sql - Cassandra - secondary index and query performance -

- June 15, 2010

my schema table :
a)

create table friend_list (     userid uuid,     friendid uuid,     accepted boolean,      ts_accepted timestamp,     primary key ((userid ,accepted), ts_accepted)    ) clustering order (ts_accepted desc);

here able perform queries like:

1.  select * friend_list userid="---" , accepted=true; 2.  select * friend_list userid="---" , accepted=false; 3.  select * friend_list userid="---" , accepted in (true,false);

but 3rd query involves more read, tried change schema :

 create table friend_list (         userid uuid,         friendid uuid,         accepted boolean,          ts_accepted timestamp,         primary key (userid , ts_accepted)        ) clustering order (ts_accepted desc); create index on friend_list (accepted);

with type b schema, 1st , 2nd queries works, can simplify third query :

3. select * friend_list userid="---";

i believe second schema gives better performance third query, won't condition check on every row.

cassandra experts...please suggest me best schema on achieving this.a or b.

first of , aware second schema not work @ first 1 ? in first 1 'accepted' field part of key, in second not @ ! don't have same unique constraint, should check not problem model.

second if want not have include 'acceptation' field every request have 2 possibilities :

1 - can use 'acceptation' clustering column :

primary key ((userid), accepted, ts_accepted)

this way 3rd request can :

select * friend_list userid="---";

and same result more efficiently.

but approach has problem, create larger partitions, not best performances.

2 - create 2 separate tables

this approach more adequate cassandra spirit. cassandra not unusual duplicate data if can improve efficiency of requests.

so in case keep first schema first table , first , second request,

and create table same data schema different , either secondary index if 'accepted' not need part of primary key (as did second schema), or primary key :

primary key ((userid), accepted, ts_accepted)

i prefer secondary index second table if possible because accepted column has low cardinality (2) , fitted secondary indexes.

edit :

also used timestamp in primary key. aware may problem if can have same user creating 2 rows in table. because timestamp not guaranty unicity : happens if 2 rows created same millisecond ?

you should use timeuuid. type commonly used in cassandra guaranty unicity combining timestamp , uuid.

furthermore timestamp in primary key can create temporary hotspots in cassandra node, beter avoid.

Search This Blog

Core code

sql - Cassandra - secondary index and query performance -

Comments

Post a Comment

Popular posts from this blog

php - Admin SDK -- get information about the group -

Python Error - TypeError: input expected at most 1 arguments, got 3 -

qt - Passing a QObject to an Script function with QJSEngine? -