sql - Cassandra - secondary index and query performance -
my schema table : 
a)
create table friend_list (     userid uuid,     friendid uuid,     accepted boolean,      ts_accepted timestamp,     primary key ((userid ,accepted), ts_accepted)    ) clustering order (ts_accepted desc);   here able perform queries like:
1.  select * friend_list userid="---" , accepted=true; 2.  select * friend_list userid="---" , accepted=false; 3.  select * friend_list userid="---" , accepted in (true,false);   but 3rd query involves more read, tried change schema :
b)
 create table friend_list (         userid uuid,         friendid uuid,         accepted boolean,          ts_accepted timestamp,         primary key (userid , ts_accepted)        ) clustering order (ts_accepted desc); create index on friend_list (accepted);   with type b schema, 1st , 2nd queries works, can simplify third query :
3. select * friend_list userid="---";   i believe second schema gives better performance third query, won't condition check on every row.
cassandra experts...please suggest me best schema on achieving this.a or b.
first of , aware second schema not work @ first 1 ? in first 1 'accepted' field part of key, in second not @ ! don't have same unique constraint, should check not problem model.
second if want not have include 'acceptation' field every request have 2 possibilities :
1 - can use 'acceptation' clustering column :
primary key ((userid), accepted, ts_accepted)   this way 3rd request can :
select * friend_list userid="---";   and same result more efficiently.
but approach has problem, create larger partitions, not best performances.
2 - create 2 separate tables
this approach more adequate cassandra spirit. cassandra not unusual duplicate data if can improve efficiency of requests.
so in case keep first schema first table , first , second request,
and create table same data schema different , either secondary index if 'accepted' not need part of primary key (as did second schema), or primary key :
primary key ((userid), accepted, ts_accepted)   i prefer secondary index second table if possible because accepted column has low cardinality (2) , fitted secondary indexes.
edit :
also used timestamp in primary key. aware may problem if can have same user creating 2 rows in table. because timestamp not guaranty unicity : happens if 2 rows created same millisecond ?
you should use timeuuid. type commonly used in cassandra guaranty unicity combining timestamp , uuid.
furthermore timestamp in primary key can create temporary hotspots in cassandra node, beter avoid.
Comments
Post a Comment