sql - Cassandra - secondary index and query performance -
my schema table :
a)
create table friend_list ( userid uuid, friendid uuid, accepted boolean, ts_accepted timestamp, primary key ((userid ,accepted), ts_accepted) ) clustering order (ts_accepted desc);
here able perform queries like:
1. select * friend_list userid="---" , accepted=true; 2. select * friend_list userid="---" , accepted=false; 3. select * friend_list userid="---" , accepted in (true,false);
but 3rd query involves more read, tried change schema :
b)
create table friend_list ( userid uuid, friendid uuid, accepted boolean, ts_accepted timestamp, primary key (userid , ts_accepted) ) clustering order (ts_accepted desc); create index on friend_list (accepted);
with type b schema, 1st , 2nd queries works, can simplify third query :
3. select * friend_list userid="---";
i believe second schema gives better performance third query, won't condition check on every row.
cassandra experts...please suggest me best schema on achieving this.a or b.
first of , aware second schema not work @ first 1 ? in first 1 'accepted' field part of key, in second not @ ! don't have same unique constraint, should check not problem model.
second if want not have include 'acceptation' field every request have 2 possibilities :
1 - can use 'acceptation' clustering column :
primary key ((userid), accepted, ts_accepted)
this way 3rd request can :
select * friend_list userid="---";
and same result more efficiently.
but approach has problem, create larger partitions, not best performances.
2 - create 2 separate tables
this approach more adequate cassandra spirit. cassandra not unusual duplicate data if can improve efficiency of requests.
so in case keep first schema first table , first , second request,
and create table same data schema different , either secondary index if 'accepted' not need part of primary key (as did second schema), or primary key :
primary key ((userid), accepted, ts_accepted)
i prefer secondary index second table if possible because accepted column has low cardinality (2) , fitted secondary indexes.
edit :
also used timestamp in primary key. aware may problem if can have same user creating 2 rows in table. because timestamp not guaranty unicity : happens if 2 rows created same millisecond ?
you should use timeuuid. type commonly used in cassandra guaranty unicity combining timestamp , uuid.
furthermore timestamp in primary key can create temporary hotspots in cassandra node, beter avoid.
Comments
Post a Comment