sql - Create bins with unique values in each bin -
i bin numeric column (var
) in such way there approximately same number of rows in in each bin. additional requirement 1 (unique) value in column cannot assigned more 1 bin. example if value 1 in column var
assigned bin 1 not allowed assign value 1 bin 2.
i aware of functions ntile()
or percent_rank()
, not see how these used task @ hand.
drop table if exists binme; create table binme (var numeric); insert binme values (0), (0), (0), (1), (1), (1.5), (1.5), (2), (2), (2), (2.5), (3), (3), (3.5), (4.5), (5), (6), (7), (10), (11); select (var * 100)::int, ntile(5) over(order var), percent_rank() over(order var) binme;
for example , 5 bins required result be:
var ntile required_bin 0 1 1 0 1 1 0 1 1 1 1 1 1 2 1 has in bin 1 1.5 2 2 1.5 2 2 2 2 2 2 3 2 2 3 2 has in bin 2 2.5 3 3 3 3 3 3 4 3 3.5 4 3 4.5 4 4 5 4 4 6 5 4 7 5 4 10 5 5 11 5 5
i somehow intuitively feel may necessary group var
first, number of rows each value , use recursive query assign bin original data. should possible figure out following:
select var, cnt, sum(cnt) over(order var) nrows (select var, count(*) cnt binme group var) a;
if looking approximation (which ensures same values placed in same bucket), indeed use width_bucket
mentioned @greg, balance number of items per bucket, has applied cumulated sum , not var
value itself. here demo (sql fiddle, improved solution below):
select o.var, width_bucket(o.cumsum, 1, o.cnt + 1, 5) bucket (select b.var, (select count(*) binme t) cnt, (select count(*) binme t t.var <= b.var) cumsum binme b ) o ;
the cumulated sum (or cumulated count more precise maybe) @ least 1
(min inclusive) , max (exclusive) cnt + 1
, 3rd parameter specifies number of buckets. first bucket 1
(not 0
, subtract 1 0-based bucket number).
alternatively can take <
instead of <=
, set range [0,cnt)
, better solution: sql fiddle.
Comments
Post a Comment