sql - Create bins with unique values in each bin -


i bin numeric column (var) in such way there approximately same number of rows in in each bin. additional requirement 1 (unique) value in column cannot assigned more 1 bin. example if value 1 in column var assigned bin 1 not allowed assign value 1 bin 2.

i aware of functions ntile() or percent_rank(), not see how these used task @ hand.

drop table if exists binme; create table binme (var numeric);  insert binme values     (0), (0), (0),     (1), (1), (1.5), (1.5),     (2), (2), (2), (2.5),     (3), (3), (3.5), (4.5),     (5), (6), (7), (10), (11);  select (var * 100)::int, ntile(5) over(order var), percent_rank() over(order var)  binme; 

for example , 5 bins required result be:

var ntile required_bin 0   1   1    0   1   1    0   1   1    1   1   1    1   2   1   has in bin 1 1.5 2   2    1.5 2   2    2   2   2    2   3   2    2   3   2   has in bin 2 2.5 3   3    3   3   3    3   4   3    3.5 4   3    4.5 4   4    5   4   4    6   5   4    7   5   4    10  5   5    11  5   5    

i somehow intuitively feel may necessary group var first, number of rows each value , use recursive query assign bin original data. should possible figure out following:

select      var,      cnt,      sum(cnt) over(order var) nrows      (select var, count(*) cnt binme group var) a; 

if looking approximation (which ensures same values placed in same bucket), indeed use width_bucket mentioned @greg, balance number of items per bucket, has applied cumulated sum , not var value itself. here demo (sql fiddle, improved solution below):

select    o.var,    width_bucket(o.cumsum, 1, o.cnt + 1, 5) bucket    (select        b.var,        (select count(*) binme t) cnt,        (select count(*) binme t t.var <= b.var) cumsum            binme b     ) o ; 

the cumulated sum (or cumulated count more precise maybe) @ least 1 (min inclusive) , max (exclusive) cnt + 1, 3rd parameter specifies number of buckets. first bucket 1 (not 0, subtract 1 0-based bucket number).

alternatively can take < instead of <= , set range [0,cnt), better solution: sql fiddle.


Comments

Popular posts from this blog

php - Admin SDK -- get information about the group -

dns - How To Use Custom Nameserver On Free Cloudflare? -

Python Error - TypeError: input expected at most 1 arguments, got 3 -