how use data.table
select subset of rows rank? have large data set , hope efficiently.
> dt <- data.table(id=1:200, category=sample(letters, 200, replace=t)) > dt[,count:=length(id), by=category] > dt id category count 1: 1 o 13 2: 2 o 13 --- 199: 170 n 3 200: 171 h 3
what want efficiently change category 'other' category not in k
common ones. along lines of:
dt[rank > 5,category:="other", by=category]
i'm new data.table
, i'm not quite sure how rank in efficient way. here's way works, seems clunky.
counts <- unique(dt$count) decision <- max(counts[rank(-counts)>5]) dt[count<=decision, category:='other']
i appreciate advice. honest, don't need 'count' column if it's not necessary.
Comments
Post a Comment