r - Row index for a data.table "binary search" on a subset of columns -


i have larger set of data , need row numbers of rows fulfill conditions. package data.table.

days <- strptime(c("2013-01-01 8:00:00", "2013-02-01 8:00:00"), format="%y-%m-%d %h:%m:%s") datetime <- rep(seq(days[1], days[2], length.out=1e6/5), 5) update <- rep(letters[3:1], length.out=1e6) group <- rep(c("aaa", "bbb", "ccc"), length.out=1e6) weight <- trunc(rnorm(1e6, 110, 3)) weight2 <- rnorm(1e6, 100, 1.5) dt <- data.table(datetime, update, group, weight, weight2) setkey(dt, datetime, update, group, weight, weight2)  exp <- dt[1e6/2] 

i cannot create data.table subset without column datetime since column used in key. creating new key on subset change order , need certainty original order preserved.

it possible row numbers need using 2 commands.

system.time(dt[, which(dt$update==exp$update & dt$group==exp$group & dt$weight==exp$weight & dt$weight2==exp$weight2)]) system.time(which(dt$update==exp$update & dt$group==exp$group & dt$weight==exp$weight & dt$weight2==exp$weight2)) 

however need faster way that.

thank suggestions.

it possible row number following way.

which(is.na(dt[list(dt$datetime, dt$update,  dt$group, dt$weight, exp$weight2), which=true]) == false) 

however 4 times slower vector search examples question.


Comments