For the present problem, a compiled version of a binary search should work well. At least it would free you from post processing the output of Nearest
, which might be tedious.
cBinarySearchAfter = Compile[{{sortedlist, _Real, 1}, {y, _Real}},
Block[{a, fa, b, fb, c, fc},
a = 1;
b = Length[sortedlist];
fa = Compile`GetElement[sortedlist, a];
fb = Compile`GetElement[sortedlist, b];
If[y < fa, Return[1];];
If[y >= fb, Return[b + 1];];
(*Always fa<=y<fb;*)
While[b > a + 1,
c = a + Quotient[b - a, 2];
fc = Compile`GetElement[sortedlist, c];
If[fc <= y,
a = c; fa = fc;
,
b = c; fb = fc;
];
];
Return[b]
],
CompilationTarget -> "C",
RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"
];
Here a usage examle.
n = 1000000;
m = 1000000;
list = RandomReal[{0, 1}, n];
sortedlist = Join[{0.}, Sort[list], {1.}]; // AbsoluteTiming // First
xlist = RandomReal[{0, 1}, m];
pos = cBinarySearchAfter[sortedlist, xlist]; // AbsoluteTiming // First
And @@ (Thread[sortedlist[[pos - 1]] <= xlist])
And @@ (Thread[xlist < sortedlist[[pos]]])
0.076439
0.098603
True
True
It is only half as fast as the Nearest
approach without postprocessing:
nf = Nearest[Sort[list] -> "Index"]; // AbsoluteTiming // First
Flatten[nf[xlist]] + 1; // AbsoluteTiming // First
0.101369
0.057467
I think once written in C/C++ and properly parallelized, it would be on par with Nearest
or sightly faster.