Getting random sample of rows from OmniSQL
Hi, I was wondering whether anyone found a way to get a sample that is not deterministic for each row. The normal solution of using rand() is not possible and the usage of sample_ratio() is deterministic for each row which disqualifies it for my purposes.
I would happy to hear any suggestions.
-
Hi @ldnker,
Thanks for joining our forum.
We haven't implemented the rand function yet, but you can re-use the procedure we use to distribute the point evenly in our point maps, making it randomly.
try this
for 0.1 percent
SELECT col1,col2,..., col2 FROM table WHERE MOD( MOD (rowid, 2147483648) * extract(epoch from now()) , 4294967296) < 4294967296/1000 AND rowid != 0 LIMIT 1000;
for 1 percent
SELECT col1,col2,..., col2 FROM table WHERE MOD( MOD (rowid, 2147483648) * extract(epoch from now()) , 4294967296) < 4294967296/100 AND rowid != 0 LIMIT 1000;
the LIMIT is optional, and with a little more complicated query you can get a proximate number of records close to the one needed.
so if you want around 1000 rows
SELECT col1,col2,..., col2 FROM table WHERE MOD( MOD (rowid, 2147483648) * extract(epoch from now()) , 4294967296) < (SELECT (4294967296/count(*))*1002 from table) AND rowid != 0 LIMIT 1000;
I used a projection query as an example, but it fits also for aggregate queries.
being based on epoch that's is returned in seconds, if you run the query in the same seconds, the same rows are returned, but you can change the filter adding in the mix microseconds or nanoseconds
something like EXTRACT(epoch FROM now()+EXTRACT(NANOSECOND from now())
Regards, Candido
-
The credits for the idea go to @todd, I just elaborated his idea.
to makes the syntax simpler, using UDFs would help, but as you can read in the docs, additional software is needed and I would check the performances of the implementation.
Please sign in to leave a comment.
Comments
3 comments