A specific practical scenario requiring fast GPU groupby queries
Hi there,
I have a production scenario where I need to process 10 billion rows (each with 10+ columns) for filtering, and group-by-aggregation within seconds. The estimated uncompressed data size may be 100 GB+ of memory.
To achieve this, I plan to purchase 10+ Azure P4 machines and distribute the workload to different VMs. A100 is too expensive for my budget. The data source to be downloaded is S3, and first stored in memory, then copied to GPU. The most frequent queries for acceleration are group-by clauses. The whole process only computes once, so the total overhead includes storing data in the CPU, moving to GPU, and computing.
I know that heavyDB has some consumptions on LLVM programs and subsequent queries will be faster, but in my scenario, I do not query again.
I also have another idea that I can pre-filter WHERE clauses in CPU and implement hash-based group-by in GPU (https://adms-conf.org/2015/gpu-optimizer-camera-ready.pdf). However, I am not sure if this approach is feasible.
I would love to have your expert opinions and suggestions for the best approach to solving this scenario. Thank you very much in advance!
-
Hi Yanzhuo Zhou,
If you run the same queries with different filter values, the LLVM plan could be reused. Additionally, GPUs are faster than CPUs regarding filtering, thanks to their superior bandwidth.
One helpful optimization you might consider is taking advantage of 'fragment skipping,' a form of partition pruning. In our database, each table is divided into fragments, typically with a default size of 32 MB. When a simple filter requests data outside the minimum and maximum values of a column for a specific fragment, that fragment is skipped. As a result, the fragment is not read from disk, and no caching is performed, leading to the complete exclusion of unnecessary data.
If you frequently use a specific field for filtering, loading data sorted based on that field will increase the likelihood of fragment skipping. This means that certain fragments can be skipped during query execution, resulting in improved performance.
Additionally, you might consider adjusting the fragment size if you find that this type of partition pruning is beneficial for your workloads. Experimenting with different fragment sizes can help optimize query performance based on your specific use cases.
Regards,
Candido
I'm truly sorry for the late reply, but I just see your message.
Please sign in to leave a comment.
Comments
1 comment