I have a production scenario where I need to process 10 billion rows (each with 10+ columns) for filtering, and group-by-aggregation within seconds. The estimated uncompressed data size may be 100 GB+ of memory.
To achieve this, I plan to purchase 10+ Azure P4 machines and distribute the workload to different VMs. A100 is too expensive for my budget. The data source to be downloaded is S3, and first stored in memory, then copied to GPU. The most frequent queries for acceleration are group-by clauses. The whole process only computes once, so the total overhead includes storing data in the CPU, moving to GPU, and computing.
I know that heavyDB has some consumptions on LLVM programs and subsequent queries will be faster, but in my scenario, I do not query again.
I also have another idea that I can pre-filter WHERE clauses in CPU and implement hash-based group-by in GPU (https://adms-conf.org/2015/gpu-optimizer-camera-ready.pdf). However, I am not sure if this approach is feasible.
I would love to have your expert opinions and suggestions for the best approach to solving this scenario. Thank you very much in advance!
Please sign in to leave a comment.