Hi I am reading heavyDB code/doc and have some questions about the executor.
From my understanding, the execution logic can be summarized as follows: Calcite logical plan --> optimized DAG (after heavyDB DAG optimization) --> query step (each node will be translated into a query step) --> work unit (each query step will be converted to a work unit). --> execute work unit (include both JIT and execution). Please correct my understanding if I am wrong!!
I have some questions about the execution process: 1. Suppose we have single GPU, are different work units executed sequentially? For example, workunit1 is a join, workunit2 is a groupby (we assume the join and groupby will not be fused into a compound node) , the executor will get all the join results before executing the grouby? 2. How are data passed between different work units? What if the intermediate results are too big? Still the example above, you don't know the size of join outputs until you execute it, but you need to prepare for the intermediate result buffer when creating the work unit, then how do you know how much intermediate buffer size you need to allocate? Moreover, the intermediate join result might be bigger than the GPU memory size, do you just fail or offload to CPU? 3. I notice heavyDB has the fusion optimization as described here https://heavyai.github.io/heavydb/execution/optimizer.html. I'm wondering the granularity of fusion. Do you just coalesce the kernel launch (e.g., still launch multiple kernels for different logical operators), or do you combine different operations on the same record (e.g., launch one kernel that performs multiple operations, such as groupby, project and filter)?
Thanks a lot!! Any comments are highly appreciated!
Please sign in to leave a comment.