Analyzing the time of queries sub-steps

Comments

3 comments

  • Avatar
    Jiashen Cao

    Thanks!

    In addition, putting the query inside select * from (...) limit 0 seems to affect the query execution time in debug timer as well. My inner query is just a join between two tables select * from tb1,tb2 where tb1.k=tb2.k. Here, I compare both runs under the system is warm up.

    The following log is when I just run the inner query, which indicates ~400ms.

    439ms total duration for executeRelAlgQuery
      439ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:516
        0ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:517
        439ms start(0ms) executeRelAlgSeq RelAlgExecutor.cpp:814
          439ms start(0ms) executeRelAlgStep RelAlgExecutor.cpp:962
            439ms start(0ms) executeProject RelAlgExecutor.cpp:2166
              438ms start(0ms) executeWorkUnit RelAlgExecutor.cpp:3388
                26ms start(0ms) compileWorkUnit NativeCodegen.cpp:2589
                  24ms start(2ms) getInstance HashJoin.cpp:295
                    24ms start(2ms) reify PerfectJoinHashTable.cpp:354
                      0ms start(2ms) getOneColumnFragment ColumnFetcher.cpp:78
                        New thread(18)
                          24ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                            23ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                        End thread(18)
                  0ms start(26ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
                  0ms start(26ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
                0ms start(27ms) ExecutionKernel::run ExecutionKernel.cpp:125
                0ms start(27ms) fetchChunks Execute.cpp:2941
                0ms start(27ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                41ms start(27ms) executePlanWithoutGroupBy Execute.cpp:3362
                  41ms start(27ms) launchGpuCode QueryExecutionContext.cpp:223
                0ms start(69ms) collectAllDeviceResults Execute.cpp:2333
                  0ms start(69ms) reduceMultiDeviceResults Execute.cpp:1309
                    0ms start(69ms) reduceMultiDeviceResultSets Execute.cpp:1355
                25ms start(69ms) compileWorkUnit NativeCodegen.cpp:2589
                  24ms start(70ms) getInstance HashJoin.cpp:295
                    24ms start(70ms) reify PerfectJoinHashTable.cpp:354
                      0ms start(70ms) getOneColumnFragment ColumnFetcher.cpp:78
                        New thread(19)
                          24ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                            23ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                        End thread(19)
                  0ms start(94ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
                  0ms start(95ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
                0ms start(95ms) ExecutionKernel::run ExecutionKernel.cpp:125
                0ms start(95ms) fetchChunks Execute.cpp:2941
                0ms start(95ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                  New thread(20)
                    0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                    0ms start(17ms) fetchChunks Execute.cpp:2941
                    0ms start(17ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                    42ms start(17ms) executePlanWithGroupBy Execute.cpp:3583
                      42ms start(17ms) launchGpuCode QueryExecutionContext.cpp:223
                      0ms start(59ms) getRowSet QueryExecutionContext.cpp:158
                        0ms start(59ms) reduceMultiDeviceResults Execute.cpp:1309
                          0ms start(59ms) reduceMultiDeviceResultSets Execute.cpp:1355
                  End thread(20)
                  New thread(21)
                    0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                    0ms start(59ms) fetchChunks Execute.cpp:2941
                    0ms start(59ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                    38ms start(59ms) executePlanWithGroupBy Execute.cpp:3583
                      38ms start(59ms) launchGpuCode QueryExecutionContext.cpp:223
                      0ms start(97ms) getRowSet QueryExecutionContext.cpp:158
                        0ms start(97ms) reduceMultiDeviceResults Execute.cpp:1309
                          0ms start(97ms) reduceMultiDeviceResultSets Execute.cpp:1355
                  End thread(21)
                16ms start(95ms) executePlanWithGroupBy Execute.cpp:3583
                    New thread(22)
                      0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                      0ms start(303ms) fetchChunks Execute.cpp:2941
                      0ms start(303ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                      39ms start(304ms) executePlanWithGroupBy Execute.cpp:3583
                        39ms start(304ms) launchGpuCode QueryExecutionContext.cpp:223
                        0ms start(343ms) getRowSet QueryExecutionContext.cpp:158
                          0ms start(343ms) reduceMultiDeviceResults Execute.cpp:1309
                            0ms start(343ms) reduceMultiDeviceResultSets Execute.cpp:1355
                    End thread(22)
                    New thread(23)
                      0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                      0ms start(179ms) fetchChunks Execute.cpp:2941
                      0ms start(179ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                      43ms start(179ms) executePlanWithGroupBy Execute.cpp:3583
                        43ms start(179ms) launchGpuCode QueryExecutionContext.cpp:223
                        0ms start(222ms) getRowSet QueryExecutionContext.cpp:158
                          0ms start(222ms) reduceMultiDeviceResults Execute.cpp:1309
                            0ms start(222ms) reduceMultiDeviceResultSets Execute.cpp:1355
                    End thread(23)
                    New thread(24)
                      0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                      0ms start(97ms) fetchChunks Execute.cpp:2941
                      0ms start(97ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                      39ms start(97ms) executePlanWithGroupBy Execute.cpp:3583
                        39ms start(97ms) launchGpuCode QueryExecutionContext.cpp:223
                        0ms start(136ms) getRowSet QueryExecutionContext.cpp:158
                          0ms start(136ms) reduceMultiDeviceResults Execute.cpp:1309
                            0ms start(136ms) reduceMultiDeviceResultSets Execute.cpp:1355
                    End thread(24)
                    New thread(25)
                      0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                      0ms start(137ms) fetchChunks Execute.cpp:2941
                      0ms start(137ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                      42ms start(137ms) executePlanWithGroupBy Execute.cpp:3583
                        42ms start(137ms) launchGpuCode QueryExecutionContext.cpp:223
                        0ms start(179ms) getRowSet QueryExecutionContext.cpp:158
                          0ms start(179ms) reduceMultiDeviceResults Execute.cpp:1309
                            0ms start(179ms) reduceMultiDeviceResultSets Execute.cpp:1355
                    End thread(25)
                    New thread(26)
                      0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                      0ms start(264ms) fetchChunks Execute.cpp:2941
                      0ms start(264ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                      39ms start(264ms) executePlanWithGroupBy Execute.cpp:3583
                        39ms start(264ms) launchGpuCode QueryExecutionContext.cpp:223
                        0ms start(303ms) getRowSet QueryExecutionContext.cpp:158
                          0ms start(303ms) reduceMultiDeviceResults Execute.cpp:1309
                            0ms start(303ms) reduceMultiDeviceResultSets Execute.cpp:1355
                    End thread(26)
                    New thread(27)
                      0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:125
                      0ms start(222ms) fetchChunks Execute.cpp:2941
                      0ms start(222ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:696
                      41ms start(222ms) executePlanWithGroupBy Execute.cpp:3583
                        41ms start(222ms) launchGpuCode QueryExecutionContext.cpp:223
                        0ms start(264ms) getRowSet QueryExecutionContext.cpp:158
                          0ms start(264ms) reduceMultiDeviceResults Execute.cpp:1309
                            0ms start(264ms) reduceMultiDeviceResultSets Execute.cpp:1355
                    End thread(27)
                  16ms start(95ms) launchGpuCode QueryExecutionContext.cpp:223
                  0ms start(112ms) getRowSet QueryExecutionContext.cpp:158
                    0ms start(112ms) reduceMultiDeviceResults Execute.cpp:1309
                      0ms start(112ms) reduceMultiDeviceResultSets Execute.cpp:1355
                0ms start(439ms) resultsUnion Execute.cpp:1279
    

    But, if I add the limit statement, the execution time becomes much less (~50ms)

    52ms total duration for executeRelAlgQuery
      52ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:516
        0ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:517
        52ms start(0ms) executeRelAlgSeq RelAlgExecutor.cpp:814
          52ms start(0ms) executeRelAlgStep RelAlgExecutor.cpp:962
            52ms start(0ms) executeSort RelAlgExecutor.cpp:2985
              52ms start(0ms) executeWorkUnit RelAlgExecutor.cpp:3388
                25ms start(0ms) compileWorkUnit NativeCodegen.cpp:2589
                  24ms start(1ms) getInstance HashJoin.cpp:295
                    24ms start(1ms) reify PerfectJoinHashTable.cpp:354
                      0ms start(1ms) getOneColumnFragment ColumnFetcher.cpp:78
                        New thread(13)
                          24ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                            23ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                        End thread(13)
                  0ms start(25ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
                  0ms start(25ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
                0ms start(26ms) collectAllDeviceResults Execute.cpp:2333
                26ms start(26ms) compileWorkUnit NativeCodegen.cpp:2589
                  24ms start(27ms) getInstance HashJoin.cpp:295
                    24ms start(27ms) reify PerfectJoinHashTable.cpp:354
                      0ms start(27ms) getOneColumnFragment ColumnFetcher.cpp:78
                        New thread(14)
                          23ms start(0ms) initHashTableForDevice PerfectJoinHashTable.cpp:721
                            23ms start(0ms) initHashTableOnGpu PerfectHashTableBuilder.h:89
                        End thread(14)
                  0ms start(52ms) markDeadRuntimeFuncs NativeCodegen.cpp:1849
                  0ms start(52ms) optimizeAndCodegenGPU NativeCodegen.cpp:1326
                0ms start(52ms) resultsUnion Execute.cpp:1279
    

    I wonder which represents the closest query execution time?

    0
    Comment actions Permalink
  • Avatar
    Jiashen Cao

    Also, on the client side, the \ iming command reports execution and total time. Do those include Calcite server start and parsing time?

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    Probably the time difference is the time spent on projection.

    In the earlier release we had some timings on that, but lately.they are disappeared.

    Yes in the client everything in the server is included. The client is calling a thrifth endpoint, and ita reporting the time of a execution in the server reporter by the thrifth call and a Total time spent ti return the result back to the user.

    1
    Comment actions Permalink

Please sign in to leave a comment.