vs Oracle11.2.0.4,Why is HeavyDB6.4 slower?



  • Avatar
    Candido Dessanti


    I'm sorry, but I didn't fully understand what you mean when you say a piece of data...Are you running a projection query with a like filter 1000 times with randomly chosen data and getting a total runtime of 490 seconds, that's 490ms per query execution?

    Running a similar query that uses a like filter on a dictionary-encoded text, the database needs to generate and compile two plans to execute the query; this is an expensive operation because each compilation takes around 180ms in my system, while the execution takes just a few ms on GPU. You can get these timings by setting the enable-debug-timer parameter to true (it can be set into the heavy.conf file in the main section).

    This is the output of the first execution of this query on a table with more than 130M or records

    SELECT id 
    FROM   tableaddresses w
    WHERE  region='Bexar County' 
    AND    denominazi like '%PER%';

    I highlighted in BOLD the compile section of an enable-debug-timer enabled instance (this output con be found in the logs)

    378ms total duration for executeRelAlgQuery
      378ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:613
        0ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:614
        378ms start(0ms) executeRelAlgSeq RelAlgExecutor.cpp:896
          378ms start(0ms) executeRelAlgStep RelAlgExecutor.cpp:1155
            378ms start(0ms) executeCompound RelAlgExecutor.cpp:2282
              378ms start(0ms) executeWorkUnit RelAlgExecutor.cpp:3652
                201ms start(0ms) compileWorkUnit NativeCodegen.cpp:2852
                  0ms start(20ms) markDeadRuntimeFuncs NativeCodegen.cpp:1994
                  180ms start(20ms) optimizeAndCodegenGPU NativeCodegen.cpp:1417
                    0ms start(20ms) initializeNVPTXBackend NativeCodegen.cpp:1546
                    180ms start(20ms) generateNativeGPUCode NativeCodegen.cpp:1200
                      0ms start(20ms) check_module_requires_libdevice NativeCodegen.cpp:274
                      5ms start(20ms) optimize_ir NativeCodegen.cpp:312
                      11ms start(27ms) generatePTX NativeCodegen.cpp:1507
                      161ms start(38ms) ptx_to_cubin NvidiaKernel.cpp:129
                  New thread(18)
                    0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:132
                    0ms start(0ms) fetchChunks Execute.cpp:3380
                    0ms start(0ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:793
                    2ms start(0ms) executePlanWithoutGroupBy Execute.cpp:3764
                      2ms start(0ms) launchGpuCode QueryExecutionContext.cpp:227
                  End thread(18)
                0ms start(204ms) collectAllDeviceResults Execute.cpp:2630
                  0ms start(204ms) reduceMultiDeviceResults Execute.cpp:1528
                    0ms start(204ms) reduceMultiDeviceResultSets Execute.cpp:1602
                170ms start(204ms) compileWorkUnit NativeCodegen.cpp:2852
                  0ms start(205ms) markDeadRuntimeFuncs NativeCodegen.cpp:1994
                  168ms start(206ms) optimizeAndCodegenGPU NativeCodegen.cpp:1417
                    168ms start(206ms) generateNativeGPUCode NativeCodegen.cpp:1200
                      0ms start(206ms) check_module_requires_libdevice NativeCodegen.cpp:274
                      5ms start(206ms) optimize_ir NativeCodegen.cpp:312
                      11ms start(213ms) generatePTX NativeCodegen.cpp:1507
                      148ms start(224ms) ptx_to_cubin NvidiaKernel.cpp:129
                  New thread(19)
                    0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:132
                    0ms start(0ms) fetchChunks Execute.cpp:3380
                    0ms start(0ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:793
                    3ms start(0ms) executePlanWithGroupBy Execute.cpp:3991
                      3ms start(0ms) launchGpuCode QueryExecutionContext.cpp:227
                      0ms start(3ms) getRowSet QueryExecutionContext.cpp:161
                        0ms start(3ms) reduceMultiDeviceResults Execute.cpp:1528
                          0ms start(3ms) reduceMultiDeviceResultSets Execute.cpp:1602
                  End thread(19)
                0ms start(378ms) resultsUnion Execute.cpp:1499

    As you may have noticed, the execution on the GPU takes only a few milliseconds, so it's reasonable to expect that the same query would take a similar amount of time on a larger table.

    I tested the query on a table with 600 million records, and it took approximately 380 milliseconds, with an increased GPU runtime of 9 ms compared to the 3 ms on the smaller table.

    I'm uncertain why the database exhibits this behavior, and I plan to inquire internally to investigate further.

    Could you please enable the "enable-debug-timer" option on your instance and confirm whether your queries are spending an extended amount of time on code generation?

    Additionally, I observed that running these queries in CPU mode significantly improves their speed because the compilation phase takes significantly less time.

    On my workstation, for example, the query on the 130M record table takes 150 ms, while on the 600M record table, it takes just 200 ms.

    If you'd like to try running the query in CPU mode, you can add the appropriate hint to the query.

    SELECT /*+ cpu_mode */ id 
    FROM   tableaddresses w
    WHERE  region='Bexar County' 
    AND    denominazi like '%PER%';

    or set the session to use the CPU as executor.

    ALTER SESSION SET executor_device = 'CPU';

    Just a few words about caching: We cache both the query plan and the data we read from disk to memory, but we don't cache results (unless you explicitly instruct the query to do so).

    When you run a query using the same value in the "LIKE" filter more than once, the query isn't recompiled because it utilizes the cached plan. However, this mechanism should work consistently regardless of the number of combinations used.

    Could you please clarify what you mean when referring to caching that works when running 1000 pieces repeatedly five times but doesn't work on 2000 pieces? Do you mean that using 2000 different filter combinations you get a slowdown compared to 1000?



    Comment actions Permalink
  • Avatar

    Yes,It's a total runtime of 490 seconds

    Repeated 5 times for 2000 pieces, HeavyDB is slower: 5799,1681, 1593, 1705, 1834, 1623

    Comment actions Permalink
  • Avatar
    Candido Dessanti


    I got the point. As I told you in github, we are investigating the problem. Have you tried to run your queries in CPU mode?
    Just to be sure that the issue is the recompilation of the code, could you run a like query with the enable-debug-timer set to true and post the output here? The CPU memory occupation is abnormal compared with the GPU one that's really bigger than my run with 100k combination on a similat table like your. Could you share the DDL of your table? 
    you need just tp run the sql command

    show create table addressinfo;

    in my case is 

    CREATE TABLE tableaddresses_test (

    and after 100k queries on a freshly started database this is the memory summary

    heavysql> \memory_summary
    HeavyDB Server CPU Memory Summary:
                MAX            USE      ALLOCATED           FREE
        51391.78 MB     1513.43 MB     4096.00 MB     2582.57 MB
    HeavyDB Server GPU Memory Summary:
    [GPU]            MAX            USE      ALLOCATED           FREE
      [0]     9673.83 MB      991.82 MB     4096.00 MB     3104.18 MB


    Comment actions Permalink
  • Avatar
    Candido Dessanti


    I conducted additional tests to optimize your query, and you might want to consider using a "NONE ENCODED" string for your "code" field.

    All the queries now return in 20ms, but the memory consumption has increased significantly, approximately tripling to around 3GB for my 130M records dataset.

    The DDL for your table would appear as follows:

    create table addressinfo (
    id integer, 
    regionid text encoding dict(16), 
    code text encoding none);

    Text encoding "none" isn't used very often due to its limitations, but considering the bug you just encountered, it could serve as an acceptable workaround.

    Please inform me if this improves the speed of your queries.


    Comment actions Permalink
  • Avatar
    heavysql> show create table addressinfo;
      id TEXT ENCODING DICT(32),
      ods_id TEXT ENCODING DICT(32),
      code TEXT ENCODING DICT(32),
      province TEXT ENCODING DICT(32),
      city TEXT ENCODING DICT(32),
      regionid TEXT ENCODING DICT(32),
      region_name TEXT ENCODING DICT(32),
      street_id TEXT ENCODING DICT(32),
      street_name TEXT ENCODING DICT(32),
      road_id TEXT ENCODING DICT(32),
      road_name TEXT ENCODING DICT(32),
      number_plate_id TEXT ENCODING DICT(32),
      number_plate_name TEXT ENCODING DICT(32),
      cpn_id TEXT ENCODING DICT(32),
      cpn_code TEXT ENCODING DICT(32),
      community_name TEXT ENCODING DICT(32),
      premises_network TEXT ENCODING DICT(32),
      building_code TEXT ENCODING DICT(32),
      building TEXT ENCODING DICT(32),
      unit_id TEXT ENCODING DICT(32),
      unit TEXT ENCODING DICT(32),
      layer_id TEXT ENCODING DICT(32),
      layer TEXT ENCODING DICT(32),
      room_id TEXT ENCODING DICT(32),
      room TEXT ENCODING DICT(32),
      rms_code TEXT ENCODING DICT(32),
      substation_id TEXT ENCODING DICT(32),
      simple_mark TEXT ENCODING DICT(32),
      inner_symbol_id TEXT ENCODING DICT(32),
      if_inner TEXT ENCODING DICT(32),
      switch_no TEXT ENCODING DICT(32),
      type INTEGER,
      queryindex TEXT ENCODING DICT(32),
      remark TEXT ENCODING DICT(32),
      alias_name TEXT ENCODING DICT(32),
      if_relate TEXT ENCODING DICT(32),
      addresstype TEXT ENCODING DICT(32),
      source TEXT ENCODING DICT(32),
      create_time TIMESTAMP(0),
      fldtag TEXT ENCODING DICT(32),
      original_company TEXT ENCODING DICT(32),
      useremark TEXT ENCODING DICT(32),
      village_code TEXT ENCODING DICT(32),
      uptimestamp_time TIMESTAMP(0),
      create_op TEXT ENCODING DICT(32));
    Comment actions Permalink
  • Avatar

    look this topic:


    Comment actions Permalink

Please sign in to leave a comment.