Comments

7 comments

  • Avatar
    Candido Dessanti

    Do you mean DDLs and queries?

    Which size you are going to try and how manu gpus are you going to use?

    0
    Comment actions Permalink
  • Avatar
    Dom

    I have ddl and queries. I will test a 20Gb benchmark with a single card. But i need help with how to run these queries. I need each query to run separately, but in the same test. that test, i will repeat for 10, and after every test, i need to reset cache, neednt i ? Thanks for your help :)

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    We have something to run queries against an omniscidb and get results in various formats, but you have to install our python driver

    You can get everything in our GitHub repo at this link

    https://github.com/omnisci/omniscidb/blob/master/Benchmarks/run_benchmark.py

    I don't remember right now if it's present on OS edition.

    edit: just checked and that part it's included into binary distribution, but you can download without issue from the github repository

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Dom

    Hello again, i have prepared all data throught the web application, installed all requirements, can you please help me with this benchmark , Im still im getting following error: error|690x309

    benchmark command: sudo python3 benchmark.py -u admin -p HyperInteractive -S localhost -t supplier -l testovani -d /home/dominik/Desktop/test -i 2 -n omnisci

    And another question, when i have 22 queries, how can i specify all tables here, because from i saw, i can only specify 1 table here.

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    I tried the benchmark.py with pymapd, and it worked as expected; I see you used the -S parameter, a target database that stores the results, so the error.

    this way is working, and it is outputting the resulting JSON to standard output; if you prefer, you can have a file in output changing the -e switch

    python run_benchmark.py -t mytable  -l "first run" -i 10 -u admin -p HyperInteractive -s localhost -n tpch1000 -d queries/ -e output
    

    the -t switch is used when you want to run the queries against different tables, so as an example you want to query two different tables one with a billion rows and another with ten billion, you can write the query is this way

    SELECT FIELD1,AVG(FIELD2)
    FROM ###TAB###
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 5
    

    the run_benchmark.py will substitute the ###TAB### with the name you specify using -t switch

    but with the TPCH benchmark this isn't useful because there are multiple tables, so its better to create more databases with a different scaling factor, and change the database name when you connect to Heavyai.

    0
    Comment actions Permalink
  • Avatar
    Dom

    Thank you so much guys, I have the last question, in the following results, query_exec_first is always large, but query_exec_avg is much smaller, so isn't the query_exec_first value more accurate in measuring how long it took to process the query?

    "query_id": "query_1", "query_result_set_count": 4, "query_error_info": "", "query_conn_first": 22.8, "query_conn_avg": 1.7, "query_conn_min": 1.1, "query_conn_max": 2.6, "query_conn_85": 2.2, "query_exec_first": 39270, "query_exec_avg": 681.6, "query_exec_min": 645, "query_exec_max": 944, "query_exec_85": 652.6, "query_exec_25": 646.0, "query_exec_stdd": 92.8, "query_exec_trimmed_avg": 649, "query_exec_trimmed_max": 653, "query_render_first": null, "query_render_avg": null, "query_render_min": null, "query_render_max": null, "query_render_85": null, "query_render_25": null, "query_render_stdd": null, "query_total_first": 39292.8, "query_total_avg": 683.3, "query_total_min": 646.3, "query_total_max": 945.1, "query_total_85": 653.9, "query_total_all": 45450.0, "query_total_trimmed_avg": 651.2, "results_iter_count": 10, "results_iter_first": 0.0, "results_iter_avg": 0.0, "results_iter_min": 0.0, "results_iter_max": 0.0, "results_iter_85": 0.0, "cpu_mem_usage_mb": 1487.4, "gpu_mem_usage_mb": 1487.4 }, "debug": { "query_exec_times": [ 944, 651, 650, 650, 653, 650, 646, 645, 645 ], "query_total_times": [ 945.1, 653.1, 652.6, 651.6, 654.1, 652.2, 647.7, 646.8, 646.3 ], "detailed_timing_last_iteration": {} } },

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    The first run of a kind of a query will be slower than subsequent.

    The first is that the engine could be forced to read the data from DISK to CPU and GPU memory, and this takes time depending on the size of data needed to move.

    The second is that the query plan has to be generated from scratch, which can take at least 100ms to more depending on the complexity of the query. Still, subsequent queries with different literals will reuse that plan so that the queries will run faster than the first one.

    As an example, this simple query runs against a small table containing 120MM rows on a notebook using an entry-level GPU

    select extract(year from arr_timestamp),avg(arrdelay) from flights_sk_8_120m where distance<10 group by 1;

    Looking at the logs, we get that the database had to create a plan to run the query 130ms start(0ms) compileWorkUnit NativeCodegen.cpp:2575

    and has to read data from disk to populate the CPU and GPU caches 371ms start(130ms) fetchChunks Execute.cpp:2572

    the rest of the runtime operations tooks another 9ms 9ms start(502ms) executePlanWithGroupBy Execute.cpp:3214 4ms start(502ms) launchGpuCode QueryExecutionContext.cpp:221 4ms start(507ms) getRowSet QueryExecutionContext.cpp:156

    If I will run the query another time, changing the literal of the filter

    select extract(year from arr_timestamp),avg(arrdelay) from flights_sk_8_120m where distance<11 group by 1;

    the total runtime falls to 7 ms the engine had a valid plan to run the query, and the data was in the GPU's RAM

    0ms start(2ms) fetchChunks Execute.cpp:2572
    0ms start(2ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:775
      6ms start(2ms) executePlanWithGroupBy Execute.cpp:3214   
        6ms start(2ms) launchGpuCode QueryExecutionContext.cpp:221
    

    This is the reason why in the benchmark, the results of the first run aren't used to calculate the average, so we think the subsequent runs are more representative of the performance.

    Best regards, Candido

    1
    Comment actions Permalink

Please sign in to leave a comment.