Setup a TPC-H benchmark
Hey, im trying to do a TPC-H test, but im lost, can you please provide me a tutorial on it ?
-
Hi,
The first run of a kind of a query will be slower than subsequent.
The first is that the engine could be forced to read the data from DISK to CPU and GPU memory, and this takes time depending on the size of data needed to move.
The second is that the query plan has to be generated from scratch, which can take at least 100ms to more depending on the complexity of the query. Still, subsequent queries with different literals will reuse that plan so that the queries will run faster than the first one.
As an example, this simple query runs against a small table containing 120MM rows on a notebook using an entry-level GPU
select extract(year from arr_timestamp),avg(arrdelay) from flights_sk_8_120m where distance<10 group by 1;
Looking at the logs, we get that the database had to create a plan to run the query 130ms start(0ms) compileWorkUnit NativeCodegen.cpp:2575
and has to read data from disk to populate the CPU and GPU caches 371ms start(130ms) fetchChunks Execute.cpp:2572
the rest of the runtime operations tooks another 9ms 9ms start(502ms) executePlanWithGroupBy Execute.cpp:3214 4ms start(502ms) launchGpuCode QueryExecutionContext.cpp:221 4ms start(507ms) getRowSet QueryExecutionContext.cpp:156
If I will run the query another time, changing the literal of the filter
select extract(year from arr_timestamp),avg(arrdelay) from flights_sk_8_120m where distance<11 group by 1;
the total runtime falls to 7 ms the engine had a valid plan to run the query, and the data was in the GPU's RAM
0ms start(2ms) fetchChunks Execute.cpp:2572 0ms start(2ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:775 6ms start(2ms) executePlanWithGroupBy Execute.cpp:3214 6ms start(2ms) launchGpuCode QueryExecutionContext.cpp:221
This is the reason why in the benchmark, the results of the first run aren't used to calculate the average, so we think the subsequent runs are more representative of the performance.
Best regards, Candido
-
I have ddl and queries. I will test a 20Gb benchmark with a single card. But i need help with how to run these queries. I need each query to run separately, but in the same test. that test, i will repeat for 10, and after every test, i need to reset cache, neednt i ? Thanks for your help :)
-
We have something to run queries against an omniscidb and get results in various formats, but you have to install our python driver
You can get everything in our GitHub repo at this link
https://github.com/omnisci/omniscidb/blob/master/Benchmarks/run_benchmark.py
I don't remember right now if it's present on OS edition.
edit: just checked and that part it's included into binary distribution, but you can download without issue from the github repository
Regards, Candido
-
Hello again, i have prepared all data throught the web application, installed all requirements, can you please help me with this benchmark , Im still im getting following error:
benchmark command: sudo python3 benchmark.py -u admin -p HyperInteractive -S localhost -t supplier -l testovani -d /home/dominik/Desktop/test -i 2 -n omnisci
And another question, when i have 22 queries, how can i specify all tables here, because from i saw, i can only specify 1 table here.
-
Hi,
I tried the benchmark.py with pymapd, and it worked as expected; I see you used the -S parameter, a target database that stores the results, so the error.
this way is working, and it is outputting the resulting JSON to standard output; if you prefer, you can have a file in output changing the -e switch
python run_benchmark.py -t mytable -l "first run" -i 10 -u admin -p HyperInteractive -s localhost -n tpch1000 -d queries/ -e output
the -t switch is used when you want to run the queries against different tables, so as an example you want to query two different tables one with a billion rows and another with ten billion, you can write the query is this way
SELECT FIELD1,AVG(FIELD2) FROM ###TAB### GROUP BY 1 ORDER BY 2 DESC LIMIT 5
the run_benchmark.py will substitute the ###TAB### with the name you specify using -t switch
but with the TPCH benchmark this isn't useful because there are multiple tables, so its better to create more databases with a different scaling factor, and change the database name when you connect to Heavyai.
-
Thank you so much guys, I have the last question, in the following results, query_exec_first is always large, but query_exec_avg is much smaller, so isn't the query_exec_first value more accurate in measuring how long it took to process the query?
"query_id": "query_1", "query_result_set_count": 4, "query_error_info": "", "query_conn_first": 22.8, "query_conn_avg": 1.7, "query_conn_min": 1.1, "query_conn_max": 2.6, "query_conn_85": 2.2, "query_exec_first": 39270, "query_exec_avg": 681.6, "query_exec_min": 645, "query_exec_max": 944, "query_exec_85": 652.6, "query_exec_25": 646.0, "query_exec_stdd": 92.8, "query_exec_trimmed_avg": 649, "query_exec_trimmed_max": 653, "query_render_first": null, "query_render_avg": null, "query_render_min": null, "query_render_max": null, "query_render_85": null, "query_render_25": null, "query_render_stdd": null, "query_total_first": 39292.8, "query_total_avg": 683.3, "query_total_min": 646.3, "query_total_max": 945.1, "query_total_85": 653.9, "query_total_all": 45450.0, "query_total_trimmed_avg": 651.2, "results_iter_count": 10, "results_iter_first": 0.0, "results_iter_avg": 0.0, "results_iter_min": 0.0, "results_iter_max": 0.0, "results_iter_85": 0.0, "cpu_mem_usage_mb": 1487.4, "gpu_mem_usage_mb": 1487.4 }, "debug": { "query_exec_times": [ 944, 651, 650, 650, 653, 650, 646, 645, 645 ], "query_total_times": [ 945.1, 653.1, 652.6, 651.6, 654.1, 652.2, 647.7, 646.8, 646.3 ], "detailed_timing_last_iteration": {} } },
Please sign in to leave a comment.
Comments
7 comments