vs Oracle11.2.0.4,Why is HeavyDB6.4 slower?
Prepare data:
sample and extract data from each prefecture and city as randomly as possible from the existing address library. The sample data contains two query criteria of regionid,code.
Test method: execute the query statement for each sample data:
For every 1000 pieces of data processed by select id from addressinfo where regionid='xxx' and code like'% xxx%', record the time of this round: (seconds).
HeavyDB:490 、 557 、 553 、 704 、 941.
Oracle:611 、 470 、 514 、 525 、 544.
Overall, Oracle queries are faster and more stable than HeavyDB.
Other observed conclusions:
If only 1000 pieces of data are repeated five times, the HeavyDB cache works, and the execution time is: 590,20,19,19,20,21.
But the HeavyDB cache didn't work if it was executed five times for 2000 pieces of data, and it was even slower: 5799,1681, 1593, 1705, 1834, 1623.
If you execute an equal query: select id from addressinfo where regionid='xxx' and code = 'xxx', Oracle is 10 times more efficient than HeavyDB.
Why is HeavyDB slower? thanks!
-
Hi,
I'm sorry, but I didn't fully understand what you mean when you say a piece of data...Are you running a projection query with a like filter 1000 times with randomly chosen data and getting a total runtime of 490 seconds, that's 490ms per query execution?
Running a similar query that uses a like filter on a dictionary-encoded text, the database needs to generate and compile two plans to execute the query; this is an expensive operation because each compilation takes around 180ms in my system, while the execution takes just a few ms on GPU. You can get these timings by setting the enable-debug-timer parameter to true (it can be set into the heavy.conf file in the main section).
This is the output of the first execution of this query on a table with more than 130M or records
SELECT id FROM tableaddresses w WHERE region='Bexar County' AND denominazi like '%PER%';
I highlighted in BOLD the compile section of an enable-debug-timer enabled instance (this output con be found in the logs)
378ms total duration for executeRelAlgQuery 378ms start(0ms) executeRelAlgQueryNoRetry RelAlgExecutor.cpp:613 0ms start(0ms) Query pre-execution steps RelAlgExecutor.cpp:614 378ms start(0ms) executeRelAlgSeq RelAlgExecutor.cpp:896 378ms start(0ms) executeRelAlgStep RelAlgExecutor.cpp:1155 378ms start(0ms) executeCompound RelAlgExecutor.cpp:2282 378ms start(0ms) executeWorkUnit RelAlgExecutor.cpp:3652 201ms start(0ms) compileWorkUnit NativeCodegen.cpp:2852 0ms start(20ms) markDeadRuntimeFuncs NativeCodegen.cpp:1994 180ms start(20ms) optimizeAndCodegenGPU NativeCodegen.cpp:1417 0ms start(20ms) initializeNVPTXBackend NativeCodegen.cpp:1546 180ms start(20ms) generateNativeGPUCode NativeCodegen.cpp:1200 0ms start(20ms) check_module_requires_libdevice NativeCodegen.cpp:274 5ms start(20ms) optimize_ir NativeCodegen.cpp:312 11ms start(27ms) generatePTX NativeCodegen.cpp:1507 161ms start(38ms) ptx_to_cubin NvidiaKernel.cpp:129 New thread(18) 0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:132 0ms start(0ms) fetchChunks Execute.cpp:3380 0ms start(0ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:793 2ms start(0ms) executePlanWithoutGroupBy Execute.cpp:3764 2ms start(0ms) launchGpuCode QueryExecutionContext.cpp:227 End thread(18) 0ms start(204ms) collectAllDeviceResults Execute.cpp:2630 0ms start(204ms) reduceMultiDeviceResults Execute.cpp:1528 0ms start(204ms) reduceMultiDeviceResultSets Execute.cpp:1602 170ms start(204ms) compileWorkUnit NativeCodegen.cpp:2852 0ms start(205ms) markDeadRuntimeFuncs NativeCodegen.cpp:1994 168ms start(206ms) optimizeAndCodegenGPU NativeCodegen.cpp:1417 168ms start(206ms) generateNativeGPUCode NativeCodegen.cpp:1200 0ms start(206ms) check_module_requires_libdevice NativeCodegen.cpp:274 5ms start(206ms) optimize_ir NativeCodegen.cpp:312 11ms start(213ms) generatePTX NativeCodegen.cpp:1507 148ms start(224ms) ptx_to_cubin NvidiaKernel.cpp:129 New thread(19) 0ms start(0ms) ExecutionKernel::run ExecutionKernel.cpp:132 0ms start(0ms) fetchChunks Execute.cpp:3380 0ms start(0ms) getQueryExecutionContext QueryMemoryDescriptor.cpp:793 3ms start(0ms) executePlanWithGroupBy Execute.cpp:3991 3ms start(0ms) launchGpuCode QueryExecutionContext.cpp:227 0ms start(3ms) getRowSet QueryExecutionContext.cpp:161 0ms start(3ms) reduceMultiDeviceResults Execute.cpp:1528 0ms start(3ms) reduceMultiDeviceResultSets Execute.cpp:1602 End thread(19) 0ms start(378ms) resultsUnion Execute.cpp:1499
As you may have noticed, the execution on the GPU takes only a few milliseconds, so it's reasonable to expect that the same query would take a similar amount of time on a larger table.
I tested the query on a table with 600 million records, and it took approximately 380 milliseconds, with an increased GPU runtime of 9 ms compared to the 3 ms on the smaller table.
I'm uncertain why the database exhibits this behavior, and I plan to inquire internally to investigate further.
Could you please enable the "enable-debug-timer" option on your instance and confirm whether your queries are spending an extended amount of time on code generation?
Additionally, I observed that running these queries in CPU mode significantly improves their speed because the compilation phase takes significantly less time.
On my workstation, for example, the query on the 130M record table takes 150 ms, while on the 600M record table, it takes just 200 ms.
If you'd like to try running the query in CPU mode, you can add the appropriate hint to the query.
SELECT /*+ cpu_mode */ id FROM tableaddresses w WHERE region='Bexar County' AND denominazi like '%PER%';
or set the session to use the CPU as executor.
ALTER SESSION SET executor_device = 'CPU';
Just a few words about caching: We cache both the query plan and the data we read from disk to memory, but we don't cache results (unless you explicitly instruct the query to do so).
When you run a query using the same value in the "LIKE" filter more than once, the query isn't recompiled because it utilizes the cached plan. However, this mechanism should work consistently regardless of the number of combinations used.
Could you please clarify what you mean when referring to caching that works when running 1000 pieces repeatedly five times but doesn't work on 2000 pieces? Do you mean that using 2000 different filter combinations you get a slowdown compared to 1000?
Regards,Candido
-
Ok,
I got the point. As I told you in github, we are investigating the problem. Have you tried to run your queries in CPU mode?
Just to be sure that the issue is the recompilation of the code, could you run a like query with the enable-debug-timer set to true and post the output here? The CPU memory occupation is abnormal compared with the GPU one that's really bigger than my run with 100k combination on a similat table like your. Could you share the DDL of your table?
you need just tp run the sql commandshow create table addressinfo;
in my case is
CREATE TABLE tableaddresses_test ( id INTEGER NOT NULL, region TEXT ENCODING DICT(16), denominazi TEXT ENCODING DICT(16));and after 100k queries on a freshly started database this is the memory summary
heavysql> \memory_summary HeavyDB Server CPU Memory Summary: MAX USE ALLOCATED FREE 51391.78 MB 1513.43 MB 4096.00 MB 2582.57 MB HeavyDB Server GPU Memory Summary: [GPU] MAX USE ALLOCATED FREE [0] 9673.83 MB 991.82 MB 4096.00 MB 3104.18 MB
Regards,
Candido -
Hello,
I conducted additional tests to optimize your query, and you might want to consider using a "NONE ENCODED" string for your "code" field.
All the queries now return in 20ms, but the memory consumption has increased significantly, approximately tripling to around 3GB for my 130M records dataset.
The DDL for your table would appear as follows:
create table addressinfo ( id integer, regionid text encoding dict(16), code text encoding none);
Text encoding "none" isn't used very often due to its limitations, but considering the bug you just encountered, it could serve as an acceptable workaround.
Please inform me if this improves the speed of your queries.
Candido.
-
heavysql> show create table addressinfo;
Result
CREATE TABLE ADDRESSINFO (
id TEXT ENCODING DICT(32),
ods_id TEXT ENCODING DICT(32),
code TEXT ENCODING DICT(32),
province TEXT ENCODING DICT(32),
city TEXT ENCODING DICT(32),
regionid TEXT ENCODING DICT(32),
region_name TEXT ENCODING DICT(32),
street_id TEXT ENCODING DICT(32),
street_name TEXT ENCODING DICT(32),
road_id TEXT ENCODING DICT(32),
road_name TEXT ENCODING DICT(32),
number_plate_id TEXT ENCODING DICT(32),
number_plate_name TEXT ENCODING DICT(32),
cpn_id TEXT ENCODING DICT(32),
cpn_code TEXT ENCODING DICT(32),
community_name TEXT ENCODING DICT(32),
premises_network TEXT ENCODING DICT(32),
building_code TEXT ENCODING DICT(32),
building TEXT ENCODING DICT(32),
unit_id TEXT ENCODING DICT(32),
unit TEXT ENCODING DICT(32),
layer_id TEXT ENCODING DICT(32),
layer TEXT ENCODING DICT(32),
room_id TEXT ENCODING DICT(32),
room TEXT ENCODING DICT(32),
rms_code TEXT ENCODING DICT(32),
substation_id TEXT ENCODING DICT(32),
simple_mark TEXT ENCODING DICT(32),
inner_symbol_id TEXT ENCODING DICT(32),
if_inner TEXT ENCODING DICT(32),
switch_no TEXT ENCODING DICT(32),
type INTEGER,
queryindex TEXT ENCODING DICT(32),
remark TEXT ENCODING DICT(32),
alias_name TEXT ENCODING DICT(32),
if_relate TEXT ENCODING DICT(32),
addresstype TEXT ENCODING DICT(32),
source TEXT ENCODING DICT(32),
create_time TIMESTAMP(0),
fldtag TEXT ENCODING DICT(32),
original_company TEXT ENCODING DICT(32),
useremark TEXT ENCODING DICT(32),
village_code TEXT ENCODING DICT(32),
uptimestamp_time TIMESTAMP(0),
create_op TEXT ENCODING DICT(32));
heavysql> -
look this topic:
Please sign in to leave a comment.
Comments
6 comments