Query is crashing omnisql
Hi,
I have installed omniscidb 5.9 on a Centos 7 server to test it. While running the following query from omnisql:
CREATE TABLE last_date AS (SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta);
I got this output: Thrift error: No more data to read. Thrift connection error: No more data to read. Retrying connection Thrift error: No more data to read. Thrift connection error: No more data to read. Retrying connection Thrift: Sat Dec 11 00:14:41 2021 TSocket::write_partial() send() : Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection Thrift: Sat Dec 11 00:14:49 2021 TSocket::write_partial() send() : Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection
Looking at the logs I saw: 2021-12-10T23:40:09.458391 F 10663 0 2 RelAlgDagBuilder.cpp:55 Check failed: size_t(1) == row_set->colCount() (1 == 2)
The tables used are the following. CREATE TABLE hits ( daydt DATE NOT NULL, alpha TEXT NOT NULL, beta TEXT NOT NULL, hits_val SMALLINT NOT NULL );
CREATE TABLE base ( daydt DATE NOT NULL, alpha TEXT NOT NULL, beta TEXT NOT NULL, base_val BOOLEAN NOT NULL );
The query runs with no issues, when I modify it to:
CREATE TABLE last_date AS (SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha) IN (SELECT DISTINCT alpha FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta);
Can somebody help me out?
Many thanks.
-
Hi @amok,
Could you check the file system that you are using to store data? Probably it's full so it's crashing the server.
In case it's full you should allocate more space to it, or move omnisci_storage in a more capable file system.
If you need some assistance on such an operation feel free to ask
Regards, Candido
-
Hi candido,
thank you for the prompt reply. I don't think it is a storage issue:
If you look at the filesystem, there is plenty of free space everywhere. Especially in the mount point where omnisci storage is located, there is around 4.2T free: Filesystem Size Used Avail Use% Mounted on devtmpfs 252G 0 252G 0% /dev tmpfs 252G 24K 252G 1% /dev/shm tmpfs 252G 82M 252G 1% /run tmpfs 252G 0 252G 0% /sys/fs/cgroup /dev/nvme0n1p4 100G 59G 42G 59% / /dev/nvme0n1p7 100G 64M 100G 1% /tmp /dev/nvme0n1p6 100G 22G 79G 22% /home /dev/nvme0n1p9 5.1T 918G 4.2T 18% /core/omnisci /dev/nvme0n1p2 509M 123M 387M 24% /boot /dev/nvme0n1p1 256M 12M 245M 5% /boot/efi tmpfs 51G 0 51G 0% /run/user/1005
Anything else I should check?
The error in the log, that I mentioned in the previous post, looks suspicious: 2021-12-10T23:40:09.458391 F 10663 0 2 RelAlgDagBuilder.cpp:55 Check failed: size_t(1) == row_set->colCount() (1 == 2)
By the way, I am using the CPU version for my initial testing.
Many thanks!
-
Hi,
here are sample files to help you in your testing: hits.csv 2021-05-04,121,1000,1 2020-04-20,131,2000,1 2020-04-21,131,2000,0 2020-04-22,131,2000,2 2020-04-23,131,2000,3 2020-04-24,131,2000,2 2020-04-25,131,2000,0 2020-04-26,131,2000,4 2020-04-27,131,2000,5 2020-04-28,131,2000,1
base.csv 2021-05-04,121,1000,1 2021-12-10,131,2000,1 2021-12-11,131,2000,1 2021-12-12,131,2000,1 2021-12-13,131,2000,1 2021-12-14,131,2000,1 2021-12-15,131,2000,1 2021-12-16,131,2000,1 2021-12-17,131,2000,1 2021-12-18,131,2000,1
Thanks.
-
Hi,
I will try with this data.
I already tried with some data(50m rows for both tables) I derived by another table, with omniscidb version from 5.5 to 5.9 and I don't get any error with a query or with cats (CPU or GPU is the same)
Which version are you using? Which are the cardinality of alpha and beta (I am on 4.8 million distinct rows on 50m total) and the size of tables? How many records is the query itself is returning (I am around 100)
Sorry for the massive number of questions.
Candido.
P.s. I can share my data, just to know if it's a problem with the data
-
Hi,
thanks for your support. This is the version I have: omnisql> \status Server Version : 5.9.0-20211123-d294f1e842
I get the error, even with this super slim dataset I shared (10 lines per table). The query is not returning anything because it is crashing.
Another thing I just noticed. Every time I connect to omnisql, I get this: Thrift: Sat Dec 11 19:25:26 2021 TSocket::open() connect() : Connection refused Do you get something like this?
Please, let me know how can I help?
-
Hi Candido,
I have tested the scenario above, using the docker image (two different servers) and I get the some error. Steps: 1. Followed the instructions from here https://docs.omnisci.com/installation-and-configuration/installation/install-docker/docker-open-source-cpu to install docker 2. Connected to the container's bash and added a folder mkdir ../omnisci-storage/data/mapd_import/ added file base.csv and hits.csv with just the 10 lines. 3. Connected to the container's omnisql: CREATE TABLE hits ( daydt DATE NOT NULL, alpha TEXT NOT NULL, beta TEXT NOT NULL, hits_val SMALLINT NOT NULL ); CREATE TABLE base ( daydt DATE NOT NULL, alpha TEXT NOT NULL, beta TEXT NOT NULL, base_val SMALLINT NOT NULL ); copy base from '/omnisci-storage/data/mapd_import/base.csv' with (header = 'false'); copy hits from '/omnisci-storage/data/mapd_import/hits.csv' with (header = 'false'); CREATE TABLE last_date AS (SELECT alpha,beta,DATEDIFF('daydt', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 and (alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt and base_val=1) and beta NOT IN ('1000') GROUP BY alpha,beta);
After the last query the container crashed.
Regards.
-
Hi,
tried with your data and everything worked without issues
omnisql> \\d base CREATE TABLE base ( daydt DATE NOT NULL ENCODING DAYS(32), alpha TEXT NOT NULL ENCODING DICT(32), beta TEXT NOT NULL ENCODING DICT(32), base_val SMALLINT NOT NULL); omnisql> \\d hits CREATE TABLE hits ( daydt DATE NOT NULL ENCODING DAYS(32), alpha TEXT NOT NULL, beta TEXT NOT NULL, hits_val SMALLINT NOT NULL, SHARED DICTIONARY (alpha) REFERENCES hits(alpha), SHARED DICTIONARY (beta) REFERENCES hits(beta)); mnisql> select * from base; daydt|alpha|beta|base_val 2021-05-04|121|1000|1 2021-12-10|131|2000|1 2021-12-11|131|2000|1 2021-12-12|131|2000|1 2021-12-13|131|2000|1 2021-12-14|131|2000|1 2021-12-15|131|2000|1 2021-12-16|131|2000|1 2021-12-17|131|2000|1 2021-12-18|131|2000|1 10 rows returned. Execution time: 26 ms, Total time: 28 ms 10 rows returned. Execution time: 62 ms, Total time: 64 ms omnisql> select * from hits; daydt|alpha|beta|hits_val 2021-05-04|121|1000|1 2020-04-20|131|2000|1 2020-04-21|131|2000|0 2020-04-22|131|2000|2 2020-04-23|131|2000|3 2020-04-24|131|2000|2 2020-04-25|131|2000|0 2020-04-26|131|2000|4 2020-04-27|131|2000|5 2020-04-28|131|2000|1 10 rows returned. Execution time: 26 ms, Total time: 28 ms omnisql> \\cpu omnisql> create table last_date as SELECT /*+ cpu_mode */ alpha, beta, DATEDIFF('DAY', current_date(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha) IN (SELECT alpha FROM base WHERE current_date()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta; omnisql> SELECT /*+ cpu_mode */ alpha, beta, DATEDIFF('DAY', current_date(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha) IN (SELECT alpha FROM base WHERE current_date()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta; alpha|beta|last_dt 131|2000|-592 1 rows returned. Execution time: 54 ms, Total time: 54 ms
except for the huge time, the query tools to return everything is ok.
Ok, tomorrow I will try with a docker installation, because I am trying on an Ubuntu 20 tarball installation.
Candido
-
Thanks Candido, This is what I get:
omnisql> \\d base CREATE TABLE base ( daydt DATE NOT NULL ENCODING DAYS(32), alpha TEXT NOT NULL ENCODING DICT(32), beta TEXT NOT NULL ENCODING DICT(32), base_val SMALLINT NOT NULL); omnisql> \\d hits CREATE TABLE hits ( daydt DATE NOT NULL ENCODING DAYS(32), alpha TEXT NOT NULL ENCODING DICT(32), beta TEXT NOT NULL ENCODING DICT(32), hits_val SMALLINT NOT NULL); omnisql> select * from base; daydt|alpha|beta|base_val 2021-05-04|121|1000|1 2021-12-10|131|2000|1 2021-12-11|131|2000|1 2021-12-12|131|2000|1 2021-12-13|131|2000|1 2021-12-14|131|2000|1 2021-12-15|131|2000|1 2021-12-16|131|2000|1 2021-12-17|131|2000|1 2021-12-18|131|2000|1 10 rows returned. Execution time: 19 ms, Total time: 22 ms omnisql> select * from hits; daydt|alpha|beta|hits_val 2021-05-04|121|1000|1 2020-04-20|131|2000|1 2020-04-21|131|2000|0 2020-04-22|131|2000|2 2020-04-23|131|2000|3 2020-04-24|131|2000|2 2020-04-25|131|2000|0 2020-04-26|131|2000|4 2020-04-27|131|2000|5 2020-04-28|131|2000|1 10 rows returned.
I don't get the lines "SHARED...\
-
Hi @amok
I have reproduced your error but changed the query in this way.
SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), MAX(daydt)) AS last_dt FROM hits WHERE hits_val > 0 AND (alpha, beta) IN (SELECT DISTINCT alpha, beta FROM base WHERE CURRENT_DATE() = daydt AND base_val = 1) AND beta NOT IN ('1000') GROUP BY alpha, beta;
So using more than one column in the IN clause against a subquery.
The software needs just one column returned by the subquery, so a fatal error is issued, and the server has to stop, and with just one column into IN clause, it's almost impossible to get such an error.
Can you try to run this query?
SELECT alpha, beta, DATEDIFF('DAY', current_date(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha) IN (SELECT alpha FROM base WHERE current_date()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta;
And if you are getting an error, please post a more detailed log, starting from the start of statement execution like this one.
2021-12-12T09:21:19.656922 I 149 0 3 DBHandler.cpp:1273 stdlog_begin sql_execute 25 0 omnisci admin 843-0UvY {"query_str"} {"CREATE TABLE last_date AS (SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta);"} 2021-12-12T09:21:19.658816 I 149 0 3 Calcite.cpp:553 User calcite catalog omnisci sql 'CREATE TABLE last_date AS (SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits WHERE hits_val>0 AND (alpha,beta) IN (SELECT DISTINCT alpha,beta FROM base WHERE CURRENT_DATE()=daydt AND base_val=1) AND beta NOT IN ('1000') GROUP BY alpha,beta);' 2021-12-12T09:21:19.679722 I 149 0 3 Calcite.cpp:588 Time in Thrift 1 (ms), Time in Java Calcite server 19 (ms) 2021-12-12T09:21:19.681614 I 149 0 3 Calcite.cpp:553 User calcite catalog omnisci sql '(SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), MAX(daydt)) AS last_dt FROM hits WHERE hits_val > 0 AND ROW(alpha, beta) IN (SELECT DISTINCT alpha, beta FROM base WHERE CURRENT_DATE() = daydt AND base_val = 1) AND beta NOT IN ('1000') GROUP BY alpha, beta)' 2021-12-12T09:21:19.724882 I 149 0 3 Calcite.cpp:588 Time in Thrift 1 (ms), Time in Java Calcite server 42 (ms) 2021-12-12T09:21:19.725740 I 149 0 3 Calcite.cpp:553 User calcite catalog omnisci sql '(SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), MAX(daydt)) AS last_dt FROM hits WHERE hits_val > 0 AND ROW(alpha, beta) IN (SELECT DISTINCT alpha, beta FROM base WHERE CURRENT_DATE() = daydt AND base_val = 1) AND beta NOT IN ('1000') GROUP BY alpha, beta)' 2021-12-12T09:21:19.760475 I 149 0 3 Calcite.cpp:588 Time in Thrift 0 (ms), Time in Java Calcite server 34 (ms) 2021-12-12T09:21:19.802247 F 149 0 3 RelAlgDagBuilder.cpp:55 Check failed: size_t(1) == row_set->colCount() (1 == 2) 2021-12-12T09:21:19.940335 I 149 0 11 MapDServer.cpp:317 Interrupt signal (6) received.
Regards, Candido
-
Hi Candido,
thank you for your support.
In my original post I have mentioned that using (alpha, beta) IN (SELECT...) crashes the server, but if I change it to (alpha) IN (SELECT...) it works.
Could you please explain if this is a bug or the software is designed to work like this? If we can't have two columns in the IN clause, could you please explain how we should write the queries? Is there some documentation I could read?
Apologies for all the questions, but I would like to understand if the software will fit my use case.
-
Hi @amok ,
Don't worry, if possible you can re-write your query with an EXIST instead the IN this way
SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits h WHERE hits_val>0 AND exists (SELECT 1 FROM base b WHERE CURRENT_DATE()=daydt AND base_val=1 AND h.alpha=b.alpha AND h.beta=b.beta) AND beta NOT IN ('1000') GROUP BY alpha,beta;
Candido
-
Hi @amok ,
Don't worry, if possible you can re-write your query with an EXIST instead the IN this way
SELECT alpha, beta, DATEDIFF('DAY', CURRENT_DATE(), max(daydt)) AS last_dt FROM hits h WHERE hits_val>0 AND exists (SELECT 1 FROM base b WHERE CURRENT_DATE()=daydt AND base_val=1 AND h.alpha=b.alpha AND h.beta=b.beta) AND beta NOT IN ('1000') GROUP BY alpha,beta;
Candido
-
Well let me know if the performances are fine, because we can rewrite the query in other ways.
about CONCAT of strings, you can enable using that switch enable-interoperability=true, but it's a developer option and it's not working on aggregates, but just in projections queries.
so you should run a CTAS and then a select to make it works
-
You have to set up the parameter I mentioned in the previous post and then you can use combat using a double pipe on projections or updates (simple. and against a text encoding none column).
So
Select alpha||'-'||beta from hits
You cannot use it with the group by queries, so it's not so useful for your queries
Candido
Please sign in to leave a comment.
Comments
17 comments