CREATE FOREIGN Usage
Hi Team,
I was trying to use FOREIGN storage from s3. I tried enabling the flag and while using CREATE FOREIGN TABLE SQL is throwing parsing error.
I'm compiling the OmnisciDB from scratch and enabled ENABLE_S3_FSI. Am I missing anything. I example of creating a table from S3 would be really helpful.
Thanks Gopinath
-
Hi @Gopinath_Jaganmohan,
I just asked, but have you tried to use the s3 parameters of copy command docs here when you created the server for foreign data wrapper?
Could you share the syntax you used to try to load your external file?
regards, Candido
-
Thanks for the Reply.
Ok, I just tried with one of the examples for this forum. Will the below example work? What I have to do to enable this while compiling the code.
"create foreign table flight_fsi_en ( flight_year SMALLINT, flight_mmonth SMALLINT, flight_dayofmonth SMALLINT, flight_dayofweek SMALLINT, deptime SMALLINT, crsdeptime SMALLINT, arrtime SMALLINT, crsarrtime SMALLINT, uniquecarrier TEXT ENCODING NONE, flightnum SMALLINT, tailnum TEXT ENCODING NONE, actualelapsedtime SMALLINT, crselapsedtime SMALLINT, airtime SMALLINT, arrdelay SMALLINT, depdelay SMALLINT, origin TEXT ENCODING NONE, dest TEXT ENCODING NONE, distance SMALLINT, taxiin SMALLINT, taxiout SMALLINT, cancelled SMALLINT, cancellationcode TEXT ENCODING NONE, diverted SMALLINT, carrierdelay SMALLINT, weatherdelay SMALLINT, nasdelay SMALLINT, securitydelay SMALLINT, lateaircraftdelay SMALLINT, dep_timestamp TIMESTAMP(0) ENCODING FIXED(32), arr_timestamp TIMESTAMP(0) ENCODING FIXED(32), carrier_name TEXT ENCODING NONE, plane_type TEXT ENCODING NONE, plane_manufacturer TEXT ENCODING NONE, plane_issue_date DATE ENCODING DAYS(16), plane_model TEXT ENCODING NONE, plane_status TEXT ENCODING NONE, plane_aircraft_type TEXT ENCODING NONE, plane_engine_type TEXT ENCODING NONE, plane_year SMALLINT, origin_name TEXT ENCODING NONE, origin_city TEXT ENCODING NONE, origin_state TEXT ENCODING NONE, origin_country TEXT ENCODING NONE, origin_lat FLOAT, origin_lon FLOAT, dest_name TEXT ENCODING NONE, dest_city TEXT ENCODING NONE, dest_state TEXT ENCODING NONE, dest_country TEXT ENCODING NONE, dest_lat FLOAT, dest_lon FLOAT) server test_server with (file_path='flights/flights.csv.gz');
We have requirement to use the FOREIGN storage at least to start with I'm good with local files. I found test case but not sure functionalities are implemented.
-
Hi @Gopinath_Jaganmohan,
As @Pey.Silvester stated the FSI on s3 storage isn't ready yet, but it shouldn't take a long time to be generally available
About the functionality on local files, it should be available but, if I am not wrong, you have .to enable the feature at database startup explicitly
The parameter is called enable-fsi and has to be set to true.
so first of all, start the server with the parameter enable-fsi turned to true (or add the parameter on omnisci.conf)
bin/omnisci_server --data /mapd_storage/data_test --enable-fsi=true
then connect to the database with omnisql, create your SERVR then the table
omnisql> CREATE SERVER test_server FOREIGN DATA WRAPPER omnisci_csv WITH (storage_type = 'LOCAL_FILE', base_path = '/home/candido/'); omnisql> CREATE FOREIGN TABLE test_table_fsi (test_column_1 INTEGER NOT NULL, test_column_2 text encoding dict(32), test_column_3 date not null) SERVER test_server WITH (file_path='test_fsi_table.csv', header='false'); omnisql> SELECT * FROM test_table_fsi LIMIT 2; test_column_1|test_column_2|test_column_3 1|Hello|2020-10-12 2|Goddbye|2021-01-13 2 rows returned. Execution time: 157 ms, Total time: 170 ms
If you are starting the server from the command line, be sure to have the number of files opened set to unlimited; you can change with limit -n 65535.
basically, the options are the same as the COPY command, so if you have a pipe-delimited file, you have to specify the delimiter in the WITH clause
omnisql> CREATE FOREIGN TABLE test_table_fsi_s (test_column_1 INTEGER NOT NULL, test_column_2 text encoding dict(32), test_column_3 date not null) SERVER test_server with (file_path='test_fsi_table_s.csv', header='false', delimiter='|'); omnisql> select * from test_table_fsi_s limit 2; test_column_1|test_column_2|test_column_3 1|Hello|2020-10-12 2|Goddbye|2021-01-13 2 rows returned. Execution time: 129 ms, Total time: 152 ms
If you have further questions, don't hesitate to ask.
Regards, Candido
-
Thank you very much, Candido. It worked I thought it enabled by default. I know it's difficult, but when is the expected timeline for S3 FSI.
Some clarification: What does "CREATE SERVER test_server FOREIGN DATA WRAPPER omnisci_csv" command achieve?
Also, I'm planning to use Arrow instead of CSV, Pls can you provide an example of Using Arrow format too.
Regards Gopinath
-
Hi,
It's not enabled by default because it's a quite new feature, that needs some refining.
The S3 storage type could be ready for the next release, but it's still in the works so it could shift. Anyway, we will notify thru the community forum when will be released.
The CREATE SERVER is needed to describe where the files are located, which kind of file is expected, and other fixed attributes (e.g. username, URL, the key of S3 storage) that would be annoying to specify in each CREATE FOREIGN TABLE statement. So if you have a fixed local path with parquet files you will issue a command like
CREATE SERVER myfiles_srv FOREIGN DATA WRAPPER omnisci_parquet WITH ( storage_type = 'LOCAL_FILE' , base_path = '/some_path' );
So the users that have granted the use of that server just need to know the name and the fields of parquet files without the need to specify that the files are in parquet format and to know where the files are located in the server. In the case of S3, you don't need to give any part or credential to end-users to make them able to load the files.
If you need to move the files in another filesystem you have just to alter the server definition (I haven't tried it yet), without the need of altering all the foreign tables.
Right now we support CSV and PARQUET files only; I will check if and when we will extend the support to arrow files too, and I will report back.
Regards, Candido
-
Hi Team,
As I see FSI is implemented and documented, I started trying it with S3 parquet as per documentation.
When I try to create Server with the below create a statement.
CREATE SERVER example_S3_parquet_server FOREIGN DATA WRAPPER parquet_file WITH ( storage_type = 'AWS_S3', base_path = '/', s3_bucket = 'ursa-labs-taxi-data.s3.us-east-2.amazonaws.com' );
I'm getting error as below.
Error: Invalid foreign server option "S3_BUCKET". Option must be one of the following: BASE_PATH, STORAGE_TYPE.
Not sure what its wrong in this. BTW, I'm compiling the code from the github source.
Regards Gopinath
-
Try this. I was surprised you didn't get the error: "Foreign server options must contain "AWS_REGION" "
CREATE SERVER example_S3_parquet_server FOREIGN DATA WRAPPER parquet_file WITH ( storage_type = 'AWS_S3', base_path ='/', s3_bucket = 'ursa-labs-taxi-data.s3.us-east-2.amazonaws.com', aws_region = 'us-west-1' );
-
Nope it should be generally available to OS or FREE users and the s3 support should be enabled by default when you build from source.
BTW I'll chekcn internally, and on a build of OS.
If you are in doubt to speed-up this check you could download the EE using a FREE license.
Check this link
-
Hi @Gopinath_Jaganmohan ,
I'm sorry but FSI for AWS is a feature of EE/Free and it's not included in OS version. if you want to try or use it, you should download an EE version of the software and activate it with a FREE key.
I'll come back to you if this feature will be available to OS users.
My apologies for the wrong information.
Candido
Please sign in to leave a comment.
Comments
21 comments