CREATE FOREIGN Usage

Comments

21 comments

  • Avatar
    Candido Dessanti

    Hi @Gopinath_Jaganmohan,

    I just asked, but have you tried to use the s3 parameters of copy command docs here when you created the server for foreign data wrapper?

    Could you share the syntax you used to try to load your external file?

    regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Pey Silvester

    Hi @Gopinath_Jaganmohan

    FOREIGN storage with S3 is not currently available. We'll notify you when it is.

    Regards, Pey

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Thanks for the Reply.

    Ok, I just tried with one of the examples for this forum. Will the below example work? What I have to do to enable this while compiling the code.

    "create foreign table flight_fsi_en ( flight_year SMALLINT, flight_mmonth SMALLINT, flight_dayofmonth SMALLINT, flight_dayofweek SMALLINT, deptime SMALLINT, crsdeptime SMALLINT, arrtime SMALLINT, crsarrtime SMALLINT, uniquecarrier TEXT ENCODING NONE, flightnum SMALLINT, tailnum TEXT ENCODING NONE, actualelapsedtime SMALLINT, crselapsedtime SMALLINT, airtime SMALLINT, arrdelay SMALLINT, depdelay SMALLINT, origin TEXT ENCODING NONE, dest TEXT ENCODING NONE, distance SMALLINT, taxiin SMALLINT, taxiout SMALLINT, cancelled SMALLINT, cancellationcode TEXT ENCODING NONE, diverted SMALLINT, carrierdelay SMALLINT, weatherdelay SMALLINT, nasdelay SMALLINT, securitydelay SMALLINT, lateaircraftdelay SMALLINT, dep_timestamp TIMESTAMP(0) ENCODING FIXED(32), arr_timestamp TIMESTAMP(0) ENCODING FIXED(32), carrier_name TEXT ENCODING NONE, plane_type TEXT ENCODING NONE, plane_manufacturer TEXT ENCODING NONE, plane_issue_date DATE ENCODING DAYS(16), plane_model TEXT ENCODING NONE, plane_status TEXT ENCODING NONE, plane_aircraft_type TEXT ENCODING NONE, plane_engine_type TEXT ENCODING NONE, plane_year SMALLINT, origin_name TEXT ENCODING NONE, origin_city TEXT ENCODING NONE, origin_state TEXT ENCODING NONE, origin_country TEXT ENCODING NONE, origin_lat FLOAT, origin_lon FLOAT, dest_name TEXT ENCODING NONE, dest_city TEXT ENCODING NONE, dest_state TEXT ENCODING NONE, dest_country TEXT ENCODING NONE, dest_lat FLOAT, dest_lon FLOAT) server test_server with (file_path='flights/flights.csv.gz');

    We have requirement to use the FOREIGN storage at least to start with I'm good with local files. I found test case but not sure functionalities are implemented.

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @Gopinath_Jaganmohan,

    As @Pey.Silvester stated the FSI on s3 storage isn't ready yet, but it shouldn't take a long time to be generally available

    About the functionality on local files, it should be available but, if I am not wrong, you have .to enable the feature at database startup explicitly

    The parameter is called enable-fsi and has to be set to true.

    so first of all, start the server with the parameter enable-fsi turned to true (or add the parameter on omnisci.conf)

    bin/omnisci_server --data /mapd_storage/data_test --enable-fsi=true

    then connect to the database with omnisql, create your SERVR then the table

    omnisql> CREATE SERVER test_server FOREIGN DATA WRAPPER omnisci_csv  
    WITH (storage_type = 'LOCAL_FILE', base_path = '/home/candido/');
    omnisql> CREATE FOREIGN TABLE test_table_fsi 
    (test_column_1 INTEGER NOT NULL,
    test_column_2 text encoding dict(32),
    test_column_3 date not null) 
    SERVER test_server 
    WITH (file_path='test_fsi_table.csv', header='false');
    omnisql> SELECT * FROM test_table_fsi LIMIT 2;
    test_column_1|test_column_2|test_column_3
    1|Hello|2020-10-12
    2|Goddbye|2021-01-13
    2 rows returned.
    Execution time: 157 ms, Total time: 170 ms
    

    If you are starting the server from the command line, be sure to have the number of files opened set to unlimited; you can change with limit -n 65535.

    basically, the options are the same as the COPY command, so if you have a pipe-delimited file, you have to specify the delimiter in the WITH clause

    omnisql> CREATE FOREIGN TABLE test_table_fsi_s 
    (test_column_1 INTEGER NOT NULL,
    test_column_2 text encoding dict(32),
    test_column_3 date not null) 
    SERVER test_server 
    with (file_path='test_fsi_table_s.csv', 
    header='false', 
    delimiter='|');
    omnisql> select * from test_table_fsi_s limit 2;
    test_column_1|test_column_2|test_column_3
    1|Hello|2020-10-12
    2|Goddbye|2021-01-13
    2 rows returned.
    Execution time: 129 ms, Total time: 152 ms
    

    If you have further questions, don't hesitate to ask.

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Thank you very much, Candido. It worked I thought it enabled by default. I know it's difficult, but when is the expected timeline for S3 FSI.

    Some clarification: What does "CREATE SERVER test_server FOREIGN DATA WRAPPER omnisci_csv" command achieve?

    Also, I'm planning to use Arrow instead of CSV, Pls can you provide an example of Using Arrow format too.

    Regards Gopinath

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    It's not enabled by default because it's a quite new feature, that needs some refining.

    The S3 storage type could be ready for the next release, but it's still in the works so it could shift. Anyway, we will notify thru the community forum when will be released.

    The CREATE SERVER is needed to describe where the files are located, which kind of file is expected, and other fixed attributes (e.g. username, URL, the key of S3 storage) that would be annoying to specify in each CREATE FOREIGN TABLE statement. So if you have a fixed local path with parquet files you will issue a command like

    CREATE SERVER myfiles_srv FOREIGN DATA WRAPPER omnisci_parquet WITH ( storage_type = 'LOCAL_FILE' , base_path = '/some_path' );

    So the users that have granted the use of that server just need to know the name and the fields of parquet files without the need to specify that the files are in parquet format and to know where the files are located in the server. In the case of S3, you don't need to give any part or credential to end-users to make them able to load the files.

    If you need to move the files in another filesystem you have just to alter the server definition (I haven't tried it yet), without the need of altering all the foreign tables.

    Right now we support CSV and PARQUET files only; I will check if and when we will extend the support to arrow files too, and I will report back.

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Hi @candido.dessanti ,

    Thank you very much. Once I tried to implement Arrow directly I think in release 5.4.1 or the first version of Create Temp with CSV file. I think I can contribute to the Arrow part.

    Regards Gopinath

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Thanks,

    Anyway I will ask internally, because it would be a nice feature to have in the database.

    If the disk space is the concern you can read arrow files and load into a temporary table using pymapd; the temporary tables are IM so they won't use disk space

    0
    Comment actions Permalink
  • Avatar
    Richard Wood

    Thanks all, I appreciate this discussion! :+1:

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Hi Team,

    As I see FSI is implemented and documented, I started trying it with S3 parquet as per documentation.

    When I try to create Server with the below create a statement.

    CREATE SERVER example_S3_parquet_server FOREIGN DATA WRAPPER parquet_file WITH ( storage_type = 'AWS_S3', base_path = '/', s3_bucket = 'ursa-labs-taxi-data.s3.us-east-2.amazonaws.com' );

    I'm getting error as below.

    Error: Invalid foreign server option "S3_BUCKET". Option must be one of the following: BASE_PATH, STORAGE_TYPE.

    Not sure what its wrong in this. BTW, I'm compiling the code from the github source.

    Regards Gopinath

    0
    Comment actions Permalink
  • Avatar
    Brad Shao

    Try this. I was surprised you didn't get the error: "Foreign server options must contain "AWS_REGION" "

    CREATE SERVER example_S3_parquet_server FOREIGN DATA WRAPPER parquet_file WITH ( storage_type = 'AWS_S3', base_path ='/', s3_bucket = 'ursa-labs-taxi-data.s3.us-east-2.amazonaws.com', aws_region = 'us-west-1' );

    1
    Comment actions Permalink
  • Avatar
    Brad Shao

    Forgot to say, use the actual region of your bucket.

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Thanks Brad. Is this only available in the EE version? because I'm compiling the source from github.

    Or I'm missing any compile flag to enable AWS S3 FSI?

    Thanks Gopi

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Nope it should be generally available to OS or FREE users and the s3 support should be enabled by default when you build from source.

    BTW I'll chekcn internally, and on a build of OS.

    If you are in doubt to speed-up this check you could download the EE using a FREE license.

    Check this link

    https://www.heavy.ai/product/downloads/free

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Thanks, Candido, I will wait for your reply.

    Regards Gopinath

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    @candido.dessanti Any update on this?

    Regards Gopinath

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @Gopinath_Jaganmohan ,

    I'm sorry but FSI for AWS is a feature of EE/Free and it's not included in OS version. if you want to try or use it, you should download an EE version of the software and activate it with a FREE key.

    I'll come back to you if this feature will be available to OS users.

    My apologies for the wrong information.

    Candido

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Thanks, Candido, for the clarification. That's what I thought.

    It would have been nicer if it was in the OS version. Pls can we have these called out properly in documentation.

    Gopi

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Yes we should impove this and aother things in the documentation

    0
    Comment actions Permalink
  • Avatar
    Gopinath Jaganmohan

    Hi Candido, Do we have "CREATE SERVER example_S3_parquet_server FOREIGN DATA WRAPPER parquet_file WITH ( storage_type = 'AWS_S3', base_path ='/', s3_bucket = 'ursa-labs-taxi-data.s3.us-east-2.amazonaws.com', aws_region = 'us-west-1' );"

     

    In the current OSS version(6.4)?

    -Gopi

     

     

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi Gopinath Jaganmohan,

    The current version of the OS version is still 6.2, but we should have 6.4 soon.

    Of course, if you want to try the 6.4, you can download and install the Free version.

     

    Regards,

    Candido

    0
    Comment actions Permalink

Please sign in to leave a comment.