Data import from Postgres (PostGIS)

Comments

5 comments

  • Avatar
    Candido Dessanti

    Hi @andrewh ,

    To use SQLimporter, you have to ensure that all the JARs are present in the local machine and the paths are correct; the examples supplied in our docs have an example path only.

    In my system, to call the utility, I run this command.

    java -cp /opt/mapd/omnisci-ee-5.10.2-20220214-05c2d3ef72-Linux-x86_64-render/bin/omnisci-utility-5.10.2.jar:/home/candido/Downloads/postgresql-42.2.20.jar com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db omnisci --port 6274 -t test_table -su test_user -sp password -c "jdbc:postgresql://192.168.1.11:5433/test_database" -ss "select * from test_table"

    this assume that the database in installed into /opt/mapd/omnisci-ee-5.10.2-20220214-05c2d3ef72-Linux-x86_64-render the postgres JDBC driver is in /home/candido/Downloads/ directory and this is the correct user in my case jdbc:postgresql://192.168.1.11:5433/test_database; I have the postgres installed in the 192.168.1.11 host and it's using the 5433 port (It's not the default) and I'm connecting to a database called test_database.

    About your question, if the target table does not exist, the SQLImporter will create a new table inferencing datatype from the source table but will append data if the table already exists. So it's up to you to create the table in advance.

    When you import data from Immerse, in the preview screen, you can change the datatypes, the names of the column, and depending on the data, other attributes, like the delimiter, the compression and the geometry datatypes, etc.

    image|690x248

    Probably the preview tools isn't detecting as timestamp, but you can enforce the datatype of the column.

    You can check the accepted formats here https://docs.heavy.ai/sql/data-manipulation-dml/functions-operators#accepted-date-time-and-timestamp-formats

    Regards, Candido

    1
    Comment actions Permalink
  • Avatar
    Andrew

    Hi Candido

    Many thanks- my problem was with the path to the JDBC JAR I still get an error with the SQLImporter- the table gets created but no rows inserted:

    Exception in thread "main" java.lang.NoClassDefFoundError: org/postgis/PGgeometry
            at com.mapd.utility.db_vendors.PostGis_types.get_wkt(Db_vendor_types.java:142)
            at com.mapd.utility.SQLImporter.setColValue(SQLImporter.java:1039)
            at com.mapd.utility.SQLImporter.executeQuery(SQLImporter.java:414)
    

    My geometry is a polygon type, SRID 4326. I'm not sure if the SQLImporter needs a PG geography datatype (as opposed to geometry) I've tried casting it to a geography datatype, with no success. I've attached relevant files tbl_test002 data.txt|attachment (306 Bytes) tbl_test002 schema.txt|attachment (207 Bytes)

    For the Immerse importer- it seems:

    • you can only specify or override the datatypes for sources such as CSV etc
    • if a shapefile source- it doesn't pick up the datatypes (such as date or timestamp)
    • if a geojson source- it does pick up the datatype image|690x399 image|690x294

    The upshot of all this is that I can't quite get the SQLimporter to work- but for now, the geojson option through Immerse is o.k, although probably not ideal for big datasets

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @andrewh,

    Lately, PostGIS started to decouple the jdbc files from the Postgresql jdbc and move data containing geometries from Postgres to Heavydb need more JAR to be added to the classpath

    java -cp /home/candido/Downloads/postgresql-42.2.20.jar:/home/candido/Downloads/postgis-jdbc-2.5.0.jar:/home/candido/Downloads/postgis-geometry-2.5.0.jar:/opt/mapd/omnisci-ee-5.10.2-20220214-05c2d3ef72-Linux-x86_64-render/bin/omnisci-utility-5.10.2.jar com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db omnisci --port 6274 -t tbl_test002 -su test_postgis -sp password -c "jdbc:postgresql://127.0.0.1:5432/postgis_db" -ss "select * from tbl_test002"

    Adding the postgis-jdbc-2.5.0.jar and postgis-geometry-2.5.0.jar along postgresql-42.2.20.jar should solve the issue. However, I strongly suggest you use these versions because I had some problems using more recent versions of the drivers.

    Importing geometries with Immerse isn't so bad because the only significant overhead is the need to move files from your desktop to the server (except for the files on S3); after that, COPY command is issued to ingest data, and being parallel is probably the fastest option to load data into the database.

    Aren't you able to change data types using shapefiles? Then, if the data is compatible, the loader automatically does a cast.

    For example, changing the inferred type of objected column in the taxi_zones shapefile to a timestamp and loading the data with Immerse or Omnisql is flawless.

    omnisql> \\d taxi_zones_2
    CREATE TABLE taxi_zones_2 (
      OBJECTID TIMESTAMP(0),
      Shape_Leng DOUBLE,
      Shape_Area DOUBLE,
      zone TEXT ENCODING DICT(32),
      LocationID INTEGER,
      borough TEXT ENCODING DICT(32),
      omnisci_geo GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32));
    omnisql> copy taxi_zones_2 from '/mapd_storage/opendata/taxi/taxi_zones.zip' with (source_type='geo_file');
    Result
    Appending geo to table 'taxi_zones_2'...
    omnisql> select distinct objectid from taxi_zones_2 limit 5;
    objectid
    1970-01-01 00:00:01
    1970-01-01 00:00:02
    1970-01-01 00:00:03
    1970-01-01 00:00:04
    1970-01-01 00:00:05
    

    Naturally, if you already have data into a db, the fastest way to import it is likely to be the SQLImporter. Let me know if I can help in speeding up the data load.

    Best Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Andrew

    Hi Candido

    Many thanks- the extra JARs solved the problem.

    I didn't try changing the data type in the omnisci database and then loading from a shapefile- but that is ok My strong preference is for the SQLImporter method, since my data changes regularly- so a scripted/automated process direct from PostGIS DB (using the SQLimporter truncate option)

    Andrew

    0
    Comment actions Permalink
  • Avatar
    Andrew

    Hi Candido

    Sorry, one further question: Does (or can) SQLImporter create a logfile of what It has done? I noticed it sometime fails (say too many connections to database) without any error

    Andrew

    0
    Comment actions Permalink

Please sign in to leave a comment.