Data import from Postgres (PostGIS)
I am having problems with importing from a Postgres database:
Command (run on omnisci server)
java -cp /opt/omnisci/bin/omnisci-utility-5.10.2.jar:/tmp/postgresql-42.2.5.jar com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db omnisci --port 6274 -t tbl_import -su postgres -sp password -c "jdbc:postgresql://192.168.20.165/db_source:5432" -ss "select * from public.tbl_source"
I've assumed this JAR is the correct one- other posts on the forum have a different path/name
192.168.20.165/db_source:5432 is my (remote) Postgres server, database & port respectively. Also tried in the format
public.tbl_source is obviously the source table
java.sql.SQLException: No suitable driver found for jdbc:postgresql://192.168.20.165/db_source:5432 at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702) at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:228) at com.mapd.utility.SQLImporter.executeQuery(SQLImporter.java:360) at com.mapd.utility.SQLImporter.doWork(SQLImporter.java:348) at com.mapd.utility.SQLImporter.main(SQLImporter.java:333)
Notes * I've got the libpostgresql-jdbc-java package installed * It is not clear from the documentation if this utility will create the destination table in omnisci- or do I need to do that first ? * Alternatively- is there a way to define datatypes when using the importer in Immerse ? The problem I'm having is that timestamp or datetime columns from a shapefile are imported as strings
Hi @andrewh ,
To use SQLimporter, you have to ensure that all the JARs are present in the local machine and the paths are correct; the examples supplied in our docs have an example path only.
In my system, to call the utility, I run this command.
java -cp /opt/mapd/omnisci-ee-5.10.2-20220214-05c2d3ef72-Linux-x86_64-render/bin/omnisci-utility-5.10.2.jar:/home/candido/Downloads/postgresql-42.2.20.jar com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db omnisci --port 6274 -t test_table -su test_user -sp password -c "jdbc:postgresql://192.168.1.11:5433/test_database" -ss "select * from test_table"
this assume that the database in installed into /opt/mapd/omnisci-ee-5.10.2-20220214-05c2d3ef72-Linux-x86_64-render the postgres JDBC driver is in /home/candido/Downloads/ directory and this is the correct user in my case jdbc:postgresql://192.168.1.11:5433/test_database; I have the postgres installed in the 192.168.1.11 host and it's using the 5433 port (It's not the default) and I'm connecting to a database called test_database.
About your question, if the target table does not exist, the SQLImporter will create a new table inferencing datatype from the source table but will append data if the table already exists. So it's up to you to create the table in advance.
When you import data from Immerse, in the preview screen, you can change the datatypes, the names of the column, and depending on the data, other attributes, like the delimiter, the compression and the geometry datatypes, etc.
Probably the preview tools isn't detecting as timestamp, but you can enforce the datatype of the column.
You can check the accepted formats here https://docs.heavy.ai/sql/data-manipulation-dml/functions-operators#accepted-date-time-and-timestamp-formats
Many thanks- my problem was with the path to the JDBC JAR I still get an error with the SQLImporter- the table gets created but no rows inserted:
Exception in thread "main" java.lang.NoClassDefFoundError: org/postgis/PGgeometry at com.mapd.utility.db_vendors.PostGis_types.get_wkt(Db_vendor_types.java:142) at com.mapd.utility.SQLImporter.setColValue(SQLImporter.java:1039) at com.mapd.utility.SQLImporter.executeQuery(SQLImporter.java:414)
My geometry is a polygon type, SRID 4326. I'm not sure if the SQLImporter needs a PG geography datatype (as opposed to geometry) I've tried casting it to a geography datatype, with no success. I've attached relevant files tbl_test002 data.txt|attachment (306 Bytes) tbl_test002 schema.txt|attachment (207 Bytes)
For the Immerse importer- it seems:
- you can only specify or override the datatypes for sources such as CSV etc
- if a shapefile source- it doesn't pick up the datatypes (such as date or timestamp)
- if a geojson source- it does pick up the datatype
The upshot of all this is that I can't quite get the SQLimporter to work- but for now, the geojson option through Immerse is o.k, although probably not ideal for big datasets
Lately, PostGIS started to decouple the jdbc files from the Postgresql jdbc and move data containing geometries from Postgres to Heavydb need more JAR to be added to the classpath
java -cp /home/candido/Downloads/postgresql-42.2.20.jar:/home/candido/Downloads/postgis-jdbc-2.5.0.jar:/home/candido/Downloads/postgis-geometry-2.5.0.jar:/opt/mapd/omnisci-ee-5.10.2-20220214-05c2d3ef72-Linux-x86_64-render/bin/omnisci-utility-5.10.2.jar com.mapd.utility.SQLImporter -u admin -p HyperInteractive -db omnisci --port 6274 -t tbl_test002 -su test_postgis -sp password -c "jdbc:postgresql://127.0.0.1:5432/postgis_db" -ss "select * from tbl_test002"
postgresql-42.2.20.jarshould solve the issue. However, I strongly suggest you use these versions because I had some problems using more recent versions of the drivers.
Importing geometries with Immerse isn't so bad because the only significant overhead is the need to move files from your desktop to the server (except for the files on S3); after that, COPY command is issued to ingest data, and being parallel is probably the fastest option to load data into the database.
Aren't you able to change data types using shapefiles? Then, if the data is compatible, the loader automatically does a cast.
For example, changing the inferred type of objected column in the taxi_zones shapefile to a timestamp and loading the data with Immerse or Omnisql is flawless.
omnisql> \\d taxi_zones_2 CREATE TABLE taxi_zones_2 ( OBJECTID TIMESTAMP(0), Shape_Leng DOUBLE, Shape_Area DOUBLE, zone TEXT ENCODING DICT(32), LocationID INTEGER, borough TEXT ENCODING DICT(32), omnisci_geo GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32)); omnisql> copy taxi_zones_2 from '/mapd_storage/opendata/taxi/taxi_zones.zip' with (source_type='geo_file'); Result Appending geo to table 'taxi_zones_2'... omnisql> select distinct objectid from taxi_zones_2 limit 5; objectid 1970-01-01 00:00:01 1970-01-01 00:00:02 1970-01-01 00:00:03 1970-01-01 00:00:04 1970-01-01 00:00:05
Naturally, if you already have data into a db, the fastest way to import it is likely to be the SQLImporter. Let me know if I can help in speeding up the data load.
Best Regards, Candido
Many thanks- the extra JARs solved the problem.
I didn't try changing the data type in the omnisci database and then loading from a shapefile- but that is ok My strong preference is for the SQLImporter method, since my data changes regularly- so a scripted/automated process direct from PostGIS DB (using the SQLimporter truncate option)
Please sign in to leave a comment.