Unable to load geo and array columns using pyomnisci

Comments

8 comments

  • Avatar
    Candido Dessanti

    Hi,

    I am or sure to understand the problem you are facing, but you can refer to this topic/post about how to insert spatial data using pymapd

    https://community.heavy.ai/t/how-use-load-table-with-spatial-data-like-point/2570/3?u=candido.dessanti

    Let me know if it fixed your problem inserting geodata into omniscidb.

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Anirudh Simha

    Hi Candido, I was able to load it using columnar method. But it's very slow compared to the arrow method. I will try doing it via omnisql since our source is parquet files in s3 and see how it goes. Btw is the geo and array column support using arrow in the roadmap in the near future? Thanks for all the help.

    Regards, Anirudh Simha

    0
    Comment actions Permalink
  • Avatar
    Anirudh Simha

    (post deleted by author)

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    I cannot provide a date, because while we would like to do everything, the engineering resources are limited, so it depends on how much a feature is requested. If you have a Github account you can open an Issue at this link https://github.com/omnisci/pymapd/issues?q=is%3Aissue

    Could you provide a snippet of the code you are using to load the table with arrow (I think that the problem with arrays and geometries objects are related).

    The throughput shouldn't be so bad ith the columnar method. The problem is that the method is serial so just 1 core is used while the copy command uses by default all the cores of the machine where the server is installed

    Candido.

    0
    Comment actions Permalink
  • Avatar
    Anirudh Simha

    Sure I'll open an issue on github :slightly_smiling_face: PFB the code we use for importing.

    import pymapd
    import pyarrow.parquet as pq
    from pyarrow import fs
    
    con = pymapd.connect(user="xxx", password="xxx", host="xxx", port=xxx, dbname="xxx")
    //load arrow_table from s3
    con.load_table("xxx", arrow_table, method='arrow', preserve_index=False)
    con.close() 
    
    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Well,

    this is somethjing I did a long time ago to load an arrow table into the database (I guess everything is local because it used shared memory).

    it's multithreaded, but the speedup wasn't spectacular, because the table is locked everytime the load_table_arrow method is called

    import pyarrow as pa
    from pyarrow import parquet as pq
    import pymapd
    import time
    from  threading import Thread
    
    def connect_then_load(url,table_name,thread_num,num_iter):
      con = connect(uri=uri)
      for x in range(0,num_iter):
        if (thread_num*num_iter+x<len(df_b)):
          con.load_table_arrow(table_name,df_b[thread_num*num_iter+x])
    
    thread_runs = [];
    num_threads=8
    reader = pa.RecordBatchStreamReader("/opt/root_ubuntu18/opt/opendata/flights/flights_none.arrow")
    df_b = reader.read_all().to_batches()
    num_iter= round(len(df_b)/(num_threads-1))
    
    from pymapd import connect
    uri = "mapd://admin:HyperInteractive@localhost:6274/omnisci?protocol=binary"
    
    start_time=time.time()
    
    for thread_num in range(0,num_threads):
      thread_runs.append(Thread(target=connect_then_load,args=(uri,'flights_parquet',thread_num,num_iter)))
      thread_runs[len(thread_runs)-1].start()
    
    for t in thread_runs:
      t.join()
    
    0
    Comment actions Permalink
  • Avatar
    Anirudh Simha

    Definitely looks worth trying!

    Thanks a lot, Candido.

    Regards, Anirudh Simha

    0
    Comment actions Permalink
  • Avatar
    Anirudh Simha

    Hi Candido, I was hoping that COPY command would work but it also throws the same error! This is what I'm using:

    COPY xxx FROM 's3://xxx' WITH(parquet='true', s3_access_key='XXXXXXXX', s3_secret_key='XXXXXXXX');
    Loader Failed due to : Arrow array appends not yet supported in 0.286000 secs
    
    0
    Comment actions Permalink

Please sign in to leave a comment.