How to pass ARRAY[] parameters in sql and python?
Hi Experts,
I've implemented a UDF which takes 2 parameters, both are ARRAY[] float. I was wondering how to pass values to it via pymapd and sql?
SQL: I've tried this one select feature, float_dot(feature, ARRAY[1.25 ,1.25 ]) from samplevector ; and got the following error Exception: Function float_dot(FLOAT[], NUMERIC(3,2)[]) not supported. Existing extension function implementations: float_dot({float, i64, i8}, {float, i64, i8}) -> float
seems like the array was converted to NUMERIC by default. and then I tried another select * , float_dot(feature, ARRAY[CAST(1.25 AS FLOAT),CAST(1.25 AS FLOAT)]) from samplevector where float_dot(feature, ARRAY[CAST(1.25 AS FLOAT),CAST(1.25 AS FLOAT)]) >5 ;
it run successfully for the first time d|feature|rowid|EXPR$3
1|{1.2, 3.4}|0|5.75
Howeveer if I run it second time, I will get error:
Thrift error: No more data to read.
Thrift connection error: No more data to read.
Retrying connection
Could you please give some suggestion?
Also How to use pymapd to pass array?
I tried in Pyson:
import pymapd import pandas as pd con = pymapd.connect(user="admin", password="HyperInteractive", host="localhost") data=[1.23,1.25] query = """select feature, float_dot(feature, :testdata) from samplevector""" df = pd.read_sql(query, con,params={'testdata': data}) Traceback (most recent call last): File "/home/faceos/.local/lib/python3.7/site-packages/pymapd/cursor.py", line 117, in execute nonce=None, first_n=-1, at_most_n=-1) File "/home/faceos/.local/lib/python3.7/site-packages/omnisci/mapd/MapD.py", line 1598, in sql_execute return self.recv_sql_execute() File "/home/faceos/.local/lib/python3.7/site-packages/omnisci/mapd/MapD.py", line 1627, in recv_sql_execute raise result.e omnisci.mapd.ttypes.TMapDException: TMapDException(error_msg='Exception: Function float_dot(FLOAT[], NULL) not supported.\ Existing extension function implementations:\ float_dot({float, i64, i8}, {float, i64, i8}) -> float\ ')
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py", line 1586, in execute cur.execute(args, kwargs) File "/home/faceos/.local/lib/python3.7/site-packages/pymapd/cursor.py", line 119, in execute raise _translate_exception(e) from e pymapd.exceptions.Error: Exception: Function float_dot(FLOAT[], NULL) not supported. Existing extension function implementations: float_dot({float, i64, i8}, {float, i64, i8}*) -> float
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py", line 1590, in execute self.con.rollback() AttributeError: 'Connection' object has no attribute 'rollback'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "", line 1, in File "/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py", line 412, in read_sql chunksize=chunksize, File "/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py", line 1633, in read_query cursor = self.execute(args) File "/home/faceos/.local/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute raise ex from inner_exc pandas.io.sql.DatabaseError: Execution failed on sql: select feature, float_dot(feature, :testdata) from samplevector Exception: Function float_dot(FLOAT[], NULL) not supported. Existing extension function implementations: float_dot({float, i64, i8}, {float, i64, i8}*) -> float
unable to rollback
Any suggestions?
Best, Zipepng
-
Hi @zipeng -
Currently, we do not support using variable-width functions in UDFs. So passing ~~arrays~~, strings or ~~geospatial~~ types won't work. Using integers, float, boolean should work.
As we develop the user-defined function functionality further, we hope to support these other types, but unfortunately I don't know the timeline for that.
Best, Randy
Edited based on @anon45925161 clarification
-
Hi @zipeng,
Thanks for the question! To clarify @randyzwitch's response, we do support variable-length datatypes as UDF inputs -- namely arrays and geospatial types (we do not yet support string inputs). We also support varlen array outputs now, in C++ -- see https://github.com/omnisci/omniscidb/blob/master/Tests/UdfTest.cpp#L473 for an example.
However, we do not support the
ARRAY
function yet. Our variable length runtime is built to read directly from input buffers.ARRAY
attempts to mimic an input buffer, but isn't quite identical. We have a project underway to move the varlen runtime from reading from input buffers to having its own intermediate representation (the representation already exists, calledArrayDatum
), but it is performance sensitive so we have to be a little careful.I'll post here once we have some progress -- should be weeks to months, not months to years.
Thanks, Alex
Please sign in to leave a comment.
Comments
3 comments