If you use pandas.read_csv and heavyai.load_table to attempt to ingest a file that is greater than 2 GB in size, you might see the following exception:
TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array.
To avoid this problem when ingesting large files, set the read_csv
parameter chunksize to a number of rows less than 2 GB. This chunked approach allows heavyai
to convert the data sets without error. For example:
df = pd.read_csv('myfile.csv', parse_dates=[0], date_parser=lambda epoch: pd.to_datetime(epoch, unit='s'), chunksize=1000000)
Comments
0 comments
Article is closed for comments.