Keeping geometric data types in any capacity when joining

Comments

9 comments

  • Avatar
    Michael Flaxman

    Just chiming in here in case anyone else is reading this later on .... as of 5.11, we are now supporting multi-fragment joins such as the above.

    There are still some limits relative to internal pointers on large (>2G) geo columns which we are doing further work to address.

    1
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @ldnker,

    Is this just happening when you join two tables? I'm going to reproduce by myself, because it would work; it doesn't look like a multistep query that needs intermediate results.

    0
    Comment actions Permalink
  • Avatar
    ldnker

    Hi @candido.dessanti , yes this happened when I was just joining the two tables.

    More specifically this happens when I add one of the geometry columns to the Select statement. If I just return the id or any other non-geometric column it works like a charm.

    Let me know if I should to replicate this in one of the demo environments.

    Best Frederic

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    I can't reproduce the issue.

    I created two tables that I join with a bigint column. The object table contains 5m records and the geometries table around 200k, and the relationship is N-1.

    omnisql> \\d geometries
    CREATE TABLE geometries (
      sez2011 BIGINT,
      omnisci_mp GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32),
      omnisci_p GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32));
    omnisql> \\d objects
    CREATE TABLE objects (
      sez2011 BIGINT,
      name TEXT,
      place TEXT,
      SHARED DICTIONARY (name) REFERENCES italy_poi(name),
      SHARED DICTIONARY (place) REFERENCES italy_poi(place));
    omnisql> select count(*) from objects;
    EXPR$0
    5769509
    1 rows returned.
    Execution time: 167 ms, Total time: 168 ms
    omnisql> select count(*) from geometries;
    EXPR$0
    269009
    1 row returned.
    Execution time: 12 ms, Total time: 13 ms
    

    if I execute a query similar to yours, I haven't any problem

    omnisql> select * from objects o inner join geometries g on o.sez2011=g.sez2011 limit 10;
    240080000039|NULL|NULL|240080000039|MULTIPOLYGON (((11.3269491453315 45.5305159862761,11.3276593439875 45.5305015274931,11.3278205279856 45.5307614922201,11.3273821544495 45.5310764841414,11.3269112591291 45.5309173118001,11.326338188409 45.5305284334023,11.3258662034412 45.5305740728651,11.3254992437202 45.5304487634126,11.3252829067992 45.5306376915102,11.3253637083459 45.5308115740916,11.3250277616666 45.531050206875,11.3247265160665 45.5310428308002,11.3252616167652 45.5301205699938,11.3248200581059 45.5300507906499,11.3248009473667 45.5302852324817,11.3244266115708 45.5302140701237,11.3244778249992 45.529981262763,11.3240775053036 45.529899371569,11.3241412915867 45.5295830385432,11.3235102180967 45.529420387712,11.3229827449298 45.5293613372042,11.3218416326315 45.5291730377493,11.3216229487777 45.5296162727892,11.3220948499265 45.5298789616348,11.3218293950529 45.5301206538128,11.3213630259602 45.5299906085851,11.3212807994901 45.5298595156194,11.3210435916302 45.5297743554831,11.3209727645484 45.5289881748748,11.3211777859 45.5288354985085,11.3214424864023 45.5285758271481,11.3210858364222 45.52838953935,11.3209041167614 45.5282537106091,11.3211873412697 45.5281332207509,11.3215317536711 45.5280227053575,11.3212812185852 45.5276925002819,11.3214639440744 45.5276190329006,11.3214224536537 45.5275456074288,11.3213707373111 45.5274543704127,11.3217754155964 45.5272526180033,11.3225007854973 45.5270646119151,11.322808988077 45.5266195328565,11.3232088048585 45.5260668301611,11.3233267382362 45.5258978929026,11.3232647121527 45.5256358746093,11.3229429308898 45.5253633789371,11.3226431102132 45.5250791905099,11.3230833277679 45.5245729235581,11.3232384767957 45.5243762422001,11.3239054248314 45.5242479152625,11.3245117717071 45.5241230668147,11.3246626459643 45.5245227997771,11.3249096606508 45.5250781008625,11.3250829145895 45.5253198349501,11.3252082240419 45.5256413228464,11.3255518820721 45.5261338853865,11.3253329467612 45.5262598653912,11.3253891893315 45.5263824926346,11.3254340325135 45.5264603186056,11.324857273756 45.5266813493924,11.3247152843162 45.5268867898392,11.3250928052352 45.5269578264686,11.3247249235049 45.5274311525409,11.3244183973057 45.5278401894159,11.3232347049393 45.5275492535567,11.3231568370588 45.527917135287,11.3234762713888 45.5279961347245,11.3249085710034 45.5283905451784,11.3265077543103 45.5286640047695,11.3262441434554 45.5290249295202,11.3260760024777 45.5294514007538,11.3261315744958 45.5294795220389,11.3260974601498 45.5298155106277,11.3263204187743 45.5300989865931,11.3269491453315 45.5305159862761)))|POINT (11.3233019278028 45.526939092915)
    [CUT]
    10 rows returned.
    Execution time: 30 ms, Total time: 34 ms
    

    I also tried to duplicate the records in the geometries table, changing the relationship from N-1 to N-N, but the query runs without issues. Which datatype are you using to join the tables? Wich relationship between the tables?

    We do this kind of join in Immerse since 4.0 release for choropleth maps, so I'm really curious to find out the reason of error you are getting.

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    ldnker

    Hi.

    Thank you for your quick answer and I am happy to hear that this should generally be possible.

    Below you will find the table definitions for the geometries table and the relevant part of the objects table.

    If this helps in any way I could also try to make a small part of our data available to you for testing internally.

    
    CREATE TABLE geometries (
      location_mapped_id TEXT ENCODING DICT(32),
      api TEXT ENCODING DICT(16),
      original_type TEXT ENCODING DICT(16),
      creation_date DATE ENCODING DAYS(32),
      point_wkt GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32),
      polygon_wkt GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
      mpolygon_wkt GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32));
    

    And the relevant part of the objects table

    CREATE TABLE objects (
      location_mapped_id TEXT,
      location_mapped_voc TEXT,
      location_mapped_country TEXT,
      SHARED DICTIONARY (location_mapped_id) REFERENCES objects(location_mapped_id),
      SHARED DICTIONARY (location_mapped_voc) REFERENCES objects(location_mapped_voc),
      SHARED DICTIONARY (location_mapped_country) REFERENCES objects(location_mapped_country);
    
    

    Just by looking at it, I see that they are different data types and one is part of a shared dictionary that I must have forgotten in the other table. If you recommend this I would be happy to try changing the geometries table accordingly.

    The relationship is / should be N-1 meaning we have a repetition of the location_mapped_id variable in the objects table but not in the geometries table.

    Best Frederic

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Yeah I tried both Objects to geometries, many to one relationship and many to many, using a bigint.

    If you have a many to many with a dictionary encoded string to could be a problem, but I want to try before saying anything.

    Candido

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @ldnker,

    Looking at your DDLs, I noticed that you are referencing as a shared dictionary column the table itself.

    [quote="ldnker, post:5, topic:2726"]

    CREATE TABLE objects (
      location_mapped_id TEXT,
      location_mapped_voc TEXT,
      location_mapped_country TEXT,
      **SHARED DICTIONARY (location_mapped_id) REFERENCES objects(location_mapped_id),**
      SHARED DICTIONARY (location_mapped_voc) REFERENCES objects(location_mapped_voc),
      SHARED DICTIONARY (location_mapped_country) REFERENCES objects(location_mapped_country);
    

    [/quote]

    It's wrong, and I haven't any idea about the problem that could arise from this syntax (likely nothing special), but I tried. The join is working, as it's working without using a shared dictionary, and it's also working using a shared dictionary key used by both tables.

    CREATE TABLE geometries_de (
      sez2011 TEXT NOT NULL ENCODING DICT(32),
      omnisci_mp GEOMETRY(MULTIPOLYGON, 4326) ENCODING COMPRESSED(32),
      omnisci_p GEOMETRY(POINT, 4326) ENCODING COMPRESSED(32),
      omnisci_pl GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32));
    
    
    CREATE TABLE objects_de_s (
      sez2011 TEXT NOT NULL,
      name TEXT ENCODING DICT(32),
      place TEXT ENCODING DICT(32),
      SHARED DICTIONARY (sez2011) REFERENCES geometries_de(sez2011));
    

    the query

    select * from objects_de_s o inner join geometries_de g on o.sez2011=g.sez2011 limit 10;

    is running without issues with many-to-one and many-to-many relationship.

    I'm sharing the dump of both tables here

    https://drive.google.com/drive/folders/1Wgk5E79vBNeGlIJDknRvlWSE-3YARuVy?usp=sharing

    open omnisql and run the commands

    restore table objects from '/path/objects_de_s.dmp'
    restore table geometries from '/path/geometries.dmp'
    

    the path can be the $OMNISCI_STORAGE/mapd_import or another one in the import allow-list (allowed-import-paths = ["root_path_1", "root_path_2", ...])

    If you can share your tables I can try out them in our systems

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @ldnker,

    I think I reproduced your issue but to be sure you would run this query on your geometries table.

    select max(cast(offset_in_fragment() as bigint))+1 as max_fragment_offset,count(*) the_count from geometries

    If the max_fragment_offset is lower than the_count it0s likely your table is spawning more than fragment, so the error when you run your join with the limit keyword (you should get another one without the limit keyword).

    To try to fix this, you can try to

    Increase the fragment size (the default is 32M, meaning that in 1 fragment can't be more than 32M rows) and the chunck_size to the max value of 2GB with a CTAS

    create table geometries_uf as select * from geometries_mf with (fragment_size=64000000,max_chunk_size=2147483648);

    and re-run the query to be sure that the table is one fragment only.

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Thank Mike,

    to remind us that this feature has been made in the actual release.

    Candido

    0
    Comment actions Permalink

Please sign in to leave a comment.