TPCH Q16, sort is slow

Comments

1 comment

  • Avatar
    Candido Dessanti

    Hi,

    Yes probably the last sort operation is limiting your performance, Anyway also for an SF of 10, the 1800ms figure isn't exactly a stellar performer.

    The join and the semijoin itself should go thru a perfect hash join, and after the hash index is created, they shouldn't take so much (on my workstation with a SF of 100 and 1 GPU it's taking around
    100ms) Removing the distinct in the count(distinct ps_suppkey) and the operation from the sort, the query is taking around 400ms (with the sort of the first 3 fields taking 329ms), and using the approx_count_distinct function is taking around 1000ms (the sort is 327ms).

    So I guess we have some troubles while trying to sort the results of a reduced operation; we have an optimized path if just the top n results are needed, so if you try to run

    heavysql> select p_brand, p_type, p_size, approx_count_distinct(ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#41' and p_type not like 'PROMO ANODIZED%' and p_size in (34, 10, 44, 45, 19, 26, 18, 47) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc,p_brand, p_type, p_size limit 10;
    p_brand|p_type|p_size|supplier_cnt
    Brand#33|MEDIUM BRUSHED TIN|34|611
    Brand#11|STANDARD BURNISHED NICKEL|26|591
    Brand#55|LARGE BURNISHED STEEL|10|590
    Brand#14|SMALL BRUSHED BRASS|45|587
    Brand#44|LARGE BRUSHED BRASS|18|578
    Brand#44|STANDARD BURNISHED BRASS|18|577
    Brand#14|ECONOMY BRUSHED COPPER|19|575
    Brand#54|ECONOMY PLATED COPPER|18|575
    Brand#24|STANDARD PLATED COPPER|34|573
    Brand#25|PROMO BRUSHED COPPER|47|571
    10 rows returned.
    Execution time: 833 ms, Total time: 836 ms
    

    and in this case, the sort is done in parallel, and it takes 64ms

    without limit, the query returns the results in more than 3000ms with the sort alone taking 2464ms.

    sorting for the dimension only 1100ms (sort 334)

    sorting the count distinct only so writing the query this way

    select p_brand, p_type, p_size, approx_count_distinct(ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#41' and p_type not like 'PROMO ANODIZED%' and p_size in (34, 10, 44, 45, 19, 26, 18, 47) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc;

    just the sort itself takes 1960ms, so there is something we are doing bad, but I don't know what can be the reason. I opened an issue about something similar (sort with top n) but it hasn't be prioritized.

    Candido

    0
    Comment actions Permalink

Please sign in to leave a comment.