An issue with approx_count_distinct operator in 6.X version

Comments

4 comments

  • Avatar
    Neill Lewis

    Hi Kaylani,

    Please provide the exact version you're using, by going to http(s)://yourImmerseServer/version.txt and reporting the results. TOmniSci exception is surprising to observe in currently supported versions of the application, so confirming the version is helpful.


    The error message is warning that your system is likely unable to process your query without running out of memory. The messaging stems from the use of our watchdog process which is enabled by default. You can disable this by setting `enable-watchdog = false` in your heavy.conf file and restarting services. However, if your system does not have adequate memory to process the query, you will instead encounter the following

    ERR_OUT_OF_CPU_MEM: Not enough host memory to execute the query

    However, you should definitely consider the suggestion provided by the error message to, replace within your view all instances of `COUNT(DISTINCT expression) with APPROX_COUNT_DISTINCT(expression). Count Distinct is indeed a very memory intensive operation, and replacing with APPROX_COUNT_DISTINCT will significantly improve performance and reduce memory requirements.

     

    Here's some further background on APPROX_COUNT_DISTINCT function:

    APPROX_COUNT_DISTINCT(x, e) gives an approximate count of the value x, based on an expected error rate defined in e. The error rate is an integer value from 1 to 100. The lower the value of e, the higher the precision, and the higher the memory cost. Select a value for e based on the level of precision required. On large tables with large cardinalities, consider using APPROX_COUNT_DISTINCT when possible to preserve memory. When data cardinalities permit, HEAVY.AI uses the precise implementation of COUNT(DISTINCT x) for APPROX_COUNT_DISTINCT.
    1
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    HI Kayalani,

    In the 6.x the engineering did some changes on count distinct, so probably your case is a taking a new code path.

     

    could you run the query

     

    select min(m_645b86c1_location.m_id=,max(m_645b86c1_location."m_id"),approx_count_distinct(m_645b86c1_location."m_id",10) from m_645b86c1_location

    This could help us to reproduce what's going on with your data and fix the problem.

    Candido

    0
    Comment actions Permalink
  • Avatar
    Kalyani N

    Sure Candido.

    Will do that.

    Thanks,
    Kalyani

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    I'm sorry but I forgot to say that you should gove us the result of the query.

    Candido

    0
    Comment actions Permalink

Please sign in to leave a comment.