Encoded string lenght for use in Immerse

Comments

4 comments

  • Avatar
    Candido Dessanti

    Hi Gianfranco,

    The LENGTH or CHAR_LENGTH function is fully supported on a dictionary-encoded string and can be used in Immerse or the database, so I'm unsure I fully understand the issue.

    An ENCODING NONE string cannot be used on expressions like GROUP BY, and DISTINCT without being converted into an ENCODED DICT with the ENCODE_TEXT function.


    0
    Comment actions Permalink
  • Avatar
    Gianfranco Campana

    Thanks Candido, maybe I'm missing something.
    I'm referring to this part of documentation (Function and Operators page):

    and to this in data type page

    and to this error in Immerse trying to use the length function (with watchdog enabled):

    Immerse is trying to render a measure using the custom exspression " lenght(field1) "

    where "field1" is encoded this way:

    FIELD1       TEXT ENCODING DICT(16)

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi Gianfranco,

    When working with DICT ENCODED strings, the LENGTH and CHAR_LENGTH functions perform an implicit conversion of the string's data to NONE ENCODED. This operation is CPU-intensive and it's condiered expensive enough to be placed under the WATCHDOG control.
    Therefore, it is limited to a certain number of rows that can be processed under the WATCHDOG. By default, the maximum number of rows that can be converted is set to 1M, which means that in order to use the function directly, you would need to change this parameter or disable the watchdog entirely.

    Which expression are you using with regexp count?

    Regards,
    Candido

    0
    Comment actions Permalink
  • Avatar
    Gianfranco Campana

    Thanks for the explanation Candido.

    For a 4,5M rows table using in a custom dimension :

    case when LENGTH(field1) = 3 then 'TYPE_1' else 'TYPE_2' end

    returns the Chart Render Error from watchdog, while using this expression :

    case when REGEXP_COUNT(field1, '[A-Za-z0-9]') = 3 then 'TYPE_1' else 'TYPE_2' end

    works well.

    Given that I wish to avoid disabling watchdog and also using cpu-intensive tasks, it seems to me that REGEXP_COUNT is the best practice whenever I need to use the length function on numebr of rows > 1M.

     

     

     

     

     

    0
    Comment actions Permalink

Please sign in to leave a comment.