An issue with approx_count_distinct operator in 6.X version
Hello Team,
In 6. X throws the error: Failure from KBNet Engine - "View Location_Availability republish failed!. Error: TOmniSciException({ErrorMsg: View 'm_645b86c1_Location_Availability' query has failed with an error: 'Consider using approx_count_distinct operator instead of count_distinct operator to lower the memory requirements'.\nThe view must be dropped and re-created to resolve the error.
We ran the below query:
SELECT COUNT(Distinct @location.id) AS "m_Location_Count", COUNT(Distinct Case when @location.activeflag =1 Then @location.id else Null End) AS "m_Active_Location_Count", COUNT(DISTINCT Case when @loc_summary.locationid IS NULL and @location.activeflag =1 Then @location.id else Null End) AS "m_Available_Location_Count", COUNT(DISTINCT Case when @loc_summary.locationid IS NOT NULL Then @location.id else Null End) AS "m_Used_Location_Count" FROM #location LEFT JOIN #loc_summary ON @location.id = @loc_summary.locationid WHERE @location.locationgroup_name = 'Main'
Gtt the following error message:
Failure from KBNet Engine - "View Location_Availability republish failed!. Error: TOmniSciException({ErrorMsg:View 'm_645b86c1_Location_Availability' query has failed with an error: 'Consider using approx_count_distinct operator instead of count_distinct operator to lower the memory requirements'.\nThe view must be dropped and re-created to resolve the error. \nQuery:\nSELECT COUNT(DISTINCT m_645b86c1_location.\"m_id\") AS \"m_Location_Count\", COUNT(DISTINCT CASE WHEN m_645b86c1_location.\"m_activeflag\" = 1 THEN m_645b86c1_location.\"m_id\" ELSE NULL END) AS \"m_Active_Location_Count\", COUNT(DISTINCT CASE WHEN m_645b86c1_loc_summary.\"m_locationid\" IS NULL AND m_645b86c1_location.\"m_activeflag\" = 1 THEN m_645b86c1_location.\"m_id\" ELSE NULL END) AS \"m_Available_Location_Count\", COUNT(DISTINCT CASE WHEN m_645b86c1_loc_summary.\"m_locationid\" IS NOT NULL THEN m_645b86c1_location.\"m_id\" ELSE NULL END) AS \"m_Used_Location_Count\" FROM m_645b86c1_location LEFT JOIN m_645b86c1_loc_summary ON m_645b86c1_location.\"m_id\" = m_645b86c1_loc_summary.\"m_locationid\" WHERE m_645b86c1_location.\"m_locationgroup_name\" = 'Main';})"
could you please help?
Thanks,
Kalyani
-
Official comment
Hello,
If you're interested in assessing whether version 7.x will address your concern (though I'm uncertain), you may want to consider trying the free version by registering and requesting a key through our website:
https://www.heavy.ai/product/downloads
It's possible that you might be overlooking the impact of the improvements detailed in the release notes.
Should you wish to resolve your issues, kindly furnish the requested data that we have repeatedly asked for in order to identify and rectify the problem. If we're unable to reproduce your issue, it becomes less likely that we can provide a solution.
Regards,
CandidoComment actions -
Hi Kaylani,
Please provide the exact version you're using, by going to http(s)://yourImmerseServer/version.txt and reporting the results. TOmniSci exception is surprising to observe in currently supported versions of the application, so confirming the version is helpful.
The error message is warning that your system is likely unable to process your query without running out of memory. The messaging stems from the use of our watchdog process which is enabled by default. You can disable this by setting `enable-watchdog = false` in your heavy.conf file and restarting services. However, if your system does not have adequate memory to process the query, you will instead encounter the followingERR_OUT_OF_CPU_MEM: Not enough host memory to execute the query
However, you should definitely consider the suggestion provided by the error message to, replace within your view all instances of `COUNT(DISTINCT expression) with APPROX_COUNT_DISTINCT(expression). Count Distinct is indeed a very memory intensive operation, and replacing with APPROX_COUNT_DISTINCT will significantly improve performance and reduce memory requirements.
Here's some further background on APPROX_COUNT_DISTINCT function:
APPROX_COUNT_DISTINCT(x, e) gives an approximate count of the value x, based on an expected error rate defined in e. The error rate is an integer value from 1 to 100. The lower the value of e, the higher the precision, and the higher the memory cost. Select a value for e based on the level of precision required. On large tables with large cardinalities, consider using APPROX_COUNT_DISTINCT when possible to preserve memory. When data cardinalities permit, HEAVY.AI uses the precise implementation of COUNT(DISTINCT x) for APPROX_COUNT_DISTINCT.
-
HI Kayalani,
In the 6.x the engineering did some changes on count distinct, so probably your case is a taking a new code path.
could you run the query
select min(m_645b86c1_location.m_id=,max(m_645b86c1_location."m_id"),approx_count_distinct(m_645b86c1_location."m_id",10) from m_645b86c1_location
This could help us to reproduce what's going on with your data and fix the problem.
Candido -
Hello Team,
At present, we are facing the same issue in version 6.4.0. To resolve this bottleneck and continue with our work smoothly, we are contemplating an upgrade to a newer version. We would greatly appreciate any potential workarounds you can suggest for this particular issue.
Interestingly, we did not encounter any such problems in version 5.9.4. It appears that the newer version, 6.4.0, introduced this issue. We are keen to understand why this problem has surfaced in the newer version. Your insights would be valuable in helping us make an informed decision regarding the upgrade.
Thank you for your assistance and support.
-
Hello @re,
Are you encountering the same error as @Kalyani N while running a
COUNT(DISTINCT ...)
query?Could you please provide the SQL of the problematic query, as well as the minimum and maximum values of the column you are using in the
COUNT(DISTINCT ...)
function? Additionally, it would be helpful to know the approximate count of distinct values for that column and the number of groups that the query would return.To retrieve the minimum and maximum values, as well as the approximate distinct count of a column "column_name" from the table "table_name," you can use the following SQL query:
SELECT MAX(field_distinct),
MIN(field_distinct),
APPROX_COUNT_DISTINCT(field_distinct) FROM table_name;To get the number of groups that the query would return based on the column "column_name," use the following SQL (replace "table_name" with the actual table name and "column_name" with the name of the column used in the
COUNT(DISTINCT ...)
part of your problematic query):SELECT COUNT(*) FROM (SELECT field1, field2, ..., fieldn
FROM table_name
GROUP BY field1, field2, ..., fieldn) AS grouped_data;As I mentioned earlier, some changes have been made in version 6.x to improve performance. However, it seems that there might be some regression in certain scenarios.
Thanks for your patience.
Candido -
Hi Candido Dessanti / Neill Lewis
Greetings Team,
We're eagerly awaiting your reply. Kindly inform us about the anticipated release date of the 7.x.x version of HeavyDB on GitHub. This release will greatly assist us in addressing numerous issues and facilitate the seamless upgrade of our current 5.9.x version to the latest iteration.
These are the fixes we are anticipating which will solve most of our issues.
1.Fixes an out of CPU memory error that could occur when executing a query with a count distinct function call on a high cardinality column.
2.Fix issue where “Cannot use fast path for COUNT DISTINCT” could be reported from a count distinct operation.
3.Fixed an issue where COUNT DISTINCT or APPROX_COUNT_DISTINCT, when run on a CASE statement that outputs literal strings, could cause a crash.
4.Fixes a crash when using COUNT () or COUNT (1) with the window function, i.e., COUNT (*) OVER (PARTITION BY x).
5.Fixes an issue that cause incorrect results in multiple aggregation of date columns that include COUNT DISTINCT.Only till version 6.4.0 we are able to see in github.
Looking forward to your much-appreciated response.
-
I trust you're doing well.irstly, I'm yet to receive any feedback regarding the release of a 7.x OS version of the software. Consequently, I find myself in a bit of a bind as I'm unable to provide you with information on this matter. The master OS repository is nearly in sync with the 7.0 release, we are laking just some commits, and opting to build from the source would yield almost identical executables for both the EE/FREE versions. If you find yourself unable to perform the build on your own, or if you're more comfortable with me handling it, I'd be more than willing to build the executables and share them with you. Do let me know if this would be of assistance to you.
Moving on to the second issue at hand, it appears that I've made several requests to all parties involved for the results of simple queries. Unfortunately, I haven't received any feedback in this regard. It's important to highlight that while the fixes implemented in the 7.0 version potentially address specific issues experienced by our customers, I'm uncertain if these fixes will resolve the error you've encountered.
We can offer solutions and workarounds when we have some data to work with, but we can't do so much if we aren't getting any
There is nothing particular going on, we are just too busy to update the OS version on a regular basis·
Best Regards,
Candido -
Hi guys,
The OS version of release 7.x is available for download in the tarball, yum/apt, and docker format.
You can download the tarballs at these links
GPU accelerated: https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64.tar.gz
CPU: https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64-cpu.tar.gz
or you can install/upgrade using Docker and other package managers as described on our docs.
Please note that the software requirements have changed: revise the new system requirements
https://docs.heavy.ai/installation-and-configuration/system-requirements/software-requirementsThank you.
Please sign in to leave a comment.
Comments
12 comments