More frequently than ever, system administrators are receiving requests from users or management to gain insights about specific aspects of server usage that are not accessible through the Admin Console.
In recent releases, many useful features have been added to the HeavyDB database, giving administrators a chance to service user requests promptly and provide management a better understanding of how their system is performing, integrating into the database metrics coming from external tools, and making them available in Heavy Immerse.
To enhance the integration of external data and derive valuable insights, the HeavyDB platform offers several capabilities:
- Heavy Connect (Aka. Foreign Tables): Introduced in version 6.0, Heavy Connect allows users to seamlessly display data from external tools within Heavy Immerse by querying external database tables. This feature enables administrators to access and visualize data from various sources alongside their native HeavyDB data.
- Cross-Database Query: Introduced in the 6.4 release, this feature enables users to perform queries that involve data from both the HeavyDB system tables and external tools. With Cross-Database Query, administrators can create unified dashboards that present a holistic view of the system's performance, leveraging data from multiple sources.
- Cross-Linking: Heavy Immerse allows administrators to establish cross-links between different data sources within the platform. This feature enables the integration of data from diverse Heavy Immerse sources, facilitating comprehensive data analysis and visualization.
- System Tables: Introduced in version 5 and significantly enhanced in version 6, System Tables provide administrators with access to crucial system metrics and performance data. These tables offer valuable insights into the system's behavior, allowing administrators to make informed decisions and optimize system performance.
To improve user experience without compromising system performance, HeavyDB offers the following features:
- Query Hints and Executor Resource Manager (EMR): Query Hints, along with the new EMR, enable administrators to fine-tune query execution in Heavy Immerse dashboards. By pushing queries to the CPU and avoiding interference with GPU metrics, administrators can ensure smooth performance. The EMR, available since version 6.4 and enabled by default in version 7.0, optimizes query execution and increases the concurrency on the platform.
- Custom SQL Sources: This feature empowers administrators to enforce Query Hints and avoid creating static objects like views to reference System Tables. Custom SQL Sources provide a flexible way to tailor queries and optimize performance without the need for additional software installation.
An exemplary use case of these capabilities is when an administrator creates a dashboard to monitor GPU usage (compute, memory bandwidth, operating temperatures, etc.) historically. By utilizing Heavy Immerse in conjunction with the nvidia-smi CLI tool in query mode, administrators can seamlessly visualize and analyze GPU metrics alongside other system performance data. This integration enhances the monitoring capabilities without requiring the installation of any additional software.
OS operations.
The only requirement on the OS side is to start the nvidia-smi command as a daemon using, as an example, the setsid command and redirect the output to a CSV file. Additionally, it's necessary to force the refresh of the request_logs system table.
In the example provided, various GPU metrics, including the GPU Model, Compute and Memory bandwidth, and GPU card temperature, are logged to a CSV file. This CSV file will be stored in the import folder of the database storage for further analysis and visualization. The sampling interval is currently set to one second, but it can be customized depending on the specific requirements of the environment.
If needed, additional metrics can be added or existing metrics can be changed in the nvidia-smi command to capture more detailed information about the GPU's performance. Additionally, the sampling interval can be adjusted to a higher or lower value, depending on the desired level of granularity in monitoring the GPU's behavior.
setsid nvidia-smi -l 1 \
--query-gpu timestamp,index,count,gpu_name,utilization.gpu,utilization.memory,temperature.gpu \
--format=csv,nounits,noheader \
>>/var/lib/heavyai/storage/import/monitoring/system_gpu_monitoring.log
Force the refresh of the request_logs system table.
setsid /bin/bash -c 'while (true); do echo "REFRESH FOREIGN TABLES request_logs;" | /opt/heavyai/bin/heavysql -q -p HyperInteractive information_schema; sleep 120; done;'
Both those commands can be run as services using the systemd or the crontab.
Database DDLs and configurations.
Check that the enable-logs-system-tables and enable-seconds-refresh-interval parameters are set to true in your heavy.conf file.
Connect to the instance and create a new monitoring database and a role to grant read-only access to regular users.
CREATE DATABASE monitoring;
CREATE ROLE r_monitoring;
GRANT access,
select,
view dashboard,
view sql editor on database monitoring TO r_monitoring;
GRANT access,
select
ON DATABASE information_schema TO r_monitoring;
The role of r_monitoring should be granted to the non-admin users to access the dashboards created in the monitoring's database
GRANT r_monitoring TO demouser;
Connect to the newly created database and create the heavy-connect server and the foreign table to map the data of the external CSV.
CREATE SERVER gpu_statistics_s FOREIGN DATA WRAPPER DELIMITED_FILE
WITH ( STORAGE_TYPE = 'LOCAL_FILE',
BASE_PATH = '/var/lib/heavyai/storage/import/monitoring/' );
CREATE FOREIGN TABLE IF NOT EXISTS gpu_statistics (
ts timestamp(3) NOT NULL,
gpu_index tinyint not null,
number_of_gpus tinyint not null,
gpu_model text not null encoding dictionary(8),
gpu_percent_utilization tinyint not null,
gpu_memory_utilization tinyint not null,
gpu_temperature smallint not null
)
SERVER gpu_statistics_s
WITH ( FILE_PATH = 'gpu_monitoring',
REFRESH_TIMING_TYPE='SCHEDULED',
REFRESH_START_TIME='2023-04-10 15:00:00',
REFRESH_INTERVAL='5S',
REFRESH_TYPE='APPEND'
);
The data will be refreshed each minute, starting at the timestamp specified in the REFRESH_START_TIME option, using the entire content of the files in the gpu_monitoring
directory.
Heavy Immerse
Configure Immerse by adding the ui/enable_custom_source_manager and ui/enable_custom_source_manager in the feature section of the servers.json file. An example with a minimal server.json file is attached to the article.
Open Heavy Immerse using your browser, connect to the monitoring database, and create a new dashboard called Gpu Monitoring, then Add a Chart and create a Custom Source named gpu_stats_cpu.
Use a simple projection query adding the /*+ g_cpu_mode */ hint to force the execution of the charts on the cpu. This will free the GPU's resources for business users.
Use the newly created Custom Source to create a Combo Chart displaying the average gpu percent utilization, then click Apply.
Check that the data in the foreign table is updated, and then add more charts as needed.
Now we have an almost complete dashboard to monitor the usage of the GPUs in the system, but we can improve the Dashboard by indicating which database operations are responsible for the usage.
In HeavyDB v6, the system tables have been greatly enhanced, with the notable addition of the request_logs table that tracks the most relevant operations in the database, like login, queries, and such.
Create a new Custom Source in Heavy Immerse to add the table in the Dashboard.
Create a relation between the two data sources using the cross-link
Add a simple chart using the newly created Custom Source to list all the sql_execute operations
Now select the intervals with spikes in GPU utilization to find the queries that may have used the GPU the most.
In this case, a query creates tens of billions of rows and joins a big table with one million complex geometries, without the appropriate filters.
Besides the query strings, we can also find out, which Dashboards or Users use the GPUs the most.
In this example, the "bike sharing" Dashboard is the one that appears to use more GPU resources. As seen in the dashboard screenshot above, the lower right bar chart is segmented by the elapsed time of each chart, with the Chart with id 4 using around 30% of the total GPU resources consumed.
The configuration files of Heavy Immerse and the Dashboard shown in this article can be downloaded and used freely, in a 7.0 environment.
Comments
0 comments
Please sign in to leave a comment.