History
The original iteration of the renderer was built using the well-known OpenGL graphics API. One of the major drawbacks with OpenGL in general is the lack of letting the user have full control of GPU memory. The user has control of the size of the buffers/images used and the data that goes in them, but the driver ultimately controls when the memory is actually allocated/deallocated because all opengl commands are stored in a command queue. Allocation/deallocation is not guaranteed to happen on demand. It tends to happen at the moment a command in the command queue requires for that memory to exist. Due to the command-queue system there can be times where the driver may actually hang onto a buffer even after its been technically freed by the user just in case another command later in the queue might reference it. The driver will then garbage collect it later. Additionally OpenGL memory is not aliasable memory. In other words, you cannot allocate memory for one particular purpose, such as shader storage buffer memory, and then re-use that memory for another purpose, such as image or framebuffer memory. This rather rigid system did not allow for us to build a configurable, slab-based memory manager for the renderer like the query engine. As such, memory for the renderer was largely separated into two categories: render query output buffer memory (controlled via the render-mem-bytes config option), and everything else (i.e. textures, framebuffers, auxiliary buffers for poly/line rendering, etc.)
Render Query Output Buffer (QOB)
A CUDA-enabled query engine is able to execute queries on the GPU using the Cuda API. Cuda includes an interoperability API for a number of other GPU libraries, such as OpenGL and Vulkan, allowing Cuda to operate on opengl/vulkan buffers as if they were allocated in Cuda. However, the OpenGL interop api is only one-directional, meaning you can allocate a buffer in OpenGL and map it for use in Cuda, but not the other way around. You cannot allocate a Cuda buffer and map it for OpenGL. Vulkan, on the other hand, has bi-directional support, but only as of recently.
Due to the one-directionality of the interop, we needed to allocate a buffer in the renderer to pass to the query engine for the output of render queries that can then be unmapped and rendered directly by the renderer. This allows for very fast, copy-free, in-situ rendering. Unfortunately, due to some historical limitations, it is difficult for us to know up front how big a buffer needs to be for any given query. Instead it was decided early on that this would need to be a pre-allocated buffer whose size can be user controlled, sort of like an additional query engine slab. The size of the render query output buffer is controlled by the render-mem-bytes
config option. It is defaulted to 1GB in size, and is lazily allocated on each GPU upon the first render request.
Calculating the size of the Render QOB
The basic equation for determining the size of the QOB for a particular query is multiply the number of output columns (X) by 8-bytes per column and add an additional byte for a hidden bit-mask and then multiply that sum by the number of resulting rows of the query (N). So:
N * (X + 1) * 8
So, for example, let's say you are using the tweets_2017_may
table (a commonly used internal test dataset) which has 60,888,641 total rows, and you try to render the following query from that table:
SELECT conv_4326_900913_x(lon) AS x, conv_4326_900913_y(lat) AS y
FROM tweets_2017_may WHERE ((tweets_2017_may.lat >= -82.80951140396493
AND tweets_2017_may.lat <= 82.80951140396508) AND (tweets_2017_may.lon
>= -177.1188354492201 AND tweets_2017_may.lon <= 177.1188354492192))
rowid
column is automatically injected into the query and counts towards the number of columns in the output, so using the above example, with hit-testing nabled, the QOB size required would be 60885545 * (3 + 1) * 8 = 1948337440 bytes, or 1.96GB.NOTE: this equation does not apply in all cases. For instance, if a query LIMIT is used, in render-query speak this is a per-fragment limit, and is not a strict limit on the number of rows.
Immerse caps the max number of points to be rendered for a pointmap chart (with the "# of Points" slider). That max uses a "SAMPLE_RATIO" method in the filter clause of the SQL to cap the number of resulting rows. You can use that cap in the equation above for N to get a QOB size for the current pointmap chart. For instance, immerse defaults the max # of Points to 10million. So, the QOB size required for a pointmap chart that uses just a lat and lon measure, and has hit-testing enabled would be:
10000000 * (3 + 1) * 8 = 320MB
If you add a color measure: 10000000 * (4 + 1) * 8 = 400MB
If you add a size measure: 480MB
If you are doing multi-layer rendering, the total size of the QOB required would be the sum of the per-layer QOB requirements. Let's say you are rendering two layers using the same tweets_2017_may source as above, that has a lon, lat, color, and size measure, has hit-testing enabled, and has the max number of points set to 10million, the total QOB size would be 960MB.
Troubleshooting
If the size of the configured render QOB is not sufficient in size. an error such as the following will be thrown:
ERR_OUT_OF_RENDER_MEM: Insufficient GPU memory for query results in render output buffer sized by render-mem-bytes
If you look in the log, there should be an additional log line giving you a hint as to how big the requested QOB should be:
Not enough buffer memory on device id 0 to render the query results. Render buffer size: 1000000000 bytes. Requested size: 1461253080 bytes.
NOTE: that is just a hint and the requested size is just the size that failed for the most recent allocation. It may not indicate the full size required to render the query as there could be allocations queued up behind the failed out still. In any event, it is a good hint to work from.
If you hit this kind of error, you can use the hint to increase the render-mem-bytes
config flag.
Or, if that's already too high, try to lower the max # of Points limits for the charts (via immerse).
All Other Render Memory
All other render memory is allocated in what we call "scratch" space, meaning a memory manager does not control it. It is just ad-hoc, on-the-fly allocations. This includes all memory for the final rendered image framebuffers, auxiliary buffers used for hit-testing and accumulation rendering, and buffers used for rendering non-point geo (i.e. lines, polygons, and multipolygons).
Since a slab-based memory manager does not back this memory, it ultimately eats memory in scratch space reserved by the res-gpu-mem
config flag.
Calculating the size of the render scratch space
This is much more difficult to calculate because it depends on the node's heaviest use cases. In general more GPUs on a node, the higher the scratch space required because multi-gpu compositing requires a bit more memory to compensate for copying/compositing the framebuffers/textures and hit-testing/accumulation textures from the other GPUs. Also, the heavier the geo (line/poly) tables (i.e. number of points of all the polygons/lines rendered), the more scratch space is needed. So it's all use-case dependent, which can vary widely.
You can use the following as a guideline:
2 GPUs, no accumulation rendering, no line/poly rendering, max render size 2k*2k = 200MB
8 GPUs, with accumulation rendering, no line/poly rendering, max render size 2k*2k = 256MB
8 GPUs, with accumulation rendering, does poly rendering equivalent to the zip codes data set (33,144 rows, approx 5.5million vertices), max render size of 2k*2k = 400MB (approx)
Troubleshooting
Out-of-memory errors that relate to the render scratch space being too small look like the following:
OutOfGpuMemoryError: <error message> not enough gpu memory available for the requested buffer size of <N> bytes on gpu <gpu idx>.
Such errors are prefaced with the "OutOfGpuMemoryError" exception string. If you see such an exception means you likely need to increase the res-gpu-mem
config flag to increase the scratch space used by the renderer.
Since it is difficult to calculate what the total size the scratch space will need to be, even when the exception message tells you how big the buffer was that failed to allocate, it's generally suggested to increase the res-gpu-mem
by 100MB-250MB each time such an OOM error is hit.
Balancing render-mem-bytes and res-gpu-mem
Ultimately the goal for memory management from the renderer's perspective is to find a balance of the right settings for render-mem-bytes
and res-gpu-mem
that limits the number of OutOfGpuMemoryError
and ERR_OUT_OF_RENDER_MEM
exceptions for typical use cases. Unfortunately, this can only really be in an iterative, trial-by-error basis at the moment, but hopefully, the above sections can help narrow in on proper settings quicker.
In general, the 1GB default for render-mem-bytes
should be adequate as long as you're not doing more than 2 layer renders and stick with the 10million limit imposed by Immerse.
A starting res-gpu-mem
of 500MB should also be a good starting point unless you know that there are heavy polygon tables that will be rendered. Again, sticking with increasing res-gpu-mem
by 250MB on each OutOfGpuMemoryError
will help narrow it in.
Also, a typical rule of thumb is to try to keep the render-mem-bytes
+ res-gpu-mem
sum no more than 20-25% of the gpu memory on a single gpu.
Fail Safes
Once render memory seems stable and well balanced, you can consider adding some fail-safe configuration parameters to prevent out-of-memory errors from inhibiting usage for atypical, heavy use cases. These are the config settings that can be used:
enable-auto-clear-render-mem
- If enabled, automatically clears render gpu memory on
OutOfGpuMemoryError
s during rendering. If anOutOfGpuMemoryError
exception is thrown while rendering, many users respond by running\clear_gpu
via the omnisql command-line interface or restarting the server to refresh/defrag the memory heap. This process can be automated with this flag enabled. At present, only GPU memory in the renderer is cleared automatically. It will not auto clear memory allocated by the query engine.
- If enabled, automatically clears render gpu memory on
render-oom-retry-threshold
- A render execution time limit in milliseconds to retry a render request if an
OutOfGpuMemoryError
is thrown. Requiresenable-auto-clear-render-mem = true.
Ifenable-auto-clear-render-mem=true
, a retry of the render request can be performed after anOutOfGpuMemoryError
exception. A retry only occurs if the first run took less than the threshold set here (in milliseconds). The retry is attempted after the render gpu memory is automatically cleared. If an OOM exception occurs, clearing the memory might get the request to succeed. Providing a reasonable threshold might give more stability to memory-constrained servers w/ rendering enabled. Only a single retry is attempted. A value of 0 disables retries.
- A render execution time limit in milliseconds to retry a render request if an
If it is desired to enable auto-clear render memory, then it is suggested to use a render-oom-retry-threshold = 180000
Comments
0 comments
Please sign in to leave a comment.