This article describes how to get up and running quickly using the AWS Deep Learning AMI for Ubuntu 18.04 to avoid installing docker, nvidia-docker, and Nvidia drivers.
AWS Setup
In the AWS console, select Launch Instance.
Give your instance a name.
In the AMI section, type "Deep Learning" into the search box and press enter.
In the resulting window, scroll to the bottom and find "Deep Learning AMI (Ubuntu 18.04) Version 64.3" or similar. We don't need any of the versions with TensorFlow or Pytorch, just the standard "Ubuntu 18.04" version will work for our needs. Note that the version may change over time. Choose the orange Select button at the right side.
Change instance type to whatever is desired. The smallest NVIDIA GPU-enabled instance type is g4dn.xlarge at the time of this writing, which features a single NVIDIA Tesla T4 GPU.
Choose a key pair for which you have access to the private key, or create a new one.
Under network settings, click edit. Edit the security group name as desired. Although all settings are at your discretion, for the sake of simplicity these rules are suggested to allow ssh and access via Immerse's default web port.
Type | Protocol | Port Range | Source Type |
ssh | TCP | 22 | Anywhere* |
Custom TCP | TCP | 6273 |
Anywhere* |
*You could also configure access to be only allowed from your current IP address for additional security.
Under configure storage, adjust the machine size in GiB in order to meet your estimated needs.
Finally, Launch the instance. Wait a minute or two for it to come online. Then, click on the instance name and click the connect button at the top. Choose the SSH tab and copy the command provided.
HEAVY.AI Installation
For good measure, let's run `nvidia-smi` to confirm that the system has GPUs and drivers are generally working. The AMI instance should have handled these installation steps, so anything other than the below would be unexpected:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 37C P8 15W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Now, let's prepare for HEAVY.AI installation with these commands:
sudo mkdir -p /var/lib/heavyai && sudo chown $USER /var/lib/heavyai
echo "port = 6274
http-port = 6278
calcite-port = 6279
data = \"/var/lib/heavyai\"
null-div-by-zero = true
[web]
port = 6273
frontend = \"/opt/heavyai/frontend\"" \
>/var/lib/heavyai/heavy.conf
Pull the latest image from releases.heavy.ai into our home directory. These instructions suggest using a specific image rather than the latest, for easier maintenance/understanding moving forward.
We'll use the *-render-docker.tar.gz versions for this instruction set.
cd ~ && wget https://releases.heavy.ai/ee/tar/heavyai-ee-6.1.1-20220726-1bd2aaaa8d-Linux-x86_64-render-docker.tar.gz
Once the wget command finishes, load the image. In this example, we're using 6.1.1:
docker load < heavyai-ee-6.1.1-20220726-1bd2aaaa8d-Linux-x86_64-render-docker.tar.gz
Verify the version desired is loaded:
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
heavyai/heavyai-ee-cuda v6.1.1 17253e143843 10 days ago 3.86GB
Now, we'll start the server with a docker run command:
docker run -d --gpus=all \
-v /var/lib/heavyai:/var/lib/heavyai \
-p 6273-6278:6273-6278 \
heavyai/heavyai-ee-cuda:v6.1.1
At this point, we could consider this server to be ready to use. However, for additional extensibility, let's optionally proceed to use docker compose CLI.
Installing Docker Compose Plugin
First, let's get prepared to be able to install using apt:
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
and finally, let's install:
sudo apt install docker-compose-plugin
Note that this will install the latest version of docker compose. Note that the commands no longer have a hyphen so if you used 'docker-compose up' in the past, you'll now use 'docker compose up'. Apply this thought universally and proceed.
Setting up HEAVY.AI using docker compose:
Now, let's create a docker-compose.yml file in /var/lib/heavyai/docker-compose.yml
Here's an example docker-compose file:
version: '3.7'
x-heavydbandweb: &heavydbandweb
image: "heavyai/heavyai-ee-cuda:v6.1.1"
restart: on-failure
networks:
- heavyai-backend
- heavyai-frontend
services:
heavydb:
<<: *heavydbandweb
deploy:
resources:
reservations:
devices:
- capabilities:
- gpu
container_name: heavydb
ipc: shareable
ports:
- 6274:6274
- 6276:6276
- 6278:6278
environment:
- CUDA_CACHE_PATH=/var/lib/heavyai/CUDACache
- CUDA_CACHE_MAXSIZE=4294967296
volumes:
- /var/lib/heavyai:/var/lib/heavyai
ulimits:
stack: 1073741824
command: /opt/heavyai/bin/heavydb /var/lib/heavyai/storage --config /var/lib/heavyai/heavy.conf
heavyweb:
<<: *heavydbandweb
container_name: heavyweb
ports:
- 6273:6273
depends_on:
- heavydb
volumes:
- /var/lib/heavyai:/var/lib/heavyai
command: /opt/heavyai/bin/heavy_web_server --data /var/lib/heavyai/storage --config /var/lib/heavyai/heavy.conf --backend-url http://heavydb:6278
networks:
heavyai-frontend:
driver: bridge
name: heavyai-frontend
heavyai-backend:
driver: bridge
name: heavyai-backend
NOTE: FORMATTING ABOVE BROKEN -- File is attached. Remove the base. from the filename.
Now, let's stop and remove ALL existing containers from the server with these commands:
docker stop $(docker ps -q)
docker rm $(docker ps -aq)
and finally, let's start our new containers, which now separate heavydb from the web server:
docker-compose up -d
Comments
0 comments
Please sign in to leave a comment.