Deep Learning Setup: ECS GPU Task On Ubuntu (Part 3)
In order to run a GPU based task in ECS, we need to create our own EC2 instance as Fargate still doesn’t support GPUs. That shouldn’t be too hard with the ECS GPU optimized AMIs.
However, sometimes that cannot be done as teams are already using their favorite AMI setups such as an Ubuntu CIS optimized AMI or any other flavor. This means they need to install and configure the setup from scratch.
In this set of 4 articles, we’ll review the installation and configuration process of an ECS task with GPU required resources over an Ubuntu 18.04 OS.
Part 3: The NVIDIA-Docker run time
Docker NVIDIA Runtime
Although installing and configuring Docker NVIDIA runtime is very well documented, I had a few gotchas so I found it valuable to document the required steps on my own. However, I recommend going over the formal documentation which is much more detailed, and also the steps might change according to one's needs & individual requirements/preference.
Repository Configuration
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
$ sudo apt-get update
Runtime Installation
sudo apt-get install nvidia-container-runtime
Create the Systemd drop-in file
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
Create the Daemon configuration file
sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
sudo pkill -SIGHUP dockerd
Test The Runtime Installation
sudo docker run --gpus all --rm nvidia/cuda:10.1-base nvidia-smi
Expected output
In the next article, we’ll configure the ECS Agent with the GPU driver.