Deep Learning Setup: ECS GPU Task On Ubuntu (Part 3)

Michael Loewenstein
CodeX

--

In order to run a GPU based task in ECS, we need to create our own EC2 instance as Fargate still doesn’t support GPUs. That shouldn’t be too hard with the ECS GPU optimized AMIs.

However, sometimes that cannot be done as teams are already using their favorite AMI setups such as an Ubuntu CIS optimized AMI or any other flavor. This means they need to install and configure the setup from scratch.

In this set of 4 articles, we’ll review the installation and configuration process of an ECS task with GPU required resources over an Ubuntu 18.04 OS.

Part 1: The NVIDIA driver

Part 2: The ECS agent

Part 3: The NVIDIA-Docker run time

Part 4: GPU configuration on ECS Agent

Docker NVIDIA Runtime

Although installing and configuring Docker NVIDIA runtime is very well documented, I had a few gotchas so I found it valuable to document the required steps on my own. However, I recommend going over the formal documentation which is much more detailed, and also the steps might change according to one's needs & individual requirements/preference.

Repository Configuration

$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
$ sudo apt-get update

Runtime Installation

sudo apt-get install nvidia-container-runtime

Create the Systemd drop-in file

sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

Create the Daemon configuration file

sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
sudo pkill -SIGHUP dockerd

Test The Runtime Installation

sudo docker run --gpus all --rm nvidia/cuda:10.1-base nvidia-smi

Expected output

In the next article, we’ll configure the ECS Agent with the GPU driver.

--

--

Michael Loewenstein
CodeX
Writer for

👨🏻‍💻 Engineering Leader ⛰️ Software Developer ☁️ Cloud Solution Architect