Deep Learning Setup: ECS GPU Task On Ubuntu (Part 1)

Michael Loewenstein
CodeX
Published in
4 min readJan 3, 2021

--

In order to run a GPU based task in ECS, we need to create our own EC2 instance as Fargate still doesn’t support GPUs. That shouldn’t be too hard with the ECS GPU optimized AMIs.

However, sometimes that cannot be done as teams are already using their favorite AMI setups such as an Ubuntu CIS optimized AMI or any other flavor. This means they need to install and configure the setup from scratch.

In this set of 4 articles, we’ll review the installation and configuration process of an ECS task with GPU required resources over an Ubuntu 18.04 OS.

Part 1: The NVIDIA driver

Part 2: The ECS agent

Part 3: The NVIDIA-Docker run time

Part 4: GPU configuration on ECS Agent

NVIDIA Driver & ToolKit

Although installing and configuring the NVIDIA tool kit & drivers is very well documented, I had a few gotchas so I found it valuable to document the required steps on my own. However, I recommend going over the formal documentation which is much more detailed, and also the steps might change according to one's needs & individual requirements/preference.

Pre-Installation

Verify You Have a CUDA-Capable GPU To verify that your GPU is CUDA-capable:

$ lspci | grep -i nvidiaIf your graphics card is from NVIDIA and it is listed in https://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.

Verify You Have a Supported Version of Linux

$ uname -m && cat /etc/*releaseYou should see output similar to the following, modified for your particular system: x86_64 Red Hat Enterprise Linux Workstation release 6.0 (Santiago)

Verify the System Has GCC Installed

$ gcc --version

I didn’t have GCC installed and got the following error: “Command ‘GCC’ not found, but can be installed with sudo apt install gcc”, which is easily solved by running:

sudo apt install gcc

Verify the System has the Correct Kernel Headers and Development Packages Installed

$ uname -rThis is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers.

The kernel headers and development packages for the currently running kernel can be installed with:

$ sudo apt-get install linux-headers-$(uname -r)This is for Ubuntu only, for a different OS please see the formal NVIDIA documentation.

Toolkit Installation

We can either install the NVIDIA drivers only or install the NVIDIA toolkit, which is a set of tools plus the drivers. I prefer to install the toolkit since it will help us verify the installation succeed later on.

Download the NVIDIA CUDA Toolkit

The NVIDIA CUDA Toolkit is available at https://developer.nvidia.com/cuda-downloads. Select the operation system, distribution, and version; then select your preferred installation type, I selected here deb(local):

And run the commands, for Linux Ubuntu 18.04 x86_64:

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda-repo-ubuntu1804-11-2-local_11.2.0-460.27.04-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1804-11-2-local_11.2.0-460.27.04-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-ubuntu1804-11-2-local/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get -y install cuda

Post Installation

Environment Setup — Add Path:

$ export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}

Setup the Persistence Daemon:

$ sudo /usr/bin/nvidia-persistenced --verbose

Verifying the installation

Now that everything is in place, let’s verify the drivers are installed successfully:

$ cat /proc/driver/nvidia/versionThe output should be similar to the following:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.27.04 Fri Dec 11 23:35:05 UTC 2020
GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

And Verify The Toolkit Installation:

$ nvcc -VThe output should be similar to the following:nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

And As A Final Test, We’ll Run A Sample GPU Program; Notice This Might Take A Long Time To Execute.

mkdir tmp && cuda-install-samples-11.2.sh tmp && cd NVIDIA_CUDA-11.2_Samples && make && cd 1_Utilities/deviceQuery && ./deviceQueryThe output should be similar to the following image:

Now, we have our NVIDIA driver installed and functioning. In the next article, we’ll install and configure the ECS agent and adjust it to work with the driver we just installed.

--

--

Michael Loewenstein
CodeX

👨🏻‍💻 Engineering Leader ⛰️ Software Developer ☁️ Cloud Solution Architect