NVIDIA GPU-Optimized AMI
NVIDIA | 24.05.1Linux/Unix, Ubuntu Ubuntu 22.04 - 64-bit Amazon Machine Image (AMI)
AMI is not configured as advertised.
None of the advertised utilities are installed in the AMI, neither is CUDA. This is current as of 3/14/24. It appears to be a raw installation of 22.04, by my estimation.
root@ip-172-31-38-109:~/cuda-samples/Samples/5_Domain_Specific/nbody# jupyterlab --version
jupyterlab: command not found
root@ip-172-31-38-109:~/cuda-samples/Samples/5_Domain_Specific/nbody# miniconda --version
miniconda: command not found
There's been a lot of troubleshooting so far with regard to attempting to get cuda installed, so I won't copy-paste my terminal.
- Leave a Comment |
- 1 comment |
- Mark review as helpful
Missing drivers
This should be preconfigured to run NVIDIA GPU Cloud (NGC) containers such as the PyTorch one, however it fails on launch on AWS (on a p3.2xlarge instance).
After sshing in, I see this error message:
```
Installing drivers ...
modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.2.0-1011-aws
```
And sure enough, running containers such as PyTorch (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) does not work:
```
~$ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.11-py3
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
```
doesn't include what it claims
Claims to include nvidia-cuda-toolkit, but it doesn't. nvcc --version an error message. The other versions are out of date and incompatible.
Outdated and Useless
This AMI is outdated and doesn't have proper packages. This is what Nvidia suggests on their website https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html
but are unable to provide the same on their AMIs