# 启动报错--GPU
# Unknown runtime specified nvidia.
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.
修改/创建/etc/docker/daemon.json
(需要管理员权限),添加如下的内容:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
重启docker
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
# 安装nvidia-container-runtime(离线):
1、下载
libnvidia-container(软件包下载地址) (opens new window)
nvidia容器工具包的相关软件包目前已由https://nvidia.github.io/libnvidia-container存储库提供服务。在GitHub libnvidia-container仓库的gh-pages分支下,可以看到相关的软件包(建议选择stable文件夹下稳定版本的软件包来进行离线下载)
选择符合的系统,查看内部repo文件的值,按照路径返回到stable下查找需要的rpm下载
libnvidia-container核心包包括:
libnvidia-container-tools
libnvidia-container1
nvidia-container-runtime
nvidia-container-toolkit
nvidia-container-toolkit-base
nvidia-docker2
下载到服务器手动执行
rpm -Uvh *.rpm --nodeps --force
(Ubuntu)执行
dpkg -i --force-overwrite *.deb
2、修改配置文件
vi /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
3、完成
检查命令
nvidia-container-runtime -v
重启docker生效
systemctl restart docker
运行docker检查
docker run --rm --runtime=nvidia nvidia/cuda nvidia-smi
进docker后 使用 nvidia-smi 检查是否正常
# nvidia-container-cli: initialization error: nvml error: insufficient permissions:unkown解决
问题:NVIDIA-Docker 在启docker的时候gpu挂不上
报错:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: insufficient permissions: unknown.
看起来是没有权限的问题
https://link.zhihu.com/?target=https%3A//github.com/NVIDIA/nvidia-docker/issues/1547
解决方法:
打开 '/etc/nvidia-container-runtime/config.toml' 文件
将文件中的user = "root:video"
取消注释
然后改成user = "root:root"
应该是因为用户组的问题。
使用命令查看
ll /dev/nvidia*
显示结果为
crw-rw---- 1 root vglusers 195, 0 Aug 17 09:55 /dev/nvidia0
crw-rw---- 1 root vglusers 195, 1 Aug 17 09:55 /dev/nvidia1
crw-rw---- 1 root vglusers 195, 2 Aug 17 09:55 /dev/nvidia2
crw-rw---- 1 root vglusers 195, 3 Aug 17 09:55 /dev/nvidia3
crw-rw---- 1 root vglusers 195, 255 Aug 17 09:55 /dev/nvidiactl
crw-rw---- 1 root vglusers 195, 254 Jan 11 11:58 /dev/nvidia-modeset
crw-rw-rw- 1 root root 235, 0 Aug 17 09:55 /dev/nvidia-uvm
crw-rw-rw- 1 root root 235, 1 Aug 17 09:55 /dev/nvidia-uvm-tools
/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root huangqinlong 80 Aug 17 12:15 ./
drwxr-xr-x 22 root root 4420 Jan 11 11:58 ../
cr-------- 1 root root 238, 1 Aug 17 12:15 nvidia-cap1
cr--r--r-- 1 root root 238, 2 Aug 17 12:15 nvidia-cap2
所以我们必须将docker设置为相同的用户/组才能获得GPU设备的访问权限
← Docker官方安装包 安装-win安装 →