python – nvidia-docker中的TensorFlow:对cuInit的调用失败:CUDA_ERROR_UNKNOWN
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python – nvidia-docker中的TensorFlow:对cuInit的调用失败:CUDA_ERROR_UNKNOWN,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含6042字,纯文字阅读大概需要9分钟。
内容图文
![python – nvidia-docker中的TensorFlow:对cuInit的调用失败:CUDA_ERROR_UNKNOWN](/upload/InfoBanner/zyjiaocheng/768/32c3a5e7884149708645a298f5efe750.jpg)
我一直致力于使用依赖于TensorFlow的应用程序作为具有nvidia-docker的docker容器.我已经在tensorflow / tensorflow:latest-gpu-py3图像上编译了我的应用程序.我用以下命令运行我的docker容器:
sudo nvidia-docker run -d -p 9090:9090 -v / src / weights:/ weights myname / myrepo:mylabel
通过portainer查看日志时,我看到以下内容:
2017-05-16 03:41:47.715682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715896: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715978: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.716002: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.718076: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-05-16 03:41:47.718177: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: 1e22bdaf82f1
2017-05-16 03:41:47.718216: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 1e22bdaf82f1
2017-05-16 03:41:47.718298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.57.0
2017-05-16 03:41:47.718398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
"""
2017-05-16 03:41:47.718455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
2017-05-16 03:41:47.718484: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.57.0
容器看起来似乎正常启动,我的应用程序似乎确实在运行.当我向它发送请求以进行预测时,预测会正确返回 – 但是当我在CPU上运行推断时,我会期望以较慢的速度运行,所以我认为很明显GPU由于某种原因没有被使用.我也试过在同一个容器中运行nvidia-smi,以确保它看到我的GPU,这些是结果:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K1 Off | 0000:00:07.0 Off | N/A |
| N/A 28C P8 7W / 31W | 25MiB / 4036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
我当然不是这方面的专家 – 但看起来GPU从容器内部可见.有关如何使用TensorFlow的任何想法?
解决方法:
我在我的ubuntu16.04桌面上运行tensorflow.
我用GPU运行代码的工作很好.但是今天我找不到带有以下代码的gpu设备
导入张量流为tf
从tensorflow.python.client导入device_lib作为_device_lib
使用tf.Session()作为sess:
local_device_protos = _device_lib.list_local_devices()
打印(local_device_protos)
[local_device_protos中x的print(x.name)]
当我运行tf.Session()时,我意识到以下问题
cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN
我在系统详细信息中检查了我的Nvidia驱动程序,并检查了驱动程序,cuda和cudnn的nvcc -V,nvida-smi.一切似乎都很好.
然后我去了附加驱动程序检查驱动程序的详细信息,在那里我发现有很多版本的NVIDIA驱动程序和最新版本被选中.但是当我第一次安装驱动程序时,只有一个.
所以我选择旧版本,并应用change.
然后我运行tf.Session()问题也在这里.我想我应该重新启动计算机,重新启动后,这个问题就消失了.
sess = tf.Session()
2018-07-01 12:02:41.336648:I tensorflow / core / platform / cpu_feature_guard.cc:140]您的CPU支持未编译此TensorFlow二进制文件的指令:AVX2 FMA
2018-07-01 12:02:41.464166:I tensorflow / stream_executor / cuda / cuda_gpu_executor.cc:898]从SysFS读取的成功NUMA节点具有负值(-1),但必须至少有一个NUMA节点,因此返回NUMA节点零
2018-07-01 12:02:41.464482:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1356]找到具有属性的设备0:
名称:GeForce GTX 1070主要:6个未成年人:1个memoryClockRate(GHz):1.8225
pciBusID:0000:01:00.0
totalMemory:7.93GiB freeMemory:7.27GiB
2018-07-01 12:02:41.464494:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1435]添加可见的gpu设备:0
2018-07-01 12:02:42.308689:I tensorflow / core / common_runtime / gpu / gpu_device.cc:923]具有强度1边缘矩阵的设备互连StreamExecutor:
2018-07-01 12:02:42.308721:I tensorflow / core / common_runtime / gpu / gpu_device.cc:929] 0
2018-07-01 12:02:42.308729:I tensorflow / core / common_runtime / gpu / gpu_device.cc:942] 0:N
2018-07-01 12:02:42.309686:I tensorflow / core / common_runtime / gpu / gpu_device.cc:1053]创建TensorFlow设备(/ job:localhost / replica:0 / task:0 / device:GPU:0 with 7022 MB存储器) – >物理GPU(设备:0,名称:GeForce GTX 1070,pci总线ID:0000:01:00.0,计算能力:
内容总结
以上是互联网集市为您收集整理的python – nvidia-docker中的TensorFlow:对cuInit的调用失败:CUDA_ERROR_UNKNOWN全部内容,希望文章能够帮你解决python – nvidia-docker中的TensorFlow:对cuInit的调用失败:CUDA_ERROR_UNKNOWN所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。