python – 在GPU上运行时使用TensorFlow内存:为什么看起来并非所有内存都被使用?
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了python – 在GPU上运行时使用TensorFlow内存:为什么看起来并非所有内存都被使用?,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含8557字,纯文字阅读大概需要13分钟。
内容图文
这是我在此发布的问题的后续跟进:Memory error with larger images when running convolutional neural network using TensorFlow on AWS instance g2.2xlarge
我使用TensorFlow在Python中构建了一个CNN模型,并在NVIDIA GRID K520 GPU上运行它.它可以在64×64图像下正常运行,但会产生128×128图像的内存错误(即使输入只包含1个图像).
错误说Ran尝试分配2.00GiB内存不足. 2GiB是我的第一个完全连接层的大小(输入:128 * 128 * 2(通道)输出:128 * 128 * 4字节= 2.14748 GB = 2.0 GiB).
从here开始,我可以看到GRID K520的内存为8GB = 7.45GiB.当我开始运行我的代码时,我也看到了输出:总内存:3.94GiB,可用内存:3.91GiB.
我的问题是,所有这些数字之间的关系是什么:如果GPU上有7.45GiB内存,为什么总内存只有3.94GiB,最重要的是,为什么GPU不能分配2GiB内存,这只占总数的一半以上记忆? (我不是计算机科学家,所以详细的答案很有价值.)
一些更具体的信息,以防它有用:
我尝试使用allow_growth和per_process_gpu_memory_fraction.仍然得到内存错误,但也有一些内存统计(如果有人可以向我解释这些数字,真的很感激):
allow_growth = True
Stats:
Limit: 3878682624
InUse: 2148557312
MaxInUse: 2148557312
NumAllocs: 13
MaxAllocSize: 2147483648
allow_growth = False
Stats:
Limit: 3878682624
InUse: 3878682624
MaxInUse: 3878682624
NumAllocs: 13
MaxAllocSize: 3877822976
per_process_gpu_memory_fraction = 0.5
allow_growth = False
Stats:
Limit: 2116026368
InUse: 859648
MaxInUse: 859648
NumAllocs: 12
MaxAllocSize: 409600
per_process_gpu_memory_fraction = 0.5
allow_growth = True
Stats:
Limit: 2116026368
InUse: 1073664
MaxInUse: 1073664
NumAllocs: 12
MaxAllocSize: 623616
最小工作示例:使用与我输入的图像大小相同的虚拟训练集,并且只有一个完全连接的层(完整模型代码为here).此示例适用于大小的输入:
X_train = np.random.rand(1, 64, 64, 2)
Y_train = np.random.rand(1, 64, 64)
但不适用于大小的输入
X_train = np.random.rand(1, 128, 128, 2)
Y_train = np.random.rand(1, 128, 128)
码:
import numpy as np
import tensorflow as tf
# Dummy training set:
X_train = np.random.rand(1, 128, 128, 2)
Y_train = np.random.rand(1, 128, 128)
print('X_train.shape at input = ', X_train.shape, ", Size = ",
X_train.shape[0] * X_train.shape[1] * X_train.shape[2]
* X_train.shape[3])
print('Y_train.shape at input = ', Y_train.shape, ", Size = ",
Y_train.shape[0] * Y_train.shape[1] * Y_train.shape[2])
def create_placeholders(n_H0, n_W0):
x = tf.placeholder(tf.float32, shape=[None, n_H0, n_W0, 2], name='x')
y = tf.placeholder(tf.float32, shape=[None, n_H0, n_W0], name='y')
return x, y
def forward_propagation(x):
x_temp = tf.contrib.layers.flatten(x) # size (n_im, n_H0 * n_W0 * 2)
n_out = np.int(x.shape[1] * x.shape[2]) # size (n_im, n_H0 * n_W0)
# FC: input size (n_im, n_H0 * n_W0 * 2), output size (n_im, n_H0 * n_W0)
FC1 = tf.contrib.layers.fully_connected(
x_temp,
n_out,
activation_fn=tf.tanh,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=tf.contrib.layers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=None,
biases_regularizer=None,
reuse=True,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope='fc1')
# Reshape output from FC layer into array of size (n_im, n_H0, n_W0, 1):
FC_M = tf.reshape(FC1, [tf.shape(x)[0], tf.shape(x)[1], tf.shape(x)[2], 1])
return FC_M
def compute_cost(FC_M, Y):
cost = tf.square(FC_M - Y)
return cost
def model(X_train, Y_train, learning_rate=0.0001, num_epochs=100):
(m, n_H0, n_W0, _) = X_train.shape
# Create Placeholders
X, Y = create_placeholders(n_H0, n_W0)
# Build the forward propagation
DECONV = forward_propagation(X)
# Add cost function to tf graph
cost = compute_cost(DECONV, Y)
# Backpropagation
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
# Initialize all the variables globally
init = tf.global_variables_initializer()
# Memory config
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
# Start the session to compute the tf graph
with tf.Session(config = config) as sess:
# Initialization
sess.run(init)
# Training loop
for epoch in range(num_epochs):
_, temp_cost = sess.run([optimizer, cost],
feed_dict={X: X_train, Y: Y_train})
print ('EPOCH = ', epoch, 'COST = ', np.mean(temp_cost))
# Finally run the model
model(X_train, Y_train, learning_rate=0.00002, num_epochs=5)
追溯:
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 2.00GiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:983] Internal: Dst tensor is not initialized.
E tensorflow/core/common_runtime/executor.cc:594] Executor failed to create kernel. Internal: Dst tensor is not initialized.
[[Node: zeros = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [32768,16384] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Traceback (most recent call last):
File "myAutomap_MinExample.py", line 99, in <module>
num_epochs=5)
File "myAutomap_MinExample.py", line 85, in model
sess.run(init)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: zeros = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [32768,16384] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'zeros', defined at:
File "myAutomap_MinExample.py", line 99, in <module>
num_epochs=5)
File "myAutomap_MinExample.py", line 72, in model
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 289, in minimize
name=name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 403, in apply_gradients
self._create_slots(var_list)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/rmsprop.py", line 103, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 647, in _zeros_slot
named_slots[var] = slot_creator.create_zeros_slot(var, op_name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 121, in create_zeros_slot
val = array_ops.zeros(primary.get_shape().as_list(), dtype=dtype)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1352, in zeros
output = constant(zero, shape=shape, dtype=dtype, name=name)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 103, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ubuntu/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized.
[[Node: zeros = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [32768,16384] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
解决方法:
如果您可以上传您的代码或至少是一个最小的示例,以便了解正在发生的事情,这将是一件好事.看看这些数字,似乎allow_growth正常工作,也就是说,它只分配它实际需要的内存量(上面计算的2.148 GiB).
您也可以提供您获得的错误的完整控制台输出.
我的猜测是,您正在混淆来自TF资源分配器的非致命警告消息,指出导致程序失败的实际错误.
这类似于您所看到的消息吗?
W tensorflow / core / common_runtime / bfc_allocator.cc:217]分配器(GPU_1_bfc)内存不足,试图分配2.55GiB.调用者表示这不是失败,但可能意味着如果有更多可用内存可能会有性能提升.
因为这只是一个警告,您可以忽略,除非您想优化代码的运行时性能.它不会导致程序失败.
内容总结
以上是互联网集市为您收集整理的python – 在GPU上运行时使用TensorFlow内存:为什么看起来并非所有内存都被使用?全部内容,希望文章能够帮你解决python – 在GPU上运行时使用TensorFlow内存:为什么看起来并非所有内存都被使用?所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。