【6.5.1】tensorflow中gpu的设置

September 19, 2018 tensorflow 阅读量：次

一、TensorFlow 设备分配

1.1 设备分配规则

默认情况下，优先使用GPU

1.2 手动指定设备分配

如果你不想让系统自动为 operation 分配设备, 而是自己手动指定, 可以用 with tf.device 创建一个设备环境, 这个环境下的 operation 都统一运行在指定的设备上.

代码示例如下：

# op 在 cpu 上运算
with tf.device('/cpu:0'):
      a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
      b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

# op 在 gpu 上运算
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

# op 在 gpus 上运算
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])

二、TensorFlow GPU 配置

2.1 指定可以被看见的GPU设备

import os

# 默认情况，TF 会占用所有 GPU 的所有内存, 我们可以指定
# 只有 GPU0 和 GPU1 这两块卡被看到，从而达到限制其使用所有GPU的目的，如果为空，则默认用cpu
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1'  

# 打印 TF 可用的 GPU
print os.environ['CUDA_VISIBLE_DEVICES']
>>> 0, 1

2.2 限定使用显存的比例

# 在开启对话session前，先创建一个 tf.ConfigProto() 实例对象
# 通过 allow_soft_placement 参数自动将无法放在 GPU 上的操作放回 CPU
gpuConfig = tf.ConfigProto(allow_soft_placement=True)

# 限制一个进程使用 60% 的显存
gpuConfig.gpu_options.per_process_gpu_memory_fraction = 0.6

# 把你的配置部署到session
with tf.Session(config=gpuConfig) as sess:
  pass

这样，如果你指定的卡的显存是8000M的话，你这个进程只能用4800M。

2.3 需要多少资源拿多少

# 在开启对话session前，先创建一个 tf.ConfigProto() 实例对象
# 通过 allow_soft_placement 参数自动将无法放在 GPU 上的操作放回 CPU
gpuConfig = tf.ConfigProto(allow_soft_placement=True)

# 运行时需要多少再给多少
gpuConfig.gpu_options.allow_growth = True  

# 把你的配置部署到session
with tf.Session(config=gpuConfig) as sess:
     pass

三、其他

3.1 查看现在可用的设备

代码：

import tensorflow
from tensorflow.python.client import device_lib


import os
os.environ["CUDA_DEVICE_ORDER"] = 'PCI_BUS_ID'
os.environ["CUDA_VISIBLE_DEVICES"] = '5,6,7'   # 因为有GPU正在被使用，不加这个报错
print device_lib.list_local_devices()

结果：

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5909479428792073686
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 12028352922
locality {
  bus_id: 2
}
incarnation: 11590753222351081897
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:87:00.0, compute capability: 3.7"
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 12026360628
locality {
  bus_id: 2
}
incarnation: 13251448707318534734
physical_device_desc: "device: 1, name: Tesla K80, pci bus id: 0000:8d:00.0, compute capability: 3.7"
, name: "/device:GPU:2"
device_type: "GPU"
memory_limit: 12026360628
locality {
  bus_id: 2
}
incarnation: 4554490859833684458
physical_device_desc: "device: 2, name: Tesla K80, pci bus id: 0000:8e:00.0, compute capability: 3.7"
]

可以看到，这里的device:GPU:2 跟CUDA_VISIBLE_DEVICES的编号不一样，这里算是重新编号了。

四、个人案例

import os
os.environ["CUDA_DEVICE_ORDER"] = 'PCI_BUS_ID'
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1'  # 这一行去掉，就用所用可以用的GPU

gpuConfig = tf.ConfigProto(allow_soft_placement=True)
gpuConfig.gpu_options.allow_growth = True  

with tf.Session(config=gpuConfig) as sess:
     pass

参考资料

https://blog.csdn.net/mzpmzk/article/details/78647711 （很赞）

药企，独角兽，苏州。团队长期招人，感兴趣的都可以发邮件聊聊：tiehan@sina.cn

个人公众号，比较懒，很少更新，可以在上面提问题，如果回复不及时，可发邮件给我： tiehan@sina.cn