Linux【11】-软件安装3-nvidia(显卡驱动,cuda)

CUDA(Compute Unified Device Architecture),是显卡厂商NVIDIA推出的运算平台。 CUDA™是一种由NVIDIA推出的通用并行计算架构,该架构使GPU能够解决复杂的计算问题

一、安装

centos7.4安装NVIDIA(备注:G03的安装)

1.1 安装gcc

yum -y install gcc-c++

此处是重点:如果有之前的NVIDIA驱动请先卸载,而且,要先装cuda再装驱动。。。你也可以按照我的步骤来,最后再重装一次驱动。

1.2 检测显卡驱动及型号

添加ELPepo源

$ sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
$ sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

安装NVIDIA驱动检测

$ sudo yum install nvidia-detect
$ nvidia-detect -v

Probing for supported NVIDIA devices...
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia
[1a03:2000] ASPEED Technology, Inc. ASPEED Graphics Family
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia
[10de:102d] NVIDIA Corporation GK210GL [Tesla K80]
This device requires the current 390.25 NVIDIA driver kmod-nvidia

两块显卡驱动都是390.25

cd /data/src
wget -r -np -nd http://us.download.nvidia.com/XFree86/Linux-x86_64/390.25/NVIDIA-Linux-x86_64-390.25.run

显卡冲突

因为NVIDIA驱动会和系统自带nouveau驱动冲突,执行命令查看该驱动状态:

lsmod | grep nouveau

nouveau              1622010  0 
video                  24520  1 nouveau
mxm_wmi                13021  1 nouveau
drm_kms_helper        159169  2 ast,nouveau
ttm                    99345  2 ast,nouveau
drm                   370825  6 ast,ttm,drm_kms_helper,nouveau
i2c_algo_bit           13413  3 ast,igb,nouveau
i2c_core               40756  8 ast,drm,igb,i2c_i801,ipmi_ssif,drm_kms_helper,i2c_algo_bit,nouveau
wmi                    19070  2 mxm_wmi,nouveau

修改/etc/modprobe.d/blacklist.conf 文件,以阻止 nouveau 模块的加载,如果系统没有该文件需要新建一个,这里使用root权限,普通用户无法再在/etc内生成.conf文件。

$ su root
# echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf

重新建立initramfs image文件

# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
# dracut /boot/initramfs-$(uname -r).img $(uname -r)

1.3 安装NVIDIA

进入NVIDIA目录执行安装(建议推迟到cuda安装后再装驱动)

$ chmod +x NVIDIA-Linux-x86_64-390.25.run
$ sh NVIDIA-Linux-x86_64-390.25.run
   如果安装完成,可以运行命令查看显卡状态

报错:

 You appear to be running an X server; please exit X before installing.  For   
         further details, please see the section INSTALLING THE NVIDIA DRIVER in the   
         README available on the Linux driver download page at www.nvidia.com.

解决办法:

关闭图形界面:

init 3  

然后,重复sh这一步的安装操作

nvidia-smi

1.4 安装cuda

官网下载cuda-rpm包https://developer.nvidia.com/cuda-downloads,一定要对应自己的版本。

wget -c https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda-repo-rhel7-9-1-local-9.1.85-1.x86_64
sudo rpm -i cuda-repo-rhel7-9-1-local-9.1.85-1.x86_64
sudo yum clean all
sudo yum install cuda

测试c