Friday, September 11, 2015

Nvidia GPU Coolness

https://devtalk.nvidia.com/default/topic/1003810/linux/adjust-nvidia-gpu-fan-speed-multiple-gpus-one-monitor-/
The following two lines of code would make it possible to adjust fan speed of multiple GPUs. 
  1. nvidia-xconfig --enable-all-gpus
  2. nvidia-xconfig --cool-bits=4


This is a good article worth further investigation:

https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness#TOC-Faking-a-Head-for-a-Headless-X-Server

List nvidia fan control settings:
nvidia-settings -q all | grep Fan

Automatic NVIDA GPU fan adjustment script and my modification for multiple GPUs from  https://bbs.archlinux.org/viewtopic.php?pid=1392961#p1392961


#!/bin/bash

#nvautoadjust
#periodically checks Nvidia GPU temperature and adjusts fan appropriately

#recommended invokation: nvautoadjust &
#run at startup to continually monitor temperature and adjust fan speed.

#sets the refresh interval in seconds
interval=5
#sets the threshold temperatures and fan speeds for the three levels of cooling
#threshold is in degrees c, speed is in percentage of maximum. Set a number between
#35 and 100 for fan speeds.
min_threshold=40
mid_threshold=50
train_threshold=70
max_threshold=80

min_speed=45
mid_speed=60
train_speed=95
max_speed=100

#get number of NVIDIA GPUs
nGPU=`ls -1 /proc/driver/nvidia/gpus | wc -l`
# continually loop
while [ true ]; do
  #for i in {0,1}; do
  for (( i=0; i<nGPU; i++ )); do            
    #get current temperature and fan speed
    current_temp=`nvidia-smi -q -d TEMPERATURE -i $i | grep 'GPU Current' | sed 's/.*\([0-9]\{2\}\).*/\1/'`
    current_speed=`nvidia-smi -q -i $i | grep Fan | sed 's/.* \(1\?[0-9]\{2\}\) .*/\1/'`

    #check current temperature and adjust fan speed
    #only set the speed if it actually needs changing, as nvidia-settings eats CPU cycles
    if [[ $current_temp > $min_threshold ]]; then  
if [[ $current_temp > $max_threshold ]]; then
   #if temp greater than 80, set fan speed to max
   if [[ $current_speed != $max_speed ]]; then
nvidia-settings -a [gpu:$i]/GPUFanControlState=1 -a [fan:$i]/GPUTargetFanSpeed="$max_speed" > /dev/null
   fi
else
   if [[ $current_temp > $train_threshold ]]; then
#if temp greater than 70, set fan speed to train_speed
if [[ $current_speed != $train_speed ]]; then
   nvidia-settings -a [gpu:$i]/GPUFanControlState=1 -a [fan:$i]/GPUTargetFanSpeed="$train_speed" > /dev/null
fi
   else
if [[ $current_temp < $mid_threshold ]]; then
   #if temp is in (40, 50), set fan speed to mid
   if [[ $current_speed != $mid_speed ]]; then
nvidia-settings -a [gpu:$i]/GPUFanControlState=1 -a [fan:$i]/GPUTargetFanSpeed="$mid_speed" > /dev/null
   fi
fi
   fi
fi    
    else
        #if temp below 40, set fan speed to minimum
        if [[ $current_speed != $min_speed ]]; then
            #echo set
            nvidia-settings -a [gpu:$i]/GPUFanControlState=1 -a [fan:$i]/GPUTargetFanSpeed="$min_speed" > /dev/null
        fi
    fi

    #wait until interval expires before rechecking
    sleep "$interval"
  done
done


No comments:

Post a Comment