Notice
Recent Posts
Recent Comments
Link
«   2025/05   »
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Tags
more
Archives
Today
Total
관리 메뉴

OveractionK의 기록보관소

nvidia-smi의 gpu utilization logging 본문

카테고리 없음

nvidia-smi의 gpu utilization logging

평화로운가을남자 2019. 5. 8. 00:55

위 기능을 찾게 되었던 이유는 training에 사용하던 model의 GPU utilization이 생각보다 높지 않다는 사실을 깨달아서였다. mnist 조차도 30퍼센트가 넘는다고 들었는데, 빵빵한 RTX 2080 ti의 utilization이 겨우 13퍼센트가 된다는 사실을 깨달았을때는 사실 충격적이였다. Training 모델의 hidden layer가 한층이란게 원인으로 생각된다.

 

아래의 다양한 기능들이 부가적으로 존재하는데, 1시간동안의 gpu utilization 만 보면되는 상황이라 아래와 같이 입력하였다. 

$ nvidia-smi --query-gpu=timestamp,name,utilization.gpu --format=csv -l 5 -t 3600 -f gpu_log.csv

 

QueryDescription

$ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max, pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory, memory.total,memory.free,memory.used --format=csv -l 5

timestamp The timestamp of where the query was made in format "YYYY/MM/DD HH:MM:SS.msec".
name The official product name of the GPU. 
This is an alphanumeric string. For all products.
pci.bus_id PCI bus id as "domain:bus:device.function", in hex.
driver_version The version of the installed NVIDIA display driver. 
This is an alphanumeric string.
pstate The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).
pcie.link.gen.max The maximum PCI-E link generation possible with this GPU and system configuration. 
For example, if the GPU supports a higher PCIe generation than the system supports then this reports the system PCIe generation.
pcie.link.gen.current The current PCI-E link generation. These may be reduced when the GPU is not in use.
temperature.gpu Core GPU temperature. in degrees C.

utilization.gpu

Percent of time over the past sample period during which one or more kernels was executing on the GPU.
The sample period may be between 1 second and 1/6 second depending on the product.

utilization.memory

Percent of time over the past sample period during which global (device) memory was being read or written.
The sample period may be between 1 second and 1/6 second depending on the product.

memory.total

Total installed GPU memory.

memory.free

Total free memory.

memory.used

Total memory allocated by active contexts.

 

 

출처

https://nvidia.custhelp.com/app/answers/detail/a_id/3751/~/useful-nvidia-smi-queries