nvidia-smi命令怎么使用(nvidia-smi,开发技术)

时间:2024-05-08 01:04:06 作者 : 石家庄SEO 分类 : 开发技术
  • TAG :

    nvidia-smi命令详解

    1. nvidia-smi命令介绍

    在深度学习等场景中,nvidia-smi命令是我们经常接触到的一个命令,用来查看GPU的占用情况,可以说是一个必须要学会的命令了,普通用户一般用的比较多的就是nvidia-smi的命令,其实掌握了这一个命令也就能够覆盖绝大多数场景了,但是本质求真务实的态度,本文调研了相关资料,整理了一些比较常用的nvidia-smi命令的其他用法。

    2. nvidia-smi 支持的 GPU

    NVIDIA 的 SMI 工具基本上支持自 2011 年以来发布的任何 NVIDIA GPU。其中包括来自 Fermi 和更高架构系列(Kepler、Maxwell、Pascal、Volta , Turing, Ampere等)的 Tesla、Quadro 和 GeForce 设备。
    支持的产品包括:
    特斯拉:S1070、S2050、C1060、C2050/70、M2050/70/90、X2070/90、K10、K20、K20X、K40、K80、M40、P40、P100、V100、A100、H100。
    Quadro:4000、5000、6000、7000、M2070-Q、K 系列、M 系列、P 系列、RTX 系列
    GeForce:不同级别的支持,可用的指标少于 Tesla 和 Quadro 产品。

    3. 桌面显卡和服务器显卡均支持的命令

    (1)nvidia-smi

    用户名@主机名:~#nvidia-smiMonDec508:48:492022+-----------------------------------------------------------------------------+|NVIDIA-SMI450.172.01DriverVersion:450.172.01CUDAVersion:11.0||-------------------------------+----------------------+----------------------+|GPUNamePersistence-M|Bus-IdDisp.A|VolatileUncorr.ECC||FanTempPerfPwr:Usage/Cap|Memory-Usage|GPU-UtilComputeM.||||MIGM.||===============================+======================+======================||0A100-SXM4-40GBOn|00000000:26:00.0Off|0||N/A22CP054W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|1A100-SXM4-40GBOn|00000000:2C:00.0Off|0||N/A25CP053W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|2A100-SXM4-40GBOn|00000000:66:00.0Off|0||N/A25CP050W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|3A100-SXM4-40GBOn|00000000:6B:00.0Off|0||N/A23CP051W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|4A100-SXM4-40GBOn|00000000:A2:00.0Off|0||N/A23CP057W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|5A100-SXM4-40GBOn|00000000:A7:00.0Off|0||N/A25CP052W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+|Processes:||GPUGICIPIDTypeProcessnameGPUMemory||IDIDUsage||=============================================================================|+-----------------------------------------------------------------------------+

    下面详细介绍一下每个部分分别表示什么:

    nvidia-smi命令怎么使用

    下面是上图的一些具体介绍,比较简单的我就不列举了,只列举一些不常见的:

    持久模式:persistence mode 能够让 GPU 更快响应任务,待机功耗增加。
    关闭 persistence mode 同样能够启动任务。
    持续模式虽然耗能大,但是在新的GPU应用启动时,花费的时间更少
    风扇转速:主动散热的显卡一般会有这个参数,服务器显卡一般是被动散热,这个参数显示N/A。
    从0到100%之间变动,这个速度是计算机期望的风扇转速,实际情况下如果风扇堵转,可能打不到显示的转速。
    有的设备不会返回转速,因为它不依赖风扇冷却而是通过其他外设保持低温(比如有些实验室的服务器是常年放在空调房间里的)。
    温度:单位是摄氏度。
    性能状态:从P0到P12,P0表示最大性能,P12表示状态最小性能。
    Disp.A:Display Active,表示GPU的显示是否初始化。
    ECC纠错:这个只有近几年的显卡才具有这个功能,老版显卡不具备这个功能。
    MIG:Multi-Instance GPU,多实例显卡技术,支持将一张显卡划分成多张显卡使用,目前只支持安培架构显卡。
    新的多实例GPU (MIG)特性允许GPU(从NVIDIA安培架构开始)被安全地划分为多达7个独立的GPU实例。
    用于CUDA应用,为多个用户提供独立的GPU资源,以实现最佳的GPU利用率。
    对于GPU计算能力未完全饱和的工作负载,该特性尤其有益,因此用户可能希望并行运行不同的工作负载,以最大化利用率。

    开启MIG的显卡使用nvidia-smi命令如下所示:

    MonDec516:48:562022+-----------------------------------------------------------------------------+|NVIDIA-SMI450.172.01DriverVersion:450.172.01CUDAVersion:11.0||-------------------------------+----------------------+----------------------+|GPUNamePersistence-M|Bus-IdDisp.A|VolatileUncorr.ECC||FanTempPerfPwr:Usage/Cap|Memory-Usage|GPU-UtilComputeM.||||MIGM.||===============================+======================+======================||0A100-SXM4-40GBOn|00000000:26:00.0Off|0||N/A22CP054W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|1A100-SXM4-40GBOn|00000000:2C:00.0Off|0||N/A24CP053W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|2A100-SXM4-40GBOn|00000000:66:00.0Off|0||N/A25CP050W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|3A100-SXM4-40GBOn|00000000:6B:00.0Off|0||N/A23CP051W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|4A100-SXM4-40GBOn|00000000:A2:00.0Off|0||N/A23CP056W/400W|3692MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|5A100-SXM4-40GBOn|00000000:A7:00.0Off|0||N/A25CP052W/400W|0MiB/40537MiB|0%Default||||Disabled|+-------------------------------+----------------------+----------------------+|6A100-SXM4-40GBOn|00000000:E1:00.0Off|On||N/A24CP043W/400W|22MiB/40537MiB|N/ADefault||||Enabled|+-------------------------------+----------------------+----------------------+|7A100-SXM4-40GBOn|00000000:E7:00.0Off|On||N/A22CP052W/400W|19831MiB/40537MiB|N/ADefault||||Enabled|+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+|MIGdevices:|+------------------+----------------------+-----------+-----------------------+|GPUGICIMIG|Memory-Usage|Vol|Shared||IDIDDev|BAR1-Usage|SMUnc|CEENCDECOFAJPG||||ECC|||==================+======================+===========+=======================||6100|11MiB/20096MiB|420|30200|||0MiB/32767MiB|||+------------------+----------------------+-----------+-----------------------+|6201|11MiB/20096MiB|420|30200|||0MiB/32767MiB|||+------------------+----------------------+-----------+-----------------------+|7100|11MiB/20096MiB|420|30200|||0MiB/32767MiB|||+------------------+----------------------+-----------+-----------------------+|7201|19820MiB/20096MiB|420|30200|||4MiB/32767MiB|||+------------------+----------------------+-----------+-----------------------++-----------------------------------------------------------------------------+|Processes:||GPUGICIPIDTypeProcessnameGPUMemory||IDIDUsage||=============================================================================||4N/AN/A22888Cpython3689MiB||72037648C...vs/tf2.4-py3.8/bin/python19805MiB|+-----------------------------------------------------------------------------+

    如上图我将GPU 6和7两张显卡使用MIG技术虚拟化成4张显卡,分别是GPU6:0,GPU6:1,GPU7:0,GPU7:1
    每张卡的显存为20GB,MIG技术支持将A100显卡虚拟化成7张显卡,具体介绍我会在之后的博客中更新。

    (2)nvidia-smi -h

    输入 nvidia-smi -h 可查看该命令的帮助手册,如下所示:

    用户名@主机名:$nvidia-smi-hNVIDIASystemManagementInterface--v450.172.01NVSMIprovidesmonitoringinformationforTeslaandselectQuadrodevices.ThedataispresentedineitheraplaintextoranXMLformat,viastdoutorafile.NVSMIalsoprovidesseveralmanagementoperationsforchangingthedevicestate.NotethatthefunctionalityofNVSMIisexposedthroughtheNVMLC-basedlibrary.SeetheNVIDIAdeveloperwebsiteformoreinformationaboutNVML.PythonwrapperstoNVMLarealsoavailable.TheoutputofNVSMIisnotguaranteedtobebackwardscompatible;NVMLandthebindingsarebackwardscompatible.http://developer.nvidia.com/nvidia-management-library-nvml/http://pypi.python.org/pypi/nvidia-ml-py/Supportedproducts:-FullSupport-AllTeslaproducts,startingwiththeKeplerarchitecture-AllQuadroproducts,startingwiththeKeplerarchitecture-AllGRIDproducts,startingwiththeKeplerarchitecture-GeForceTitanproducts,startingwiththeKeplerarchitecture-LimitedSupport-AllGeforceproducts,startingwiththeKeplerarchitecturenvidia-smi[OPTION1[ARG1]][OPTION2[ARG2]]...-h,--helpPrintusageinformationandexit.LISTOPTIONS:-L,--list-gpusDisplayalistofGPUsconnectedtothesystem.-B,--list-blacklist-gpusDisplayalistofblacklistedGPUsinthesystem.SUMMARYOPTIONS:<noarguments>ShowasummaryofGPUsconnectedtothesystem.[plusanyof]-i,--id=TargetaspecificGPU.-f,--filename=Logtoaspecifiedfile,ratherthantostdout.-l,--loop=ProbeuntilCtrl+Catspecifiedsecondinterval.QUERYOPTIONS:-q,--queryDisplayGPUorUnitinfo.[plusanyof]-u,--unitShowunit,ratherthanGPU,attributes.-i,--id=TargetaspecificGPUorUnit.-f,--filename=Logtoaspecifiedfile,ratherthantostdout.-x,--xml-formatProduceXMLoutput.--dtdWhenshowingxmloutput,embedDTD.-d,--display=Displayonlyselectedinformation:MEMORY,UTILIZATION,ECC,TEMPERATURE,POWER,CLOCK,COMPUTE,PIDS,PERFORMANCE,SUPPORTED_CLOCKS,PAGE_RETIREMENT,ACCOUNTING,ENCODER_STATS,FBC_STATS,ROW_REMAPPERFlagscanbecombinedwithcommae.g.ECC,POWER.Samplingdatawithmax/min/avgisalsoreturnedforPOWER,UTILIZATIONandCLOCKdisplaytypes.Doesn'tworkwith-uor-xflags.-l,--loop=ProbeuntilCtrl+Catspecifiedsecondinterval.-lms,--loop-ms=ProbeuntilCtrl+Catspecifiedmillisecondinterval.SELECTIVEQUERYOPTIONS:Allowsthecallertopassanexplicitlistofpropertiestoquery.[oneof]--query-gpu=InformationaboutGPU.Call--help-query-gpuformoreinfo.--query-supported-clocks=Listofsupportedclocks.Call--help-query-supported-clocksformoreinfo.--query-compute-apps=Listofcurrentlyactivecomputeprocesses.Call--help-query-compute-appsformoreinfo.--query-accounted-apps=Listofaccountedcomputeprocesses.Call--help-query-accounted-appsformoreinfo.ThisqueryisnotsupportedonvGPUhost.--query-retired-pages=Listofdevicememorypagesthathavebeenretired.Call--help-query-retired-pagesformoreinfo.--query-remapped-rows=Informationaboutremappedrows.Call--help-query-remapped-rowsformoreinfo.[mandatory]--format=Commaseparatedlistofformatoptions:csv-commaseparatedvalues(MANDATORY)noheader-skipthefirstlinewithcolumnheadersnounits-don'tprintunitsfornumericalvalues[plusanyof]-i,--id=TargetaspecificGPUorUnit.-f,--filename=Logtoaspecifiedfile,ratherthantostdout.-l,--loop=ProbeuntilCtrl+Catspecifiedsecondinterval.-lms,--loop-ms=ProbeuntilCtrl+Catspecifiedmillisecondinterval.DEVICEMODIFICATIONOPTIONS:[anyoneof]-pm,--persistence-mode=Setpersistencemode:0/DISABLED,1/ENABLED-e,--ecc-config=ToggleECCsupport:0/DISABLED,1/ENABLED-p,--reset-ecc-errors=ResetECCerrorcounts:0/VOLATILE,1/AGGREGATE-c,--compute-mode=SetMODEforcomputeapplications:0/DEFAULT,1/EXCLUSIVE_PROCESS,2/PROHIBITED--gom=SetGPUOperationMode:0/ALL_ON,1/COMPUTE,2/LOW_DP-r--gpu-resetTriggerresetoftheGPU.CanbeusedtoresettheGPUHWstateinsituationsthatwouldotherwiserequireamachinereboot.TypicallyusefulifadoublebitECCerrorhasoccurred.Resetoperationsarenotguarenteedtoworkinallcasesandshouldbeusedwithcaution.-vm--virt-mode=SwitchGPUVirtualizationMode:SetsGPUvirtualizationmodeto3/VGPUor4/VSGAVirtualizationmodeofaGPUcanonlybesetwhenitisrunningonahypervisor.-lgc--lock-gpu-clocks=Specifies<minGpuClock,maxGpuClock>clocksasapair(e.g.1500,1500)thatdefinestherangeofdesiredlockedGPUclockspeedinMHz.Settingthiswillsupercedeapplicationclocksandtakeeffectregardlessifanappisrunning.Inputcanalsobeasingulardesiredclockvalue(e.g.<GpuClockValue>).-rgc--reset-gpu-clocksResetstheGpuclockstothedefaultvalues.-ac--applications-clocks=Specifies<memory,graphics>clocksasapair(e.g.2000,800)thatdefinesGPU'sspeedinMHzwhilerunningapplicationsonaGPU.-rac--reset-applications-clocksResetstheapplicationsclockstothedefaultvalues.-acp--applications-clocks-permission=Togglespermissionrequirementsfor-acand-raccommands:0/UNRESTRICTED,1/RESTRICTED-pl--power-limit=Specifiesmaximumpowermanagementlimitinwatts.-cc--cuda-clocks=OverridesorrestoresdefaultCUDAclocks.Inoverridemode,GPUclockshigherfrequencieswhenrunningCUDAapplications.OnlyonsupporteddevicesstartingfromtheVoltaseries.Requiresadministratorprivileges.0/RESTORE_DEFAULT,1/OVERRIDE-am--accounting-mode=EnableordisableAccountingMode:0/DISABLED,1/ENABLED-caa--clear-accounted-appsClearsalltheaccountedPIDsinthebuffer.--auto-boost-default=Setthedefaultautoboostpolicyto0/DISABLEDor1/ENABLED,enforcingthechangeonlyafterthelastboostclienthasexited.--auto-boost-permission=Allownon-admin/rootcontroloverautoboostmode:0/UNRESTRICTED,1/RESTRICTED-mig--multi-instance-gpu=EnableordisableMultiInstanceGPU:0/DISABLED,1/ENABLEDRequiresroot.[plusoptional]-i,--id=TargetaspecificGPU.-eow,--error-on-warningReturnanon-zeroerrorforwarnings.UNITMODIFICATIONOPTIONS:-t,--toggle-led=SetUnitLEDstate:0/GREEN,1/AMBER[plusoptional]-i,--id=TargetaspecificUnit.SHOWDTDOPTIONS:--dtdPrintdeviceDTDandexit.[plusoptional]-f,--filename=Logtoaspecifiedfile,ratherthantostdout.-u,--unitShowunit,ratherthandevice,DTD.--debug=Logencrypteddebuginformationtoaspecifiedfile.STATISTICS:(EXPERIMENTAL)statsDisplaysdevicestatistics."nvidia-smistats-h"formoreinformation.DeviceMonitoring:dmonDisplaysdevicestatsinscrollingformat."nvidia-smidmon-h"formoreinformation.daemonRunsinbackgroundandmonitordevicesasadaemonprocess.Thisisanexperimentalfeature.NotsupportedonWindowsbaremetal"nvidia-smidaemon-h"formoreinformation.replayUsedtoreplay/extractthepersistentstatsgeneratedbydaemon.Thisisanexperimentalfeature."nvidia-smireplay-h"formoreinformation.ProcessMonitoring:pmonDisplaysprocessstatsinscrollingformat."nvidia-smipmon-h"formoreinformation.TOPOLOGY:topoDisplaysdevice/systemtopology."nvidia-smitopo-h"formoreinformation.DRAINSTATES:drainDisplays/modifiesGPUdrainstatesforpoweridling."nvidia-smidrain-h"formoreinformation.NVLINK:nvlinkDisplaysdevicenvlinkinformation."nvidia-sminvlink-h"formoreinformation.CLOCKS:clocksControlandqueryclockinformation."nvidia-smiclocks-h"formoreinformation.ENCODERSESSIONS:encodersessionsDisplaysdeviceencodersessionsinformation."nvidia-smiencodersessions-h"formoreinformation.FBCSESSIONS:fbcsessionsDisplaysdeviceFBCsessionsinformation."nvidia-smifbcsessions-h"formoreinformation.GRIDvGPU:vgpuDisplaysvGPUinformation."nvidia-smivgpu-h"formoreinformation.MIG:migProvidescontrolsforMIGmanagement."nvidia-smimig-h"formoreinformation.Pleaseseethenvidia-smi(1)manualpageformoredetailedinformation.

    (3)nvidia-smi -L

    输入 nvidia-smi -L 可以列出所有的GPU设备及其UUID,如下所示:

    用户名@主机名:$nvidia-smi-LGPU0:A100-SXM4-40GB(UUID:GPU-9f2df045-8650-7b2e-d442-cc0d7ba0150d)GPU1:A100-SXM4-40GB(UUID:GPU-ad117600-5d0e-557d-c81d-7c4d60555eaa)GPU2:A100-SXM4-40GB(UUID:GPU-087ca21d-0c14-66d0-3869-59ff65a48d58)GPU3:A100-SXM4-40GB(UUID:GPU-eb9943a0-d78e-4220-fdb2-e51f2b490064)GPU4:A100-SXM4-40GB(UUID:GPU-8302a1f7-a8be-753f-eb71-6f60c5de1703)GPU5:A100-SXM4-40GB(UUID:GPU-ee1be011-0d98-3d6f-8c89-b55c28966c63)GPU6:A100-SXM4-40GB(UUID:GPU-8b3c7f5b-3fb4-22c3-69c9-4dd6a9f31586)MIG3g.20gbDevice0:(UUID:MIG-GPU-8b3c7f5b-3fb4-22c3-69c9-4dd6a9f31586/1/0)MIG3g.20gbDevice1:(UUID:MIG-GPU-8b3c7f5b-3fb4-22c3-69c9-4dd6a9f31586/2/0)GPU7:A100-SXM4-40GB(UUID:GPU-fab40b5f-c286-603f-8909-cdb73e5ffd6c)MIG3g.20gbDevice0:(UUID:MIG-GPU-fab40b5f-c286-603f-8909-cdb73e5ffd6c/1/0)MIG3g.20gbDevice1:(UUID:MIG-GPU-fab40b5f-c286-603f-8909-cdb73e5ffd6c/2/0)

    (4)nvidia-smi -q

    输入 nvidia-smi -q 可以列出所有GPU设备的详细信息。如果只想列出某一GPU的详细信息,可使用 -i 选项指定,如下图所示:

    用户名@主机名:$nvidia-smi-q-i0==============NVSMILOG==============Timestamp:MonDec517:31:452022DriverVersion:450.172.01CUDAVersion:11.0AttachedGPUs:8GPU00000000:26:00.0ProductName:A100-SXM4-40GBProductBrand:TeslaDisplayMode:EnabledDisplayActive:DisabledPersistenceMode:EnabledMIGModeCurrent:DisabledPending:DisabledAccountingMode:DisabledAccountingModeBufferSize:4000DriverModelCurrent:N/APending:N/ASerialNumber:1564720004631GPUUUID:GPU-9f2df045-8650-7b2e-d442-cc0d7ba0150dMinorNumber:2VBIOSVersion:92.00.19.00.10MultiGPUBoard:NoBoardID:0x2600GPUPartNumber:692-2G506-0200-002InforomVersionImageVersion:G506.0200.00.04OEMObject:2.0ECCObject:6.16PowerManagementObject:N/AGPUOperationModeCurrent:N/APending:N/AGPUVirtualizationModeVirtualizationMode:NoneHostVGPUMode:N/AIBMNPURelaxedOrderingMode:N/APCIBus:0x26Device:0x00Domain:0x0000DeviceId:0x20B010DEBusId:00000000:26:00.0SubSystemId:0x134F10DEGPULinkInfoPCIeGenerationMax:4Current:4LinkWidthMax:16xCurrent:16xBridgeChipType:N/AFirmware:N/AReplaysSinceReset:0ReplayNumberRollovers:0TxThroughput:0KB/sRxThroughput:0KB/sFanSpeed:N/APerformanceState:P0ClocksThrottleReasonsIdle:ActiveApplicationsClocksSetting:NotActiveSWPowerCap:NotActiveHWSlowdown:NotActiveHWThermalSlowdown:NotActiveHWPowerBrakeSlowdown:NotActiveSyncBoost:NotActiveSWThermalSlowdown:NotActiveDisplayClockSetting:NotActiveFBMemoryUsageTotal:40537MiBUsed:0MiBFree:40537MiBBAR1MemoryUsageTotal:65536MiBUsed:29MiBFree:65507MiBComputeMode:DefaultUtilizationGpu:0%Memory:0%Encoder:0%Decoder:0%EncoderStatsActiveSessions:0AverageFPS:0AverageLatency:0FBCStatsActiveSessions:0AverageFPS:0AverageLatency:0EccModeCurrent:EnabledPending:EnabledECCErrorsVolatileSRAMCorrectable:0SRAMUncorrectable:0DRAMCorrectable:0DRAMUncorrectable:0AggregateSRAMCorrectable:0SRAMUncorrectable:0DRAMCorrectable:0DRAMUncorrectable:0RetiredPagesSingleBitECC:N/ADoubleBitECC:N/APendingPageBlacklist:N/ARemappedRowsCorrectableError:0UncorrectableError:0Pending:NoRemappingFailureOccurred:NoBankRemapAvailabilityHistogramMax:640bank(s)High:0bank(s)Partial:0bank(s)Low:0bank(s)None:0bank(s)TemperatureGPUCurrentTemp:26CGPUShutdownTemp:92CGPUSlowdownTemp:89CGPUMaxOperatingTemp:85CMemoryCurrentTemp:38CMemoryMaxOperatingTemp:95CPowerReadingsPowerManagement:SupportedPowerDraw:55.79WPowerLimit:400.00WDefaultPowerLimit:400.00WEnforcedPowerLimit:400.00WMinPowerLimit:100.00WMaxPowerLimit:400.00WClocksGraphics:210MHzSM:210MHzMemory:1215MHzVideo:585MHzApplicationsClocksGraphics:1095MHzMemory:1215MHzDefaultApplicationsClocksGraphics:1095MHzMemory:1215MHzMaxClocksGraphics:1410MHzSM:1410MHzMemory:1215MHzVideo:1290MHzMaxCustomerBoostClocksGraphics:1410MHzClockPolicyAutoBoost:N/AAutoBoostDefault:N/AProcesses:None

    (5)nvidia-smi -l [second]

    输入 nvidia-smi -l [second] 后会每隔 second 秒刷新一次面板。监控GPU利用率通常会选择每隔1秒刷新一次,例如:

    nvidia-smi-l2

    (6)nvidia-smi -pm

    在 Linux 上,您可以将 GPU 设置为持久模式,以保持加载 NVIDIA 驱动程序,即使没有应用程序正在访问这些卡。 当您运行一系列短作业时,这特别有用。 持久模式在每个空闲 GPU 上使用更多瓦特,但可以防止每次启动 GPU 应用程序时出现相当长的延迟。 如果您为 GPU 分配了特定的时钟速度或功率限制(因为卸载 NVIDIA 驱动程序时这些更改会丢失),这也是必要的。 通过运行以下命令在所有 GPU 上启用持久性模式:

    nvidia-smi-pm1

    也可以指定开启某个显卡的持久模式:

    nvidia-smi-pm1-i0

    (7)nvidia-smi dmon

    以 1 秒的更新间隔监控整体 GPU 使用情况

    (8)nvidia-smi pmon

    以 1 秒的更新间隔监控每个进程的 GPU 使用情况:

    (9)&hellip;

    4. 服务器显卡才支持的命令

    使用 nvidia-smi 查看系统/GPU 拓扑和 NVLink

    要正确利用更高级的 NVIDIA GPU 功能(例如 GPU Direct),正确配置系统拓扑至关重要。 拓扑是指各种系统设备(GPU、InfiniBand HCA、存储控制器等)如何相互连接以及如何连接到系统的 CPU。 某些拓扑类型会降低性能甚至导致某些功能不可用。 为了帮助解决此类问题,nvidia-smi 支持系统拓扑和连接查询:

    nvidia-smitopo--matrix#查看系统/GPU拓扑nvidia-sminvlink--status#查询NVLink连接本身以确保状态、功能和运行状况。nvidia-sminvlink--capabilities#查询NVLink连接本身以确保状态、功能和运行状况。
     </div> <div class="zixun-tj-product adv-bottom"></div> </div> </div> <div class="prve-next-news">
    本文:nvidia-smi命令怎么使用的详细内容,希望对您有所帮助,信息来源于网络。
    上一篇:Linux常用nvidia-smi命令有哪些下一篇:

    11 人围观 / 0 条评论 ↓快速评论↓

    (必须)

    (必须,保密)

    阿狸1 阿狸2 阿狸3 阿狸4 阿狸5 阿狸6 阿狸7 阿狸8 阿狸9 阿狸10 阿狸11 阿狸12 阿狸13 阿狸14 阿狸15 阿狸16 阿狸17 阿狸18