Authors: Louis Bavoil and Iain Cantlay

With all modern graphics APIs (D3D11, D3D12, GL4 and Vulkan), it is possible for an application to query the elapsed GPU time for any given range of render calls by using timestamp queries. Most game engines today are using this mechanism to measure the GPU time spent on a whole frame and per pass. This blog post includes full source code for a simple D3D12 application (SetStablePowerState.exe) that can be run to disable and restore GPU Boost at any time, for all graphics applications running on the system. Disabling GPU Boost helps getting more deterministic GPU times from timestamp queries. And because the clocks are changed at the system level, you can run SetStablePowerState.exe even if your game is using a different graphics API than D3D12. The only requirement is that you use Windows 10 and have the Windows 10 SDK installed.

Motivation

On some occasions, we have found ourselves confused by the fact that the measured GPU time for a given pass we were working on would change over time, even if we did not make any change to that pass. The GPU times would be stable within a run, but would sometimes vary slightly from run to run. Later on, we learned that this can happen as a side effect of the GPU having a variable Core Clock frequency, depending on the current GPU temperature and possibly other factors such as power consumption. This can happen with all GPUs that have variable frequencies, and can happen with all NVIDIA GPUs that include a version of GPU Boost, more specifically all GPUs based on the Kepler, Maxwell and Pascal architectures, and beyond.

SetStablePowerState.exe

All NVIDIA GPUs that have GPU Boost have a well-defined Base Clock frequency associated with them. That is the value of the GPU Core Clock frequency that the GPU should be able to sustain while staying within the reference power usage and temperature targets. For the record, for each GeForce GPU, the Base Clock is specified in the associated Specification page on GeForce.com.

Using D3D12, there is an easy way for an application to request the NVIDIA driver to lock the GPU Core Clock frequency to its Base Clock value: by using the ID3D12Device::SetStablePowerState method. When calling SetStablePowerState(TRUE), a system-wide change of GPU power-management policy happens for the NVIDIA GPU associated with the current D3D12 device, and the current GPU Core Clock gets locked to the reference Base Clock recorded in the VBIOS for that GPU, unless thermal events happen. If the GPU detects that it’s overheating, it will then down-clock itself even if SetStablePowerState(TRUE) was called. But in practice, that should never happen if the GPU is in a properly cooled case and its fan is working properly. The result is that the GPU Core Clock frequency is then stable at Base Clock once any D3D12 application calls SetStablePowerState(TRUE) in the system. In other words, GPU Boost gets disabled. And our driver takes care of restoring the previous GPU power-management state when the locking D3D12 device gets released.

Knowing all that, we have written a simple standalone D3D12 application (SetStablePowerState.exe) that can lock and unlock the current GPU Core Clock frequency for any NVIDIA GPU with GPU Boost. The GPU Core Clock frequency gets instantly locked when launching this app, so it can be launched anytime you want to start/stop profiling GPU times. You can monitor your current GPU Core Clock frequency by using NVAPI (see Appendix) or by using an external GPU monitoring tool such as GPU-Z.

Using this standalone SetStablePowerState.exe application to lock the clocks before/after profiling GPU times makes it useless to ever call ID3D12Device::SetStablePowerState from a game engine directly. We actually recommend to never call this D3D12 method from engine code, especially for applications that have both D3D11 and D3D12 paths, to avoid any confusion when comparing GPU profiling results on D3D12 vs D3D11.

Gotchas

Using SetStablePowerState only modifies the GPU Core Clock frequency but does not modify the GPU Memory Clock frequency. So if an application gets a 1:1 between GPU Core Clock and GPU Memory Clock on a normal run, SetStablePowerState can modify it to up to 0.8 to 1. That’s an issue worth knowing as relative performance limiters will slightly shift. So when GPU Boost is disabled, a pass that is both math-throughput and memory-bandwidth limited may become more math limited; or, conversely, it may become relatively less memory limited.

Finally, for the SetStablePowerState call to succeed, you need to have the Windows 10 SDK installed. With Windows 10 up to Version 1511, that’s all you need. But with more recent versions of Windows 10 (starting from the Anniversary Update), you also need to enable “developer mode” in the OS settings, otherwise the call to SetStablePowerState will cause a D3D12 device removal.

Afterword: Some History and How Our Advice Evolved

If you have been following our DX12 Do's And Don'ts blog, you may have noticed that the advice on SetStablePowerState has changed. That could use some explanation…

In the first wave of DX12 games, we saw a couple of beta pre-releases that always called SetStablePowerState(TRUE) by default. As we discussed above, this API call significantly lowers the Core Clock frequency on NVIDIA GPUs and does not represent the end-user experience accurately. It is therefore quite inappropriate to call it by default in a shipping product, or even a beta.

We have also seen confusion result from the use of SetStablePowerState because it only works when the D3D12 debug layer is present on a system. We have seen multiple cases where development engineers and QA departments get different performance results because SetStablePowerState fails on some systems and the failure was quietly ignored.

Hence, our recommendation was to avoid SetStablePowerState or use it very thoughtfully and carefully.

For the Windows 10 Anniversary Update (aka Redstone), Microsoft changed the implementation, “SetStablePowerState now requires developer mode be enabled; otherwise, device removal will now occur.” (http://forums.directxtech.com/index.php?topic=5734.new). So any calls to SetStablePowerState will obviously fail on end-users systems or most QA systems. This is a change for the better and makes much of our previous advice irrelevant.

We are still left with the question of whether or not to test with SetStablePowerState. Do you test with reduced performance and more stable results? Do you test end-user performance and accept some variability? Do you monitor clocks and show a warning when variability exceeds a threshold? To be perfectly honest, we have changed our minds more than once at NVIDIA DevTech. This is for good reasons because there is no one true answer. The answer depends on exactly what you are trying to achieve and what matters most to you. We have done all three. We have largely settled on stabilizing the clocks for our in-depth, precise analyses.

Appendix: SetStablePowerState.cpp

 #include <dxgi1_4.h>
 #include <d3d12.h>
 #include <stdio.h>

 void Error(const char *str)
 {
     fprintf(stderr, "ERROR: %s\n", str);
     Sleep(INFINITE);
 }

 void GetHardwareAdapter(IDXGIFactory4* pFactory, IDXGIAdapter1** ppAdapter)
 {
     *ppAdapter = nullptr;
     for (UINT AdapterIndex = 0; ; ++AdapterIndex)
     {
         IDXGIAdapter1* pAdapter = nullptr;
         if (DXGI_ERROR_NOT_FOUND == pFactory->EnumAdapters1(AdapterIndex, &pAdapter))
         {
             break;
         }

         if (SUCCEEDED(D3D12CreateDevice(pAdapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr)))
         {
             *ppAdapter = pAdapter;
             return;
         }
         pAdapter->Release();
     }
 }

 int main(int argc, char *argv[])
 {
     IDXGIFactory4* pFactory = nullptr;
     if (FAILED(CreateDXGIFactory1(IID_PPV_ARGS(&pFactory))))
     {
         Error("CreateDXGIFactory1 failed");
     }

     IDXGIAdapter1* pAdapter = nullptr;
     GetHardwareAdapter(pFactory, &pAdapter);
     if (!pAdapter)
     {
         Error("Failed to find DX12-compatible DXGI adapter");
     }

     ID3D12Device* pDevice = nullptr;
     if (FAILED(D3D12CreateDevice(pAdapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&pDevice))))
     {
         Error("D3D12CreateDevice failed for Adapter");
     }

     if (FAILED(pDevice->SetStablePowerState(TRUE)))
     {
         Error("SetStablePowerState failed. Do you have the Win10 SDK installed?");
     }

     printf("SUCCESS. Close this program to restore default clocks.\n");
     Sleep(INFINITE);

     return 0;
 }

Appendix: Monitoring the GPU Core Clock using NVAPI

If you want to monitor your NVIDIA GPU Core Clock frequency without having to use an external tool, you can use the NvAPI_GPU_GetAllClockFrequencies function from NVAPI like in the example code below. We recommend to not call this function every frame, to avoid the risk of introducing any significant performance hit. Instead, we recommend calling it at the beginning and end of a given time interval (for instance before/after a GPU profiling session, or before/after playing a level), and display a warning if the GPU Core Clock frequency has changed during the considered time interval.

 #include "nvapi.h"

 class NvApiWrapper
 {
 public:
     struct FrequencyInfo
     {
         unsigned int NvGraphicsClockInMhz;
     };

     NvApiWrapper()
         : m_GpuHandle(0)
     {
     }

     bool Init()
     {
         NvAPI_Status Status = NvAPI_Initialize();
         if (Status != NVAPI_OK)
         {
             return false;
         }

         NvPhysicalGpuHandle NvGpuHandles[NVAPI_MAX_PHYSICAL_GPUS] = { 0 };
         NvU32 NvGpuCount = 0;
         Status = NvAPI_EnumPhysicalGPUs(NvGpuHandles, &NvGpuCount);
         if (Status != NVAPI_OK || NvGpuCount == 0)
         {
             return false;
         }

         m_GpuHandle = NvGpuHandles[0];
         return true;
     }

     bool GetCoreClockMhz(FrequencyInfo *pInfo)
     {
         NV_GPU_CLOCK_FREQUENCIES table = { 0 };
         table.version = NV_GPU_CLOCK_FREQUENCIES_VER;
         table.ClockType = NV_GPU_CLOCK_FREQUENCIES_CURRENT_FREQ;

         NvAPI_Status Status = NvAPI_GPU_GetAllClockFrequencies(m_GpuHandle, &table);
         if (Status != NVAPI_OK)
         {
             return false;
         }

         if (!table.domain[NVAPI_GPU_PUBLIC_CLOCK_GRAPHICS].bIsPresent)
         {
             return false;
         }

         NvU32 GraphicsClockInKhz = table.domain[NVAPI_GPU_PUBLIC_CLOCK_GRAPHICS].frequency;
         pInfo->NvGraphicsClockInMhz = NvU32((GraphicsClockInKhz + 500) / 1000);
         return true;
     }

 private:
     NvPhysicalGpuHandle m_GpuHandle;
 };