NVIDIA® Nsight™ Graphics 2023.4 is released with the following changes:

Feature Enhancements:

  • In this release, we have evolved the Nsight Graphics acceleration structure viewer into the much more powerful Ray Tracing Inspector! This tool adds the ability for advanced, visual inspection of ray tracing applications. This includes the new ability to display traversal timing as a heatmap for a quick evaluation of the hotspots of your scenes traversal. It also adds the ability to see instance AABB overlaps in world space for identification of scene complexity. Opacity micromap analysis has received additional improvement as well, including new statistics that help you understand OMM use and efficiency.
  • D3D12 Work Graphs are now supported:
    • Added support in the Frame Debugger for capture and replay of D3D12 Work Graphs. Note: the D3D12 Works Graphs functionality is in preview state and may not work in all instances.
    • In GPU Trace, D3D12 DispatchGraph() calls are now shown as timeline events in the Actions row. This support is limited to the preview release of D3D12 Work Graphs in Agility SDK 1.711.3-preview.
  • In the Shader Profiler the new “Instruction Mix” view reveals statistics about the currently selected code block. This includes the types of instructions (pipeline, family, operation), and producer/consumer relationships of long latency instructions such as memory accesses. Use this view to navigate through the data dependencies in your shader.

Improvements:

  • GPU Trace
    • The Real-Time Shader Profiler is out of preview and is now enabled by default.
    • The new “Instruction Mix” view is fully integrated with the GPU Trace timeline. Upon selection of a time-region, it will display the types of instructions that the hardware sampling profiler landed on, and the number of unique instruction addresses per category.
    • The GPU Trace HUD Overlay now has a “Background Compiles” indicator that states whether optimized shader compilation is occurring in the background within the driver. This indicator is available for D3D12 and Vulkan applications. To learn more about shader compilation and how to control it from your engine, see this NVIDIA blog article.
    • When the Real-Time Shader Profiler is enabled, the SM Warp Occupancy timeline row now reveals the individual shader stages being executed. Notably, this reveals ray tracing stages individually, allowing you to quickly identify whether Any Hit shaders are consuming a large fraction of performance (for example).
    • The Real-Time Shader Profiler has a new “SM Warps Stalled at Issue Stage” timeline row that reveals the aggregate set of warp stall reasons at each point in time.
    • VK_EXT_shader_object is now fully supported. VkShaderEXT shader bindings are shown on the timeline in the same way as pipeline state objects. VkShaderEXT objects are also listed in the Shader Pipelines view.
    • When the Real-Time Shader Profiler is enabled, a new Avg. Warp Latency timeline row reveals the average duration of a shader warp at each point in time. Use this to visually identify short and long duration shaders.
    • The Real-Time Shader Profiler now provides an estimation of cycle counts per shader/function/source-line, in the Average Warp Latency column.
    • The Data Transfer dialog is now dismissable if the application was terminated during the transfer of a trace report.
  • Shader Profiler
    • Each Shader profiler view now has an Instruction Mix column, that reveals the decomposition of instruction types within that code block.
    • VK_EXT_shader_object is now fully supported by the Frame Debugger | Shader Profiler.
    • The Total Samples column in the source view reveals the total performance per source line, including inlined functions. Use this column to visually scan for the most expensive source lines in a shader.
  • Frame Debugger
  • Aftermath
    • Added the ability to calculate a signature or hash for crash dump files.
    • Added an Automated Analysis component to the Exception Summary screen.

Known Issues:

  • GPU Trace
    • The GPU Contexts row may contain incomplete data in scenarios with high frequency context switching This is a limitation of the NVIDIA driver.
    • On Windows, CommandList timeline events may appear to be active for longer than their true duration, when in reality the underlying hardware queue was in a wait state for the initial portion of that time. This only occurs when Windows Hardware Accelerated GPU Scheduling is enabled.
    • The Average Warp Latency column in the Shader Profiler will provide accurate values within regions of execution where a single workload was executing; however, selecting broader regions will wash out the results (caused by averaging). The recommended approach is to use the Avg. Warp Latency timeline row to identify a region of execution, select a perf marker or range within that, and then use the Shader Profiler’s values.
    • When the Real-Time Shader Profiler is enabled, the new SM Warp Occupancy timeline row (per-shader-stage) is only able to show complete stats for D3D12 and Vulkan contexts in the traced process. Other contexts will have incomplete values in this row; however, the neighboring “SM Warp Occupancy (HW)” row contains the full set of stats, directly sampled from hardware counters.
  • Shader Profiler
    • The Vulkan Shader Profiler’s support for KHR_non_semantic_info is contingent on shader compiler support. dxc -fspv-debug=vulkan-with-source works well, aside from this compiler bug. In the Vulkan SDK, glslangValidator -gVS does not produce sufficient info at the time of this release.
    • In the Frame Debugger, when Collect SASS Execution Counters is true, the Shader Profiler may encounter a Device Lost error on R535 drivers. To work around this, enable Nsight Aftermath before the Frame Debugger session.
    • In the Frame Debugger, instruction execution counters are no longer collected by default. To re-enable these counters, we recommend first increasing the Windows TDR Timeout to 30 seconds; then in the Frame Debugger launch settings | Troubleshooting tab, set Collect SASS Execution Counters to “Yes”.
    • In the Frame Debugger, instruction execution counters for D3D12 pixel shaders will only be collected on R545.95 and newer drivers, due to limitations in the DX driver.
    • In the Frame Debugger, instruction execution counters for VkShaderEXT objects will only be collected on R546.24 and newer drivers, where the capability was newly introduced.

For more details and known issues, please see the full release notes!

For an overview of Nsight™ Graphics and access to resources, please visit the main Nsight™ Graphics page.

NVIDIA® Nsight™ Graphics 2023.4 is available for download under the NVIDIA Registered Developer Program.

 Download   Documentation