The first GPUs from the Turing architecture have arrived with lots of new features. Extensions have been added to both Vulkan and OpenGL to give developers access to these new features. The various Khronos Registries and Repositories have been updated to include the specifications and tools for the new extensions. The Vulkan and OpenGL extensions enumerated below provide developers access to these new features.

More details about the extensions below can be found at the official Khronos specification repositories:

In addition to the specifications, some tools are necessary for Vulkan developers to use these new extensions. All the following tool repositories have support for these new extensions:

Vulkan SDK

The official Vulkan SDK version 1.1.85.0 from LunarG includes all the above components packaged together bringing all the tools developers need to develop Vulkan applications for Turing. An updated Vulkan SDK version 1.1.92.0 contains support for VK_NV_ray_tracing and SPV_NV_ray_tracing. Visit the LunarG SDK page to download the latest SDK for Windows or Linux here: LunarG Vulkan SDK.

Drivers

Starting with the first GeForce RTX 2080 driver version 411.63 available here all Turing capable drivers going forward support all of these Vulkan and OpenGL extensions.

Extensions Overview

The following sections give a brief overview of each new extension and provide links to their specifications:

Raytracing

Turing brings hardware acceleration for raytracing through dedicated units called RT cores. The RT cores provide BVH traversal as well as ray-triangle intersection. This acceleration is exposed in Vulkan through a new ray-tracing pipeline, associated with a series of new shader stages. This programming model is similar to the DXR (DirectX Ray-Tracing) model, which is briefly described in this blog post: Introduction to NVIDIA RTX and DirectX Ray Tracing

A GTC 2018 presentation about Vulkan Ray-Tracing can be found here: video slides.

Mesh Shaders

Mesh shaders provide a new programmable geometry processing pipeline, replacing the traditional vertex/tessellation/geometry pipeline. This pipeline is built around two shader stages: the task shader and the mesh shader. If enabled, the task shader specifies the number of mesh shaders to spawn for each task. This can be used for variable workload expansion or reduction. The mesh shader writes a compact mesh description (meshlets) to on-chip memory and then feeds that output to the rasterizer for further processing. This provides a flexible and efficient compute-like programming model supporting generic cooperative thread group features (workgroups, shared memory, barrier synchronizations, etc.). This can be used to implement efficient culling or LOD schemes, perform procedural geometry generation, and many other techniques.

More details can be found in this technical blog by Christoph Kubisch Introduction Turing Mesh Shaders and in his Siggraph 2018 presentation Sigraph 2018 Mesh Shaders.

Shading Rate Image

This hardware feature allows applications to dynamically control the number of fragment shaders that will be launched in a particular area of the screen while rendering primitives, which we refer to as the shading rate. The shading rate can be as coarse as one fragment shader for each 4x4 block of pixels or as fine as launching 16 fragment shader invocations per pixel. The shading rate can vary across the screen and is controlled using a shading rate image. Each texel of this image controls the shading rate for a 16x16 region of the screen and holds an 8-bit index value that is mapped to a shading rate via a per-viewport shading rate palette.

The GLSL and SPIR-V extensions also expose built-ins that allow fragment shaders to read the effective fragment size in pixels (gl_FragmentSizeNV) as well as the number of fragment shader invocations launched for a fully covered pixel (gl_InvocationsPerPixelNV).

This extension allows developers to implement more efficient shading rate techniques, like Foveated Rendering, Lens Adaptation (for VR), Content or Motion Adaptive Shading.

Shader Image Footprint

This extension provides a set of GLSL (and SPIR-V) query functions that report the set of pixels that would be accessed when performing a filtered texture lookup, which we call the image or texture footprint. Footprints are supported for 2D and 3D images and include a 64-bit bitfield reporting on coverage for an 8x8 (2D) or 4x4x4 (3D) neighborhood. Each bit in this bitfield is set if and only if any pixel in a block of pixels would be read by the lookup. The size of these pixel blocks is given by a requested granularity, which can vary from 2x2 to 256x256 for 2D images. In addition to the bitfield, the footprint also includes information identifying the location of the neighborhood in the full image.

This is an important component for implementing multipass decoupled and image-space shading pipelines, where identifying the set of pixels that are actually visible allows the application to reduce shading work in a subsequent pass. Model-space rendering is an example of where this can be used to avoid wasting GPU cycles rendering to parts of the model that will never be seen.

Corner Sampled Image

This extension adds support for corner-sampled images, which are arranged as a set of pixel rectangles storing values at the corners of each rectangle. By contrast, conventional images store values at the center of each pixel rectangle. This allows lookups sampling the edge of the image to get exact values on the edge of the texture. This facilitates implementing Ptex (Per-face Texture [Burley and Lacewell 2008]) texturing in real-time applications by providing proper filtering and interpolation. Ptex uses separate images for each face of a subdivision surface or polygon mesh. With sample locations at pixel corners, continuity between adjacent patches can be maintained by duplicating values along shared edges.

Representative Fragment Test

This extension optimizes occlusion query techniques that use a fragment shader to record the set of visible primitives. A large visible primitive may spawn a large number of fragment shaders, and having each fragment shader record visibility results in a lot of wasted work. This feature is an early fragment test that allows the hardware to stop generating new fragments for a given primitive once it ensures that at least one other (representative) fragment will be processed. While this performance optimization can significantly reduce the amount of work to record visibility, it does not guarantee that every "extra" fragment will be discarded.

Fragment Shader Barycentrics

This feature provides new GLSL and SPIR-V fragment shader built-in inputs that hold barycentric coordinates, and also allows fragment shaders to directly read raw per-vertex attribute values in order to perform barycentric interpolation manually. The three-component vector built-ins gl_BaryCoordNV and gl_BaryCoordNoPerspNV provide perspective-corrected and non-corrected barycentric coordinates. Fragment shader inputs accessed using per-vertex values are declared in GLSL using the "pervertexNV" qualifier and, as with tessellation and geometry shader inputs, are declared as arrays with separate elements for each vertex.

This feature allows applications to reduce the amount of data passed from vertex to fragment shaders by using more compactly packed representations. Additionally, it allows fragment shaders to interpolate using per-vertex values fetched directly from memory or to perform arbitrary interpolation operations using raw attributes accessed from the primitive's vertices.

Compute Shader Derivatives

This extension allows compute shaders to compute quad-based derivatives, which was previously only possible in fragment shaders. This extension allows compute shaders to use built-in derivative functions like dFdx(), texture lookup functions using automatic level-of-detail computation, and the texture level of detail query function textureQueryLod(). It provides two layout qualifiers allowing arrange compute shader invocations into quads using either a linear index or the (x,y) coordinates assigned to each invocation.

Exclusive Scissor

This extension adds a second per-viewport scissor test, which culls fragments inside (exclusive) the specified rectangle, unlike the standard scissor test which culls outside (inclusive). This can be used to optimize multi-resolution foveated-rendering schemes (in conjunction with Variable Rate Shading), where raster passes fill concentric strips of pixels by enabling both inclusive and exclusive scissor tests.