NOTE: The
NVIDIA Developer Forums
and the
GPU Computing Forums
require separate logins. We will fix this in the near future when the two forums are merged. Thank you for your patience!
The CUFFT Library now supports double-precision transforms and includes
significant performance improvements for single-precision transforms as
well. See the CUDA Toolkit release notes for details.
The cuda-gdb hardware debugger and CUDA Visual Profiler are now
included in the CUDA Toolkit installer, and the CUDA-GDB debugger is
now available for all supported Linux distros.
Each GPU in an SLI group is now enumerated individually, so compute
applications can now take advantage of multi-GPU performance even when
SLI is enabled for graphics.
The 64-bit versions of the CUDA Toolkit now support compiling 32-bit
applications. Please note that the installation location of the
libraries has changed, so developers on 64-bit Linux must update
their LD_LIBRARY_PATH to contain either /usr/local/cuda/lib or
/usr/local/cuda/lib64.
New support for fp16/fp32 conversion intrinsics allows storage of
data in fp16 format with computation in fp32. Use of fp16 format is
ideal for applications that require higher numerical range than 16-bit
integer but less precision than fp32 and reduces memory space and
bandwidth consumption.
The Visual Profiler includes several enhancements:
All memory transfer API calls are now reported
Support for profiling multiple contexts per GPU
Synchronized clocks for requested start time on the CPU and start/end
times on the GPU for all kernel launches and memory transfers
Global memory load and store efficiency metrics for GPUs with
compute capability 1.2 and higher
The CUDA Driver for MacOS now has it's own installer, and is available separate from the CUDA
Toolkit.
Support for major Linux distros, MacOS X, and Windows:
MacOS X 10.5.6 and later (32-bit)
Windows XP/Vista/7 with Visual Studio 8 (VC2005 SP1) and 9 (VC2008)
A new pitchLinearTexure code sample that shows how to efficiently
texture from pitch linear memory.
A new PTXJIT code sample illustrating how to use cuModuleLoadDataEx()
to load PTX source from memory instead of loading a file.
Two new code samples for Windows, showing how to use the NVCUVID
library to decode MPEG-2, VC-1, and H.264 content and pass frames
to OpenGL or Direct3D for display.
Updated code samples showing how to properly align CUDA kernel
function parameters so the same code works on both x32 and x64
systems.