GPU Accelerated Computing with C and C++

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python. Below you will find some resources to help you get started using CUDA.


Install the free CUDA Tookit on a Linux, Mac or Windows system with one or more CUDA-capable GPUs. Follow the instructions in the CUDA Quick Start Guide to get up and running quickly.

Or, watch the short video below and follow along.

If you do not have a GPU, you can access one of the thousands of GPUs available from cloud service providers including Amazon AWS, Microsoft Azure and IBM SoftLayer. The NVIDIA-maintained CUDA Amazon Machine Image (AMI) on AWS, for example, comes pre-installed with CUDA and is available for use today.

For more detailed installation instructions, refer to the CUDA installation guides. For help with troubleshooting, browse and participate in the CUDA Setup and Installation forum.


You are now ready to write your first CUDA program. The article, Even Easier Introduction to CUDA, introduces key concepts through simple examples that you can follow along.

The video below walks through an example of how to write an example that adds two vectors.

The Programming Guide in the CUDA Documentation introduces key concepts covered in the video including CUDA programming model, important APIs and performance guidelines.


NVIDIA provides hands-on training in CUDA through a collection of self-paced and instructor-led courses. The self-paced online training, powered by GPU-accelerated workstations in the cloud, guides you step-by-step through editing and execution of code along with interaction with visual tools. All you need is a laptop and an internet connection to access the complete suite of free courses and certification options.

The CUDA C Best Practices Guide presents established parallelization and optimization techniques and explains programming approaches that can greatly simplify programming GPU-accelerated applications.

Additional Resources

CODE Samples


The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

So, now you’re ready to deploy your application?
Register today for free access to NVIDIA TESLA GPUs in the cloud.

Latest News

Dyndrite Unveils First GPU-Accelerated Geometry Kernel to tackle Data Explosion in Additive Manufacturing

the team at Dyndrite has developed a new GPU-based platform: Accelerated Computation Engine (ACE), the world’s first GPU-accelerated geometry kernel.

PGI Community Edition 19.10 Now Available

New PGI Community Edition supports NVIDIA V100 Tensor Cores in CUDA Fortean, the full C++17 language, PCAST CPU/GPU auto-compare directives, OpenACC 2.6 and more.

Latest Updates to NVIDIA CUDA-X Libraries

Learn what’s new in the latest releases of NVIDIA’s CUDA-X Libraries and NGC

Developer Spotlight: Visualizing High-Resolution Atomic Structures to Simulate Molecular Dynamics

Experimental sciences deliver high-resolution atomic structures for biological complexes, but researchers need to refine those structures, prove their accuracy, and simulate their dynamics while retaining all of the information that makes simulati

Blogs: Parallel ForAll

Int4 Precision for AI Inference

INT4 Precision Can Bring an Additional 59% Speedup Compared to INT8 If there’s one constant in AI and deep learning, it’s never-ending optimization to wring every possible bit of performance out of a given platform.

Introducing Jetson Xavier NX, the World’s Smallest AI Supercomputer

Today NVIDIA announced Jetson Xavier NX, the world’s smallest, most advanced embedded AI supercomputer for autonomous robotics and edge computing devices.

MLPerf Inference: NVIDIA Innovations Bring Leading Performance

New TensorRT 6 Features Combine with Open-Source Plugins to Further Accelerate Inference  Inference is where AI goes to work. Identifying diseases. Answering questions. Recommending products and services.

grCUDA: A Polyglot Language Binding for CUDA in GraalVM

Integrating GPU-accelerated libraries into existing software stacks can be challenging, in particular, for applications that are written in high-level scripting languages.