GPU Accelerated Computing with C and C++

Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python. Below you will find some resources to help you get started using CUDA.

1
SETUP CUDA

Install the free CUDA Tookit on a Linux, Mac or Windows system with one or more CUDA-capable GPUs. Follow the instructions in the CUDA Quick Start Guide to get up and running quickly.

Or, watch the short video below and follow along.

If you do not have a GPU, you can access one of the thousands of GPUs available from cloud service providers including Amazon AWS, Microsoft Azure and IBM SoftLayer. The NVIDIA-maintained CUDA Amazon Machine Image (AMI) on AWS, for example, comes pre-installed with CUDA and is available for use today.

For more detailed installation instructions, refer to the CUDA installation guides. For help with troubleshooting, browse and participate in the CUDA Setup and Installation forum.

2
YOUR FIRST CUDA PROGRAM

You are now ready to write your first CUDA program. The article, Even Easier Introduction to CUDA, introduces key concepts through simple examples that you can follow along.

The video below walks through an example of how to write an example that adds two vectors.

The Programming Guide in the CUDA Documentation introduces key concepts covered in the video including CUDA programming model, important APIs and performance guidelines.

3
PRACTICE CUDA

NVIDIA provides hands-on training in CUDA through a collection of self-paced and instructor-led courses. The self-paced online training, powered by GPU-accelerated workstations in the cloud, guides you step-by-step through editing and execution of code along with interaction with visual tools. All you need is a laptop and an internet connection to access the complete suite of free courses and certification options.

The CUDA C Best Practices Guide presents established parallelization and optimization techniques and explains programming approaches that can greatly simplify programming GPU-accelerated applications.

Additional Resources

CODE Samples

Availability

The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

So, now you’re ready to deploy your application?
Register today for free access to NVIDIA TESLA GPUs in the cloud.

Latest News

NVIDIA’s Top 10 AI Developer Stories of 2019

These are the top 10 AI developer stories that we covered this year on the NVIDIA Developer News Center.

Multi-GPU Workflows for Training AI Models in Academic Research

We highlight a few research areas by our NVAIL, NVIDIA’s academic partners, who are leveraging multi-GPU training in their research.

Dyndrite Unveils First GPU-Accelerated Geometry Kernel to tackle Data Explosion in Additive Manufacturing

the team at Dyndrite has developed a new GPU-based platform: Accelerated Computation Engine (ACE), the world’s first GPU-accelerated geometry kernel.

PGI Community Edition 19.10 Now Available

New PGI Community Edition supports NVIDIA V100 Tensor Cores in CUDA Fortean, the full C++17 language, PCAST CPU/GPU auto-compare directives, OpenACC 2.6 and more.

Blogs: Parallel ForAll

Bringing HLSL Ray Tracing to Vulkan

DirectX Ray Tracing (DXR) allows you to render graphics using ray tracing instead of the traditional method of rasterization. This API was created by NVIDIA and Microsoft back in 2018.

Learning to Rank with XGBoost and GPU

XGBoost is a widely used machine learning library, which uses gradient boosting techniques to incrementally build a better model during the training phase by combining multiple weak models.

Building a Real-time Redaction App Using NVIDIA DeepStream, Part 2: Deployment

This post is the second in a series (Part 1) that addresses the challenges of training an accurate deep learning model using a large public dataset and deploying the model on the edge for real-time inference using NVIDIA DeepStream.

Building a Real-time Redaction App Using NVIDIA DeepStream, Part 1: Training

Some of the biggest challenges in deploying an AI-based application are the accuracy of the model and being able to extract insights in real time. There’s a trade-off between accuracy and inference throughput.