Examples |
|||
---|---|---|---|
Group |
Example |
Description |
|
Subgroup |
|||
Introduction Examples |
introduction_example |
cuBLASDx API introduction example |
|
Simple GEMM Examples |
Basic Example |
simple_gemm_fp32 |
Performs fp32 GEMM |
simple_gemm_cfp16 |
Performs complex fp16 GEMM |
||
Extra Examples |
simple_gemm_leading_dimensions |
Performs GEMM with non-default leading dimensions |
|
simple_gemm_std_complex_fp32 |
Performs GEMM with |
||
NVRTC Examples |
nvrtc_gemm |
Performs GEMM, kernel is compiled using NVRTC |
|
GEMM Performance |
single_gemm_performance |
Benchmark for single GEMM |
|
fused_gemm_performance |
Benchmark for 2 GEMMs fused into a single kernel |
||
Advanced Examples |
Fusion |
fused_gemm |
Performs 2 GEMMs in a single kernel |
gemm_fft |
Perform GEMM and FFT in a single kernel |
||
gemm_fft_fp16 |
Perform GEMM and FFT in a single kernel (half-precision complex type) |
||
gemm_fft_performance |
Benchmark for GEMM and FFT fused into a single kernel |
||
Deep Learning |
scaled_dot_prod_attn |
Scaled dot product attention using cuBLASDx |
|
scaled_dot_prod_attn_batched |
Multi-head attention using cuBLASDx |
||
Other |
multiblock_gemm |
Proof-of-concept for single large GEMM using multiple CUDA blocks |
|
batched_gemm_fp64 |
Manual batching in a single CUDA block |
||
blockdim_gemm_fp16 |
BLAS execution with different block dimensions |