AI Pipeline

NVIDIA® Riva is an application framework for multimodal conversational AI services that deliver real-time performance on GPUs.

Click here to view other performance data.

Riva Benchmarks

H100 ASR Benchmarks - Best Streaming Throughput Mode

Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
conformer n-gram 1 14.6 1 H100 SXM5-80GB
conformer n-gram 64 71 64 H100 SXM5-80GB
conformer n-gram 128 110 126 H100 SXM5-80GB
conformer n-gram 256 184 249 H100 SXM5-80GB
H100 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 9.4 1 H100 SXM5-80GB
citrinet n-gram 8 13 8 H100 SXM5-80GB
citrinet n-gram 16 17.6 16 H100 SXM5-80GB
citrinet n-gram 32 25 32 H100 SXM5-80GB
citrinet n-gram 48 32 48 H100 SXM5-80GB
citrinet n-gram 64 40 64 H100 SXM5-80GB
conformer n-gram 1 13 1 H100 SXM5-80GB
conformer n-gram 8 23 8 H100 SXM5-80GB
conformer n-gram 16 26 16 H100 SXM5-80GB
conformer n-gram 32 42 32 H100 SXM5-80GB
conformer n-gram 48 54 48 H100 SXM5-80GB
H100 ASR Benchmarks - Offline Mode
Acoustic Model Language Model # of Streams Throughput (RTFX) GPU Version
conformer n-gram 32 1900 H100 SXM5-80GB

ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk

L40 ASR Benchmarks - Best Streaming Throughput Mode

Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
conformer n-gram 1 12.7 1 NVIDIA L40
conformer n-gram 64 75 64 NVIDIA L40
conformer n-gram 128 113 126 NVIDIA L40
conformer n-gram 256 180 250 NVIDIA L40
L40 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 8.7 1 NVIDIA L40
citrinet n-gram 8 15 8 NVIDIA L40
citrinet n-gram 16 20 16 NVIDIA L40
citrinet n-gram 32 32 32 NVIDIA L40
citrinet n-gram 48 40 48 NVIDIA L40
citrinet n-gram 64 53 64 NVIDIA L40
conformer n-gram 1 11 1 NVIDIA L40
conformer n-gram 8 21 8 NVIDIA L40
conformer n-gram 16 28 16 NVIDIA L40
conformer n-gram 32 40 32 NVIDIA L40
conformer n-gram 48 53 48 NVIDIA L40
L40 ASR Benchmarks - Offline Mode
Acoustic Model Language Model # of Streams Throughput (RTFX) GPU Version
conformer n-gram 32 2100 NVIDIA L40

ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk

L4 ASR Benchmarks - Best Streaming Throughput Mode

Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
conformer n-gram 1 15 1 NVIDIA L4
conformer n-gram 64 140 63 NVIDIA L4
conformer n-gram 128 228 125 NVIDIA L4
conformer n-gram 256 455 244 NVIDIA L4
L4 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 11.25 1 NVIDIA L4
citrinet n-gram 8 16 8 NVIDIA L4
citrinet n-gram 16 28 16 NVIDIA L4
citrinet n-gram 32 46 32 NVIDIA L4
citrinet n-gram 48 60 48 NVIDIA L4
citrinet n-gram 64 84 63 NVIDIA L4
conformer n-gram 1 13.5 1 NVIDIA L4
conformer n-gram 8 24 8 NVIDIA L4
conformer n-gram 16 33 16 NVIDIA L4
conformer n-gram 32 56 32 NVIDIA L4
conformer n-gram 48 90 47 NVIDIA L4
L4 ASR Benchmarks - Offline Mode
Acoustic Model Language Model # of Streams Throughput (RTFX) GPU Version
conformer n-gram 32 920 NVIDIA L4

ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk

A100 ASR Benchmarks - Best Streaming Throughput Mode

Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 11 1 A100 SXM4-40GB
citrinet n-gram 64 64 63 A100 SXM4-40GB
citrinet n-gram 128 108 126 A100 SXM4-40GB
citrinet n-gram 256 172 248 A100 SXM4-40GB
citrinet n-gram 384 240 367 A100 SXM4-40GB
citrinet n-gram 512 320 482 A100 SXM4-40GB
citrinet n-gram 768 484 703 A100 SXM4-40GB
conformer n-gram 1 16 1 A100 SXM4-40GB
conformer n-gram 64 97 63 A100 SXM4-40GB
conformer n-gram 128 140 126 A100 SXM4-40GB
conformer n-gram 256 230 247 A100 SXM4-40GB
conformer n-gram 384 330 365 A100 SXM4-40GB
conformer n-gram 512 470 478 A100 SXM4-40GB
A100 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 9.91 1 A100 SXM4-40GB
citrinet n-gram 8 14.48 8 A100 SXM4-40GB
citrinet n-gram 16 23 16 A100 SXM4-40GB
citrinet n-gram 32 35 32 A100 SXM4-40GB
citrinet n-gram 48 46 48 A100 SXM4-40GB
citrinet n-gram 64 55 63 A100 SXM4-40GB
conformer n-gram 1 13.92 1 A100 SXM4-40GB
conformer n-gram 8 26.19 8 A100 SXM4-40GB
conformer n-gram 16 37 16 A100 SXM4-40GB
conformer n-gram 32 52 32 A100 SXM4-40GB
conformer n-gram 48 62 48 A100 SXM4-40GB
conformer n-gram 64 76 63 A100 SXM4-40GB
A100 ASR Benchmarks - Offline Mode
Acoustic Model Language Model # of Streams Throughput (RTFX) GPU Version
citrinet n-gram 32 4000 A100 SXM4-40GB
conformer n-gram 32 1500 A100 SXM4-40GB

ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk

A40 ASR Benchmarks - Best Streaming Throughput Mode

Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
conformer n-gram 1 15 1 A40
conformer n-gram 64 130 63 A40
conformer n-gram 128 203 126 A40
conformer n-gram 256 354 247 A40
A40 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 10 1 A40
citrinet n-gram 8 17.44 8 A40
citrinet n-gram 16 28 16 A40
citrinet n-gram 32 42 32 A40
citrinet n-gram 48 53 48 A40
citrinet n-gram 64 69 63 A40
conformer n-gram 1 12.8 1 A40
conformer n-gram 8 26 8 A40
conformer n-gram 16 37 16 A40
conformer n-gram 32 60 32 A40
conformer n-gram 48 92.5 48 A40
A40 ASR Benchmarks - Offline Mode
Acoustic Model Language Model # of Streams Throughput (RTFX) GPU Version
conformer n-gram 32 1200 A40

ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk

A30 ASR Benchmarks - Best Streaming Throughput Mode

Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 15 1 A30
citrinet n-gram 64 106 63 A30
citrinet n-gram 128 150 125 A30
citrinet n-gram 256 274 245 A30
citrinet n-gram 384 420 359 A30
citrinet n-gram 512 620 467 A30
conformer n-gram 1 21 1 A30
conformer n-gram 64 140 63 A30
conformer n-gram 128 210 125 A30
conformer n-gram 256 374 243 A30
A30 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 13.84 1 A30
citrinet n-gram 8 23 8 A30
citrinet n-gram 16 40.5 16 A30
citrinet n-gram 32 60 32 A30
citrinet n-gram 48 64 48 A30
citrinet n-gram 64 80 63 A30
conformer n-gram 1 18.934 1 A30
conformer n-gram 8 40 8 A30
conformer n-gram 16 53 16 A30
conformer n-gram 32 66 32 A30
conformer n-gram 48 91 47 A30
A30 ASR Benchmarks - Offline Mode
Acoustic Model Language Model # of Streams Throughput (RTFX) GPU Version
citrinet n-gram 32 2500 A30
conformer n-gram 32 1020 A30

ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk

A10 ASR Benchmarks - Best Streaming Throughput Mode

Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 12.18 1 A10
citrinet n-gram 64 83 63 A10
citrinet n-gram 128 150 126 A10
citrinet n-gram 256 292 247 A10
citrinet n-gram 384 433 363 A10
citrinet n-gram 512 600 476 A10
conformer n-gram 1 15.66 1 A10
conformer n-gram 64 140 63 A10
conformer n-gram 128 240 125 A10
conformer n-gram 256 440 245 A10
A10 ASR Benchmarks - Best Streaming Latency Mode
Acoustic Model Language Model # of Streams Avg Latency (ms) Throughput (RTFX) GPU Version
citrinet n-gram 1 11.53 1 A10
citrinet n-gram 8 18 8 A10
citrinet n-gram 16 30 16 A10
citrinet n-gram 32 50 32 A10
citrinet n-gram 48 68 48 A10
citrinet n-gram 64 80 63 A10
conformer n-gram 1 13.988 1 A10
conformer n-gram 8 27 8 A10
conformer n-gram 16 39 16 A10
conformer n-gram 32 70 32 A10
conformer n-gram 48 102 47.445 A10
A10 ASR Benchmarks - Offline Mode
Acoustic Model Language Model # of Streams Throughput (RTFX) GPU Version
citrinet n-gram 32 2400 A10
conformer n-gram 32 920 A10

ASR Throughput (RTFX) - Number of seconds of audio processed per second | Riva version: v2.9.0 on H100, L40, T4, A40 and v.2.8.0 on other hardwares | ASR Dataset - Librispeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz | Best Streaming Throughput Mode = 800ms chunk, Best Streaming Latency Mode = 160ms chunk, Offline Mode = 1600ms chunk

Riva Benchmarks

H100 TTS Benchmarks

Model # of streams Avg Latency to first audio (sec) Avg Latency between audio chunks (sec) Throughput (RTFX) GPU Version
FastPitch+Hifi-GAN 1 20 2.67 150 H100 SXM5-80GB
FastPitch+Hifi-GAN 4 30 3.92 420 H100 SXM5-80GB
FastPitch+Hifi-GAN 6 50 4.6 450 H100 SXM5-80GB
FastPitch+Hifi-GAN 8 60 5.3 510 H100 SXM5-80GB
FastPitch+Hifi-GAN 10 68 5.6 530 H100 SXM5-80GB

TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz

L40 TTS Benchmarks

Model # of streams Avg Latency to first audio (sec) Avg Latency between audio chunks (sec) Throughput (RTFX) GPU Version
FastPitch+Hifi-GAN 1 20 2.54 160 NVIDIA L40
FastPitch+Hifi-GAN 4 40 4 350 NVIDIA L40
FastPitch+Hifi-GAN 6 60 5 400 NVIDIA L40
FastPitch+Hifi-GAN 8 80 5.5 400 NVIDIA L40
FastPitch+Hifi-GAN 10 90 6 430 NVIDIA L40

TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz

L4 TTS Benchmarks

Model # of streams Avg Latency to first audio (sec) Avg Latency between audio chunks (sec) Throughput (RTFX) GPU Version
FastPitch+Hifi-GAN 1 24 3.57 130 NVIDIA L4
FastPitch+Hifi-GAN 4 50 7 250 NVIDIA L4
FastPitch+Hifi-GAN 6 80 9.4 255 NVIDIA L4
FastPitch+Hifi-GAN 8 115 11 260 NVIDIA L4
FastPitch+Hifi-GAN 10 133 12.4 262 NVIDIA L4

TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz

A100 TTS Benchmarks

Model # of streams Avg Latency to first audio (sec) Avg Latency between audio chunks (sec) Throughput (RTFX) GPU Version
FastPitch+Hifi-GAN 1 20 3.14 140 A100 SXM4-40GB
FastPitch+Hifi-GAN 4 43 5 320 A100 SXM4-40GB
FastPitch+Hifi-GAN 6 60 6 360 A100 SXM4-40GB
FastPitch+Hifi-GAN 8 73 7.4 400 A100 SXM4-40GB
FastPitch+Hifi-GAN 10 80 8 420 A100 SXM4-40GB

TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz

A30 TTS Benchmarks

Model # of streams Avg Latency to first audio (sec) Avg Latency between audio chunks (sec) Throughput (RTFX) GPU Version
FastPitch+Hifi-GAN 1 24 4 120 A30
FastPitch+Hifi-GAN 4 50 7 250 A30
FastPitch+Hifi-GAN 6 84 7.6 270 A30
FastPitch+Hifi-GAN 8 103 8.5 300 A30
FastPitch+Hifi-GAN 10 120 9.03 310 A30

TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz

A10 TTS Benchmarks

Model # of streams Avg Latency to first audio (sec) Avg Latency between audio chunks (sec) Throughput (RTFX) GPU Version
FastPitch+Hifi-GAN 1 20 3.9 120 A10
FastPitch+Hifi-GAN 4 57 7.5 230 A10
FastPitch+Hifi-GAN 6 94 8.6 240 A10
FastPitch+Hifi-GAN 8 118 10 260 A10
FastPitch+Hifi-GAN 10 140 11 264 A10

TTS Throughput (RTFX) - Number of seconds of audio generated per second | Riva version: v2.8.0 | ASR Dataset - LJSpeech | Hardware: DGX H100 (1x H100 SXM5-80GB) with Platinum 8480@2.00GHz, GIGABYTE G482-Z54-00 (1x NVIDIA L40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA L4) with EPYC 7763@2.45GHz, DGX A100 (1x A100 SXM4-40GB) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A40) with EPYC 7763@2.45GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A30) with EPYC 7742@2.25GHz, GIGABYTE G482-Z52-00 (1x NVIDIA A10) with EPYC 7763@2.45GHz

View More Performance Data

Training to Convergence

Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.

Learn More

AI Inference

Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.

Learn More