For HPC performance, please go here.


NVIDIA’s complete solution stack, from GPUs to libraries, and containers on NVIDIA GPU Cloud (NGC), allows data scientists to quickly get up and running with deep learning. NVIDIA® Tesla® V100 Tensor Core GPUs leverage mixed precision to accelerate deep learning training throughputs across every framework and every type of neural network. NVIDIA breaks performance records on MLPerf, the AI’s first industry-wide benchmark, a testament to our GPU-accelerated platform approach.

NVIDIA Performance on MLPerf 0.6 AI Benchmarks

ResNet-50 v1.5 Time to Solution on V100

MXNet | Batch Size refer to CNN V100 Training table below | Precision: Mixed | Dataset: ImageNet2012 | Convergence criteria - refer to MLPerf requirements

Training Image Classification on CNNs

ResNet-50 V1.5 Throughput on V100

DGX-1: 8x Tesla V100 32GB, E5-2698 v4 2.2 GHz | Batch Size = 256 for MXNet, PyTorch; 128 for TensorFlow | 19.05-py3 | Precision: Mixed | Dataset: ImageNet2012

ResNet-50 V1.5 Throughput on T4

Supermicro SYS-4029GP-TRT T4: 8x Tesla T4 16GB, Gold 6140 2.3 GHz | Batch Size = 208 for MXNet, 256 for PyTorch, 128 for TensorFlow | 19.05-py3 | Precision: Mixed | Dataset: ImageNet2012

Training Performance

NVIDIA Performance on MLPerf 0.6 AI Benchmarks

0
FrameworkNetworkNetwor TypeTime to Solution GPUServerMLPerf-IDPrecisionDatasetGPU Version
MXNetResNet-50 v1.5CNN115.22 minutes8x V100DGX-10.6-8MixedImageNet2012V100-SXM2-16GB
CNN57.87 minutes16x V100DGX-20.6-17MixedImageNet2012V100-SXM3-32GB
CNN52.74 minutes16x V100DGX-2H0.6-19MixedImageNet2012V100-SXM3-32GB-H
CNN2.59 minutes512x V100DGX-2H0.6-29MixedImageNet2012V100-SXM3-32GB-H
CNN1.69 minutes1040x V100DGX-10.6-16MixedImageNet2012V100-SXM2-16GB
CNN1.33 minutes1536x V100DGX-2H0.6-30MixedImageNet2012V100-SXM3-32GB-H
PyTorchSSD-ResNet-34CNN22.36 minutes8x V100DGX-10.6-9MixedCOCO2017V100-SXM2-16GB
CNN12.21 minutes16x V100DGX-20.6-18MixedCOCO2017V100-SXM3-32GB
CNN11.41 minutes16x V100DGX-2H0.6-20MixedCOCO2017V100-SXM3-32GB-H
CNN4.78 minutes64x V100DGX-2H0.6-21MixedCOCO2017V100-SXM3-32GB-H
CNN2.67 minutes240x V100DGX-10.6-13MixedCOCO2017V100-SXM2-16GB
CNN2.56 minutes240x V100DGX-2H0.6-24MixedCOCO2017V100-SXM3-32GB-H
CNN2.23 minutes240x V100DGX-2H0.6-27MixedCOCO2017V100-SXM3-32GB-H
Mask R-CNNCNN207.48 minutes8x V100DGX-10.6-9MixedCOCO2017V100-SXM2-16GB
CNN101 minutes16x V100DGX-20.6-18MixedCOCO2017V100-SXM3-32GB
CNN95.2 minutes16x V100DGX-2H0.6-20MixedCOCO2017V100-SXM3-32GB-H
CNN32.72 minutes64x V100DGX-2H0.6-21MixedCOCO2017V100-SXM3-32GB-H
CNN22.03 minutes192x V100DGX-10.6-12MixedCOCO2017V100-SXM2-16GB
CNN18.47 minutes192x V100DGX-2H0.6-23MixedCOCO2017V100-SXM3-32GB-H
PyTorchGNMTRNN20.55 minutes8x V100DGX-10.6-9MixedWMT16 English-GermanV100-SXM2-16GB
RNN10.94 minutes16x V100DGX-20.6-18MixedWMT16 English-GermanV100-SXM3-32GB
RNN9.87 minutes16x V100DGX-2H0.6-20MixedWMT16 English-GermanV100-SXM3-32GB-H
RNN2.12 minutes256x V100DGX-2H0.6-25MixedWMT16 English-GermanV100-SXM3-32GB-H
RNN1.99 minutes384x V100DGX-10.6-14MixedWMT16 English-GermanV100-SXM2-16GB
RNN1.8 minutes384x V100DGX-2H0.6-26MixedWMT16 English-GermanV100-SXM3-32GB-H
PyTorchTransformerAttention20.34 minutes8x V100DGX-10.6-9MixedWMT17 English-GermanV100-SXM2-16GB
Attention11.04 minutes16x V100DGX-20.6-18MixedWMT17 English-GermanV100-SXM3-32GB
Attention9.8 minutes16x V100DGX-2H0.6-20MixedWMT17 English-GermanV100-SXM3-32GB-H
Attention2.41 minutes160x V100DGX-2H0.6-22MixedWMT17 English-GermanV100-SXM3-32GB-H
Attention2.05 minutes480x V100DGX-10.6-15MixedWMT17 English-GermanV100-SXM2-16GB
Attention1.59 minutes480x V100DGX-2H0.6-28MixedWMT17 English-GermanV100-SXM3-32GB-H
TensorFlowMiniGoReinforcement Learning27.39 minutes8x V100DGX-10.6-10MixedN/AV100-SXM2-16GB
Reinforcement Learning13.57 minutes24x V100DGX-10.6-11MixedN/AV100-SXM2-16GB

V100 Training Performance

FrameworkNetworkNetwork TypeThroughput GPUServerContainerPrecisionBatch SizeDatasetGPU Version
MXNetInception V3CNN511 images/sec1x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN584 images/sec1x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4063 images/sec8x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN4520 images/sec8x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
ResNet-50CNN1409 images/sec1x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN1442 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN10380 images/sec8x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN10530 images/sec8x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
ResNet-50 v1.5CNN1419 images/sec1x V100DGX-119.05-py3Mixed208ImageNet2012V100-SXM2-16GB
CNN1580 images/sec1x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN9502 images/sec8x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN11056 images/sec8x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN11507 images/sec8x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
PyTorchInception V3CNN537 images/sec1x V100DGX-119.03-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN572 images/sec1x V100DGX-219.03-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN4156 images/sec8x V100DGX-119.03-py3Mixed256ImageNet2012V100-SXM2-32GB
Mask R-CNNCNN14 images/sec1x V100DGX-119.05-py3Mixed16COCO2014V100-SXM2-32GB
CNN17 images/sec1x V100DGX-2H19.05-py3Mixed16COCO2014V100-SXM3-32GB-H
CNN84 images/sec8x V100DGX-119.05-py3Mixed16COCO2014V100-SXM2-32GB
NCFCNN19839034 samples/sec1x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
CNN23398907 samples/sec1x V100DGX-2H19.05-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
CNN95524496 samples/sec8x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
CNN106267011 samples/sec8x V100DGX-2H19.05-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
ResNet-50CNN849 images/sec1x V100DGX-118.10-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN898 images/sec1x V100DGX-218.10-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN6675 images/sec8x V100DGX-118.10-py3Mixed256ImageNet2012V100-SXM2-32GB
ResNet-50 v1.5CNN855 images/sec1x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN920 images/sec1x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN6791 images/sec8x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
SSDCNN257 images/sec1x V100DGX-119.05-py3Mixed64COCO 2017V100-SXM2-32GB
CNN275 images/sec1x V100DGX-219.05-py3Mixed64COCO 2017V100-SXM3-32GB
CNN2017 images/sec8x V100DGX-119.05-py3Mixed64COCO 2017V100-SXM2-32GB
Tacotron2CNN2553 total input tokens/sec1x V100DGX-119.05-py3Mixed80LJ Speech 1.1V100-SXM2-16GB
CNN2745 total input tokens/sec1x V100DGX-219.05-py3Mixed80LJ Speech 1.1V100-SXM3-32GB
CNN3054 total input tokens/sec1x V100DGX-2H19.05-py3Mixed80LJ Speech 1.1V100-SXM3-32GB-H
CNN17185 total input tokens/sec8x V100DGX-119.04-py3Mixed80LJ Speech 1.1V100-SXM2-32GB
CNN18265 total input tokens/sec8x V100DGX-219.04-py3Mixed80LJ Speech 1.1V100-SXM3-32GB
WaveGlowCNN73120 output samples/sec1x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-32GB
CNN83975 output samples/sec1x V100DGX-219.05-py3Mixed8LJ Speech 1.1V100-SXM3-32GB
CNN533364 output samples/sec8x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-32GB
TensorFlowInception V3CNN537 images/sec1x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN572 images/sec1x V100DGX-219.05-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN4078 images/sec8x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
NCFCNN24393781 samples/sec1x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
CNN65260213 samples/sec8x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
ResNet-50CNN857 images/sec1x V100DGX-119.02-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN921 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN6704 images/sec8x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN7022 images/sec8x V100DGX-2H19.01-py3Mixed256ImageNet2012V100-SXM3-32GB-H
ResNet-50 v1.5CNN785 images/sec1x V100DGX-119.05-py3Mixed128ImageNet2012V100-SXM2-32GB
CNN846 images/sec1x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN6295 images/sec8x V100DGX-119.05-py3Mixed128ImageNet2012V100-SXM2-32GB
SSDCNN123 images/sec1x V100DGX-119.05-py3Mixed32COCO 2017V100-SXM2-32GB
CNN136 images/sec1x V100DGX-219.05-py3Mixed32COCO 2017V100-SXM3-32GB
CNN665 images/sec8x V100DGX-119.05-py3Mixed32COCO 2017V100-SXM2-32GB
CNN770 images/sec8x V100DGX-219.05-py3Mixed32COCO 2017V100-SXM3-32GB
U-Net IndustrialCNN100 images/sec1x V100DGX-119.05-py3Mixed16DAGM2007V100-SXM2-32GB
CNN107 images/sec1x V100DGX-219.05-py3Mixed16DAGM2007V100-SXM3-32GB
CNN517 images/sec8x V100DGX-119.05-py3Mixed2DAGM2007V100-SXM2-32GB
CNN546 images/sec8x V100DGX-219.05-py3Mixed2DAGM2007V100-SXM3-32GB
PyTorchGNMT V2RNN78347 total tokens/sec1x V100DGX-119.05-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
RNN83860 total tokens/sec1x V100DGX-219.05-py3Mixed128WMT16 English-GermanV100-SXM3-32GB
RNN598342 total tokens/sec8x V100DGX-119.05-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
TensorFlowGNMT V2RNN24899 total tokens/sec1x V100DGX-119.05-py3Mixed192WMT16 English-GermanV100-SXM2-16GB
RNN25733 total tokens/sec1x V100DGX-2H19.05-py3Mixed192WMT16 English-GermanV100-SXM3-32GB-H
RNN138216 total tokens/sec8x V100DGX-119.05-py3Mixed192WMT16 English-GermanV100-SXM2-32GB
TensorFlowBERTAttention29 sentences/sec1x V100DGX-119.05-py3Mixed43748SQuaD v1.1V100-SXM2-32GB
Attention32 sentences/sec1x V100DGX-219.05-py3Mixed43748SQuaD v1.1V100-SXM3-32GB
Attention138 sentences/sec8xV100DGX-119.05-py3Mixed43748SQuaD v1.1V100-SXM2-32GB

T4 Training Performance

FrameworkNetworkNetwork TypeThroughput GPUServerContainerPrecisionBatch SizeDatasetGPU Version
MXNetInception V3CNN172 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN1359 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
ResNet-50CNN425 images/sec1x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
CNN3329 images/sec8x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
ResNet-50 v1.5CNN447 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed208ImageNet2012Tesla T4
CNN4116 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed208ImageNet2012Tesla T4
PyTorchInception V3CNN164 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN1253 images/sec8x T4Supermicro SYS-4029GP-TRT T419.03-py3Mixed128ImageNet2012Tesla T4
Mask R-CNNCNN6 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed4ImageNet2012Tesla T4
CNN39 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed4ImageNet2012Tesla T4
NCFCNN7134482 samples/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
CNN25428142 samples/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
ResNet-50CNN252 images/sec1x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed256ImageNet2012Tesla T4
CNN2054 images/sec8x T4Supermicro SYS-4029GP-TRT T419.01-py3Mixed256ImageNet2012Tesla T4
ResNet-50 v1.5CNN282 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed256ImageNet2012Tesla T4
CNN2241 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed256ImageNet2012Tesla T4
SSDCNN86 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed64COCO 2017Tesla T4
CNN693 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed64COCO 2017Tesla T4
Tacotron2CNN1466 total input tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed80LJ Speech 1.1Tesla T4
CNN9390 total input tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed80LJ Speech 1.1Tesla T4
WaveGlowCNN33256 output samples/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed8LJ Speech 1.1Tesla T4
CNN250173 output samples/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed8LJ Speech 1.1Tesla T4
TensorFlowInception V3CNN181 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN1358 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
NCFCNN9573658 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
CNN19069773 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
ResNet-50CNN294 images/sec1x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
CNN2262 images/sec8x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
ResNet-50 v1.5CNN272 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN2151 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
SSDCNN52 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed32COCO 2017Tesla T4
CNN281 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed32COCO 2017Tesla T4
U-Net IndustrialCNN29 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed16DAGM2007Tesla T4
CNN196 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed2DAGM2007Tesla T4
PyTorchGNMT V2RNN25992 total tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128WMT16 English-GermanTesla T4
RNN184525 total tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128WMT16 English-GermanTesla T4
TensorFlowGNMT V2RNN9654 total tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed192WMT16 English-GermanTesla T4
RNN53570 total tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed192WMT16 English-GermanTesla T4
TensorFlowBERTAttention8 sentences/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed3SQuaD v1.1Tesla T4
31 sentences/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed3SQuaD v1.1Tesla T4

 

NVIDIA® TensorRT™ running on NVIDIA GPUs enable the most efficient deep learning inference performance across multiple application areas and models. This versatility provides wide latitude to data scientists to create the optimal low-latency solution. Visit NVIDIA GPU Cloud (NGC) to download any of these containers.

NVIDIA Tesla® V100 Tensor Cores GPUs leverage mixed-precision to combine high throughput with low latencies across every type of neural network. Tesla P4 is an inference GPU, designed for optimal power consumption and latency, for ultra-efficient scale-out servers. Read the inference whitepaper to learn more about NVIDIA’s inference platform.

Measuring the inference performance involves balancing a lot of variables. PLASTER is an acronym that describes the key elements for measuring deep learning performance. Each letter identifies a factor (Programmability, Latency, Accuracy, Size of Model, Throughput, Energy Efficiency, Rate of Learning) that must be considered to arrive at the right set of tradeoffs and to produce a successful deep learning implementation. Refer to NVIDIA’s PLASTER whitepaper for more details.

Inference Image Classification on CNNs with TensorRT

ResNet-50 Throughput

DGX-1: 1x Tesla V100-SXM2-16GB, E5-2698 v4 2.2 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py2 | Precision: Mixed | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic

 
 

ResNet-50 Latency

DGX-1: 1x Tesla V100-SXM2-16GB, Platinum 8168 2.7 GHz | TensorRT 5.1 | Batch Size = 1 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 1 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic

 
 

ResNet-50 Power Efficiency

DGX-1: 1x Tesla V100-SXM2-16GB, E5-2698 v4 2.2 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py3 | Precision: Mixed | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic

 

Inference Performance

V100 Inference Performance

NetworkNetwork
Type
Batch
Size
Throughput Efficiency LatencyGPUServerContainerPrecisionDatasetGPU
Version
GoogleNetCNN11579 images/sec14 images/sec/watt0.631x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
1686 images/sec12 images/sec/watt0.591x V100DGX-219.05-py3INT8SyntheticV100-SXM3-32GB
CNN22129 images/sec18 images/sec/watt0.941x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN85140 images/sec35 images/sec/watt1.61x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN8211805 images/sec44 images/sec/watt71x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN12812345 images/sec45 images/sec/watt101x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
MobileNet V2CNN11737 images/sec18 images/sec/watt0.581x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN22886 images/sec30 images/sec/watt0.691x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN88848 images/sec63 images/sec/watt0.91x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN3217649 images/sec77 images/sec/watt1.81x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN12823262 images/sec81 images/sec/watt5.51x V100DGX-1-INT8SyntheticV100-SXM2-16GB
ResNet-50CNN11118 images/sec8.2 images/sec/watt0.891x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
1176 images/sec7.1 images/sec/watt0.851x V100DGX-219.05-py3INT8SyntheticV100-SXM3-32GB
CNN21551 images/sec11 images/sec/watt1.31x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN83308 images/sec21 images/sec/watt2.41x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN395821 images/sec22 images/sec/watt6.71x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN1287636 images/sec27 images/sec/watt171x V100DGX-219.05-py3MixedSyntheticV100-SXM2-16GB
ResNet-50v1.5CNN1934 images/sec7.1 images/sec/watt1.11x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN21396 images/sec9.8 images/sec/watt1.41x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN83239 images/sec20 images/sec/watt2.51x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN1287309 images/sec25 images/sec/watt181x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
VGG16CNN1724 images/sec3.8 images/sec/watt1.41x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN21133 images/sec5.5 images/sec/watt1.81x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN82044 images/sec8 images/sec/watt3.91x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN1282855 images/sec9.9 images/sec/watt451x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
NMTRNN14013 total tokens/sec tokens/sec/watt131x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN26290 total tokens/sec tokens/sec/watt161x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN6456531 total tokens/sec tokens/sec/watt581x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN12873375 total tokens/sec tokens/sec/watt891x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
Deep RecommenderRecommender15541 images/sec48 images/sec/watt0.181x V100DGX-1-MixedSyntheticV100-SXM2-16GB
Deep RecommenderRecommender210066 images/sec88 images/sec/watt0.21x V100DGX-1-MixedSyntheticV100-SXM2-16GB
Deep RecommenderRecommender64282209 images/sec1874 images/sec/watt0.231x V100DGX-1-MixedSyntheticV100-SXM2-16GB
Deep RecommenderRecommender128393347 images/sec2461 images/sec/watt0.331x V100DGX-1-MixedSyntheticV100-SXM2-16GB
NCFRecommender121824 images/sec310 images/sec/watt0.051x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
Recommender641335854 images/sec16998 images/sec/watt0.051x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
Recommender2500091680666 images/sec730549 images/sec/watt0.271x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
Recommender100000114022270 images/sec694185 images/sec/watt0.881x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB

TensorRT 5.1, except TensorRT 5.0 for MobileNet V2 and Deep Recommender

 

T4 Inference Performance

NetworkNetwork
Type
Batch
Size
Throughput Efficiency LatencyGPUServerContainerPrecisionDatasetGPU
Version
GoogleNetCNN11687 images/sec25 images/sec/watt0.591x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN22356 images/sec35 images/sec/watt0.851x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN85458 images/sec78 images/sec/watt1.51x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN527335 images/sec105 images/sec/watt7.11x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN1287516 images/sec108 images/sec/watt171x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
MobileNet V2CNN11766 images/sec34 images/sec/watt0.571x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN23235 images/sec55 images/sec/watt0.621x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN87251 images/sec106 images/sec/watt1.11x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN328869 images/sec129 images/sec/watt3.61x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN1289059 images/sec131 images/sec/watt141x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
ResNet-50CNN11065 images/sec15 images/sec/watt0.941x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN21725 images/sec25 images/sec/watt1.21x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN83778 images/sec55 images/sec/watt2.11x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN334723 images/sec68 images/sec/watt71x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN1285106 images/sec74 images/sec/watt251x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
VGG16CNN1407 images/sec5.8 images/sec/watt2.51x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
CNN2656 images/sec9.5 images/sec/watt3.11x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
CNN81379 images/sec20 images/sec/watt5.81x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
CNN321598 images/sec23 images/sec/watt201x T4Supermicro SYS-4029GP-TRT T419.02-py3INT8SyntheticTesla T4
CNN1281895 images/sec27 images/sec/watt681x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
NMTRNN12511 total tokens/sec38 tokens/sec/watt201x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
RNN23768 total tokens/sec58 tokens/sec/watt271x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
RNN89975 total tokens/sec160 tokens/sec/watt411x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
RNN12834124 total tokens/sec647 tokens/sec/watt1921x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
Deep RecommenderRecommender13252 images/sec47 images/sec/watt0.311x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
Recommender26417 images/sec94 images/sec/watt0.311x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
Recommender823650 images/sec344 images/sec/watt0.341x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
Recommender128219511 images/sec3186 images/sec/watt0.581x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
NCFRecommender17623 images/sec274 images/sec/watt0.141x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4
Recommender64488930 images/sec16539 images/sec/watt0.141x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4
Recommender2500050381198 images/sec726308 images/sec/watt0.51x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4
Recommender10000053374338 images/sec726308 images/sec/watt0.51x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4

TensorRT 5.1, except TensorRT 5.0 for MobileNet V2, NMT and Deep Recommender

 

Last updated: June 19th, 2019