Analyzing genomic data is computationally intensive. Time and cost are significant barriers to using genomics data for precision medicine. The NVIDIA Parabricks Genomics Analysis Toolkit breaks down those barriers, providing GPU-accelerated genomic analysis. Data that once took days to analyze can now be done in under an hour. Choose to run specific accelerated tools or full commonly used pipelines with outputs specific to your requirements.



NVIDIA Parabricks Free for COVID-19 Research

Full access to Parabricks is available to all organizations doing research on the novel coronavirus for 90 days.
Parabricks enables GPU-accelerated GATK and is separate from the fold at home initiative.


Before requesting access, please review the minimum requirements to install and run the NVIDIA Parabricks Software Suite.


Request Access


Standard Evaluation License

Get full access to Parabricks for one month with unlimited GPU compute for evaluation purposes.

Get Started

Purchase NVIDIA Parabricks

Contact a sales representative for pricing



Request Quote



Fast

Accelerate existing tools by orders of magnitude faster when you efficiently utilize resources on your system.

Equivalent

Accelerated software generates results equivalent to baseline tools, such as GATK best practices.

Scalable

Performance scales linearly with the number of GPU resources on the server or in the cloud.

Deterministic

On any GPU platform with any number of GPU resources, Parabricks generates the exact same results with every execution.

Configurable

All steps in the NVIDIA Parabricks pipeline are readily customizable, and new steps can be added effortlessly.



ANALYSIS PIPELINES

NVIDIA Parabricks’ Germline Pipeline is a GPU-accelerated germline variant calling (SNVs and Indels) pipeline that uses the exact same algorithms as the BWA-GATK4 germline variant analysis. By accelerating the existing CPU-only pipelines on GPUs, the NVIDIA Parabricks Germline Pipeline provides more than 40 times faster analysis for an individual sample. By processing the FASTQ input files, the system generates sorted, marked BAM/CRAM files and variant call files (VCF or gVCF). This pipeline reduces the time required to analyze a whole human genome from about 30 hours (BWA-GATK4) to 45 minutes on servers with 8 GPUs, while achieving the equivalent result as standard tools. The pipeline was built from the ground up for optimizing speed, accuracy, and cost by using the computational power of GPUs. NVIDIA Parabricks’ pipelines have been tested on Dell, HPE, IBM, and NVIDIA servers at Amazon Web Services, Google Cloud, and Microsoft Azure.

Steps in the Pipeline

  1. Alignment of Reads with Reference
  2. Coordinate Sorting
  3. Marking Duplicate BAM Entries
  4. Base Quality Score Calibration of the Sample
  5. Apply BQSR for the Sample
  6. Germline Variant Calling

Performance

NVIDIA Parabricks’ pipeline is generally 35-50 times faster than CPU only GATK4 solutions. Users can run a 30X Whole Genome Sequence Data in about 45 minutes. The renowned NA12878 sample at 43X coverage takes nearly 65 minutes to process with the NVIDIA Parabricks pipeline. These breakneck speeds are achieved using 8 V100 GPUs from NVIDIA. As GPUs become faster, so does the NVIDIA Parabricks pipeline.


Accuracy

NVIDIA Parabricks’ pipelines were designed with accuracy as the first goal. These pipelines match GATK4 best practices output 100% after marking duplicates (BAM). This makes the evaluation of NVIDIA Parabricks’ software, extremely convenient for the end-user. The concordance of the VCF file compared to GATK4 for the entire genome, including non-coding regions is greater than 99.99%. Additionally NVIDIA Parabricks’ pipelines are 100% reproducible and deterministic across any hardware configuration and across runs. Comprehensive benchmarking data can be found here.

Scalability

NVIDIA Parabricks’ pipelines have been engineered to obtain the maximum performance from computing hardware. Performance scales linearly with the number of GPUs, as shown below. Users can start with a small server for a small number of samples, or use full-scale GPU servers to meet their needs. NVIDIA Parabricks’ pipelines are guaranteed to use hardware in the most efficient way possible.


NVIDIA Parabricks’ Somatic Pipeline is a GPU-accelerated somatic short variants calling (SNVs and Indels) pipeline that uses the exact same algorithms as the BWA-GATK4 somatic pipeline for tumor-normal analysis. By accelerating the existing CPU-only pipelines on GPUs, the NVIDIA Parabricks Somatic Pipeline provides more than 50 times faster analysis for an individual sample. Aligned, sorted, and marked BAM/CRAM files and variant call files (VCF) are generated by processing FASTQ files for tumor and normal. The pipeline was built from the ground up for optimizing speed, accuracy, and cost by using the computational power of GPUs. This pipeline reduces the time required to analyze a whole human genome at 40X coverage for a tumor sample from about 30 hours (BWA-GATK4) to 30 minutes on servers with 8 GPUs, while achieving the equivalent result as standard tools. NVIDIA Parabricks’ pipelines have been tested on Dell, HPE, IBM and NVIDIA servers, and at Amazon Web Services, Google Cloud, and Microsoft Azure.

Steps in the Pipeline

  1. Alignment of Tumor and Normal Reads with Reference
  2. Coordinate Sorting of Tumor and Normal Reads
  3. Marking Duplicate BAM Entries Independently for Tumor and Normal Samples
  4. Base Quality Score Calibration Independently for Tumor and Normal Samples
  5. Apply BQSR for Tumor and Normal Samples Independently for Tumor and Normal Samples
  6. Combined Variant Calling for Tumor and Normal BAMs

Performance

NVIDIA Parabricks’ accelerated Somatic Variant Analysis Pipelines are more than 50 times faster than CPU-only pipelines by using the parallel computing power of GPUs. Our customers have tested 100X tumor and 40X normal samples through this pipeline in under 4 hours. This unprecedented level of speedup can help researchers, hospitals, and clinics sequence more deeply and to get the best variant calls in their pipelines. The chart below shows the execution time of tumor-only analysis for our pipeline compared to the BWA-GATK4 pipeline.


Accuracy

NVIDIA Parabricks’ somatic variant calling pipeline output is 100% equivalent for the generated BAM files after marking duplicates and has more than 99.99% concordance with the CPU-only BWA-GATK pipelines. This significant computing acceleration is achieved without sacrificing accuracy. By enabling higher coverage with NVIDIA Parabricks’ acceleration, which was not possible earlier, the NVIDIA Parabricks’ pipeline results in better variant calls.

NVIDIA Parabricks Variant Calling Accuracy

BAM VCF (F1 Scores)
NIST (41X) 1 0.99996
NA12878 (43X) 1 0.999975
Sample3 (41X) 1 0.99998
Sample2 (42X) 1 0.99998
Sample1 (26X) 1 0.99998

The NVIDIA Parabricks Copy Number Variation Pipeline provides rapid analysis of FASTQ files to infer copy number variants for tumor and normal samples. By efficiently processing FASTQ files, NVIDIA Parabricks generates sorted marked BAM/CRAM files and Variant Call Files (VCF). This pipeline reduces the time required to analyze copy number variants, based on whole human genomes, from about 20 hours (BWA-CNVKit) to minutes on servers with 8 GPUs, while achieving 100% equivalent results. NVIDIA Parabricks CNV was built from the ground up for optimizing speed, accuracy, and cost by using the computational power of GPUs and has been tested on Dell, HPE, IBM, and NVIDIA servers, and at Amazon Web Services, Google Cloud, and Microsoft Azure.

Analytical Steps

The NVIDIA Parabricks CNV Pipeline Includes:

  1. Alignment of Reads with Reference
  2. Coordinate Sorting
  3. Marking Duplicate BAM Entries
  4. Base Quality Score Calibration of the Sample
  5. Apply BQSR for the Sample
  6. Copy Number Variant Calling

Performance

The steps in this pipeline from alignment to ApplyBQSR are common across several NVIDIA Parabricks pipelines, and have been shown to be 20-30 times faster compared to a CPU-only solution. NVIDIA Parabricks built the CNVKit algorithm from the ground up for GPU execution and accelerated the process to finish in 4 minutes for 30X coverage of Whole Genome Sequence data. NVIDIA Parabricks calculates the germline, somatic, and copy number variants of a sample, resulting in a complete analysis.

Accuracy

The NVIDIA Parabricks CNV Caller generates results that 100% conform to standard CPU CNVKit output.

The NVIDIA Parabricks team is accelerating DeepVariant, the deep learning based variant caller from Google, which delivers superior accuracy in SNVs and Indels, compared to other industry standard germline solutions. While it is more accurate, DeepVariant takes significant computing time compared to other methods. You can learn more here.

Steps in the Pipeline

  1. Alignment (BWA-MEM)
  2. Sorting (Picard)
  3. Marking Duplicates (Picard)
  4. Variant Caller (DeepVariant)

ADDITIONAL RESOURCES

Learn more about accelerating genomic analysis from days to hours with NVIDIA Parabricks’ AI-based workflows.