NVIDIA 100 and V100S The World’s Most Powerful GPU

Custom Configure Your NVIDIA V100S Solution

Every organization wants the data insights that can spark the next technological revolution. From enabling autonomous vehicles to simulating global climate patterns, data scientists can transform industries when given the right tools.

The NVIDIA® V100 Tensor Core GPU is the world’s most powerful accelerator for deep learning, machine learning, high performance computing (HPC), and graphics. Powered by NVIDIA Volta™, a single V100 Tensor Core GPU offers the performance of nearly 32 CPUs—enabling researchers to tackle challenges that were once unsolvable. The V100 won MLPerf, the first industry-wide AI benchmark, validating itself as the world’s most powerful, scalable, and versatile computing platform.

Volta Tensor Core GPU Innovations

Tensor Core: The Compute Core Built for AI

Every industry needs AI, but not every data scientist has access to an AI supercomputer. Equipped with 640 Tensor Cores, Volta delivers up to 130 teraFLOPS of deep learning and machine learning performance, more than a five-fold leap compared to prior-generation NVIDIA Pascal™ architecture. A single Volta GPU outperforms nearly 60 CPUs in deep learning.

New GPU Architecture: World’s Most Powerful Accelerator Delivering Mixed Precision Computing

Humanity’s biggest challenges need the most powerful computing engine to accelerate both computational science and data science. The V100 Tensor Core GPU fuses CUDA Cores and Tensor Cores in a unified architecture to enable mixed-precision calculations using FP64 and FP32 for scientific computing and simulations and FP16 and INT8 for AI training, inference, and machine learning. This allows highly precise calculations for HPC and highly efficient processing for deep learning and machine learning.

Next-Generation NVLink: 10X Faster Than PCIe

NVLink, the revolutionary high-speed interconnect designed to scale applications across multiple GPUs, delivers 2X higher throughput compared to its previous generation. Up to eight V100 Tensor Core GPU accelerators can be interconnected at up to 300 GB/s to unleash the highest application performance possible on a single scale-up server.

32 GB HBM2: Unprecedented Computing Efficiency

Applications often spend more time and energy waiting for data than processing it. The V100 Tensor Core GPU tightly integrates compute and data on the same package with HBM2 technology to deliver unprecedented computational efficiency. It provides a huge leap in application performance by delivering a class-leading up to 1134 GB/s of memory bandwidth. And with 32 GB of memory, twice that of the prior generation, it accelerates time to solution by efficiently processing the largest of datasets while minimizing precious research cycles on memory optimizations.

Complete V100 Tensor Core GPU Specifications

SKU V100 PCIe V100 SXM2 V100S PCIe
GPU Architecture NVIDIA Volta
Cores 5120 CUDA Cores
640 Tensor Cores
Form factor PCI Gen3 NVIDIA NVLink PCI Gen3
Optimal Workload Single GPU (e.g. 1,2,3) Multi-GPU (e.g- 4 to 8) Single GPU (e.g. 1,2,3)
Double Precision Performance (teraFLOPS) 7 7.8 8.2
Single Precision Performance (teraFLOPS) 14 15.7 16.4
Tensor Performance (teraFLOPS) 112 125 130
GPU HBM2 Memory 32GB/16GB 32GB/16GB 32GB
Memory Bandwidth (GB/sec) 900 900 1134
Interconnect Bandwidth (GB/sec) 32 300 32
Power 250W 300W 250W
  configure configure configure

NVIDIA Data Center Platform

The NVIDIA Data Center Platform uses optimized CUDA and NVIDIA Deep Learning SDK libraries such as cuDNN, NCCL, NVIDIA TensorRT™, and RAPIDS to accelerate the most important industry frameworks and applications with the power of Volta. Combined with out-of-the-box deep learning models from NVIDIA NGC that are accelerated by Tensor Core capabilities, data scientists and researchers can accelerate breakthroughs and discoveries faster than ever before.


NGC is the hub for GPU-optimized software for deep learning, machine learning, and HPC. It’s designed specifically for data scientists, developers, and researchers. Advantages include:

  • Accelerate Time-to-Solution: NGC accelerates productivity with easy-to-deploy, optimized AI frameworks and HPC application containers.
  • Simplify AI Adoption: NGC lowers the barrier to AI adoption by alleviating the need for expertise, time, and compute resources with pre-trained models and workflows that have best-in-class accuracy and performance.
  • Run with NVIDIA GPUs Anywhere: Run software from NGC on-premises, in the cloud, and edge or using hybrid and multi-cloud deployments. NGC software can be deployed on bare metal servers or on virtualized environments.
  • Deploy NGC Software with Confidence: Enterprise-grade support for NGC Ready systems provides direct access to NVIDIA's experts, minimizing risk and maximizing system utilization and user productivity.

NGC software runs on a wide variety of NVIDIA GPU-accelerated platforms, including

  • NGC-Ready servers for edge and data center
  • NVIDIA DGX™ Systems
  • Workstations with NVIDIA TITAN and NVIDIA Quadro® GPUs
  • Virtualized environments with NVIDIA vComputeServer
  • Top cloud platforms.

NGC provides a range of options to accommodate various levels of AI expertise. Users can quickly deploy AI frameworks with containers, get a head start with pre-trained models or model training scripts, and use domain specific workflows and Helm charts for the fastest AI implementations.



The RAPIDS suite of open source software libraries and APIs enables end-to-end data science and analytics pipelines entirely on GPUs. Available on NGC and licensed under Apache 2.0, it uses NVIDIA CUDA primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. It includes a familiar data frame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar data frame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs.

  • Accelerated Data Science: The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs.
  • Scale Out On GPUs: Seamlessly scale from GPU workstations to multi-GPU servers and multi-node clusters with Dask.
  • Python Integration: Accelerate your Python data science tool chain with minimal code changes and no new tools to learn.
  • Top Model Accuracy: Increase machine learning model accuracy by iterating on models faster and deploying them more frequently.
  • Reduced Training Time: Drastically improve your productivity with more interactive data science.

RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

The RAPIDS suite of software libraries, running on an NVIDIA-powered data science workstation, enables data scientists to load, prepare, and visual massive data sets allowing them to quickly understand extract insights from their data.

NVIDIa v100 and v100s SOLUTION TODAY


NGC Deep Learning Technical Overview
Overview >
NVIDIA Tesla Enterprise Server Line Card
Line Card >