AMD Instinct MI100 GPU

Unleash Intelligence Everywhere

CUSTOM CONFIGURE YOUR
AMD InstinctTM MI100 GPU SYSTEM

TAAcompliant

all solutions are
taa compliant

THE AMD INSTINCTTM MI100 GPU SERVER PLATFORMS ARE ALSO AVAILABLE ON OUR GSA IT SCHEDULE 70, NASA SEWP V, AND NITAAC CIO-CS CONTRACTS.

AMD Instinct™ MI100 GPU

The AMD Instinct™ MI100 GPU accelerator is the world’s fastest HPC GPU, engineered from the ground up for the new era of computing.1 Powered by the first AMD CDNA architecture, the MI100 accelerators deliver a giant leap in compute and interconnect performance, offering a nearly 3.5x boost for HPC (FP32) matrix performance and a nearly 7x boost for AI throughput (FP16) performance compared to AMD’s prior generation accelerators.

AMD Instinct™ MI100 GPU Features and Benefits

  • All-new CDNA architecture delivering 2x the density to accelerate HPC & AI workloads compared to AMD prior gen9
  • All-new CDNA architecture delivering as much as 75% higher FP64 performance to accelerate HPC workloads compared to AMD prior gen10
All-new Matrix Core technology
  • All-new Matrix Core technology delivering a giant leap in AI matrix performance with FP32 and FP16, introducing new bFloat16 operations
  • FP32 Matrix Core for a nearly 3.5x boost for HPC & AI workloads vs AMD prior gen2
  • FP16 Matrix Core for a nearly 7x boost running AI workloads vs AMD prior gen2
  • Support for newer ML operators like bfloat16
  • Enables working with large models and enhances memory bound operation performance
  • Superior performance for full range of mixed precision operations
2nd Generation Infinity Architecture
  • 2nd Generation Infinity Architecture bringing advanced platform connectivity and scalability
  • 3 Infinity Fabric™ links for full quad GPU hive connectivity
  • Improved GPU-to-GPU connectivity and scalability
  • Providing fully connected quad GPU hives with up to 552 GB/s peak peer-to-peer theoretical I/O bandwidth7
  • AMD Infinity Fabric™ technology providing ~2x the peak peer-to-peer I/O bandwidth over PCIe® 4.0 for fast data sharing in quad GPU hive7
  • AMD Infinity Fabric™ technology providing up to 37% more GPU peak peer-to-peer (P2P) aggregate theoretical I/O bandwidth performance over prior Gen enabling faster data sharing11
Enhanced HBM2 memory interface
  • Enhanced HBM2 memory interface with ~20% more theoretical memory throughput and improved latencies vs. AMD’s prior gen accelerators8
Enhanced RAS and security capabilities
  • Helping to keep your data safe and secure
  • Enterprise-grade RAS for consistent performance & uptime while offering remote manageability capabilities
  • ROCm™ 4.0 ecosystem: open, flexible and portable
  • Code once, use it everywhere
  • Designed to maximize developer productivity
  • Open stack for developer innovation

AMD INSTINCT™ MI100 GPU KEY FEATURES

PERFORMANCE:

  • Compute Units: 120
  • Stream Processors: 7,680
  • Peak BFLOAT16: Up to 92.3 TFLOPS
  • Peak INT4 | INT8: Up to 184.6 TOPS
  • Peak FP16: Up to 184.6 TFLOPS
  • Peak FP32 Matrix: Up to 46.1 TFLOPS
  • Peak FP32: Up to 23.1 TFLOPS
  • Peak FP64: Up to 11.5 TFLOPS
  • Bus Interface: PCIe® Gen 3 and Gen 4 Support

MEMORY:

  • Memory Size: 32GB HBM2
  • Memory Interface: 4,096Bits
  • Memory Clock: 1.2 GHz
  • Memory Bandwidth: Up to 1.2 TB/s

RELIABILITY:

  • ECC (Full-chip): Yes
  • RAS Support: Yes

SCALABILITY:

  • Infinity Fabric™ Links: 3
  • OS Support: Linux® 64-bit
  • AMD ROCm™ Compatible: Yes

BOARD DESIGN:

  • Board Form Factor: Full-Height, Dual Slot
  • Length: 10.5” Long
  • Thermal: Passively Cooled
  • Max Power: 300W TDP

CONTACT US TO PURCHASE YOUR CUSTOMIZED
AMD InstinctTM MI100 GPU SOLUTION TODAY

End Notes
  • 1.    Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak double precision (FP64), 46.1 TFLOPS peak single precision matrix (FP32), 23.1 TFLOPS peak single precision (FP32), 184.6 TFLOPS peak half precision (FP16) peak theoretical, floating-point performance. Published results on the NVidia Ampere A100 (40GB) GPU accelerator resulted in 9.7 TFLOPS peak double precision (FP64). 19.5 TFLOPS peak single precision (FP32), 78 TFLOPS peak half precision (FP16) theoretical, floating-point performance. Server manufacturers may vary configuration offerings yielding different results. MI100-03

  • 2.    Calculations performed by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 accelerator at 1,502 MHz peak boost engine clock resulted in 184.57 TFLOPS peak theoretical half precision (FP16) and 46.14 TFLOPS peak theoretical single precision (FP32) matrix floating-point performance. The results calculated for Radeon Instinct™ MI50 GPU at 1,725 MHz peak engine clock resulted in 26.5 TFLOPS peak theoretical half precision (FP16) and 13.25 TFLOPS peak theoretical single precision (FP32) matrix floating-point performance. Server manufacturers may vary configuration offerings yielding different results. MI100-04

  • 3.    Works with PCIe® Gen 4.0 and Gen 3.0 compliant motherboards. Performance may vary from motherboard to motherboard. Refer to system or motherboard provider for individual product performance and features.

  • 4.   ECC support on AMD Instinct™ GPU cards, based on the “AMD CDNA” technology includes full-chip ECC including HBM2 memory and internal GPU structures.

  • 5.    Expanded RAS (Reliability, availability and serviceability) attributes have been added to the AMDs Instinct™ “AMD CDNA” technology-based GPU cards and their supporting ecosystem including software, firmware and system level features. AMD’s remote manageability capabilities using advanced out-of-band circuitry allow for easier GPU monitoring via I2C, regardless of the GPU state. For full system RAS capabilities, refer to the system manufacturer’s guidelines for specific system models.

  • 6.    The AMD Instinct™ accelerator products come with a three-year limited warranty. Please visit www.AMD.com/warranty page for warranty details on the specific graphics products purchased. Toll-free phone service available in the U.S. and Canada only, email access is global.

  • 7.    Calculations as of SEP 18th, 2020. AMD Instinct™ MI100 built on AMD CDNA technology accelerators supporting PCIe® Gen4 providing up to 64 GB/s peak theoretical transport data bandwidth from CPU to GPU per card. AMD Instinct™ MI100 accelerators include three Infinity Fabric™ links providing up to 276 GB/s peak theoretical GPU to GPU or Peer-to-Peer (P2P) transport rate bandwidth performance per GPU card. Combined with PCIe Gen4 support providing an aggregate GPU card I/O peak bandwidth of up to 340 GB/s. MI100s have three links: 92 GB/s * 3 links per GPU = 276 GB/s. Four GPU hives provide up to 552 GB/s peak theoretical P2P performance. Dual 4 GPU hives in a server provide up to 1.1 TB/s total peak theoretical direct P2P performance per server. AMD Infinity Fabric link technology not enabled: Four GPU hives provide up to 256 GB/s peak theoretical P2P performance with PCIe® 4.0. Server manufacturers may vary configuration offerings yielding different results. MI100-07

  • 8.    Calculations by AMD Performance Labs as of Oct 5th, 2020 for the AMD Instinct™ MI100 accelerator designed with AMD CDNA 7nm FinFET process technology at 1,200 MHz peak memory clock resulted in 1.2288 TFLOPS peak theoretical memory bandwidth performance. The results calculated for Radeon Instinct™ MI50 GPU designed with “Vega” 7nm FinFET process technology with 1,000 MHz peak memory clock resulted in 1.024 TFLOPS peak theoretical memory bandwidth performance. CDNA-04

  • 9.    The AMD Instinct™ MI100 accelerator has 120 compute units (CUs) and 7,680 stream cores at 300W. The Radeon Instinct™ MI50 GPU has 60 CUs and 3,840 stream cores at 300W. CDNA-02

  • 10.    Calculations by AMD Performance Labs as of OCT 5th, 2020 for the AMD Instinct™ MI100 accelerator designed with AMD CDNA 7nm FinFET process technology at 1,502 MHz peak boost engine clock resulted in 11.535 TFLOPS peak theoretical double precision (FP64) floating-point performance @ 300W. The results calculated for Radeon Instinct™ MI50 GPU designed with “Vega” 7nm FinFET process technology with 1,725 MHz peak engine clock resulted in 6.62 TFLOPS peak theoretical double precision (FP64) floating-point performance @ 300W. Performance efficiency is TFLOPS/watts. Server manufacturers may vary configuration offerings yielding different results. MI100-10

  • 11.    Calculations as of SEP 18th, 2020. AMD Instinct™ MI100 accelerators support PCIe® Gen4 providing up to 64 GB/s peak theoretical transport data bandwidth from CPU to GPU per card. AMD Instinct™ MI100 accelerators include three Infinity Fabric™ links providing up to 276 GB/s peak theoretical GPU to GPU or Peer-to-Peer (P2P) transport rate bandwidth performance per GPU card. Combined with PCIe Gen4 support, this provides an aggregate GPU card I/O peak bandwidth of up to 340 GB/s. Server manufacturers may vary configuration offerings yielding different results. MI100-06