top of page

Quick Tutorial on Distributed Data Parallel Training on PyTorch with Multi GPUs

A quick tutorial on distributed data parallel training on PyTorch with multiple GPUs to let beginners start training in just a few minutes.


quick tutorial on distributed data parallel training on pytorch on cloud hpc aws google cloud azure

DP vs. DDP


We know that PyTorch itself provides two implementations for multi-GPU training.


  • DataParallel (DP): Parameter Server mode, one card bit reducer, and super simple to implement, one line of code.

  • DistributedDataParallel (DDP): All-Reduce mode, intended for distributed training, but can also be used for training on a single node with multi cards.


DataParallel is an algorithm based on Parameter Server algorithm, which is relatively simple to implement by adding one line to the original standalone single card code:


model = nn.DataParallel