A quick tutorial on distributed data parallel training on PyTorch with multiple GPUs to let beginners start training in just a few minutes.

DP vs. DDP
We know that PyTorch itself provides two implementations for multi-GPU training.
DataParallel (DP): Parameter Server mode, one card bit reducer, and super simple to implement, one line of code.
DistributedDataParallel (DDP): All-Reduce mode, intended for distributed training, but can also be used for training on a single node with multi cards.
DataParallel is an algorithm based on Parameter Server algorithm, which is relatively simple to implement by adding one line to the original standalone single card code:
model = nn.DataParallel