MindPipe: High-performance and Carbon-efficient Four-dimensional Parallel Training System for Large AI Models

Principal Investigator: Dr. Heming CUI (Associate Professor from Department of Computer Science)

This project is showcased in the second exhibition – Digitization in Innovation Wing Two

About the scholar

Dr. Heming CUI

Research interests:
• Distributed AI training/serving systems
• Blockchain systems
• Cloud computing systems
Email: heming@cs.hku.hk
Website: https://www.cs.hku.hk/people/academic-staff/heming 

Project information

AI Models are Rising to Unprecedented Complexity
Large deep neural network (DNN) models with rising complexity and modeling capacity have achieved unprecedented success in various digitization areas, including natural languages, vision, and audio. However, training a large DNN model can easily exceed a single GPU’s capacity.
– The limited GPU memory cannot hold the huge number of model parameters.
– The limited computing power cannot finish training in a reasonable time.

MindPipe – 4D Parallel Training System
MindPipe, the first 4D parallel training system for large DNN models, has the following objectives:

  1. Greatly reducing load imbalance in GPU pipeline parallel stages. [vPipe IEEE TPDS 2021]
  2. Effectively resolving contention of the 3D parallel communication tasks.
  3. Deterministically scheduling multiple subnets to be trained in supernet parallelism, a novel parallel dimension proposed by MindPipe. [NASPipe ACM ASPLOS 2022]
  4. Automatic near-optimal 4D configuration of GPUs considering both DNN converging efficiency and GPU utilization.
Project video
Project images
AI Development Workflow, where model training could be the most time consuming step.
The rapidly growing sizes of DNN models in recent years.
Distributed parallelization techniques for training a 2-layer model on 2 devices. Training systems are of critical importance to leverage these techniques to boost DNN training efficiency.
MindPipe is compatible with PyTorch, the leading framework for AI research, and MindSpore, Huawei's new cutting-edge training system for big AI models.
Enquiry / Feedback

Please feel free to give your enquiry / feedbacks to the research team by filling the form (https://forms.gle/JV59N47nTj19ndYz6). Thank you!