TechTalk – Building Multi-dimensional Parallel Training Systems for Large AI Models

All members of the HKU community and the general public are welcome to join!
Speaker: Professor Heming Cui, Associate Professor, School of Computing and Data Science, HKU
Date: 24th June 2025 (Tuesday)
Time: 4:30pm

Mode: Mixed

About the TechTalk
All members of the HKU community and the general public are welcome to join!
Speaker: Professor Heming Cui, Associate Professor, School of Computing and Data Science, HKU
Moderator: Professor Chenshu Wu, Assistant Professor, School of Computing and Data Science, HKU
Date:  24th June 2025 (Tuesday)
Time: 4:30pm
Mode: Mixed (both face-to-face and online). Seats for on-site participants are limited. A confirmation email will be sent to participants who have successfully registered.
Language: English

The increasing modeling capacities of large DNNs (e.g., Transformer and GPT) have achieved unprecedented successes in various AI areas, including understanding vision and natural languages. The high modeling power a large DNN mainly stems from its increasing complexity (having more neuron layers and more neuron operators in each layer) and dynamicity (frequently activating/deactivating neuron operators in each layer during training, such as Neural Architecture Search, or NAS). Such complexity and dynamicity can easily make a large DNN exceed the computing and memory capacities of a modern GPU, so training a large DNN often needs to split the DNN into many GPUs via multiple dimensions, including data parallelism, tensor parallelism, and pipeline parallelism. Professor Cui’s talk will present his recent papers (e.g., [PipeMesh, in revision of a journal], [Fold3D TPDS 2023], [NASPipe ASPLOS 2022], and [vPipe TPDS 2021]), which address major limitations in existing multi-dimensional parallel training systems, including GPipe, Pipedream, and Megatron. For instance, vPipe focuses on addressing the severe load imbalance and low GPU computing utilization; NASPipe will present Supernet parallelism, a new parallel training dimension for highly dynamic large DNNs designed in the Supernet manners (e.g., Evolved Transformer and Neural Architecture Search). Fold3D is now the major thousands-GPU parallel training system on the world-renowned MindSpore AI framework.

Registration
Registration
  • The tech talk “Building Multi-dimensional Parallel Training Systems for Large AI Models” will be organized in the Tam Wing Fan Innovation Wing Two (G/F, Run Run Shaw Building, HKU) on 24th June 2025 (Tuesday), 4:30pm.
  • Seats are limited. Zoom broadcast is available if the seating quota is full. 
  • Registrants on the waiting list will be notified of the arrangement after the registration deadline (with seating/free-standing/other arrangement)
Recording of the Tech Talk
About the speaker

Professor Heming Cui

Professor Heming Cui (cs.hku.hk/people/academic-staff/heming) is an Associate Professor in HKU CS. Professor Cui is interested in building software infrastructures and tools to greatly improve the reliability, security and performance of real-world software. After gaining his PhD degree in the Columbia University in 2015, he joined HKU and independently built a parallel and distributed system group with about 20 ongoing, full-time PhD students. His recent research has led to a series of open source projects and publications in international top conferences and journals of broad areas, including SOSP, IEEE S&P, VLDB, TPDS, MICRO, NSDI, ASPLOS, ATC, IC
+SE, EuroSys. In recent three years, Professor Cui serves on the program committees for at least once of international top systems/networking conferences, including OSDI, SIGCOMM, ASPLOS, NSDI, ATC, and EuroSys. Professor Cui received the ACM ICSE 2025 best paper award and the ACM ACSAC 2017 best paper award. He serves as the general chair of ACM APSys 2016 and 2021, and the program chair of ACM ChinaSys 2023. As a research project leader (PI or PC) of 12 research grants at HKU, his research projects have received a total fundings of about HKD $30 million in the past 10 years, including an RGC Research Impact Fund (RIF) in 2023, five RGC GRF/ECS grants, HK’s Innovation & Technology Commission (ITF ITSP Platform in 2022), the Croucher Innovation Award (2016), and four Huawei major research grants about tech-transfer and licensing.Professor Cui’s recent secure system papers (e.g., [Uranus AsiaCCS 2020] and [Cronus MICRO 2022]), and their resultant Ubiquitous Trusted Execution Environments (UTEE) project, have become the core commercial system of Huawei’s Trusted and Intelligent Cloud Services (https://www.huaweicloud.com/product/tics.html) in 2021. Professor Cui’s parallel big AI model training systems (e.g., [Fold3D TPDS 2023], [NASPipe ASPLOS 2022], and [vPipe TPDS 2021]) are implemented on the PyTorch library and Nvidia’s GPUs with the support of general big AI models (e.g., Transformer, GPT, CPM, and Pan-Gu); Professor Cui’s Fold3D work is now the major thousands-GPU parallel training system on the world-renowned MindSpore AI framework. Professor Cui received his bachelor and master degrees from Tsinghua University, and PhD from Columbia University, all in Computer Science.

Promotion materials
About the project

Multifunctional Filters for Protecting Public Health

Clean water and clean air are vital for public health. This project focuses on developing high-efficiency and environmentally sustainable filters for removing harmful air/water pollutants. The team has developed novel architectures and functionalities for the filters to achieve high permeance, high removal efficiency, and excellent reusability.

Other Tech talks