Computer Science – Innovation Wing

TechTalk – Building Multi-dimensional Parallel Training Systems for Large AI Models

Innovation Wing Two, TechTalk / By Zhu Wanyi

June 24, 2025 (Tuesday) 4:30-5:30pm
The increasing modeling capacities of large DNNs (e.g., Transformer and GPT) have achieved unprecedented successes in various AI areas, including understanding vision and natural languages. The high modeling power a large DNN mainly stems from its increasing complexity (having more neuron layers and more neuron operators in each layer) and dynamicity (frequently activating/deactivating neuron operators in each layer during training, such as Neural Architecture Search, or NAS). Dr. Cui’s talk will present his recent papers (e.g., [PipeMesh, in revision of a journal], [Fold3D TPDS 2023], [NASPipe ASPLOS 2022], and [vPipe TPDS 2021]), which address major limitations in existing multi-dimensional parallel training systems, including GPipe, Pipedream, and Megatron. Fold3D is now the major thousands-GPU parallel training system on the world-renowned MindSpore AI framework.

TechTalk – Speeding Up Fairness: The Science of Fast Convergence in Markets and Graphs

Innovation Wing Two, TechTalk / By Zhu Wanyi

April 3, 2025 (Thursday) 4:30-5:30pm
Achieving fairness in resource allocation can be modeled as a graph-based optimization problem, with many efficient algorithms available. This talk explores the connection between market equilibrium and graph density decomposition, showing how fast convergence can be achieved in large-scale systems. We present a unified framework linking hypergraph density decomposition and Fisher market equilibrium through locally verifiable optimality conditions. This symmetry allows repurposing algorithms between domains, significantly accelerating convergence.
We focus on iterative gradient-based methods, including the iterative proportional response process and its momentum-enhanced extensions. Our novel exponential momentum approach refines traditional techniques, delivering near-optimal solutions in distributed settings. Empirical results show these methods outperform existing algorithms, achieving speedups by several orders of magnitude in large-scale graphs.
By integrating graph theory, market dynamics, and optimization, this talk offers new insights into efficient computation to achieve fairness in networked systems. These methods deepen our understanding of algorithmic principles and open new applications in algorithm design, social networks, and economic modeling.

TechTalk – AI-Assisted Community Legal Information Access

Innovation Wing Two, TechTalk / By Zhu Wanyi

February 20, 2025 (Thursday) 4:30-5:30pm
In the contemporary era, legal information, encompassing court judgments and legislation, is generally accessible online in numerous countries. However, the online availability of this information does not necessarily equate to effective public access to legal knowledge. It presents significant challenges for individuals without legal expertise to acquire legal knowledge due to two primary reasons. Firstly, the online content predominantly consists of primary legal sources, such as cases and statutes, which are typically written in formal legal terminology that can be challenging for the general public to comprehend. Secondly, the public may lack knowledge of the applicable legal principles in their specific legal situations. Given the vast number of documents, it becomes arduous for users to identify the relevant legal sources when seeking solutions to their legal challenges. In this presentation, we will demonstrate several AI tools that we have integrated into our online legal information platforms, specifically HKLII and CLIC. We will elucidate how these tools facilitate enhanced public access to legal information.

TechTalk – Optimizing Distributed Large Model Training in AI Clouds

Innovation Wing Two, TechTalk / By Zhu Wanyi

December 12, 2024 (Thursday) 4:30-5:30pm
Distributed training using a large number of devices has been widely adopted for learning large deep learning models. Improving distributed training efficiency is critical for time, resource and energy consumption of large model learning. In this talk, I will introduce recent research works in my group on optimizing distributed training parallelisms for effective training acceleration and maximal resource utilization. Especially, we have designed optimized strategies and systems for operator sharding, computation and communication scheduling for SPMD parallelism (e.g., in Mixture-of-Experts model training) in both homogeneous and heterogeneous AI clusters, as well as dynamic micro-batching and pipelining to tackle sequence length variation in multi-task model training (e.g., Large Language Model training).

TechTalk – Medical Image Representation Learning via Cross-supervision between Images and Text Reports

Innovation Wing Two, TechTalk / By Zhu Wanyi

December 5, 2024 (Thursday) 4:30-5:30pm
Pre-training lays the foundation for recent successes in radiograph analysis supported by deep learning. It learns transferable image representations by conducting large-scale fully- or self-supervised learning on a source domain; however, supervised pre-training requires a complex and labour-intensive two-stage human-assisted annotation process, whereas self-supervised learning cannot compete with the supervised paradigm. To tackle these issues, we propose a cross-supervised methodology called reviewing free-text reports for supervision (REFERS), which acquires free supervision signals from the original radiology reports accompanying the radiographs. The proposed approach employs a vision transformer and is designed to learn joint representations from multiple views within every patient study. REFERS outperforms its transfer learning and self-supervised learning counterparts on four well-known X-ray datasets under extremely limited supervision. Moreover, REFERS even surpasses methods based on a source domain of radiographs with human-assisted structured labels; it therefore has the potential to replace canonical pre-training methodologies.

Young Scholar TechTalk – GRAINS: Proximity Sensing of Objects in Granular Materials

Innovation Wing Two, TechTalk / By clarecmy

October 17, 2023 (Tuesday) 4:30-5:30pm
Proximity sensing is a method of detecting the presence of objects without making physical contact. However, this concept has not been widely explored in the context of granular materials, which are materials composed of small particles like sand or gravel. This is because granular materials have complex properties and the sensing needs to work without the aid of vision. In this presentation, I will introduce a system called GRAINS (Granular Material-Embedded Autonomous Proximity Sensing). GRAINS is designed to sense objects buried within granular materials by utilizing fundamental principles related to the behavior of granules, such as how they flow like a fluid, how they can become jammed. GRAINS uses force signals to determine the proximity of buried objects. It achieves this by analyzing force anomalies that occur when granules become jammed due to their proximity to objects. These force anomalies are learned in real-time by the system using a mathematical technique called Gaussian process regression. To capture these patterns, a probe is moved along a spiral trajectory within the granular material. The results of our experiments demonstrate that GRAINS can adaptively adjust its parameters to effectively work with different types of granules. It can perceive objects in the nearby vicinity, approximately 0.5 to 7 cm ahead, without the need for direct contact with the buried obstacles.
(project page: https://sites.google.com/view/grains2/home)

Young Scholar TechTalk – Secure and High-performance AI Serving: Protecting AI Secretes, Accelerating AI Insights

Innovation Wing Two, TechTalk / By clarecmy

September 19, 2023 (Tuesday) 4:30-5:30pm
Driven by the remarkable success of artificial intelligence (AI) and edge computing, the deployment of well-trained private AI models on third-party edge devices for mission-critical applications has become increasingly prevalent. Safeguarding these private models on untrusted devices, while simultaneously speeding up model serving (i.e., inference) through accelerators like GPUs, has escalated in urgency.
We introduce SOTER, a new AI serving system that, for the first time, achieves both high security and high performance. Harnessing the associativity property of AI operators, SOTER presents an innovative approach—transforming computationally expensive AI operators into parameter-morphed equivalents for secure execution on untrusted but fast GPUs, and losslessly restoring inference results within trusted execution environments (TEEs) in CPUs. Experimental results on six prevalent AI models in the three most popular categories show that, even with stronger model protection, SOTER achieves comparable performance with baselines while retaining the same high accuracy as insecure AI model inference.

Young Scholar TechTalk – HOF2 – Interact with Device through Simple and Robust Hand-Over-Face Gesture

Innovation Wing Two, TechTalk / By clarecmy

Mobile devices have been like an extended part of ourselves, but can we really operate a mobile device just as naturally as how we control our fingers or body? We present HOF2, a novel input modality that uses simple gestures over your face to interact with your device. Unlike other gesture-based modalities, HOF2 is highly robust and can avoid false triggering caused by many unconscious gestures like scratching or wiping, while is still easy, comfortable and natural to use. Moreover, HOF2 is highly available and can be implemented on any mobile phone/tablet/computer with a single camera and without remote servers. In this TechTalk, we will present a live demo on iOS/iPadOS demonstrating the performance of HOF2 scheme in practice and explore some real-life use cases such as virtual conferencing, selfie, or TV controller. We believe there are far more possibilities waiting to be explored with this novel interaction scheme.

TechTalk – Learning Optimal Auctions from Data

Innovation Wing Two, TechTalk / By clarecmy

The design of optimal auctions for revenue maximization is a central topic in Economics. Classical optimal auction theory assumes that bidders’ values are drawn from a known distribution. In reality, the source of such prior information is really past data. Cole and Roughgarden (2014) modeled past data as i.i.d. samples from the value distribution and asked: How many samples are sufficient/necessary to learn a near optimal auction? This TechTalk will introduce a unified theory that yields sample-efficient algorithms with optimal sample complexity for auctions with homogeneous goods, and state-of-the-art sample complexity for auctions with heterogeneous goods. Unlike conventional statistical learning theory which focuses on the complexity of hypothesis classes, our new theory relies on the simplicity of data distributions and a monotonicity property of these problems.

Young Scholar TechTalk – Learning to Control and Coordinate Hybrid Traffic Through Robot Vehicles at Complex and Unsignalized Intersections

Innovation Wing Two, TechTalk / By clarecmy

Intersections are essential road infrastructures for traffic in modern metropolises; however, they can also be the bottleneck of traffic flows due to traffic incidents or the absence of traffic coordination mechanisms such as traffic lights. Thus, various control and coordination mechanisms that are beyond traditional control methods have been proposed to improve the efficiency of intersection traffic. Amongst these methods, the control of foreseeable hybrid traffic that consists of human-driven vehicles (HVs) and robot vehicles (RVs) has recently emerged. We propose a decentralized reinforcement learning approach for the control and coordination of hybrid traffic at real-world, complex intersections–a topic that has not been previously explored. Comprehensive experiments are conducted to show the effectiveness of our approach. We show that using 5% RVs, we can prevent congestion formation inside the intersection under the actual traffic demand of 700 vehicles per hour. When there exist more than 50% RVs in traffic, our method starts to outperform traffic signals on the average waiting time of all vehicles at the intersection.