May 2024

TechTalk – Learning to Simulate and Understand the 3D World

June 06, 2024 (Thursday) 4:30-5:30pm
Humans live in a 3D world, continually acquiring diverse skills and engaging in various activities through perceiving, understanding, and interacting with it. Our long-term research objective is centered on simulating the 3D world and empowering AI systems with 3D spatial understanding capabilities. In this talk, I will start by discussing our recent research efforts in creating 3D interactive environments by reconstruction, decomposition, and generation. Subsequently, I will explore how we can equip machines with the ability to comprehend and reason within a 3D environment by adopting a data-centric approach. Lastly, I will examine the possibilities of integrating 3D environment simulation and understanding to facilitate the emergence of closed-loop active intelligence. In summary, this talk will encompass our latest efforts in 3D reconstruction, comprehension, and creation, ultimately aiming for AI systems that can effectively navigate and engage with our 3D world.

TechTalk – Towards Controllable and Compositional Visual Content Generation

May 30, 2024 (Thursday) 4:30-5:30pm
Visual content generation has achieved great success in the past few years, but current visual generation models still lack controllability and compositionality. In real applications, we desire highly controllable visual generation models which allow users to control the generated contents in a fine-grained manner. We also desire models which can effectively compose objects with different attributes and relationships into a complex and coherent scene. In this talk, I will introduce our several works towards controllable and compositional visual content generation. I will introduce T2I-CompBench for benchmarking compositional text-to-image generation. I will also introduce our recent works on drag-based video editing, controllable 3D generation, and training-free massive concept editing in text-to-image diffusion models.