TechTalk – Data Intelligence of Large Language Models
June 18, 2026 (Thursday) 4:00pm-5:00pm
Database systems, which provide various operations for defining and querying data, enable large-scale AI systems and intelligent applications in various domains. Due to recent advances in large language models (LLMs), automating database operations through code generation has become increasingly attainable. This capability of having data intelligence in LLMs has given rise to a new paradigm—Data-Centric Code Generation (DCCG)—which aims to build systems that can automatically understand, manipulate, and reason over data.
To realize DCCG, I will discuss our team’s effort in building benchmarking systems, including BIRD-SQL, a large-scale Text-to-SQL benchmark on real databases, and SWE-SQL, which gauges the ability that an LLM resolves user SQL issues. These benchmarks, widely used in the industry, reveal hallucination and other issues faced by LLMs. To address these challenges, I will present our work in graph-aware reasoning, SQL correction, and multi-turn tabular data analysis. They aim to evolve LLMs from static code generators into autonomous, trustworthy agents, with data intelligence, that can understand and generate data-driven software systems.










