Jia LI

Assistant Professor

The cross-empowerment between artificial intelligence and software engineering: AI for SE and SE for AI

Education/Work Experience

Sept 2020–Jul 2025 Ph.D. in Computer Science, School of Computer Science, Peking University

Supervisor: Prof. Zhi Jin

Aug 2025–present Assistant Professor, Institute for Artificial Intelligence, Tsinghua University

Research Directions

Assistant Professor Li Jia is primarily engaged in research at the intersection of Artificial Intelligence (AI) and Software Engineering (SE). His specific research directions can be categorized into the following two types:

1、AI for SE: This research focuses on developing novel, reliable, efficient, and secure AI technologies to enhance the efficiency and quality of the entire software engineering lifecycle, thereby advancing the level of software automation. Research scenarios encompass core software engineering tasks such as code generation, test generation, and code repair.

2、SE for AI: Grounded in the theoretical and methodological frameworks of software engineering, this research explores innovative techniques to improve the performance, efficiency, and security of AI technologies, providing support for the application of AI technologies. Research scenarios cover critical issues such as large language model hallucinations and large language model alignment.


Research Highlights

➢ Advanced the convergence of AI and software engineering, accelerating large-model-based code generation. Led or co-led the training of multiple code-oriented large language models that set new international benchmarks on downstream tasks and provide the community with robust foundational models. Introduced deep-reasoning code-generation techniques that unlock the inferential power of large models, enabling them to tackle complex development demands. Established evaluation benchmarks grounded in real-world software projects, driving the adoption of large models in practical development.

➢ Over the past five years, published more than 20 papers in CCF-A top-tier conferences and journals (NeurIPS, ACL, ICSE, ASE, FSE, etc.), including several Oral presentations. These works have been cited over 1,000 times by researchers from institutions such as MIT, Stanford, NTU, and CUHK.

➢ Served on program committees of premier international conferences (e.g., ASE) and has been repeatedly invited to give oral presentations. Research outcomes have been featured by mainstream media including China Science Daily, China Daily, and Synced. Honors include Beijing Outstanding Graduate and the “Excellent Ph.D. Student Award” at the ChinaSoft Conference.

Representative Works

➢ Code-oriented Large Language Models

– Co-trained aiXcoder-7B (7 B params), the first LLM to explicitly inject code-structural priors—syntax, dependencies, etc.—into pre-training. Novelties span pre-training objectives, data sampling, and cleansing. aiXcoder-7B outperforms same-scale international baselines (e.g., Meta’s Code Llama-7B, DeepSeek-Coder-6.7B) on eight mainstream benchmarks, earned 2,271 GitHub stars, and ranked in Hugging Face’s Global Trending Top-30 (May 2024).

– Led the training of aiXcoder-7B-v2, which further boosts long-context capability via reinforcement learning and sets new records on repository-level code completion.

➢ Deep-Reasoning Code Generation

Introduced a four-stage reasoning pipeline—requirement understanding, planning, implementation, and optimization—mirroring real-world developer cognition. Each stage refines the previous output, yielding up to 88.4 % relative Pass@1 improvement. The technique has sparked follow-up studies from MIT, Peking University, and other leading groups.

➢ Real-Project-Aligned Evaluation Benchmarks

Proposed and open-sourced DevEval (static) and EvoCodeBench (dynamic), curated from high-quality open-source projects whose distribution mirrors real software. EvoCodeBench auto-updates to prevent data leakage. These benchmarks have been adopted by researchers from ByteDance, Baidu, NTU, and more, highlighting current limitations and guiding future development.

Email

jia_li@mail.tsinghua.edu.cn

Office

Room 411, Block F, Zhongguancun Intelligent Manufacturing Street

Homepage

https://lj2lijia.github.io/
TOP