Tsinghua Intelligent Software Engineering Lab-College of Al, Tsinghua University

Research

Intelligent Software Engineering Research Group (T-ISE)

PI：Jia LI

Research Direction: The cross-empowerment between artificial intelligence and software engineering: AI for SE and SE for AI

Research Group Introduction

The research group is primarily engaged in research at the intersection of Artificial Intelligence (AI) and Software Engineering (SE). Its specific research directions can be categorized into the following two types:

1、AI for SE: This direction focuses on developing novel, reliable, efficient, and secure AI technologies to enhance the efficiency and quality of the entire software engineering lifecycle, thereby advancing the level of software automation. Research scenarios encompass core software engineering tasks such as code generation, test generation, and code repair.

2、SE for AI: Grounded in the theoretical and methodological frameworks of software engineering, this direction investigates innovative techniques to improve the performance, efficiency, and security of AI technologies, providing support for the application of AI technologies. Research scenarios cover important issues such as large language model hallucinations and large language model alignment.

For more details, please visit our research group's homepage: https://lj2lijia.github.io/

Research Achievements

➢ We have advanced the convergence of artificial intelligence and software engineering, propelling large-model-based code generation to new heights. We led or co-led the training of several code-oriented large language models that achieve state-of-the-art results on downstream tasks, providing the community with robust foundational models. We introduced deep-reasoning code-generation techniques that fully unleash the inferential power of large models, enabling them to tackle complex development requirements. Moreover, we established evaluation benchmarks grounded in real-world software projects to accelerate the adoption of large models in practical software development.

➢ Over the past five years, we have published more than twenty papers in CCF-A top-tier conferences and journals such as NeurIPS, ACL, ICSE, ASE, and FSE, including multiple Oral presentations. These works have been cited over a thousand times by researchers worldwide, including groups at MIT, Stanford University, Nanyang Technological University, the Chinese University of Hong Kong, and other leading institutions.

Group Culture & Mentorship Philosophy

The Intelligent Software Engineering Research Group (T-ISE) at Tsinghua University adheres to the philosophy of "connecting the sky and the ground." Here, "grounding" refers to conducting in-depth research on fundamental issues within the field, achieving high-caliber academic research; while "reaching for the sky" signifies the construction of effective interdisciplinary applications to address real-world pain points.

The group places great emphasis on students' personal interests. Within the context of aligning with the group's main research directions, students are encouraged to freely explore valuable research questions. Students are actively guided to engage in in-depth exchanges with the industrial sector, identifying meaningful research problems from practical applications. The aim is to cultivate students into research professionals capable of independent thinking and problem-solving.

Regular weekly meetings and one-on-one discussions are held, providing ample computational resources and professional research guidance.

For more information, please visit our homepage: https://lj2lijia.github.io/

Representative Publications

➢ Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu, Hao Zhu, Lecheng Wang, Kaibo Liu, Zheng Fang, Lanshen Wang, Jiazheng Ding, Xuanming Zhang, Yuqi Zhu, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li, Bin Gu, and Mengfei Yang. 2024. DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories. In Findings of the 62st Annual Meeting of the Association for Computational Linguistics (ACL 2024), pages 3603–3614. Association for Computational Linguistics.

➢ Jia Li, Ge Li, Xuanming Zhang, Yunfei Zhao, Yihong Dong, Zhi Jin, Binhua Li, Fei Huang, Yongbin Li. EvoCodeBench: An Evolving Code Generation Benchmark with DomainSpecific Evaluations. In the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), pages 57619-57641.

➢ Jia Li, Yunfei Zhao, Yongmin Li, Ge Li, and Zhi Jin. 2024. AceCoder: An Effective Prompting Technique Specialized in Code Generation. ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 33, Issue 8, Pages 1-26.

➢ Jia Li, Ge Li, Yongmin Li, and Zhi Jin. 2025. Structured Chain-of-Thought Prompting for Code Generation. ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 34, Issue 2, Pages 1-23.

➢ Jia Li, Yongmin Li, Ge Li, Zhi Jin, Yiyang Hao, and Xing Hu. 2023. SkCoder: A SketchBased Approach for Automatic Code Generation. In the 45th International Conference on Software Engineering (ICSE 2023). IEEE Press, 2124–2135.

➢ Jia Li, Ge Li, Zhuo Li, Zhi Jin, Xing Hu, Kechi Zhang, and Zhiyi Fu. 2023. CodeEditor: Learning to Edit Source Code with Pre-trained Models. ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 32, Issue 6, Pages 1-22.

➢ Jia Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, and Zhi Jin. 2022. EditSum: a retrieveand-edit framework for source code summarization. In the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE 2021). IEEE Press, 155–166.

➢ Jia Li, Zhuo Li, Huangzhao Zhang, Ge Li, Zhi Jin, Xing Hu, and Xin Xia. 2024. Poison Attack and Poison Detection on Deep Source Code Processing Models. ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 33, Issue 3, Pages 1-31.

➢ Siyuan Jiang*, Jia Li* (共同一作), He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge Li. 2025. aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing. In the 47th International Conference on Software Engineering (ICSE 2025). Just Accepted (December 2024).

➢ Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, and Hong Mei. 2024. Hot or cold? adaptive temperature sampling for code generation with large language models. In the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI 2024), Vol. 38. AAAI Press, Article 50, 437–445.

➢ Haojie Zhang, Ge Li, Jia Li, Zhongjin Zhang, Yuqi Zhu, and Zhi Jin. 2022. Fine-tuning pre-trained language models effectively by optimizing subnetworks adaptively. In the 36th International Conference on Neural Information Processing Systems (NeurIPS 2022). Curran Associates Inc., Red Hook, NY, USA, Article 1558, 21442–21454.

➢ Kechi Zhang, Zhuo Li, Jia Li, Ge Li, and Zhi Jin. 2023. Self-Edit: Fault-Aware Code Editor for Code Generation. In the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), pages 769–787. Association for Computational Linguistics.

➢ Kechi Zhang, Ge Li, Jia Li, Yihong Dong, Jia Li, Zhi Jin. 2025. Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points. In Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Just Accepted (May 2025).

➢ Jia Li, Fang Liu, Jia Li, Yunfei Zhao, Ge Li, and Zhi Jin. 2023. MCodeSearcher: MultiView Contrastive Learning for Code Search. In the 14th Asia-Pacific Symposium on Internetware (Internetware 2023). Association for Computing Machinery, New York, NY, USA, 270–280.

➢ Jia Li, Chongyang Tao, Jia Li, Ge Li, Zhi Jin, Huangzhao Zhang, Zheng Fang, and Fang Liu. 2025. Large Language Model-Aware In-Context Learning for Code Generation. ACM Transactions on Software Engineering and Methodology (TOSEM). Just Accepted (February 2025).

➢ Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation. In the ACM International Conference on the Foundations of Software Engineering (FSE 2024), Volume 1, Issue FSE, Pages 15851608.

➢ Jia Li, Chongyang Tao, Zhi Jin, Fang Liu, Jia Li, and Ge Li. 2024. ZC3: Zero-Shot CrossLanguage Code Clone Detection. In the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023). IEEE Press, 875–887.

➢ Huangzhao Zhang, Kechi Zhang, Zhuo Li, Jia Li, Jia Li, Yongmin Li, Yunfei Zhao, Yuqi Zhu, Fang Liu, Ge Li, Zhi Jin, Deep learning for code generation: a survey, SCIENCE CHINA Information Sciences, Volume 67, Issue 9, 2024, Pages 191101, ISSN 1674-733X.

➢ Jia Li, Xuyuan Guo, Lei Li, Kechi Zhang, Ge Li, Jia Li, et al. 2025. LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding. In the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Just Accepted (May 2025).

Group Members

Yitong ZHANG

22373337@buaa.edu.cn
Zhengxiang CHENG

zhengxiangc@hust.edu.cn
Leikang YANG

ylk22@mails.tsinghua.edu.cn

Research

Intelligent Software Engineering Research Group (T-ISE)

Research Group Introduction

Research Achievements

Group Culture & Mentorship Philosophy

Representative Publications

Group Members

Yitong ZHANG

22373337@buaa.edu.cn

Zhengxiang CHENG

zhengxiangc@hust.edu.cn

Leikang YANG

ylk22@mails.tsinghua.edu.cn

News & Updates

contacts

Office

Postal Code

Email