Yinpeng DONG

Assistant Professor

The basic theories of AI, machine learning, large model safety and alignment, generative AI, etc.

Education/Work Experience

2017: Bachelor's degree from the Department of Computer Science and Technology, Tsinghua University.

2022: Ph.D. from the Department of Computer Science and Technology, Tsinghua University (Advisor: Professor Jun Zhu).

January 2022-February 2025: Employed at the Department of Computer Science and Technology, Tsinghua University, as a Postdoctoral Researcher.

February 2025-Present: working as Assistant Professor at the College of AI, Tsinghua University.

Representative Works

1. Efficient Adversarial Attack Methods for Deep Learning:

Developed efficient adversarial attack methods for deep learning that significantly increase attack success rates in black-box scenarios where model information is unknown. These methods reveal the underlying vulnerabilities of deep learning models. The representative work, the Momentum Iterative Method [CVPR'18 Spotlight], has been cited over 3,600 times and is listed as a benchmark algorithm in adversarial attack and defense platforms developed by Google, OpenAI, and IBM. This work was also the first to break through commercial multimodal large models (such as GPT-4o and Gemini) and has been used by OpenAI for robustness evaluation of the GPT-o1 model.

2. Generalizable and Robust Defense Methods:

Focused on developing generalizable and robust defense mechanisms and security enhancements. Proposed the robust diffusion classifier model [ICML'24 & NeurIPS'24], which integrates discriminative and generative modeling approaches in machine learning. This work demonstrated for the first time that generative classifiers can achieve optimal robustness. Additionally, introduced an inference-enhanced safety alignment method for large models [ICML'25 Oral], achieving state-of-the-art security levels in commercial models and applied to the safety alignment of DeepSeek-R1.

3. Model Security Evaluation Benchmarks and Platforms:

Constructed comprehensive evaluation benchmarks and platforms for model security, including:

1) The first comprehensive robustness evaluation benchmark for deep learning models, ARES [CVPR'20 Oral];

2) The first multimodal large model trustworthiness evaluation benchmark, MultiTrust [NeurIPS'24];

3) The first text-to-video model security evaluation benchmark, T2VSafetyBench [NeurIPS'24].

Organized the "AI Security Challenger Program" competition based on these algorithmic platforms, which has been covered by media outlets such as the People's Daily.

Email

dongyinpeng@mail.tsinghua.edu.cn

Office

Room 406, Block F, Zhongguancun Intelligent Manufacturing Street
TOP