Yao FENG

Assistant Professor (incoming)

Human-Centric Embodied Intelligence;Human-Robot Interaction;Digital Humans;Computer Vision;Computer Graphics;Robotics;Machine Learning

Education/Work Experience

Sept. 2019 – Oct. 2024: Ph.D., jointly at ETH Zürich and the Max Planck Institute for Intelligent Systems. Advisors: Michael J. Black and Marc Pollefeys.

Oct. 2023 – Nov. 2024: Research Scientist, Meshcapade (acquired by Epic Games).

Jan. 2025 – present: Postdoctoral Researcher, Department of Computer Science, Stanford University. Advisors: Karen Liu and Scott Delp.

Research Directions

My research focuses on Human-Centric Embodied Intelligence — building a general-purpose embodied brain that truly understands human behavior, models the interaction between people and the world, and collaborates with humans. My research spans three core directions:

  • Human-Centric Perception. From videos and multi-modal signals, understanding human actions, intentions, and behaviors, as well as the interactions between humans and objects, scenes, and other humans, to enable fine-grained understanding of human behavior in the real world.

  • Human-Centric Understanding. Deeply integrating large language models with 3D human representations, enabling machines to reason about, describe, and edit human pose, motion, appearance, and behavior through language — unifying the language-level understanding of body, hands, garments, and interactions, and bridging LLMs with embodied intelligence.

  • Human-Centric Embodiment. Studying video world models and human-robot collaborative action policies for home, long-term care, and service scenarios, enabling a single agent to act across the boundaries of digital humans, virtual characters, and physical robots — and truly operate in environments shared with people.

Selected Works

In the early-to-mid stages of my Ph.D., my research focused on 3D digital human representation and multi-modal human reconstruction, with first-author works such as DECA, PIXIE, and SCARF, establishing a foundation-model pipeline for human representation spanning from face to full body and from single image to video. During the later stages of my Ph.D. and my time at Meshcapade, my research extended to the fusion of large language models with digital humans, with first-author works such as ChatPose and ChatHuman, which directly connected language models with 3D human behavioral representations. Entering my postdoctoral stage, my research has shifted toward human-centric embodied intelligence and human-robot collaboration, with works such as GentleHumanoid, which studies safe and compliant physical interaction between humanoid robots and humans in shared environments.

To date, I have published 20+ papers in top-tier international venues across computer graphics (SIGGRAPH / SIGGRAPH Asia), computer vision (CVPR / ICCV / ECCV), machine learning (ICLR / NeurIPS), and robotics (Science Advances), with 3,000+ Google Scholar citations and approximately 15,000 GitHub stars across my open-source projects. Honors include MIT EECS Rising Stars, WiGRAPH Rising Stars in Computer Graphics, and Eurographics PhD Award Candidate.

For more details, please visit: https://yfeng95.github.io

Email

yaofeng@stanford.edu

Office

Block F, Zhongguancun Intelligent Manufacturing Street

Homepage

https://yfeng95.github.io/
TOP