I am a third-year Ph.D. student in Computer Science at the Australian Institute for Machine Learning (AIML), Adelaide University (formerly The University of Adelaide), advised by Prof. Javen Qinfeng Shi. I study representation learning, asking how natural language supervision determines which semantic structure is captured, preserved, or lost in vision-language models
My work combines theoretical analysis with controlled empirical study to understand contrastive learning, cross-modal alignment, and identifiability in representation learning, with the long-term goal of building more interpretable and reliable multimodal AI systems.
News
Previous news
Education
-
Sep 2023 - PresentPh.D. in Computer Science, Adelaide University
Advisor: Prof. Javen Qinfeng Shi -
Sep 2016 - Jun 2019M.Sc. in Instrument Science and Technology, Wuhan University of TechnologyAdvisor: Prof. Xiao Zhou
-
Sep 2012 - Jun 2016B.Eng. in Measurement & Control Technology and Instrument, Wuhan University of Technology
Experience
-
Sep 2023 - PresentPhD Student Researcher, AIML, Adelaide University
-
Jun 2020 - Aug 2022AI Engineer, Tellhow Software
-
Jul 2019 - Apr 2020Software Engineer, Huawei Technologies
-
May 2018 - Oct 2018Visiting Student Researcher, California PATH, UC Berkeley
Teaching
-
Teaching Assistant - Neural Networks and Deep Learning (ARTI X300), Adelaide UniversitySemester 1 2026
-
Guest Lecturer and Head Tutor - Statistical Machine Learning (COMP SCI 3314), Adelaide UniversitySemester 2 2025
-
Teaching Assistant - Using Machine Learning Tools (COMP SCI 7317), Adelaide UniversityTrimester 2 2025
-
Teaching Assistant - Concepts in AI and ML (COMP SCI 7327), Adelaide UniversitySemester 1 2025
Honors & Awards
-
NeurIPS Scholar Award2025
-
Adelaide University Research Scholarships2023
-
Award for Outstanding Graduates, Wuhan University of Technology2019
Academic Service
-
Reviewer for TMLR, ICLR 2026, ICML 2026, and NeurIPS 2026
Selected Publications (view all )
The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-Modal Divergence
Yichao Cai; Zhen Zhang; Yuhang Liu; Javen Q. Shi.
International Conference on Machine Learning (ICML) 2026
A theoretical study of representation geometry in unimodal and multimodal contrastive learning, revealing a geometric bifurcation and identifying object-level causes of the modality gap.
The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-Modal Divergence
Yichao Cai; Zhen Zhang; Yuhang Liu; Javen Q. Shi.
A theoretical study of representation geometry in unimodal and multimodal contrastive learning, revealing a geometric bifurcation and identifying object-level causes of the modality gap.
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu; Dong Gong; Yichao Cai; Erdun Gao; Zhen Zhang; Biwei Huang; Mingming Gong; Anton van den Hengel; Javen Q. Shi.
International Conference on Learning Representations (ICLR) 2026
An investigation of whether next-token prediction alone is sufficient for learning human-interpretable concepts from data.
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu; Dong Gong; Yichao Cai; Erdun Gao; Zhen Zhang; Biwei Huang; Mingming Gong; Anton van den Hengel; Javen Q. Shi.
An investigation of whether next-token prediction alone is sufficient for learning human-interpretable concepts from data.
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
Yichao Cai*; Yuhang Liu*; Erdun Gao; Tianjiao Jiang; Zhen Zhang; Anton van den Hengel; Javen Q. Shi. (* equal contribution)
Advances in Neural Information Processing Systems (NeurIPS) 2025 Spotlight
Studies when controlled cross-modal misalignment can improve multimodal representation learning instead of only harming it.
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
Yichao Cai*; Yuhang Liu*; Erdun Gao; Tianjiao Jiang; Zhen Zhang; Anton van den Hengel; Javen Q. Shi. (* equal contribution)
Spotlight
Studies when controlled cross-modal misalignment can improve multimodal representation learning instead of only harming it.
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai; Yuhang Liu; Zhen Zhang; Javen Q. Shi.
European Conference on Computer Vision (ECCV) 2024
Explores language-guided disentanglement of style and content through contrastive learning with augmented prompts.
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai; Yuhang Liu; Zhen Zhang; Javen Q. Shi.
Explores language-guided disentanglement of style and content through contrastive learning with augmented prompts.