Yichao Cai
Ph.D. Candidate
Australian Institute for Machine Learning
University of Adelaide Logo
About Me

I am a third-year Ph.D. student in Computer Science at the Australian Institute for Machine Learning (AIML), University of Adelaide, advised by Prof. Javen Qinfeng Shi. My research studies multimodal representation learning, with a particular focus on contrastive learning theory, cross-modal alignment, and identifiable causal representations.

More broadly, I am interested in how language supervision shapes semantic structure in vision-language models, and how this perspective can support interpretable, reliable, and human-aligned AI systems. My work combines theoretical analysis with empirical study to better understand representation formation in modern multimodal models.

Research interests: multimodal learning, contrastive learning theory, identifiability, causal representation learning, and vision-language models.


News
2026
I attended MLSS Melbourne 2026 and enjoyed learning from world-class speakers and connecting with the community.
Jan 26
2025
I served as a guest lecturer in Statistical Machine Learning and presented recent advances in vision-language modeling. Slides.
Oct 15
Our work On the Value of Cross-Modal Misalignment in Multimodal Representation Learning was selected as a Spotlight at NeurIPS 2025.
Sep 19
Education
  • Sep 2023 - Present
    University of Adelaide
    University of Adelaide
    Ph.D. in Computer Science
    Advisor: Prof. Javen Qinfeng Shi
  • Sep 2012 - Jun 2019
    Wuhan University of Technology
    Wuhan University of Technology
    M.Sc. in Instrument Science and Technology, Sep 2016 - Jun 2019
    Advisor: Prof. Xiao Zhou
    B.Eng. in Measurement and Control Engineering, Sep 2012 - Jun 2016
Experience
  • 2025 - present
    University of Adelaide
    University of Adelaide
    Teaching Assistant - Neural Networks and Deep Learning (ARTI X300), Semester 1 2026
    Guest Lecturer and Head Tutor - Statistical Machine Learning (COMP SCI 3314), Semester 2 2025
    Teaching Assistant - Using Machine Learning Tools (COMP SCI 7317), Trimester 2 2025
    Teaching Assistant - Concepts in AI and ML (COMP SCI 7327), Semester 1 2025
  • May 2018 - Oct 2018
    California PATH, UC Berkeley
    California PATH, UC Berkeley
    Visiting Student Researcher, under Dr. Ching-Yao Chan
Honors & Awards
  • NeurIPS Scholar Award
    2025
  • University of Adelaide Research Scholarships
    2023
  • Award for Outstanding Graduates, Wuhan University of Technology
    2019
Academic Service
  • Reviewer for TMLR, ICLR 2026, and ICML 2026
    2025 - 2026
Selected Publications (view all )
The Geometric Mechanics of Contrastive Learning: Alignment Potentials, Entropic Dispersion, and Modality Gap
arXiv
The Geometric Mechanics of Contrastive Learning: Alignment Potentials, Entropic Dispersion, and Modality Gap

Yichao Cai; Zhen Zhang; Yuhang Liu; Javen Q. Shi.

arXiv preprint 2026

A theoretical study of contrastive learning geometry, alignment forces, dispersion, and the emergence of modality gap.

The Geometric Mechanics of Contrastive Learning: Alignment Potentials, Entropic Dispersion, and Modality Gap

Yichao Cai; Zhen Zhang; Yuhang Liu; Javen Q. Shi.

A theoretical study of contrastive learning geometry, alignment forces, dispersion, and the emergence of modality gap.

arXiv
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
ICLR
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

Yuhang Liu; Dong Gong; Yichao Cai; Erdun Gao; Zhen Zhang; Biwei Huang; Mingming Gong; Anton van den Hengel; Javen Q. Shi.

International Conference on Learning Representations (ICLR) 2026

An investigation of whether next-token prediction alone is sufficient for learning human-interpretable concepts from data.

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

Yuhang Liu; Dong Gong; Yichao Cai; Erdun Gao; Zhen Zhang; Biwei Huang; Mingming Gong; Anton van den Hengel; Javen Q. Shi.

An investigation of whether next-token prediction alone is sufficient for learning human-interpretable concepts from data.

ICLR
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
NeurIPS
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

Yichao Cai*; Yuhang Liu*; Erdun Gao; Tianjiao Jiang; Zhen Zhang; Anton van den Hengel; Javen Q. Shi. (* equal contribution)

Advances in Neural Information Processing Systems (NeurIPS) 2025 Spotlight

Studies when controlled cross-modal misalignment can improve multimodal representation learning instead of only harming it.

On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

Yichao Cai*; Yuhang Liu*; Erdun Gao; Tianjiao Jiang; Zhen Zhang; Anton van den Hengel; Javen Q. Shi. (* equal contribution)

Spotlight

Studies when controlled cross-modal misalignment can improve multimodal representation learning instead of only harming it.

NeurIPS
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
ECCV
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai; Yuhang Liu; Zhen Zhang; Javen Q. Shi.

European Conference on Computer Vision (ECCV) 2024

Explores language-guided disentanglement of style and content through contrastive learning with augmented prompts.

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai; Yuhang Liu; Zhen Zhang; Javen Q. Shi.

Explores language-guided disentanglement of style and content through contrastive learning with augmented prompts.

ECCV
All publications