Representative papers are highlighted.

2026

The Geometric Mechanics of Contrastive Learning: Alignment Potentials, Entropic Dispersion, and Modality Gap
arXiv
The Geometric Mechanics of Contrastive Learning: Alignment Potentials, Entropic Dispersion, and Modality Gap

Yichao Cai; Zhen Zhang; Yuhang Liu; Javen Q. Shi.

arXiv preprint 2026

A theoretical study of contrastive learning geometry, alignment forces, dispersion, and the emergence of modality gap.

The Geometric Mechanics of Contrastive Learning: Alignment Potentials, Entropic Dispersion, and Modality Gap

Yichao Cai; Zhen Zhang; Yuhang Liu; Javen Q. Shi.

A theoretical study of contrastive learning geometry, alignment forces, dispersion, and the emergence of modality gap.

arXiv

2025

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
ICLR
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

Yuhang Liu; Dong Gong; Yichao Cai; Erdun Gao; Zhen Zhang; Biwei Huang; Mingming Gong; Anton van den Hengel; Javen Q. Shi.

International Conference on Learning Representations (ICLR) 2026

An investigation of whether next-token prediction alone is sufficient for learning human-interpretable concepts from data.

I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?

Yuhang Liu; Dong Gong; Yichao Cai; Erdun Gao; Zhen Zhang; Biwei Huang; Mingming Gong; Anton van den Hengel; Javen Q. Shi.

An investigation of whether next-token prediction alone is sufficient for learning human-interpretable concepts from data.

ICLR
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
NeurIPS
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

Yichao Cai*; Yuhang Liu*; Erdun Gao; Tianjiao Jiang; Zhen Zhang; Anton van den Hengel; Javen Q. Shi. (* equal contribution)

Advances in Neural Information Processing Systems (NeurIPS) 2025 Spotlight

Studies when controlled cross-modal misalignment can improve multimodal representation learning instead of only harming it.

On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

Yichao Cai*; Yuhang Liu*; Erdun Gao; Tianjiao Jiang; Zhen Zhang; Anton van den Hengel; Javen Q. Shi. (* equal contribution)

Spotlight

Studies when controlled cross-modal misalignment can improve multimodal representation learning instead of only harming it.

NeurIPS

2024

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
ECCV
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai; Yuhang Liu; Zhen Zhang; Javen Q. Shi.

European Conference on Computer Vision (ECCV) 2024

Explores language-guided disentanglement of style and content through contrastive learning with augmented prompts.

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai; Yuhang Liu; Zhen Zhang; Javen Q. Shi.

Explores language-guided disentanglement of style and content through contrastive learning with augmented prompts.

ECCV