A recipe for scalable attention-based MLIPs: unlocking long-range accuracy with all-to-all node attention

1FAIR at Meta, 2UC Berkeley, 3LBNL
*Work done at Meta, †Corresponding Author

How to effectively train MLIPs at scale?

Our Answer: All-to-all Node Attention

AllScAIP pairs local neighborhood attention with global all-to-all node attention: every atom attends to every other atom in a single layer. Both stages use standard multi-head self-attention CUDA kernels ⚡⚡⚡

Loading architecture diagram…

What does node attention actually learn? Click an atom to see: attention reaches far beyond the local cutoff.

Loading attention visualization…

Does it matter? Stretch a molecule and watch how the error changes.

Loading distance-scaling explorer…

What happens to inductive biases when data and model size scale up?

AllScAIP encodes physics at three levels: Enforced symmetries that never break, Optional geometric priors that help at small scale, and biases left entirely Learnable.

Loading inductive bias table…

The Evidence: Component Ablations

Loading ablation dashboard…

Results

Open Molecules 2025

AllScAIP is on top of the OMol25 leaderboard at the time of release: see the leaderboard live!

Molecular Dynamics Simulations

NPT simulations recover experimental density and heat of vaporization out of the box: no more over-compression!

Density and heat of vaporization parity plots from AllScAIP molecular dynamics simulations
NPT molecular dynamics results: density and heat of vaporization parity plots (Fig. 7 from paper).