How to effectively train MLIPs at scale?
AllScAIP pairs local neighborhood attention with global all-to-all node attention: every atom attends to every other atom in a single layer. Both stages use standard multi-head self-attention CUDA kernels ⚡⚡⚡
What does node attention actually learn? Click an atom to see: attention reaches far beyond the local cutoff.
Does it matter? Stretch a molecule and watch how the error changes.
What happens to inductive biases when data and model size scale up?
AllScAIP encodes physics at three levels: Enforced symmetries that never break, Optional geometric priors that help at small scale, and biases left entirely Learnable.
AllScAIP is on top of the OMol25 leaderboard at the time of release: see the leaderboard live!
NPT simulations recover experimental density and heat of vaporization out of the box: no more over-compression!