Models That Prove Their Own Correctness

Abstract

This talk introduces Self-Proving models, a new class of models that formally prove the correctness of their outputs via an Interactive Proof system. After reviewing some related literature, I will formally define Self-Proving models and their per-input (worst-case) guarantees. I will then present algorithms for learning these models and explain how the complexity of the proof system affects the complexity of the learning algorithms. Finally, I will show experiments where Self-Proving models are trained to compute the Greatest Common Divisor of two integers, and to prove the correctness of their results to a simple verifier. No prior knowledge of autoregressive models or Interactive Proofs will be assumed of the listener. This is a joint work with Noga Amit, Shafi Goldwasser, and Guy Rothblum.

Date

Apr 1, 2025 12:00 PM

Event

Stanford University; FAR.ai; University of Warwick; University of Oxford; Cambridge University; Google DeepMind; École Polytechnique Fédérale de Lausanne (EPFL); Institut de Recherche en Informatique Fondamentale (IRIF); Zuse Institute Berlin (ZIB); Massachusetts Institute of Technology (MIT); Harvard University; Yale University; Alignment, Trust, Watermarking, and Copyright Issues in LLMs workshop at the Simons Institute for the Theory of Computing; and at the CS Theory Seminar at University of California, Berkeley

Stanford University: February 20th, 2025.
FAR.ai: February 13th, 2025.
University of Warwick: January 14th, 2025.
University of Oxford: January 13th, 2025.
Cambridge University: January 9th, 2025.
Google DeepMind: January 8th, 2025.
École Polytechnique Fédérale de Lausanne (EPFL): December 17th, 2024.
Institut de Recherche en Informatique Fondamentale (IRIF): December 10th, 2024.
Zuse Institute Berlin (ZIB): December 4th, 2024.
Massachusetts Institute of Technology (MIT): November 20th, 2024.
Harvard University: November 18th, 2024.
Yale University: November 14th, 2024.
Alignment, Trust, Watermarking, and Copyright Issues in LLMs workshop at the Simons Institute for the Theory of Computing: October 15th, 2024.
CS Theory Seminar at University of California, Berkeley: September 11th, 2024.

Talks

Models That Prove Their Own Correctness

Abstract

Orr Paradise

Related