Models That Prove Their Own Correctness

Abstract

This talk introduces Self-Proving models, a new class of models that formally prove the correctness of their outputs via an Interactive Proof system. After reviewing some related literature, I will formally define Self-Proving models and their per-input (worst-case) guarantees. I will then present algorithms for learning these models and explain how the complexity of the proof system affects the complexity of the learning algorithms. Finally, I will show experiments where Self-Proving models are trained to compute the Greatest Common Divisor of two integers, and to prove the correctness of their results to a simple verifier. No prior knowledge of autoregressive models or Interactive Proofs will be assumed of the listener. This is a joint work with Noga Amit, Shafi Goldwasser, and Guy Rothblum.

Date
Apr 1, 2025 12:00 PM
Event
University of Warwick; University of Oxford; Cambridge University; Google DeepMind; École Polytechnique Fédérale de Lausanne (EPFL); Institut de Recherche en Informatique Fondamentale (IRIF); Zuse Institute Berlin (ZIB); Massachusetts Institute of Technology (MIT); Harvard University; Yale University; Alignment, Trust, Watermarking, and Copyright Issues in LLMs workshop at the Simons Institute for the Theory of Computing; and at the CS Theory Seminar at University of California, Berkeley
  • University of Warwick: January 14th, 2025.
  • University of Oxford: January 13th, 2025.
  • Cambridge University: January 9th, 2025.
  • Google DeepMind: January 8th, 2025.
  • École Polytechnique Fédérale de Lausanne (EPFL): December 17th, 2024.
  • Institut de Recherche en Informatique Fondamentale (IRIF): December 10th, 2024.
  • Zuse Institute Berlin (ZIB): December 4th, 2024.
  • Massachusetts Institute of Technology (MIT): November 20th, 2024.
  • Harvard University: November 18th, 2024.
  • Yale University: November 14th, 2024.
  • Alignment, Trust, Watermarking, and Copyright Issues in LLMs workshop at the Simons Institute for the Theory of Computing: October 15th, 2024.
  • CS Theory Seminar at University of California, Berkeley: September 11th, 2024.

Related