Proof-Carrying Materials: Falsifiable Safety Certificates for Machine-Learned Interatomic Potentials
Authors: Abhinaba Basu, Pavan Chakraborty
RSCT Score Breakdown
TL;DR
Proof-Carrying Materials: Falsifiable Safety Certificates for Machine-Learned Interatomic Potentials
RSCT Certification: κ=0.778 (certified) | RSN: 0.70/0.75/0.20 | Topics: Mixture of Experts Architectures, Energy-Based Transformers, Diffusion and Generative Models
Overview
One-Sentence Summary
This paper introduces Proof-Carrying Materials (PCM), a framework to rigorously audit and validate machine-learned interatomic potentials (MLIPs) used for high-throughput materials screening, and demonstrates its ability to discover additional stable materials missed by traditional MLIP-based screening.
Key Innovation
The key innovation in this work is the PCM framework, which combines three stages - adversarial falsification, bootstrap envelope refinement, and formal certification - to systematically evaluate the reliability of MLIPs. This is a significant advancement over the status quo, where MLIPs are often used without formal reliability guarantees.
Should You Read This?
If you work on machine learning for materials science: Yes, this paper is highly relevant. It tackles a critical issue in the field - the lack of formal reliability guarantees for MLIPs - and presents a rigorous solution. Understanding the PCM framework and its applications could directly inform your own work.
If you work on high-throughput materials discovery: Yes, this paper is also relevant to you. The case study demonstrating PCM's ability to discover additional stable materials missed by traditional MLIP-based screening is particularly interesting. Understanding the limitations of current MLIP-based approaches and how PCM can address them could improve your materials discovery workflows.
The Good
- The PCM framework is a well-designed, multi-stage approach to systematically audit and validate MLIPs, addressing a clear gap in the field.
- The extensive evaluation, including adversarial testing, bootstrap analysis, formal certification, and independent validation, provides a high level of confidence in the reliability of the PCM-audited MLIPs.
- The case study demonstrating the ability of PCM to discover additional stable materials missed by traditional MLIP-based screening is a compelling real-world application of the framework.
- The paper is well-written and accessible, with a good balance of background information for readers from different domains.
The Gaps
- The paper does not provide a detailed discussion of the computational cost or scalability of the PCM framework, which could be an important consideration for practical deployment.
- The evaluation is limited to a specific set of MLIP architectures (CHGNet, TensorNet, and MACE), and it's unclear how well the PCM framework would generalize to other MLIP architectures.
- The case study on thermoelectric materials discovery is interesting, but it's a single example, and more validation across different materials domains would strengthen the claims.
How to Read This Paper
If you're from the machine learning for materials science domain: You can likely skip the background sections on materials science and focus on the technical details of the PCM framework, including the adversarial testing, bootstrap analysis, and formal certification.
If you're from the high-throughput materials discovery domain: The background sections on materials science and MLIP-based screening will be most relevant for you, as well as the case study demonstrating PCM's ability to discover additional stable materials.
Must read (everyone): The sections describing the PCM framework and its key components (adversarial falsification, bootstrap envelope refinement, and formal certification) are essential for understanding the core contribution of this work.
Verify: Before building on the results, you should independently validate the performance of the PCM-audited MLIPs on a broader set of materials and architectures.
Bottom Line
This paper presents a valuable and practical framework for rigorously auditing and validating machine-learned interatomic potentials (MLIPs) used in high-throughput materials screening. By combining adversarial testing, bootstrap analysis, and formal certification, the Proof-Carrying Materials (PCM) framework addresses a critical gap in the field and could significantly improve the reliability of MLIP-based materials discovery workflows. The case study demonstrating PCM's ability to uncover additional stable materials missed by traditional MLIP-based screening is a compelling proof of concept, and the broader application of this framework could lead to more robust and trustworthy materials discovery.
Quality Assessment
Trust Level: MODERATE - Verify key results first
What the scores mean:
- 70% signal - This much of the paper directly supports its claims
- 75% context - Background material for readers from other fields (this is a bridge paper - high context is a feature!)
- 20% noise - Content that may mislead if taken at face value
Reliability score: 78% (certified)
Practical interpretation: Good foundation but some gaps. Read critically and verify key claims before building on this work.
Paper Details
- Authors: Abhinaba Basu, Pavan Chakraborty
- Published: 2026-03-12
- Source: arxiv
- PDF: Download
- Primary Topic: Mixture of Experts Architectures
- Difficulty: Intermediate
Abstract
Machine-learned interatomic potentials (MLIPs) are deployed for high-throughput materials screening without formal reliability guarantees. We show that a single MLIP used as a stability filter misses 93% of density functional theory (DFT)-stable materials (recall 0.07) on a 25,000-material benchmark. Proof-Carrying Materials (PCM) closes this gap through three stages: adversarial falsification across compositional space, bootstrap envelope refinement with 95% confidence intervals, and Lean 4 formal certification. Auditing CHGNet, TensorNet and MACE reveals architecture-specific blind spots with near-zero pairwise error correlations (r ≤ 0.13; n = 5,000), confirmed by independent Quantum ESPRESSO validation (20/20 converged; median DFT/CHGNet force ratio 12x). A risk model trained on PCM-discovered features predicts failures on unseen materials (AUC-ROC = 0.938 +/- 0.004) and transfers across architectures (cross-MLIP AUC-ROC ~ 0.70; feature importance r = 0.877). In a thermoelectric screening case study, PCM-audited protocols discover 62 additional stable materials missed by single-MLIP screening - a 25% improvement in discovery yield.
This analysis was automatically generated and certified by the Swarm-It RSCT pipeline. κ-gate score: 0.778 | Quality tier: certified