Back to reviews
min readarXiv:2603.15607v1

Do Metrics for Counterfactual Explanations Align with User Perception?

Authors: Felix Liedeker, Basil Ell, Philipp Cimiano, Christoph Düsing

🥉 Certified (κ=0.78)Intermediateai-safety-and-alignmentcs-aiai-alignment-and-model-safety

RSCT Score Breakdown

Relevance (R)
0.42
Superfluous (S)
0.46
Noise (N)
0.12

TL;DR

Explainability is widely regarded as essential for trustworthy artificial intelligence systems. However, the metrics commonly used to evaluate counterfactual explanations are algorithmic evaluation me...

Do Metrics for Counterfactual Explanations Align with User Perception?

RSCT Certification: κ=0.778 (certified) | RSN: 0.70/0.75/0.20 | Topics: AI Alignment and Model Safety, AI Safety and Alignment, AI Code Security and Vulnerability

Overview

One-Sentence Summary

This paper empirically evaluates the alignment between commonly used metrics for evaluating counterfactual explanations in AI systems and actual human perceptions of explanation quality, finding that the metrics often fail to capture important aspects of explanation quality as perceived by users.

Key Innovation

The key innovation here is the direct empirical comparison of standard algorithmic evaluation metrics for counterfactual explanations to human judgments of explanation quality across multiple datasets. Previous work has relied heavily on these algorithmic metrics, but this paper is the first to systematically assess whether they actually reflect what users find valuable in explanations.

Should You Read This?

If you work on explainable AI (XAI) or the evaluation of explanations: Yes, this paper is a must-read. It calls into question the validity of the most commonly used metrics for evaluating counterfactual explanations, which are a key type of explanation in XAI. Understanding the limitations of these metrics is crucial for developing more human-centered approaches to evaluating explanation quality.

If you work on human-AI interaction or the user experience of AI systems: Yes, this paper provides important insights about the disconnect between algorithmic evaluation and actual user perceptions. This can inform the design of AI systems to better meet user needs for transparent and trustworthy decision-making.

The Good

  • The paper takes a rigorous, systematic approach to evaluating explanation quality metrics across multiple datasets.
  • The human evaluation study is well-designed, with participants rating explanations on multiple quality dimensions.
  • The results clearly show that widely used algorithmic metrics often fail to capture key aspects of explanation quality as perceived by users.
  • The authors provide a comprehensive set of standard counterfactual explanation metrics, which serves as a useful reference for the field.

The Gaps

  • The paper does not explore potential reasons for the disconnect between algorithmic metrics and human perceptions. More analysis is needed to understand the underlying causes.
  • The human evaluation was limited to a relatively small number of participants. Larger-scale studies would strengthen the conclusions.
  • The paper does not propose alternative metrics or evaluation frameworks that better align with user needs. This is an important next step.
  • The datasets used may not fully represent the range of real-world use cases for counterfactual explanations. Validating the findings on a more diverse set of applications would increase confidence.

How to Read This Paper

If you're from the XAI/explainability field: You can likely skip the background sections on counterfactual explanations and focus on the empirical evaluation, results, and discussion.

If you're from human-AI interaction or user experience: The background on counterfactual explanations and existing evaluation metrics will be valuable context. Pay close attention to the human evaluation methodology and findings.

Must read (everyone): The results and discussion sections contain the core contribution of the paper, demonstrating the limitations of current evaluation metrics.

Verify: The specific claims about the weak correlations between algorithmic metrics and human ratings, as well as the lack of improvement from using more metrics, should be verified through independent replication.

Bottom Line

This paper presents a crucial wake-up call for the field of explainable AI. It demonstrates that the metrics we have been using to evaluate the quality of counterfactual explanations often fail to capture what users actually find valuable. This underscores the need for more human-centered approaches to evaluating AI explanations, moving beyond just algorithmic performance to consider the actual user experience. Researchers and practitioners working on explainable AI systems should carefully consider these findings and work to develop evaluation frameworks that better align with user needs and perceptions.

Quality Assessment

Trust Level: MODERATE - Verify key results first

What the scores mean:

  • 70% signal - This much of the paper directly supports its claims
  • 75% context - Background material for readers from other fields (this is a bridge paper - high context is a feature!)
  • 20% noise - Content that may mislead if taken at face value

Reliability score: 78% (certified)

Practical interpretation: Good foundation but some gaps. Read critically and verify key claims before building on this work.

Paper Details

  • Authors: Felix Liedeker, Basil Ell, Philipp Cimiano, Christoph Düsing
  • Published: 2026-03-16
  • Source: arxiv
  • PDF: Download
  • Primary Topic: AI Alignment and Model Safety
  • Difficulty: Intermediate

Abstract

Explainability is widely regarded as essential for trustworthy artificial intelligence systems. However, the metrics commonly used to evaluate counterfactual explanations are algorithmic evaluation metrics that are rarely validated against human judgments of explanation quality. This raises the question of whether such metrics meaningfully reflect user perceptions. We address this question through an empirical study that directly compares algorithmic evaluation metrics with human judgments across three datasets. Participants rated counterfactual explanations along multiple dimensions of perceived quality, which we relate to a comprehensive set of standard counterfactual metrics. We analyze both individual relationships and the extent to which combinations of metrics can predict human assessments. Our results show that correlations between algorithmic metrics and human ratings are generally weak and strongly dataset-dependent. Moreover, increasing the number of metrics used in predictive models does not lead to reliable improvements, indicating structural limitations in how current metrics capture criteria relevant for humans. Overall, our findings suggest that widely used counterfactual evaluation metrics fail to reflect key aspects of explanation quality as perceived by users, underscoring the need for more human-centered approaches to evaluating explainable artificial intelligence.


This analysis was automatically generated and certified by the Swarm-It RSCT pipeline. κ-gate score: 0.778 | Quality tier: certified

About This Review

This review was auto-generated by the Swarm-It research discovery platform. Quality is certified using RSCT (RSN Certificate Technology) with a κ-gate score of 0.78. RSN scores: Relevance=0.42, Superfluous=0.46, Noise=0.12.