The Problem of Insincere, Post-Hoc AI Explanations

Boris Babic & I. Glenn Cohen, The Algorithmic Explainability "Bait and Switch", available at SSRN (August 20, 2023).

AI is mysterious and important. It’s important because it’s showing up everywhere and doing lots of things. It’s mysterious because we very often don’t know how it works and why it comes to the conclusions it does. Whether AI should be important is hotly debated, but its mystery is widely regarded as a problem, particularly when AI is making inscrutable decisions that matter to people’s lives. And so there are widespread calls in law, policy, and scholarship for explainable AI—that is, ways to explain just why an AI system came to the conclusion it did. In The Algorithmic Explainability “Bait and Switch”, Boris Babic and Glenn Cohen add to the literature on explainable AI by clearly and convincingly arguing that explainable AI is “fool’s gold”—shiny and exciting on the surface, but not what we need, because it’s post hoc, insincere, tough to judge, and can’t be used to effectively guide actions.

So what is explainable AI, and why does it matter? Essentially, the problem is that it’s too hard to understand how AI makes decisions; they’re too complicated and don’t make sense, so they’re opaque to us. Explainable AI tries to use another, simpler algorithm to approximate a plausible reason the AI might have come to its conclusion; that explanation is typically specific to the conclusion being questioned. This happens after the initial system does its thing; it’s a post-hoc approximation, not a true accounting of why the initial system actually did what it did. Babic and Cohen illustrate this using an extended hypothetical admissions model for a hypothetical law school which shows the pitfalls and why they matter.

(As an aside, this bit demonstrates a real strength of the piece: its comprehensibility on complex topics. There’s a tension in law review articles: They need to speak to generalist readers (including the law students who do selection and editing, as well as scholars in adjacent fields), but they also need to move the ball forward for expert readers who are already in the conversation. It’s tough to do this well; typical approaches include neglecting one task or writing very long pieces with lots of detailed background to get the nonexpert up to speed. Both can be frustrating. Babic and Cohen smoothly walk this dual path, in part by using a sort of Choose-Your-Own-Adventure structure in the Background. ‘Here’s the math,’ they say, ‘but if you’d like, feel free to skip ahead to the intuitive example where we make it easy to understand.’)

Because AI explanations are simplified post-hoc approximations, they’ve got some real problems. They’re “insincere,” Babic and Cohen argue, in that they’re plausible reasons that the system might have used to make a particular decision (in the example, admitting a prospective student or not). But there’s no guarantee that they’re the actual reason. Indeed, there couldn’t be such a guarantee, because the whole point of post-hoc explanations is that they’re simple enough to be understood, when the whole reason we need post-hoc explanations in the first place is that the actual AI system being used isn’t simple enough to be understood. There’s a gap by definition. And so these answers aren’t sincere.

That post-hoc insincerity is a real problem for AI explanations for three big reasons. First, if an AI explanation doesn’t tell the actual reason for a decision, the affected party can’t know what to change to alter the outcome for next time (as an alternative, some have suggested systems for playing around with lots of possibilities to try to figure that out). It’s not an “action guiding” explanation if it can’t reliably guide action, something it’s often hoped explanations will do. If you’re trying to find out why your date to the movies is late and you only get a plausible explanation rather than the actual explanation, it’s hard to know whether to bail, buy a ticket for a later show, or get snacks because they’re on their way. (The article is spangled with delightful, intuitive examples that make tough concepts easier to understand, from Maverick and Goose piloting fighter jets to too-short dates to unethical test-ordering doctors; it’s a real strength.) More seriously, if someone gets denied parole and told a plausible reason that might or not be the real reason, it’s tough to know what to do to improve their chances for next time.

The second big problem with insincerity is trust. One touted benefit of explainability is that if people affected by AI systems understand their reasons, they’ll trust the AI systems (and the human systems in which they’re embedded) more. Transparency matters, and that includes knowing how decisions were reached. But if explanations are insincere and inaccurate, that’s likely to destroy trust in the system, not build it. This, Babic and Cohen point out, is especially likely because explainable AI comes up with different explanations for different individual decisions—and if the subjects of those decisions can share stories, they might find pretty quickly that they were given different decision rules.

Third and finally, it’s important to evaluate AI systems’ decision rules, because many rules aren’t OK. If a post-hoc, insincere explanation doesn’t reliably reflect the actual decision rule, it’s not a useful path to evaluate whether that rule is racist or sexist or otherwise unacceptable (which is disturbingly often the case).

The problems Babic and Cohen highlight matter because AI is incorporated into a broader range of contexts and decisions. When they wrote this piece in the hoary days of 2023, generative AI was still relatively new, and they focused accordingly on classification algorithms. But the problems of explanation remain, not only with those older systems but also with generative AI. Indeed, users can ask a chatbot why it said what it said. Trusting the answer is another matter. These issues aren’t going away.

So what’s to be done? There’s always the hope for a technological deus ex machina that makes all the black-boxes transparent and explicable; that’d be lovely but seems unlikely, at least in the near term, whether because it’s computationally very expensive to peer inside even simple black boxes or because some black box mechanics simply aren’t explicable. Instead, Babic and Cohen argue, we need to face up to the reality that explainable AI can’t really do all that’s asked of it. In some circumstances, that means we need to rely on interpretable AI or algorithms instead (simpler models we can actually understand); the Fair Credit Reporting Act takes this approach, for instance. Where procedural justice or democratic freedom are at stake, we truly need to understand why decisions are reached. In other contexts, we might be willing to sacrifice understanding in service of better performance; many medical AI systems might fall into this bucket. In any case, we should be clear-eyed about what we’re doing. With Babic and Cohen’s sharp and cogent explanation of explainability, that’s an easier task to undertake.

Cite as: Nicholson Price, The Problem of Insincere, Post-Hoc AI Explanations, JOTWELL (June 20, 2025) (reviewing Boris Babic & I. Glenn Cohen, The Algorithmic Explainability "Bait and Switch", available at SSRN (August 20, 2023)), https://cyber.jotwell.com/the-problem-of-insincere-post-hoc-ai-explanations/.

The Problem of Insincere, Post-Hoc AI Explanations

Submit a Comment Cancel reply

INSIDE JOTWELL

Sponsored By

SECTIONS

Editor in Chief

Section Editors

CONTRIBUTING EDITORS

Student Editors

Feeds & Subscriptions

Get Email Updates

Search