Technology Law

When Law is Code

Jul 31, 2024 James GrimmelmannAdd a Comment

Sarah B. Lawsky, Coding the Code: Catala and Computationally Accessible Tax Law, 75 SMU L. Rev. 535 (2022).

Sarah B. Lawsky’s Coding the Code: Catala and Computationally Accessible Tax Law offers an exceptionally thoughtful perspective on the automation of legal rules. It provides not just a nuanced analysis of the consequences of translating legal doctrines into computer programs (something many other scholars have done), but also a tutorial in how to do so effectively, with fidelity to the internal structure of law and humility about what computers do and don’t do well.

Coding the Code builds on Lawsky’s previous work on formal logic and its advantages for statutory interpretation. (Formal logic, sometimes called “symbolic” or “mathematical” logic, involves the precise and rigorous analysis of symbolic expressions representing arguments, such as “p & ¬q” to mean “p is true and q is not true”.) In her 2017 A Logic for Statutes, she observed that many statutory provisions have a characteristic structure: rules subject to exceptions. A typical rule says that WHEN certain conditions are satisfied, THEN certain consequences follow, UNLESS one of several exceptions applies. Exceptions have exceptions of their own: interest payments are deductible, unless they are personal, unless they are mortgage payments.

Lawsky’s great insight about law and logic is that this characteristic structure of nested exceptions is most naturally modeled using a branch of formal logic called “default logic.” Default logic, unlike standard “monotonic logic,” allows for tentative conclusions. On the basis of what I know now, this is a nondeductible personal interest payment, but let me investigate further, and oh, I see that this is qualified residence interest, so I am withdrawing my tentative conclusion and replacing it with another tentative conclusion that the payment is deductible. And so on, until there are no more clauses of the statute to check, no more exceptions to explore, and the most recent tentative conclusion becomes a definitive one. It is a process of successive refinement, converging on certainty. Monotonic logic, by sharp contrast, requires ruling out all possibilities before drawing a conclusion, which remains valid for all time once drawn.

Default logic is not more powerful than standard logic, but for some kinds of reasoning it is cleaner, and Lawsky’s point is that the back-and-forth of exceptions and subexceptions in statutory analysis maps naturally onto default logic’s structure of defaults and defeats. A formal logician applying a default logic’s inference rules follows a reasoning process that naturally corresponds to the reasoning process followed by a lawyer working through a statute.

Default logic is also a good tool for programming. (It is a formal logic, after all.) Once a human has translated a natural-language statute into a formal-logic representation, it becomes possible to reason automatically and algorithmically about the statute and how it treats various fact patterns. In 2021, a trio of computer scientists—Denis Merigoux, Nicolas Chataing, and Jonathan Protzenko—published Catala: A Programming Language for the Law, which turned Lawsky’s default-logic analysis of statutes from an abstract formalism useful for pencil-and-paper analysis into a concrete implementation useful for programming. (Lawsky herself is now a co-designer of the Catala language.)

As someone who sank years of his life into programming a body of law, I can say that Catala is the cleanest and most broadly useful advance towards making law programmable I have ever seen. A 29-page paper filled with equations and code blocks may be quite daunting (our research group took several weeks to read through the formalisms together in detail), but the basic idea of what it does is beautifully simple and clear. Catala allows a programmer to write the way that lawyers think: by laying out rules that trigger consequences, together with the exceptions that can prevent the consequences from happening.

Tax law in particular has two advantages that make it well-suited for this kind of formalization. First, it depends on—and attempts to produce—clear and determinate answers. Everything comes down to, or should, a specific amount due. And second, much of tax law is what a programmer would “declarative” rather than “imperative”; instead of telling people what to do, it describes the consequences of what they have already done. The Internal Revenue Code, as Lawsky has shown, comprises declarative provisions that are particularly clean to implement in a Catala-style language that uses defaults and exceptions. Taking advantage of this affinity between tax law and programming languages, Merigoux and Protzenko, along with Raphaël Monat, have been developed a toolchain to help the French tax authority modernize its antiquated systems. Their work puts directly into practice Lawsky-ian ideas about the value of clean formal reasoning to improve the application of tax statutes.

Coding the Code is in many ways the summa of Lawsky’s project over the last decade. After an accessible introduction to default logic, other portions of Coding the Code draw on what Lawsky has been up to lately: actually using Catala to code up tax law. She and her collaborators have approached the task with humility and care—virtues that Lawsky describes the need for in interdisciplinary collaborations in the recently-published Computational Law and Epistemic Trespassing. One approach they use is “pair programming”: a lawyer and a computer scientist sit side by side at one computer, discussing a statutory section and making sure that they agree on its translation into code. Another is “literate programming”, in which code is interwoven with comments that document what each part of it is doing. For statutory translations, these comments can include the statutory text itself, making the isomorphism between specification (the statute) and implementation (the code) wholly explicit. Neither pair programming nor literate programming directly affects what the code-ified version of the law does; instead, they are tools to make sure that the people who do the translation do so faithfully, in a way that others who come later can recognize as correct. (Lawrence Lessig, patron saint of code-as-law, would approve.)

Coding the Code, like the rest of Lawsky’s work, stands out in two ways. First, she is actively making it happen, using her insights as a legal scholar and logician to push forward the state of the art. Her Lawsky Practice Problems site—a hand-coded open source app that can generate as many tax exercises as students have the patience to work through—is a pedagogical gem, because it matches the computer science under the hood to the structure of the legal problem. (Her Teaching Algorithms and Algorithms for Teaching documents the app and why it works the way it does.)

Second, Lawsky’s claims about the broader consequences of formal approaches are grounded in a nuanced understanding of what these formal approaches do well and what they do not. Sometimes formalization leads to insight; her recent Reasoning with Formalized Statutes shows how coding up a statute section can reveal unexpected edge cases and drafting mistakes. At other times, formalization is hiding in plain sight. As she observes in 2020’s Form as Formalization, the IRS already walks taxpayers through tax algorithms; its forms provide step-by-step instruction for making tax computations. In every case, Lawsky links carefully links her systemic claims to specific doctrinal examples. She shows not that computational law will change everything, but rather that it is already changing some things, in ways large and small.

It is unusual for an established law professor to go back to school for a PhD. In philosophy. With a dissertation on formal logic. But Coding the Code, published five years after Lawsky submitted her (highly technical) thesis, shows the great value for legal scholars of the approach she developed in her PhD. It refines her distinctive approach to statutory analysis—which mixes careful legal reading with technical tools from formal logic and computer science—in a way that has great potential to help other lawyers and legal scholars be more precise about what tax laws say. All they need to do is talk to computer scientists, and Lawsky provides a roadmap for how. There is no epistemic trespassing in Sarah Lawsky’s work. Everywhere she goes, she is a welcomed guest.

Cite as: James Grimmelmann, When Law is Code, JOTWELL (July 31, 2024) (reviewing Sarah B. Lawsky, Coding the Code: Catala and Computationally Accessible Tax Law, 75 SMU L. Rev. 535 (2022)), https://cyber.jotwell.com/when-law-is-code/.

Getting Real About Protecting Privacy

Jun 5, 2024 Scott Skinner-ThompsonAdd a Comment

Ignacio Cofone, The Privacy Fallacy: Harm and Power in the Information Economy (2024).

Scott Skinner-Thompson

The power of surveillance capitalists to know us, nudge us, and exploit us continues to expand with alacrity despite Europe’s General Data Protection Regulation (GDPR) and state privacy regulations within the United States. Likewise, consequential and concrete privacy harms (such as sexual privacy violations) and, at first glance, more ethereal but also troublesome privacy losses (such as data breaches) remain under-compensated or under-redressed. Why? And what can be done about it? Ignacio Cofone’s incredible new book, The Privacy Fallacy: Harm and Power in the Information Economy, answers those two questions with a healthy dose of realism about the stakes, the need for substantive command-and-control regulation of the information economy, and the imperative of compensatory liability.

In this beautiful and accessible book Cofone explains that, to date, efforts to rein in data exploitation and redress privacy harms have fallen short because of a few underlying fallacies about how information ecosystems work. One of those misconceptions is law’s failure to understand that many egregious privacy harms, such as online sexual harassment, don’t occur because of just one or two bad actors. Rather, the very existence of data ecosystems makes those harms possible in the first instance and then magnifies them. Glossing over the role of the ecosystem (and the companies that create that ecosystem) in perpetuating these privacy harms means that law-reform efforts targeted toward the initial privacy violator under-deter such conduct and stymie efforts to, in effect, put the cat back in the bag. To address the systematic nature of privacy harms, Cofone boldly—but rightly—suggests that modified private causes of action need to be created through statute or tort law to target the myriad corporations that enable such privacy violations. Importantly, Cofone also explains why class actions are an important part of the solution in order to provide for more efficient and comprehensive liability.

A second fallacy is the belief that procedural justice efforts to amplify users’ control over their data through notice and/or consent regimes will somehow mitigate data losses. Drawing from behavioral economics, Cofone shows how such approaches—embodied in laws like the GDPR—will never be more than a band-aid that lends a false sense of security to users, while allowing surveillance capitalists to harvest information at will. He rightly explains that individuals who “consent” to privacy-invading policies aren’t necessarily apathetic about privacy; rather because of the daunting and incomprehensible nature of privacy notices, unforeseeable risk, and manipulative design choices, they are cajoled into surrendering their privacy. To overcome this fallacy, Cofone grabs the bull by the horns and makes a persuasive case for command-and-control regulation. That is, Cofone explains why simply holding surveillance capitalists to the weak promises contained in the privacy notices/policies won’t ever be sufficient because users don’t have the bandwidth to meaningfully police the content of those polices through notice/choice. As such, the government needs to do the policing for them—needs to dictate the substantive contours of privacy protections, rather than leaving it to the consumers and the market, which will always favor powerful tech companies.

In terms of the substance of those privacy regulations, Cofone also explains why some of the existing privacy regulations have fallen short over time: they focus on a particular technology (video rentals, cookies, etc.) rather than “the underlying relationship intermediated by the technology (surveillance).” As such, Cofone advocates the implementation of rigorous but flexible standards for digital privacy protection that can adapt as technology evolves, helping to prevent law from always playing catch-up to tech.

Make no mistake, Cofone’s proposals for substantive privacy regulation of and remedial causes of action against information capitalists are big. And difficult. They will require significant political will and capital to implement. But to the extent that society values privacy and individual freedom as our lives continue to become digitally mediated, then lawyers, politicians, and voters will need to put their resources behind meaningful substantive privacy laws and meaningful compensatory regimes that target the corporate information ecosystems themselves. In a nutshell, piecemeal procedural interventions and remedies against individual privacy violators are simply not going to cut it. Refreshingly, Cofone’s book gets real about these problems while offering real solutions—if we have the courage to implement them.

Cite as: Scott Skinner-Thompson, Getting Real About Protecting Privacy, JOTWELL (June 5, 2024) (reviewing Ignacio Cofone, The Privacy Fallacy: Harm and Power in the Information Economy (2024)), https://cyber.jotwell.com/getting-real-about-protecting-privacy/.

Can Informed Consent Solve AI Bias?

May 7, 2024 Nicholson Price1 comment

Khiara M. Bridges, Race in the Machine: Racial Disparities in Health and Medical AI, 110 Va. L. Rev. 243 (2024).

Nicholson Price

Artificial intelligence (AI) is moving increasingly rapidly into health care (as indeed into everything else). But it has problems there (as indeed everywhere else!). What’s to be done, in particular, about the deeply embedded biases along racial and other lines that permeate the whole world of health and, as such, are likely to be encoded in AI?

Khiara Bridges gives an answer that seems mild but carries roots of revolution. In Race in the Machine: Racial Disparities in Health and Medical AI, she argues that informed consent is a key lever to pull in fighting these racial disparities. But not because informed consent—at present, mostly a formality, a begrudging nod to autonomy—will fix the problem in its current state. Instead, Bridges argues, informed consent, beefed up and focused on conveying the brutal truth about encoded racial disparities, can form the foundation for revolutionary social changes in health care, health, and beyond. Curious? Read on!

The first half of the article comprises Parts I-III. These parts aren’t breaking too much new ground, but they do an excellent job bringing together the literature, often including a host of data and examples, to make their own cases—each of which is prerequisite to the piece’s second half. Part I covers the landscape of health and health-care bias. Part II does AI. And Part III brings the first two together to describe bias in medical AI. To elaborate a bit: Part I traces the causes of different health outcomes for marginalized groups (e.g., substandard housing, poverty, persistent stress caused by racism) and the different treatment of marginalized groups by the medical system (e.g., doctors offering different treatments to Black patients than to White patients). It’s replete with infuriating examples, most reflecting the endemic bias experienced by Black patients in America and many referring to Black maternal health care (the subject of a prior Bridges opus). Part II provides a basic primer on AI (probably skippable by AI-conversant readers, but otherwise a helpful foundation), a handy overview of AI in medicine (again, skippable if familiar, but that’s far fewer folks), and a discussion of the potential uses of medical AI in prenatal care—this last a bringing together of technologies and the medical literature that is both novel and insightful. In Part III, discussing bias in medical AI, Bridges explores pre-mapped territory, but she walks it carefully and thoughtfully and shines new light on it, including through trenchant examples from prenatal care. She lays out different sources of bias—design choices, inadequate data, data that accurately reflect inequitable systems, and the pernicious encoding of race (often deliberate). After this recounting, Bridges emphasizes an underappreciated point: the problem isn’t just that AI will encode existing biases—it’s that it will wrap them in “the veneer of objectivity,” thus leaving minorities ultimately worse off—because they suffer the same injustice, but this time it’s coded as just the machine being impartially correct (what Ifroma Ajunwa dubs “data objectivity”).

The second half (Part IV) is the heart of the article; it starts out smart and interesting and reasonable and winds up smart and interesting and audacious. In a good way!

Starting with the new and smart but not socks-removing: Bridges presents convincing evidence that people of color may well not want to have medical AI involved in their care. She recounts studies of “algorithmic aversion,” where folks are, well, averse to using or trusting algorithms—and demonstrates how this is likely to be particularly forceful for Black patients and medical AI. It’s not just the legacy of Tuskegee, she recounts; it’s the pervasive and ongoing evidence of bias, inequality, and inequity in the health systems of today that compromises trustworthiness. (Indeed, AI may worsen this dynamic.)

So what’s the intervention? Bridges argues for informed consent—telling all patients, but especially those of color, not only that AI is being used in their care, but that the AI is very likely to be biased, based on the deep multimodal biases embedded in health. Why? Well, for starters, there’s the classic story that informed consent respects autonomy, and patients (especially of color) would want to know, so physicians should tell them. And that’s likely enough.

But there’s more, and here’s where it gets radical. Bridges draws on literature grounding informed consent in the Nuremberg Trials and recasting it “as a rebuke of Nazi ‘medicine,’ eugenics, anti-Semitism, racism, and white supremacy.” (P. 316.) This “rebelliousness” underlying informed consent, Bridges forcefully posits, can be grounds for a broader social revolution—if we tell patients, truly and meaningfully, about the inequities embedded in the system and informing their care, that may plant the seeds for making society actually better—a goal vastly preferable to the Faustian outcome where somehow-improved algorithms paper over unaddressed social inequities.

It’s a provocative, fascinating, and persuasive argument. (I’m primed to resist—I’m part of the benighted crew that’s argued that informed consent probably isn’t specially needed for medical AI—but I’m far less complacent than I used to be.)

And of course Bridges’ argument stretches beyond AI in medicine, to reach medical care generally and indeed areas outside medicine. Bridges mentions this, and if it’s not deeply explored in the piece, that may be because the piece is already just shy of a hundred pages. But one can see the radical, beguiling arguments reaching forward. And that’s a gratifying uneasiness with which to leave this challenging and excellent piece.

A postscript: It’d be a shame to review the article without mentioning Bridges’ engrossing prose. Some sesquipedalian sentences are pure pleasure to peruse: “These defiant, revolutionary origins have been expunged from the perfunctory form that the informed consent process has taken at present.” (P. 250.) Some paragraphs are short and pungent; in others, Bridges deploys an avalanche of data and studies. One crushing paragraph in the introduction has a relentless and irresistible litany of “for examples”’s, each a drumbeat pounding home her point about embedded disparity. It’s a pleasure to move through the weighty arguments with Bridges’ writing carrying you along.

Cite as: Nicholson Price, Can Informed Consent Solve AI Bias?, JOTWELL (May 7, 2024) (reviewing Khiara M. Bridges, Race in the Machine: Racial Disparities in Health and Medical AI, 110 Va. L. Rev. 243 (2024)), https://cyber.jotwell.com/can-informed-consent-solve-ai-bias/.

Breathing Feminism into the Machine

Apr 3, 2024 Ari WaldmanAdd a Comment

Feminist Cyberlaw (Meg Leta Jones & Amanda Levendowski eds., forthcoming 2024).

Ari Waldman

A few years ago, the CBS show 60 Minutes ran a segment about algorithmic bias. Among other things, the segment told the story of a Black man in Detroit who had been misidentified by a facial recognition algorithm as the thief who had been caught on video stealing $4,000 worth of watches. Anderson Cooper interviewed the victim, his wife, the Detroit police chief, a lawyer at Georgetown’s Center on Privacy and Technology, and a computer scientist at the National Institute of Standards and Technology (NIST). The NIST computer scientist, Patrick Grother, was interviewed because “[e]very year, more than a hundred facial recognition developers around the world send his lab prototypes of test for accuracy.” For 60 Minutes that day, Mr. Grother was the expert on these algorithms.

No offense to Mr. Grother, who I’m sure is a smart guy, but the real experts were missing. Mr. Cooper did not interview the Black women who first brought the problem of algorithmic bias in facial recognition to academic and public attention: Joy Buolamwini, Timnit Gebru, and Deborah Raji. The incident captured in just a few minutes the long history of erasing the contributions of women and of women of color in particular. NIST, which had cited Buolamwini, Gebru, and Raji in its report, got its moment in the sun but never mentioned the underlying research; the police chief got his chance to cast the wrongful arrest as an isolated mistake. The Black women who founded the field were ignored, a fact even more remarkable given that 60 Minutes researchers and producers spent time with Buolamwini in preparation! That this happened in 2021 highlights the desperate need for Meg Leta Jones’s and Amanda Levendowski’s edited volume, Feminist Cyberlaw.

With contributions from established and up-and-coming scholars, including, among others, Elizabeth Joh (UC Davis), Ngozi Okidegbe (Boston University), Hannah Bloch-Wehba (Texas A&M), Jasmine McNealy (University of Florida), Alexandra Roberts (Northeastern), and Anjali Vats (University of Pittsburg), Feminist Cyberlaw aims to inform and challenge. It informs by looking at cyberlaw issues like ownership, privacy, the First Amendment, labor, and security through a feminist lens. It challenges its readers to consider not just how feminist theory reconsiders old internet law debates, but how the work of cyberlaw over the last few decades has marginalized feminist perspectives.

In what ways have feminist perspectives been marginalized? The cyberlaw canon, which Professor Jones frames in her introduction as advocating a kind of universality and perfect equality, elided the connective tissue between cyberspace and the physical world. Feminist work, particularly feminist critiques of Science and Technology Studies, inhabits that connective tissue and, thus, gives the readers of Feminist Cyberlaw a new way of looking at technology. It gives voice to the people who may not be in the room when technology is designed but who are most definitely acted upon and act with technologies both designed for them and without them.

So, Feminist Cyberlaw gives us context. It also centers the lived experiences of women, those living with disabilities, queer people, gender variant populations, and women of color. Novelty and generativity may have been the watchwords of the cyberlaw canon, but feminist approaches to technology surface the real-world effects of innovation, law, and institutions that have never been and never will be borne equally. Notice, too, that Feminist Cyberlaw is not limited by narrowness of first wave feminism. The “feminism” in Feminist Cyberlaw is hooksian, that is, indebted to people like bell hooks, who saw recognition of the many different ways different women are subordinated by different systems of oppression, including white supremacy, heteronormativity, ableism, ageism, and more.

What I like most about Feminist Cyberlaw is its diversity of perspectives within an overarching frame. Although broadly clustered into sections on ownership, access, and governance, contributions to the volume range from intellectual property to data security, from the First Amendment to labor, from the Global South to China. The authors are law professors, practitioners, independent researchers, and community organizers. Elizabeth Joh’s chapter on the effects of eliminating a federal right to abortion in a world of ubiquitous surveillance serves as an important reminder that the Supreme Court’s decision in Dobbs v. Jackson Women’s Health does not mark a return to the pre-Roe v. Wade world, but rather places those who can get pregnant in a state of perpetual risk. Michela Meister and Karen Levy go even further in their excellent essay on what feminist lens can teach us about privacy invasions. They show that Dobbs is actually a broader case study in the ways in which women’s loss of control over their bodies is part of a broader pattern of patriarchal surveillance. Gabrielle Rejouis’s essay on taking an intersectional lens to content moderation is a welcome contribution to a content moderation literature that is too often satisfied with elite liberals’ best efforts.

I also appreciate Professor Jones’s eagerness to situate Feminist Cyberlaw in its own historical context. She places the text on the shoulders of a litany of critical scholars, many of them women, who have been challenging patriarchy, oligarchy, and heteronormativity in law and technology for decades. Julie Cohen, Danielle Keats Citron, Catherine MacKinnon, and Anita Allen deserve particular mention here. Professor Allen, who has been writing about cyberlaw broadly defined for nearly forty years, must be considered among the original canonical scholars of cyberlaw. That she is often defined—including in this text—not as canon but as an early critical voice responding to likes of John Perry Barlow and Larry Lessig (Professor Jones leaves out the late Joel Reidenberg)—is itself a testament to how academia and academic culture view the contributions of Black women. Julie Cohen has been offering polymathic articles and treatises on intellectual property and privacy for decades, as well, challenging the tech bro-y insistence that cyberspace is somehow above the law, above the physical world, and above regulation. Professor Cohen’s latest book, Between Truth and Power, is a political, legal, and economic tour through the ways law has been integral to the formation of informational capitalism’s hierarchies. Feminist Cyberlaw takes many of the broad themes of Between Truth and Power and of Professor Citron’s groundbreaking work on intimate privacy and establishes them as Feminist—with a capital F—inspirations.

By the time readers finish Feminist Cyberlaw’s economical and digestible essays, they will come away with an appreciation for traditionally marginalized voices and more questions to pursue in their own research. Feminist Cyberlaw is a call to action to build from critique to change, to use the feminist lens to develop new paths forward in the information age. Whether that means assessing the value of proposals against how they rebalance power or centering the experiences of women as the future of law and technology, Feminist Cyberlaw is a profoundly important step in a positive direction.

Cite as: Ari Waldman, Breathing Feminism into the Machine, JOTWELL (April 3, 2024) (reviewing Feminist Cyberlaw (Meg Leta Jones & Amanda Levendowski eds., forthcoming 2024)), https://cyber.jotwell.com/breathing-feminism-into-the-machine/.

Debunking AI’s Supposed Fairness-accuracy Tradeoff

Mar 7, 2024 Paul OhmAdd a Comment

Emily Black, John Logan Koepke, Pauline T. Kim, Solon Barocas & Mingwei Hsu, Less Discriminatory Algorithms, __ Geo. L.J. __ (forthcoming); Wash. U. Legal Studies Rsch. Paper (forthcoming), available at SSRN (Oct. 2, 2023).

Paul Ohm

We are likely to see as many new law review articles about artificial intelligence and law in the next year as the legal academy has produced since the dawn of AI. Those writing about AI for the very first time (*eyes a bunch of copyright scholars suspiciously*) would do well to engage deeply with the work of bias and discrimination scholars who have been writing some of the best, most insightful articles about AI and law for more than a decade. The frameworks and insights they have developed give us a way of thinking about AI law and policy beyond just considerations about Title VII and Equal Protection. A wonderful place to start is with the best contribution to this scholarship I have seen in years, Less Discriminatory Algorithms, by an interdisciplinary team of technical, policy, and legal experts.

This article is a form of my favorite genre of legal scholarship: “you all are making an important mistake about how the technology works.” In particular, it takes on the received wisdom that there is a vexing tradeoff between fairness and accuracy when training machine learning models. This supposed tradeoff—that in order to make a biased algorithm fairer, you need to sacrifice some of the model’s accuracy—may be true in theory for idealized and never-seen-in-the-wild maximally accurate models. But nobody ever has the time, compute, money, or energy to even approach this ideal, meaning real models out in the world are far from maximally accurate. People instead declare success as soon as their models are just accurate enough for their purposes.

The thing about these less-than-maximally-accurate models is that there are many other possible models the model builders could have trained that would have been nearly as accurate, while doling out different false positives and false negatives. Some of these nearly-as-accurate models would likely have better distributional outcomes for protected classes and other vulnerable populations. This is what two of the article’s authors, Emily Black and Solon Barocas, have dubbed “model multiplicity” in the computer science literature. Model multiplicity means that if those training a model would continue to experiment—tweaking a hyperparameter here, selecting or deselecting a feature there, or preprocessing the data a little bit more—they would probably find many other equally (or nearly as) accurate models that allocated the winners and losers differently.

Importantly, many of these other nearly-as-accurate models might also be more fair, less biased, and less discriminatory than the one actually deployed. Rather than saying that there is a tradeoff between fairness and accuracy, we should instead understand that the tradeoff is between accepting a given level of bias or discrimination versus spending a little more time, money, and (carbon emitting) compute to find an alternative that improves on fairness without sacrificing accuracy.

Happily, Black and Barocas have found a brilliant and interdisciplinary team of coauthors: noted antidiscrimination scholar Pauline Kim and two amazing tech-meets-law researchers from the incredible nonprofit Upturn, Logan Koepke and Mingwei Hsu. The group successfully connects these engineering insights to legal doctrine. Model multiplicity’s recasting of the fairness-accuracy tradeoff has a direct bearing on the so-called third step of Title VII’s disparate impact analysis: the requirement that plaintiffs demonstrate a less discriminatory alternative (LDA). (I’m omitting the article’s careful discussion of how other civil rights statutes, including ECOA and the FHA, handle disparate impact analysis a bit differently, but to the same general effect.) This means that plaintiffs might be able to win in the third step of the test, by citing this paper and bringing on an expert who can help find the fairer, just-as-accurate model-not-taken.

Finding the less discriminatory alternative would still be a daunting path. It would require a plaintiff to have access to the exact training environment the defendant used—including all of the training data, which defendants are sure to resist producing. If plaintiffs clear this formidable discovery challenge, they will still need to spend a lot of time and money to find the less discriminatory alternative. All of this might be insurmountably burdensome for plaintiffs, limiting the change that model multiplicity might bring to civil rights law.

The authors, however, have both technical and legal responses to these evidentiary challenges. If the burden were instead placed on the defendant at step two to show that they looked for less discriminatory alternatives during training, it would avoid the inefficiency, delay, and discovery hassles associated with burdening the plaintiff after the fact. Kim and her coauthors find judicial precedent for imposing this obligation on defendants at step two, while conceding that this interpretation has not yet been broadly adopted. Importantly, once this approach is understood to be required by civil rights law, it will impose on model builders the duty to start looking for these alternatives during training. According to model multiplicity, they will often find them! The result will be more model builders finding and deploying less discriminatory alternatives, meaning the win-win of fewer people suffering from discrimination and less litigation exposure for employers!

I like (lots!) so much about this article. It imports a simple but powerful and elegant technical insight—model multiplicity—into legal scholarship. It does so while also advancing a much-needed reform in the way we interpret civil rights law, putting the burden of looking for less discriminatory alternatives on defendants rather than plaintiffs. It is grounded and specific, explaining exactly why this should happen and how the three-step test and the correlative duty to search for less discriminatory alternatives ought to be interpreted; I can imagine courts across the country implementing this article’s prescriptions directly. It does not shy away from technical and legal detail, taking deep dives into the techniques used to find a less discriminatory alternative, how nearly accurate a less discriminatory alternative must be, or how much effort a model builder must spend in trying to find one, to list only three examples.

Finally, legal scholars must understand that model multiplicity matters beyond civil rights law. An obvious extension is to products liability law. Model multiplicity will provide a pathway for injured plaintiffs looking for a “reasonable alternative design” to support claims of defective products design.

We will soon see AI models that do not live up to our legal and societal ideals in many other ways, such as large language models that spout misinformation, deep fake models used to terrorize women, and facial recognition models that destroy privacy. As we fight over what we should do about the bad effects of models, we should understand that they are often the effects of bad models. Model multiplicity means refusing to let model builders rest on the expediency of having done the bare minimum. Keep trying; keep searching; keep looking, and you just may find models that both do well and do good.

Cite as: Paul Ohm, Debunking AI’s Supposed Fairness-accuracy Tradeoff, JOTWELL (March 7, 2024) (reviewing Emily Black, John Logan Koepke, Pauline T. Kim, Solon Barocas & Mingwei Hsu, Less Discriminatory Algorithms, __ Geo. L.J. __ (forthcoming); Wash. U. Legal Studies Rsch. Paper (forthcoming), available at SSRN (Oct. 2, 2023)), https://cyber.jotwell.com/debunking-ais-supposed-fairness-accuracy-tradeoff/.

Risky Speech Systems: Tort Liability for AI-Generated Illegal Speech

Feb 6, 2024 Margot KaminskiAdd a Comment

Jane Bambauer, Negligent AI Speech: Some Thoughts about Duty, 3 J. Free Speech L. 344 (2023).
Nina Brown, Bots Behaving Badly: A Products Liability Approach to Chatbot-Generated Defamation, 3 J. Free Speech L. 389 (2023).

Margot Kaminski

How should we think about liability when AI systems generate illegal speech? The Journal of Free Speech Law, a peer-edited journal, ran a topical 2023 symposium on Artificial Intelligence and Speech that is a must-read. This JOT addresses two symposium pieces that take particularly interesting and interlocking approaches to the question of liability for AI-generated content: Jane Bambauer’s Negligent AI Speech: Some Thoughts about Duty, and Nina Brown’s Bots Behaving Badly: A Products Liability Approach to Chatbot-Generated Defamation. These articles evidence how the law constructs technology: the diverse tools in the legal sensemaking toolkit that are important to pull out every time somebody shouts “disruption!”

Each author offers a cogent discussion of possible legal frameworks for liability, moving beyond debates about First Amendment coverage of AI speech to imagine how substantive tort law will work. While these are not strictly speaking First Amendment pieces, exploring the application of liability rules for AI is important, even crucial, for understanding how courts might shape First Amendment law. First Amendment doctrine often hinges on the laws to which it is applied. By focusing on substantive tort law, Bambauer and Brown take the as-yet largely abstract First Amendment conversation to a much-welcomed pragmatic yet creative place.

What makes these two articles stand out is that they each address AI-generated speech that is illegal—that is, speech that is or should be unprotected by the First Amendment, even if First Amendment coverage extends to AI-generated content. Bambauer talks about speech that physically hurts people, a category around which courts have been conducting free-speech line-drawing for decades; Brown talks about defamation, which is a historically unprotected category of speech. While a number of scholars have discussed whether the First Amendment covers AI-generated speech, until this symposium there was little discussion of how the doctrine might adapt to handle liability for content that’s clearly unprotected.

Bambauer’s and Brown’s articles are neatly complimentary. Bambauer addresses duties of care that might arise when AI misrepresentations result in physical harm to a user or third parties. Brown addresses a products-liability approach to AI-generated defamation. Another related symposium piece that squarely takes on the question of liability for illegal speech is Eugene Volokh’s Large Libel Models? Liability for AI Output. The Brown and Bambauer pieces speak more directly to each other in imagining and applying two overlapping foundational liability frameworks, while Volokh’s piece focuses on developing a sui generis version of developer liability called “notice-and-blocking” that he grounds in Brown’s idea of using products liability as a starting point. That is, Bambauer and Brown provide the necessary building blocks; Volokh’s article is an example of how one might further manipulate them.

Bambauer writes of state tort liability, as it might be modified by state courts incorporating free speech values. She explains that she has “little doubt that the output of AI speech programs will be covered by free speech protections” (P. 347) (as do my co-authors and I) but also that “the First Amendment does not create anything like an absolute immunity to regulatory intervention,” especially when it comes to negligence claims for physical harm. (P. 348.) Bambauer convincingly claims that the duty element of negligence is where the rubber will hit the road in state courts when it comes to determining the right balance between preventing physical harms and protecting free speech values. She identifies different categories of duty as an effective way of categorizing existing cases that address analogous problems (from books that mis-identify poisonous mushrooms as edible, to doctors who provide dangerously incorrect information to patients).

Bambauer divides her discussion of duty into three broad categories, followed by additional subcategories: 1) situations where AI systems provide information to a user that causes physical harm to that user; 2) situations where AI systems provide information to a user who then causes physical harm to a third party; and 3) situations where AI systems would have provided accurate information that could have averted harm, had a user consulted them (reminiscent of Ian Kerr’s and Michael Froomkin’s prescient work on the impact of machine learning on physician liability). Throughout, this article is logical, clearly organized, factually grounded, and neatly coherent, even where a reader might depart from its substantive claims.

These categories allow Bambauer to tour the reader through available analogies, comparing AI “to pure speech products, to strangers, or to professional advisors” and more. (P. 360.) If an AI system’s erroneous output is analogized to a book, Bambauer argues that developers will not and should not be found liable, as with a book that misidentified poisonous mushrooms as edible as in the Ninth Circuit’s Winter case. (Eerily, this exact fact pattern has already arisen with AI-generated foraging books.) If, under different factual circumstances, AI-generated content is more appropriately analogized to professional advice in a specialized domain such as law or medicine, there might be a higher duty of care. Or, courts might use a “useful, if strained analogy” of “wandering precocious children,” where parents/developers might be held liable under theories of “negligent supervision” for failing to anticipate where their child/generative AI might be doing dangerous things. (P. 356.) This might, Bambauer muses, nudge courts to focus on what mechanisms an AI developer has put in place to find and mitigate recurring harms. This is a classic “which existing analogy should apply to new things?” article, but done well. Others might take this logic further by pulling analogies from other spaces (I’m thinking here for example of Bryan Choi’s work on car crashes, code crashes, and programmers’ duties of care).

This takes us to Brown’s intervention. Brown examines defamation claims through a products liability lens, asking what interventions a developer might be required to take to mitigate the known risk of defamatory content. Brown starts with a summary of how chatbots work, so the rest of us don’t have to. (I will be citing this section often.) She quickly and clearly explains the defamation puzzle: that the current law focuses largely on the intent of the speaker/publisher of defamatory content. This approach runs into issues when we are talking about the developers of AI systems, who Brown argues will almost never have the requisite intent under current defamation law.

Brown then turns to dismantling hurdles to a products liability approach (is it a product? What’s the role of economic loss doctrine?). Readers may find this part more or less convincing, but resolving the hurdles (it’s a product, she thinks economic loss doctrine is not a problem) allows her to get to the really interesting part of the article: what substantive duties a developer might have, if AI-generated defamation gets framed as a products liability problem. Brown argues that “a design defect could exist if the model was designed in a way that made it likely to generate defamatory statements.” (P. 410.) She provides concrete examples grounded in current developer practices: the use of flawed datasets rife with false content; the prioritization of sensational content over accuracy; a failure to take steps to reduce the likelihood of hallucinations; a failure to test the system.

I’m still not sure a products liability approach will survive the Supreme Court’s recent emphasis on scienter in First Amendment cases, but one can hope. In several recent cases, most prominently in Counterman v. Colorado, the Supreme Court has insisted on a heightened intent standard for unprotected speech in order to protect speakers from a chilling effect that occurs if one cannot clearly determine whether one’s speech is unprotected or protected.¹ In Counterman, the unprotected speech category at issue was true threats, which the Court found could not be determined under an objective standard but required a query of speaker intent. The Court reasoned that a heightened intent standard creates a penumbra of protection for borderline speech that is close to but not unprotected speech—such as opinionated criticism of a public figure bordering on defamation, or vigorous political speech at a rally bordering on incitement. Brown presents the products-liability approach as a sort of hack to get around the specific intent requirement of “actual malice” for defamation of public figures (private figures require only negligence, but arguably a heightened form of it). She does not really inquire about whether this is possible—whether today’s Court, post-Counterman, would accept this move. I personally think there is space in the Court’s reasoning in Counterman for moving away from specific intent, but it would have been nice to know Brown’s thoughts.

Together, these two articles offer a trio of important contributions: foundations for First Amendment debates about unprotected speech and AI systems; creative but grounded ways of imagining duties of care in the context of developer liability (relevant, too, to evolving discussions of platform liability); and an important basis for discussions about the role of tort law in establishing risk mitigation for content-generating AI systems in the U.S. legal context. Regulators have increasingly defaulted to a regulatory approach to risk mitigation for AI systems, including or especially in the EU. If, as is likely, the United States fails to enact its counterpart to, the Digital Services Act (DSA), Europe’s massive new law regulating content moderation, tort law may be where AI risk mitigation plays out in the United States.

Counterman v. Colorado, 600 U.S. 66 (2023).

Cite as: Margot Kaminski, Risky Speech Systems: Tort Liability for AI-Generated Illegal Speech, JOTWELL (February 8, 2024) (reviewing Jane Bambauer, Negligent AI Speech: Some Thoughts about Duty, 3 J. Free Speech L. 344 (2023); Nina Brown, Bots Behaving Badly: A Products Liability Approach to Chatbot-Generated Defamation, 3 J. Free Speech L. 389 (2023)), https://cyber.jotwell.com/risky-speech-sys…d-illegal-speech/.

Centering Educational Institutions as Potential Sources of Student Privacy Violations

Jan 5, 2024 Stacy-Ann ElvyAdd a Comment

Fanna Gamal, The Private Life of Education, 75 Stan. L. Rev. 1315 (2023).

Stacy-Ann Elvy

Schools increasingly use various technologies to monitor and collect information about students. The COVID-19 pandemic, which led to a large number of school closures and a transition to online learning, has also raised alarming questions about student privacy. For instance, virtual software used during remote exams to monitor students can scan students’ bedrooms, collect data from the microphones and cameras of students’ computers, and discern students’ keystrokes. In her article, The Private Life of Education, Professor Fanna Gamal makes a noteworthy contribution to scholarship in the privacy law and education law fields by highlighting embedded assumptions and significant shortcomings in privacy law governing student data. In doing so, she advances existing debates on the legal conception of information privacy. Gamal argues that student privacy laws’ immoderate focus on nondisclosure of students’ data outside of the school context fails to effectively consider the various ways in which schools can serve as the primary perpetrators of student privacy violations. She further contends that schools’ data practices may have disproportionate negative implications for members of historically marginalized groups, such as disabled and low-income students.

Gamal expertly critiques the provisions of the Family Educational Rights and Privacy Act (FERPA). She argues that FERPA’s excessive focus on the prohibition of data disclosures outside of schools spuriously assumes that schools should, by default, receive treatment as privacy protectors that act in the best interest of students’ privacy. Gamal aptly acknowledges that FERPA’s heavy reliance on non-disclosure is not unique to American privacy law. However, after unpacking the legal conception of student data privacy, Gamal goes on to convincingly argue that student data privacy law also assumes that students do not have a significant privacy interest in “data creation, collection and recording.” (P. 1319.)

She posits that educational records contain data that “assumes an aura of [uncontestable] truth,” a truth that follows students indefinitely and can impact their lives well beyond the age of majority (P. 1319.) Gamal argues that there is a significant imbalance of power between students and schools. She contends that FERPA grants schools too much power to determine this truth and its life cycle while giving students and parents insufficient mechanisms to contest educational records that contain misleading or false truths. Gamal notes that even when parents have the ability to participate in hearings regarding students’ records, schools have excessive power in those hearings since parents and students have the burden of convincing the educational institution to amend educational records. Gamal suggests that privacy law unnecessarily shelters the internal data practices of educational entities from scrutiny, thereby permitting educational institutions to amass “power over the official archives that shape students’ lives” (P. 1318.) Educational institutions may use this power to infringe on students’ privacy.

Gamal perceptively highlights the impact of student privacy laws’ shortcomings on historically marginalized groups, such as disabled students. She convincingly argues that disability documentation is a “poor proxy for disability” and can further entrench pre-existing inequities (P. 1321.) Gamal admits that disability documentation may help to ensure special education resources go to students in need of such services, but she also notes that heightened documentation requirements may instead stem from “the fear of the ‘disability con,” that is, an irrational fear that some individuals may be dishonest about their disabilities (P. 1321.) She contends that the Individuals with Disabilities Education Act’s (IDEA) data disclosure requirements limit the ability of students who use special education services to obtain privacy from their educational institutions. In contrasting the educational records of non-disabled students and disabled students, Gamal observes that the records of disabled students contain data about their social, medical, physical, and mental diagnoses. Fear of the so-called disability con, Gamal contends, results in requirements that ignore the challenges individuals from marginalized groups may face, such as possible limited access to documentation providers. She also points out that students from racial minorities experience over-representation among special education groups and, as such, disproportionality fall subject to the heightened documentation processes required of students seeking access to special education services.

The well-written article concludes by offering a path forward. Gamal argues for expanding the concept of information privacy in the school setting via a collaborative process that gives voice to various stakeholders. She also proposes several amendments to FERPA, including correcting FERPA’s excessive reliance on non-disclosure outside of the school context, redefining the term “educational records,” and providing students and parents with better tools to amend and delete educational records. She also recommends limits on educational institutions’ power over their internal data practices. Gamal’s convincing description of the limits of the current legal framework regulating student privacy should capture the attention of privacy and educational law scholars interested in learning more about the ways in which narrow conceptions of information privacy can further cement institutional data practices that contribute to existing disparities.

Cite as: Stacy-Ann Elvy, Centering Educational Institutions as Potential Sources of Student Privacy Violations, JOTWELL (January 5, 2024) (reviewing Fanna Gamal, The Private Life of Education, 75 Stan. L. Rev. 1315 (2023)), https://cyber.jotwell.com/centering-educational-institutions-as-potential-sources-of-student-privacy-violations/.

Addressing the Modern Shamanism of Predictive Inferences

Nov 27, 2023 Mireille HildebrandtAdd a Comment

Hideyuki Matsumi & Daniel J. Solove, The Prediction Society: Algorithms and the Problems of Forecasting the Future, GWU Legal Studies Rsch. Paper (forthcoming), available at SSRN (June 5, 2023).

Mireille Hildebrandt

In their draft paper, The Prediction Society: Algorithms and the Problems of Forecasting the Future, Matsumi and Solove distinguish two ways of making predictions: “the first method is prophecy–based on superstition” and “the second is forecasting–based on calculation.” Initially, they seem convinced that the latter, calculative, type of prediction is more accurate and thus capable of transforming society as it shifts control over peoples’ future to those who develop or deploy such systems. Over the course of the paper, however, that distinction between deceptive prophecy and accurate prediction blurs. The authors make the argument that the pervasive and surreptitious use of predictive algorithms that target human behaviour makes a difference for a whole range of human rights beyond privacy, highlighting the societal impact these systems generate, and requiring new ways of regulating the design and deployment of predictive systems. The authors foreground the constitutive impact of predictive inferences on society and human agency, moving beyond utilitarian approaches that require the identification of individual harm, arguing instead that these inferences often create the future they predict.

Most of the points they make have been made before (e.g. here), but the lucid narrative argumentation presented in Matsumi’s and Solove’s paper could open a new conversation in the US as to how legislatures and courts should approach the issue of pre-emptive predictions with regard to constitutional rights beyond privacy. The paper also expands that same discourse beyond individual rights, highlighting the pernicious character of the manipulative choice architectures that build on machine learning, and showing how the use of ‘dark patterns’ is more than merely the malicious deployment of an otherwise beneficial technology.

To make their argument, the authors tease out a set of salient “issues” that merit a brief discussion here, as they are key to the constitutive societal impact of pre-emptive predictions. The first issue concerns the “fossilisation problem” that foregrounds the fact that algorithmic predictions are necessarily based on past data and thus on past behavioural patterns, thereby risking what I have called (in this book) “scaling the past while freezing the future.” The second issue concerns the “unfalsifiability problem” that underscores the fact that data-driven predictions are probabilistic, making it difficult to contest their accuracy, which – according to the authors – sits in a grey zone between true and false data (I should note that under the GDPR personal data need not be true to qualify as such). The third issue concerns the “pre-emptive intervention problem” that zeros in on the fact that measures taken based on these predictions make testing their accuracy even more illusionary as we cannot know how people would have acted without those measures. This relates to the so-called Goodhart effect that foresees that “when using a measure as a target, it ceases to be a good measure.” The fourth issue concerns the “self-fulfilling prophecy” problem that reminds us of the seminal Thomas Theorem that states that “if men define a situation as real it is real in its consequences” which can be translated to our current environment as “if machines define a situation as real it is real in its consequences.”

The paper is all the more interesting because it refrains from framing everything and anything in terms of harm or risk of harm, foregrounding the constitutive impact of predictive inferences on society and human agency. Though the utilitarian framework of harm is part of their argument, the authors manage to dig deeper, thus developing insights outside the scope of cost-benefit analyses. Utilitarianism may in point of fact be part of the problem rather than offering solutions, because the utilitarian calculus cannot deal with the risk to rights unless it can be reduced to a risk of harm. In asserting the specific temporal nature of predictive inferences when used to pre-empt human behaviour, the constitutive impact on individual agency and societal dynamics becomes clear. It is this temporal issue that – according to the authors – distinguishes these technologies from many others, requiring new regulatory ways of addressing their impact.

To further validate their argument, the authors proceed to address a set of use cases, where the nefarious consequences of algorithmic targeting stand out, notably also because of their dubious reliability: credit scoring (now widely used in finance, housing, insurance or education), criminal justice (with a longstanding history of actuarial justice, now routinely used in decisions of bail, probation or sentencing, but also deployed to automate suspicion), employment (continuing surveillance-Taylorism while also targeting recruitment in ways that may exclude people from entering a job based on algorithmic scoring), education (where a focus on standardised testing and ‘early warning systems’ based on quantification of quality criteria may have perverse effects for those already disadvantaged) and insurance (where actuarial methods originated and the chimera of quantified efficiency of data-driven predictions could result in quasi-personalised premiums that charge people based on the statistical group they are deemed to fit). In all these contexts, the use of predictive and pre-emptive targeting restricts or enables future action, thus redefining the space for human agency. The design and deployment of predictive inferences enables corporations and public administration to create the future they predict, due to the performative effects they generate. Even if such creation is imperfect or was not intended, the authors highlight how it changes the dynamics of human society and disempowers those whose life is being predicted.

Matsumi and Solove end with a set of recommendations for legislatures, calling for legal norms that specifically target the use of predictive inferences, requiring scientific testability combined with evaluative approaches grounded in the humanities. They ask that legislatures develop a proper focus, avoiding over- and under-inclusivity, highlighting the relevance of context and stipulating specific requirements for training data in the case of data-driven systems. They call for the possibility to “escape” the consequences of unverifiable predictions and suggest an expiry date for predictive inferences, while emphasizing that individual redress cannot resolve issues that play out at the societal level. As they note, the EU AI Act addresses many of the problems they detect, providing many of the recommended “solutions,” though their current analysis of the Act remains cursory. (This is understandable as the final text was not yet available at the time of the release of this paper draft.)

Whereas the authors start their paper with a distinction between shamanic prophecies and calculated predictions, the distinction crumbles in the course of the paper, and rightly so. The initial assumption of objective and reliable predictive algorithms turns out to be a rhetorical move to call out the shamans of allegedly scientific predictions that may be refuted based on mathematical and empirical testing. It is key for lawyers to come to terms with the claimed functionalities of predictive tools that hold a potentially illusionary promise of reliable objective truth. We need to follow Odysseus’ strategy, when he bound himself to the mast after waxing the ears of his sailors, to avoid giving in to the Sirens of algorithmic temptation. To do so we cannot merely depend on self-binding (as the authors seem to suggest towards the end of their paper) but, as they actually convincingly advocate, we need to institute countervailing powers. That will necessitate legislative interventions beyond privacy and data protection, directly targeting e.g. digital services and ‘AI’ in the broad sense of that term. Matsumi & Solove’s paper holds great promise for an in-depth analysis of what is the key problem here and it should inform the development of well-argued and well-articulated legal frameworks.

Cite as: Mireille Hildebrandt, Addressing the Modern Shamanism of Predictive Inferences, JOTWELL (November 27, 2023) (reviewing Hideyuki Matsumi & Daniel J. Solove, The Prediction Society: Algorithms and the Problems of Forecasting the Future, GWU Legal Studies Rsch. Paper (forthcoming), available at SSRN (June 5, 2023)), https://cyber.jotwell.com/addressing-the-modern-shamanism-of-predictive-inferences/.

Best Laid Plans: The Challenges of Implementing Article 17

Oct 23, 2023 Rebecca TushnetAdd a Comment

Jasmin Brieske & Alexander Peukert, Coming into Force, Not Coming into Effect? The Impact of the German Implementation of Art. 17 CDSM Directive on Selected Online Platforms, CREATe Working Paper, available at SSRN (Jan. 25, 2022).

Rebecca Tushnet

The European Union has been busy updating its regulation of online services in a variety of ways. This includes a recent directive that directs Member States to institute a new online copyright regime. Services that host user-generated content will be required to keep unlicensed works off of their sites, and also required to negotiate with copyright owner groups for licensing agreements. In essence, other hosting sites will have to behave like YouTube in its deals with major music and film labels. This new regime was imposed by what’s known as Art. 17 of the 2019 Directive on Copyright in the Digital Single Market (CDSM Directive). (The Digital Services Act further complicates the picture because it overlaps with the laws required by Art. 17 and adds to their requirements, but I will focus here on Art. 17.)

Unlike its content-agnostic counterpart the Digital Services Act, the copyright-specific Art. 17 does not itself have the force of law; it requires transposition into national law, and different countries have taken different approaches to that transposition. Germany’s transposition has been one of the most ambitious and user-oriented. Brieske & Peukert’s working paper Coming into Force, Not Coming into Effect? The Impact of the German Implementation of Art. 17 CDSM Directive on Selected Online Platforms explores how the new German regime affected—and didn’t affect—the copyright-related policies and practices of major sites. As it turns out, neither the user protections nor the rightsowner protections seem to have changed the practices of the big sites—giving more evidence that the major impact will be on smaller sites that may not even have had the problems that purportedly justified this new licensing-first regime. The piece is an important reminder that implementation is everything: New legislation is exciting and produces lots of work for lawyers, but that doesn’t mean it produces wider change.

As Brieske and Peukert explain, few EU member states actually met the deadline for transposition, due in part to the inconsistencies in Art. 17 itself: Supposedly, the directive didn’t require the use of automated filtering—but it imposed duties to prevent unauthorized uploads that could not practically be accomplished without filters, to be applied when rightsholders supplied information sufficient to identify their works. Art. 17 was also supposed to preserve some user rights, but current technologies don’t (and likely never will) identify non-copyright-infringing uses of works (“fair dealing” in Germany) such as reviews, quotations, and parodies in an automated way.

Germany’s implementation aimed to thread the needle by limiting automated blocking and creating a category of uses that are presumptively authorized by law and should not be blocked. A presumptively authorized use:

(1) contains less than half of one or several other works or entire images,
(2) combines this third-party content with other content, and
(3) uses the works of third parties only to a minor extent or, in the alternative, is flagged by the user as legally authorized. Minor uses are really minor, however: “uses that do not serve commercial purposes or only serve to generate insignificant income and concern up to 15 seconds of a cinematographic work or moving picture, up to 15 seconds of an audio track, up to 160 characters of a text, and up to 125 kilobytes of a photographic work, photograph or graphic.” Unlike fair use or even traditional fair dealing, this is a rule rather than a standard.

Moreover, providers have a duty to notify rightsowners when their identified works are used in minor or flagged ways, and offer an opportunity for the rightsowners to object, either via a takedown notice or, in cases involving “premium” content like live sports or current movies, via a “red button” that will immediately block access to the upload.

As is evident, this is a complicated system, perhaps rescued by the idea that mostly it won’t be used, since rightsowners have no real incentives to protest truly minor or critical uses. The German implementation also requires services to inform users about the existence of exceptions and (like the DSA) provide a dispute resolution procedure. Article 17 contemplates only an internal dispute resolution process, while the DSA will require the largest sites to provide for external arbitration as well.

Did all this complexity result in changes in the copyright policies of major sites? The authors studied “YouTube, Rumble (a smaller platform with similar functionality), TikTok, Twitter, Facebook, Instagram, SoundCloud and Pinterest.” The sites appeared not to change much or at all in response to the new German law, even when they had Germany-specific versions (as most did), although their policies also varied a fair amount across the entire group. Most notably, all the sites, with the exception of Twitter, were already using automated upload filters before they were required to do so. This result reflects what followers of research on the US DMCA have long known: Big platforms that experienced lots of unauthorized uploads had already transitioned away from reliance on notice and takedown and legal safe harbors, and towards using filtering, and often licensing, in “DMCA Plus” systems. Art. 17 thus didn’t change matters much if at all for those platforms, while potentially imposing expensive new duties on platforms that don’t have significant infringement problems.

The theoretical protections for users don’t seem to have done much. Likewise, the sites all already had internal dispute mechanisms, further indicating that market pressures were already producing some “due process” protections for users even without legal requirements. Larger sites may also be incentivized to do so by legal requirements: under the overlapping obligations of the DSA, very large sites will be required to provide outside arbitrators for appeals. Meanwhile, the sites didn’t seem to implement or tell users about the possibility of flagging an upload as authorized by law, and they also didn’t warn copyright owners of the possible penalties for repeated abuse of the system. The inefficacy of user protections may be a harbinger of the fate of other attempts to inject users’ rights into systems predicated on broad copyright controls.

As the authors point out, the difficulties in passing implementing legislation across the EU made a “wait and see” approach reasonable for many platforms. The European orientation towards accepting good-faith attempts at compliance, unlike the usually more-legalistic American approach, may also play a role. With Content ID or similar filtering mechanisms and internal appeal options already in place, the fact that the details vary somewhat from the formal requirements of the law might readily seem low-risk. There was no widespread noncompliance with the protections for large copyright owners, who are the most likely to sue and the most expensive to defend against. Users whose fair dealing is blocked are more likely to complain online or give up, neither of which are nearly as damaging.

The authors’ results are consistent with a story of regulation lagging behind reality, and also of regulation being designed with only the big players in mind. Websites like Ravelry (focused on the fiber arts) don’t really need filters to prevent them from being hotbeds of copyright infringement; keeping the site on-topic suffices for that even though it allows lots of user-generated content. And it turns out that the DMCA-Plus sites that most people use most of the time already did filter and didn’t bother to change how they filtered just because they were supposed to respect user rights in the process. The results also might support the alternate harder-law approach of the DSA, which doesn’t require transposition into national law. There’s no reason to wait and see what national implementations will look like, and a more limited risk of differing national interpretations (though this could still happen). Moreover, the DSA at least attempts to focus on the largest and thus most “dangerous” platforms, though I have argued elsewhere that its targeting is still relatively poor.

Brieske and Peukert help explain why online content governance is so difficult: Not only are regulators dealing with conflicting and sometimes irreconcilable priorities (pay copyright owners, avoid overblocking) but their solutions have to be translated into working systems. Services aware that they can’t automate fair dealing are easily tempted into sticking with the policies and systems they already put into place. Since the objective of licensing everything except that which need not be licensed can’t be done on an automated, large-scale basis, there is little incentive to improve. That is not a happy lesson, but it is one worth heeding.

Cite as: Rebecca Tushnet, Best Laid Plans: The Challenges of Implementing Article 17, JOTWELL (October 23, 2023) (reviewing Jasmin Brieske & Alexander Peukert, Coming into Force, Not Coming into Effect? The Impact of the German Implementation of Art. 17 CDSM Directive on Selected Online Platforms, CREATe Working Paper, available at SSRN (Jan. 25, 2022)), https://cyber.jotwell.com/best-laid-plans-the-challenges-of-implementing-article-17/.

Algorithmic Accountability is Even Harder Than You Thought

Sep 28, 2023 Rebecca CrootofAdd a Comment

Jennifer Cobbe, Michael Veale & Jatinder Singh, Understanding Accountability in Algorithmic Supply Chains (May 22, 2023), available at Arxiv.

Rebecca Crootof

Most proposed regulations for algorithmic accountability mechanisms have a common feature: they assume that there is a regulatory target with the power to control the system’s inputs, structure, or outputs. Maybe it’s the algorithm’s creator, or the vendor, or the deployer—but surely there’s an entity that can be held to account!

In Understanding Accountability in Algorithmic Supply Chains, Jennifer Cobbe, Michael Veale, and Jatinder Singh upend that assumption. In ten tightly but accessibly written pages, they detail how there is often no single entity that may be legitimately held accountable for an algorithmic conclusion. This is partially due to the “many hands” problem that has already spurred arguments for strict liability or enterprise liability for algorithmic systems. But designing a governance regime is also difficult, the authors argue, because of how algorithmic systems are structured. The authors use the “supply chain” metaphor to capture the fact that these systems are comprised of multiple actors with shifting interdependencies and shared control, contributing varied data and changing elements of the infrastructure, all while data flows in multiple directions simultaneously. The difficulty in regulating algorithmic systems is not just that it is hard to identify which of many entities is the cheapest cost avoider or the one that can be fairly held accountable; instead, it may be impossible to identify which entity or even which combination of entities is causally responsible for any given output.

The authors identify four distinct characteristics of algorithmic supply chains, all of which muddle traditional accountability analyses: (1) “production, deployment, and use are split between several interdependent actors”; (2) “supply chain actors and data flows perpetually change”; (3) “major providers’ operations are increasingly integrated across markets and between production and distribution”; and (4) “supply chains are increasingly consolidating around systemically important providers.” The first three elements make it challenging to identify which actor caused a given result; the fourth creates a practical and political impediment to accountability, as certain entities may become “too big to fail.” Refreshingly, the authors’ precise descriptions of these complex systems are interspersed with ruminations on how technological affordances, law, and political economy realities foster elements of the supply chain, while being careful not to slip into technological determinism.

The authors’ first observation is one of those concepts that I had never considered, but which seemed obvious after reading this piece: algorithms often “involve a group of organizations arranged together in a data-driven supply chain, each retaining control over component systems they provide as services to others” (emphasis in original) (I have only one critique of this paper: the authors are extremely fond of italics). It is “no longer the case that software is generally developed by particular teams or organizations.” Rather, “functionality results from the working together of multiple actors across various stages of production, deployment and use of AI technologies.” These various actors are (sometimes unknowingly) interdependent. Each one “may not be aware of the others, nor have consciously decided to work together towards [an] outcome . . . . However, each depends on something done by others.”

This interdependent dynamic is somewhat abstract, so the authors helpfully provide diagrams and concrete examples. Consider their Figure 2 (below) which showcases how one AI service provider (the red dot) might play three different roles in the provision of an algorithmic result, including providing AI as a service infrastructure to one entity, providing AI as a service to second, and providing technical infrastructure for an app to a third:

Figure 2: A representative AI supply chain. The application developer (blue) initiates a series of data flows by sending input data to an AI service provider (grey). One AI service provider (red) appears at multiple key points in the supply chain – providing infrastructure (A) for an AI service offered by (grey); providing an AI service (B) to another cloud service provider (orange); and providing technical infrastructure (C) for application deployment.
© 2023 Copyright Jennifer Cobbe, Michael Veale & Jatinder Singh. Reproduced by permission subject to cc-by-nc license.

The authors’ second observation is that the interdependencies among the various actors are dynamic and unstable: a supply chain “may differ each time it is instantiated,” as it may be comprised of different data, different actors, and different data flows.” And the outputs change as actors introduce new features or retire older ones, employ additional support services, or otherwise tinker with the system.

These dynamic and unstable interdependencies of the algorithmic supply chain raise accountability issues. One is a variant on Charles Perrow’s Normal Accident theory, writ large: “Interdependence helps problems propagate.” If accidents are inevitable in complex systems, they are certainly inevitable in algorithmic supply chains! The other accountability challenge is that, even when a problem is identified, it may be impossible to determine how it arose or what might be done to correct or mitigate it.

That being said, the authors’ third and fourth observations suggest that some actors—namely, ones which have been able to consolidate and entrench power within an algorithmic supply chain—play more stable and predictable roles than others. Some actors are horizontally integrated and operate across markets and sectors, repurposing infrastructural or user-facing technology for a range of services. Amazon Web Services, for example, is a cloud computing service used by newspapers, food companies, and retailers. Others are vertically integrated, controlling multiple stages of production and distribution of a particular algorithmic supply chain. And a few are both horizontally and vertically integrated, rendering them practically inescapable. (For a visceral description of the inescapability of Amazon, Facebook, Google, Microsoft, and Apple, I strongly recommend Kashmir Hill’s 2019 Goodbye Big Five project). Their centralization renders these entities tempting regulatory targets—but it also means they have the power and resources to affect how regulations take shape.

This is hardly the first time legal actors have had to confront the questions of how to create the right incentives for complex systems or hold multiple entities liable. The varied forms of the administrative state and joint-and-several liability, products liability, market share liability, and enterprise liability are all still useful models for constructing governance mechanisms.

But algorithmic accountability proposals that focus on a discrete actor will likely be insufficient and unfair, unless they account for the complicated interrelations of different entities within the supply chain. Meanwhile, proposals that target centralized actors will need to attend to the risks of assisting incumbents in building regulatory moats and otherwise creating barriers to entry. As Cobbe, Veale, and Singh’s excellent article details, both policymakers and scholars will need to wrestle with the complicated reality of how algorithmic supply chains actually operate.

Cite as: Rebecca Crootof, Algorithmic Accountability is Even Harder Than You Thought, JOTWELL (September 28, 2023) (reviewing Jennifer Cobbe, Michael Veale & Jatinder Singh, Understanding Accountability in Algorithmic Supply Chains (May 22, 2023), available at Arxiv), https://cyber.jotwell.com/algorithmic-accountability-is-even-harder-than-you-thought/.

« Older Entries

Next Entries »

When Law is Code

Getting Real About Protecting Privacy

Can Informed Consent Solve AI Bias?

Breathing Feminism into the Machine

Debunking AI’s Supposed Fairness-accuracy Tradeoff

Risky Speech Systems: Tort Liability for AI-Generated Illegal Speech

Centering Educational Institutions as Potential Sources of Student Privacy Violations

Addressing the Modern Shamanism of Predictive Inferences

Best Laid Plans: The Challenges of Implementing Article 17

Algorithmic Accountability is Even Harder Than You Thought

INSIDE JOTWELL

Sponsored By

SECTIONS

Editor in Chief

Section Editors

CONTRIBUTING EDITORS

Student Editors

Feeds & Subscriptions

Get Email Updates

Search