The Journal of Things We Like (Lots)
Select Page

Debunking AI’s Supposed Fairness-accuracy Tradeoff

Emily Black, John Logan Koepke, Pauline T. Kim, Solon Barocas & Mingwei Hsu, Less Discriminatory Algorithms, __ Geo. L.J. __ (forthcoming); Wash. U. Legal Studies Rsch. Paper (forthcoming), available at SSRN (Oct. 2, 2023).

We are likely to see as many new law review articles about artificial intelligence and law in the next year as the legal academy has produced since the dawn of AI. Those writing about AI for the very first time (*eyes a bunch of copyright scholars suspiciously*) would do well to engage deeply with the work of bias and discrimination scholars who have been writing some of the best, most insightful articles about AI and law for more than a decade. The frameworks and insights they have developed give us a way of thinking about AI law and policy beyond just considerations about Title VII and Equal Protection. A wonderful place to start is with the best contribution to this scholarship I have seen in years, Less Discriminatory Algorithms, by an interdisciplinary team of technical, policy, and legal experts.

This article is a form of my favorite genre of legal scholarship: “you all are making an important mistake about how the technology works.” In particular, it takes on the received wisdom that there is a vexing tradeoff between fairness and accuracy when training machine learning models. This supposed tradeoff—that in order to make a biased algorithm fairer, you need to sacrifice some of the model’s accuracy—may be true in theory for idealized and never-seen-in-the-wild maximally accurate models. But nobody ever has the time, compute, money, or energy to even approach this ideal, meaning real models out in the world are far from maximally accurate. People instead declare success as soon as their models are just accurate enough for their purposes.

The thing about these less-than-maximally-accurate models is that there are many other possible models the model builders could have trained that would have been nearly as accurate, while doling out different false positives and false negatives. Some of these nearly-as-accurate models would likely have better distributional outcomes for protected classes and other vulnerable populations. This is what two of the article’s authors, Emily Black and Solon Barocas, have dubbed “model multiplicity” in the computer science literature. Model multiplicity means that if those training a model would continue to experiment—tweaking a hyperparameter here, selecting or deselecting a feature there, or preprocessing the data a little bit more—they would probably find many other equally (or nearly as) accurate models that allocated the winners and losers differently.

Importantly, many of these other nearly-as-accurate models might also be more fair, less biased, and less discriminatory than the one actually deployed. Rather than saying that there is a tradeoff between fairness and accuracy, we should instead understand that the tradeoff is between accepting a given level of bias or discrimination versus spending a little more time, money, and (carbon emitting) compute to find an alternative that improves on fairness without sacrificing accuracy.

Happily, Black and Barocas have found a brilliant and interdisciplinary team of coauthors: noted antidiscrimination scholar Pauline Kim and two amazing tech-meets-law researchers from the incredible nonprofit Upturn, Logan Koepke and Mingwei Hsu. The group successfully connects these engineering insights to legal doctrine. Model multiplicity’s recasting of the fairness-accuracy tradeoff has a direct bearing on the so-called third step of Title VII’s disparate impact analysis: the requirement that plaintiffs demonstrate a less discriminatory alternative (LDA). (I’m omitting the article’s careful discussion of how other civil rights statutes, including ECOA and the FHA, handle disparate impact analysis a bit differently, but to the same general effect.) This means that plaintiffs might be able to win in the third step of the test, by citing this paper and bringing on an expert who can help find the fairer, just-as-accurate model-not-taken.

Finding the less discriminatory alternative would still be a daunting path. It would require a plaintiff to have access to the exact training environment the defendant used—including all of the training data, which defendants are sure to resist producing. If plaintiffs clear this formidable discovery challenge, they will still need to spend a lot of time and money to find the less discriminatory alternative. All of this might be insurmountably burdensome for plaintiffs, limiting the change that model multiplicity might bring to civil rights law.

The authors, however, have both technical and legal responses to these evidentiary challenges. If the burden were instead placed on the defendant at step two to show that they looked for less discriminatory alternatives during training, it would avoid the inefficiency, delay, and discovery hassles associated with burdening the plaintiff after the fact. Kim and her coauthors find judicial precedent for imposing this obligation on defendants at step two, while conceding that this interpretation has not yet been broadly adopted. Importantly, once this approach is understood to be required by civil rights law, it will impose on model builders the duty to start looking for these alternatives during training. According to model multiplicity, they will often find them! The result will be more model builders finding and deploying less discriminatory alternatives, meaning the win-win of fewer people suffering from discrimination and less litigation exposure for employers!

I like (lots!) so much about this article. It imports a simple but powerful and elegant technical insight—model multiplicity—into legal scholarship. It does so while also advancing a much-needed reform in the way we interpret civil rights law, putting the burden of looking for less discriminatory alternatives on defendants rather than plaintiffs. It is grounded and specific, explaining exactly why this should happen and how the three-step test and the correlative duty to search for less discriminatory alternatives ought to be interpreted; I can imagine courts across the country implementing this article’s prescriptions directly. It does not shy away from technical and legal detail, taking deep dives into the techniques used to find a less discriminatory alternative, how nearly accurate a less discriminatory alternative must be, or how much effort a model builder must spend in trying to find one, to list only three examples.

Finally, legal scholars must understand that model multiplicity matters beyond civil rights law. An obvious extension is to products liability law. Model multiplicity will provide a pathway for injured plaintiffs looking for a “reasonable alternative design” to support claims of defective products design.

We will soon see AI models that do not live up to our legal and societal ideals in many other ways, such as large language models that spout misinformation, deep fake models used to terrorize women, and facial recognition models that destroy privacy. As we fight over what we should do about the bad effects of models, we should understand that they are often the effects of bad models. Model multiplicity means refusing to let model builders rest on the expediency of having done the bare minimum. Keep trying; keep searching; keep looking, and you just may find models that both do well and do good.

Cite as: Paul Ohm, Debunking AI’s Supposed Fairness-accuracy Tradeoff, JOTWELL (March 7, 2024) (reviewing Emily Black, John Logan Koepke, Pauline T. Kim, Solon Barocas & Mingwei Hsu, Less Discriminatory Algorithms, __ Geo. L.J. __ (forthcoming); Wash. U. Legal Studies Rsch. Paper (forthcoming), available at SSRN (Oct. 2, 2023)), https://cyber.jotwell.com/debunking-ais-supposed-fairness-accuracy-tradeoff/.

Risky Speech Systems: Tort Liability for AI-Generated Illegal Speech

How should we think about liability when AI systems generate illegal speech? The Journal of Free Speech Law, a peer-edited journal, ran a topical 2023 symposium on Artificial Intelligence and Speech that is a must-read. This JOT addresses two symposium pieces that take particularly interesting and interlocking approaches to the question of liability for AI-generated content: Jane Bambauer’s Negligent AI Speech: Some Thoughts about Duty, and Nina Brown’s Bots Behaving Badly: A Products Liability Approach to Chatbot-Generated Defamation. These articles evidence how the law constructs technology: the diverse tools in the legal sensemaking toolkit that are important to pull out every time somebody shouts “disruption!”

Each author offers a cogent discussion of possible legal frameworks for liability, moving beyond debates about First Amendment coverage of AI speech to imagine how substantive tort law will work. While these are not strictly speaking First Amendment pieces, exploring the application of liability rules for AI is important, even crucial, for understanding how courts might shape First Amendment law. First Amendment doctrine often hinges on the laws to which it is applied. By focusing on substantive tort law, Bambauer and Brown take the as-yet largely abstract First Amendment conversation to a much-welcomed pragmatic yet creative place.

What makes these two articles stand out is that they each address AI-generated speech that is illegal—that is, speech that is or should be unprotected by the First Amendment, even if First Amendment coverage extends to AI-generated content. Bambauer talks about speech that physically hurts people, a category around which courts have been conducting free-speech line-drawing for decades; Brown talks about defamation, which is a historically unprotected category of speech. While a number of scholars have discussed whether the First Amendment covers AI-generated speech, until this symposium there was little discussion of how the doctrine might adapt to handle liability for content that’s clearly unprotected.

Bambauer’s and Brown’s articles are neatly complimentary. Bambauer addresses duties of care that might arise when AI misrepresentations result in physical harm to a user or third parties. Brown addresses a products-liability approach to AI-generated defamation. Another related symposium piece that squarely takes on the question of liability for illegal speech is Eugene Volokh’s Large Libel Models? Liability for AI Output. The Brown and Bambauer pieces speak more directly to each other in imagining and applying two overlapping foundational liability frameworks, while Volokh’s piece focuses on developing a sui generis version of developer liability called “notice-and-blocking” that he grounds in Brown’s idea of using products liability as a starting point. That is, Bambauer and Brown provide the necessary building blocks; Volokh’s article is an example of how one might further manipulate them.

Bambauer writes of state tort liability, as it might be modified by state courts incorporating free speech values. She explains that she has “little doubt that the output of AI speech programs will be covered by free speech protections” (P. 347) (as do my co-authors and I) but also that “the First Amendment does not create anything like an absolute immunity to regulatory intervention,” especially when it comes to negligence claims for physical harm. (P. 348.) Bambauer convincingly claims that the duty element of negligence is where the rubber will hit the road in state courts when it comes to determining the right balance between preventing physical harms and protecting free speech values. She identifies different categories of duty as an effective way of categorizing existing cases that address analogous problems (from books that mis-identify poisonous mushrooms as edible, to doctors who provide dangerously incorrect information to patients).

Bambauer divides her discussion of duty into three broad categories, followed by additional subcategories: 1) situations where AI systems provide information to a user that causes physical harm to that user; 2) situations where AI systems provide information to a user who then causes physical harm to a third party; and 3) situations where AI systems would have provided accurate information that could have averted harm, had a user consulted them (reminiscent of Ian Kerr’s and Michael Froomkin’s prescient work on the impact of machine learning on physician liability). Throughout, this article is logical, clearly organized, factually grounded, and neatly coherent, even where a reader might depart from its substantive claims.

These categories allow Bambauer to tour the reader through available analogies, comparing AI “to pure speech products, to strangers, or to professional advisors” and more. (P. 360.) If an AI system’s erroneous output is analogized to a book, Bambauer argues that developers will not and should not be found liable, as with a book that misidentified poisonous mushrooms as edible as in the Ninth Circuit’s Winter case. (Eerily, this exact fact pattern has already arisen with AI-generated foraging books.) If, under different factual circumstances, AI-generated content is more appropriately analogized to professional advice in a specialized domain such as law or medicine, there might be a higher duty of care. Or, courts might use a “useful, if strained analogy” of “wandering precocious children,” where parents/developers might be held liable under theories of “negligent supervision” for failing to anticipate where their child/generative AI might be doing dangerous things. (P. 356.) This might, Bambauer muses, nudge courts to focus on what mechanisms an AI developer has put in place to find and mitigate recurring harms. This is a classic “which existing analogy should apply to new things?” article, but done well. Others might take this logic further by pulling analogies from other spaces (I’m thinking here for example of Bryan Choi’s work on car crashes, code crashes, and programmers’ duties of care).

This takes us to Brown’s intervention. Brown examines defamation claims through a products liability lens, asking what interventions a developer might be required to take to mitigate the known risk of defamatory content. Brown starts with a summary of how chatbots work, so the rest of us don’t have to. (I will be citing this section often.) She quickly and clearly explains the defamation puzzle: that the current law focuses largely on the intent of the speaker/publisher of defamatory content. This approach runs into issues when we are talking about the developers of AI systems, who Brown argues will almost never have the requisite intent under current defamation law.

Brown then turns to dismantling hurdles to a products liability approach (is it a product? What’s the role of economic loss doctrine?). Readers may find this part more or less convincing, but resolving the hurdles (it’s a product, she thinks economic loss doctrine is not a problem) allows her to get to the really interesting part of the article: what substantive duties a developer might have, if AI-generated defamation gets framed as a products liability problem. Brown argues that “a design defect could exist if the model was designed in a way that made it likely to generate defamatory statements.” (P. 410.) She provides concrete examples grounded in current developer practices: the use of flawed datasets rife with false content; the prioritization of sensational content over accuracy; a failure to take steps to reduce the likelihood of hallucinations; a failure to test the system.

I’m still not sure a products liability approach will survive the Supreme Court’s recent emphasis on scienter in First Amendment cases, but one can hope. In several recent cases, most prominently in Counterman v. Colorado, the Supreme Court has insisted on a heightened intent standard for unprotected speech in order to protect speakers from a chilling effect that occurs if one cannot clearly determine whether one’s speech is unprotected or protected.1 In Counterman, the unprotected speech category at issue was true threats, which the Court found could not be determined under an objective standard but required a query of speaker intent. The Court reasoned that a heightened intent standard creates a penumbra of protection for borderline speech that is close to but not unprotected speech—such as opinionated criticism of a public figure bordering on defamation, or vigorous political speech at a rally bordering on incitement. Brown presents the products-liability approach as a sort of hack to get around the specific intent requirement of “actual malice” for defamation of public figures (private figures require only negligence, but arguably a heightened form of it). She does not really inquire about whether this is possible—whether today’s Court, post-Counterman, would accept this move. I personally think there is space in the Court’s reasoning in Counterman for moving away from specific intent, but it would have been nice to know Brown’s thoughts.

Together, these two articles offer a trio of important contributions: foundations for First Amendment debates about unprotected speech and AI systems; creative but grounded ways of imagining duties of care in the context of developer liability (relevant, too, to evolving discussions of platform liability); and an important basis for discussions about the role of tort law in establishing risk mitigation for content-generating AI systems in the U.S. legal context. Regulators have increasingly defaulted to a regulatory approach to risk mitigation for AI systems, including or especially in the EU. If, as is likely, the United States fails to enact its counterpart to, the Digital Services Act (DSA), Europe’s massive new law regulating content moderation, tort law may be where AI risk mitigation plays out in the United States.

  1. Counterman v. Colorado, 600 U.S. 66 (2023).
Cite as: Margot Kaminski, Risky Speech Systems: Tort Liability for AI-Generated Illegal Speech, JOTWELL (February 8, 2024) (reviewing Jane Bambauer, Negligent AI Speech: Some Thoughts about Duty, 3 J. Free Speech L. 344 (2023); Nina Brown, Bots Behaving Badly: A Products Liability Approach to Chatbot-Generated Defamation, 3 J. Free Speech L. 389 (2023)), https://cyber.jotwell.com/risky-speech-sys…d-illegal-speech/.

Centering Educational Institutions as Potential Sources of Student Privacy Violations

Fanna Gamal, The Private Life of Education, 75 Stan. L. Rev. 1315 (2023).

Schools increasingly use various technologies to monitor and collect information about students. The COVID-19 pandemic, which led to a large number of school closures and a transition to online learning, has also raised alarming questions about student privacy. For instance, virtual software used during remote exams to monitor students can scan students’ bedrooms, collect data from the microphones and cameras of students’ computers, and discern students’ keystrokes. In her article, The Private Life of Education, Professor Fanna Gamal makes a noteworthy contribution to scholarship in the privacy law and education law fields by highlighting embedded assumptions and significant shortcomings in privacy law governing student data. In doing so, she advances existing debates on the legal conception of information privacy. Gamal argues that student privacy laws’ immoderate focus on nondisclosure of students’ data outside of the school context fails to effectively consider the various ways in which schools can serve as the primary perpetrators of student privacy violations. She further contends that schools’ data practices may have disproportionate negative implications for members of historically marginalized groups, such as disabled and low-income students.

Gamal expertly critiques the provisions of the Family Educational Rights and Privacy Act (FERPA). She argues that FERPA’s excessive focus on the prohibition of data disclosures outside of schools spuriously assumes that schools should, by default, receive treatment as privacy protectors that act in the best interest of students’ privacy. Gamal aptly acknowledges that FERPA’s heavy reliance on non-disclosure is not unique to American privacy law. However, after unpacking the legal conception of student data privacy, Gamal goes on to convincingly argue that student data privacy law also assumes that students do not have a significant privacy interest in “data creation, collection and recording.” (P. 1319.)

She posits that educational records contain data that “assumes an aura of [uncontestable] truth,” a truth that follows students indefinitely and can impact their lives well beyond the age of majority (P. 1319.) Gamal argues that there is a significant imbalance of power between students and schools. She contends that FERPA grants schools too much power to determine this truth and its life cycle while giving students and parents insufficient mechanisms to contest educational records that contain misleading or false truths.  Gamal notes that even when parents have the ability to participate in hearings regarding students’ records, schools have excessive power in those hearings since parents and students have the burden of convincing the educational institution to amend educational records. Gamal suggests that privacy law unnecessarily shelters the internal data practices of educational entities from scrutiny, thereby permitting educational institutions to amass “power over the official archives that shape students’ lives” (P. 1318.) Educational institutions may use this power to infringe on students’ privacy.

Gamal perceptively highlights the impact of student privacy laws’ shortcomings on historically marginalized groups, such as disabled students. She convincingly argues that disability documentation is a “poor proxy for disability” and can further entrench pre-existing inequities (P. 1321.) Gamal admits that disability documentation may help to ensure special education resources go to students in need of such services, but she also notes that heightened documentation requirements may instead stem from “the fear of the ‘disability con,” that is, an irrational fear that some individuals may be dishonest about their disabilities (P. 1321.) She contends that the Individuals with Disabilities Education Act’s (IDEA) data disclosure requirements limit the ability of students who use special education services to obtain privacy from their educational institutions. In contrasting the educational records of non-disabled students and disabled students, Gamal observes that the records of disabled students contain data about their social, medical, physical, and mental diagnoses. Fear of the so-called disability con, Gamal contends, results in requirements that ignore the challenges individuals from marginalized groups may face, such as possible limited access to documentation providers. She also points out that students from racial minorities experience over-representation among special education groups and, as such, disproportionality fall subject to the heightened documentation processes required of students seeking access to special education services.

The well-written article concludes by offering a path forward. Gamal argues for expanding the concept of information privacy in the school setting via a collaborative process that gives voice to various stakeholders. She also proposes several amendments to FERPA, including correcting FERPA’s excessive reliance on non-disclosure outside of the school context, redefining the term “educational records,” and providing students and parents with better tools to amend and delete educational records. She also recommends limits on educational institutions’ power over their internal data practices. Gamal’s convincing description of the limits of the current legal framework regulating student privacy should capture the attention of privacy and educational law scholars interested in learning more about the ways in which narrow conceptions of information privacy can further cement institutional data practices that contribute to existing disparities.

Cite as: Stacy-Ann Elvy, Centering Educational Institutions as Potential Sources of Student Privacy Violations, JOTWELL (January 5, 2024) (reviewing Fanna Gamal, The Private Life of Education, 75 Stan. L. Rev. 1315 (2023)), https://cyber.jotwell.com/centering-educational-institutions-as-potential-sources-of-student-privacy-violations/.

Addressing the Modern Shamanism of Predictive Inferences

Hideyuki Matsumi & Daniel J. Solove, The Prediction Society: Algorithms and the Problems of Forecasting the Future, GWU Legal Studies Rsch. Paper (forthcoming), available at SSRN (June 5, 2023).

In their draft paper, The Prediction Society: Algorithms and the Problems of Forecasting the Future, Matsumi and Solove distinguish two ways of making predictions: “the first method is prophecy–based on superstition” and “the second is forecasting–based on calculation.” Initially, they seem convinced that the latter, calculative, type of prediction is more accurate and thus capable of transforming society as it shifts control over peoples’ future to those who develop or deploy such systems. Over the course of the paper, however, that distinction between deceptive prophecy and accurate prediction blurs. The authors make the argument that the pervasive and surreptitious use of predictive algorithms that target human behaviour makes a difference for a whole range of human rights beyond privacy, highlighting the societal impact these systems generate, and requiring new ways of regulating the design and deployment of predictive systems. The authors foreground the constitutive impact of predictive inferences on society and human agency, moving beyond utilitarian approaches that require the identification of individual harm, arguing instead that these inferences often create the future they predict.

Most of the points they make have been made before (e.g. here), but the lucid narrative argumentation presented in Matsumi’s and Solove’s paper could open a new conversation in the US as to how legislatures and courts should approach the issue of pre-emptive predictions with regard to constitutional rights beyond privacy. The paper also expands that same discourse beyond individual rights, highlighting the pernicious character of the manipulative choice architectures that build on machine learning, and showing how the use of ‘dark patterns’ is more than merely the malicious deployment of an otherwise beneficial technology.

To make their argument, the authors tease out a set of salient “issues” that merit a brief discussion here, as they are key to the constitutive societal impact of pre-emptive predictions. The first issue concerns the “fossilisation problem” that foregrounds the fact that algorithmic predictions are necessarily based on past data and thus on past behavioural patterns, thereby risking what I have called (in this book)  “scaling the past while freezing the future.” The second issue concerns the “unfalsifiability problem” that underscores the fact that data-driven predictions are probabilistic, making it difficult to contest their accuracy, which – according to the authors – sits in a grey zone between true and false data (I should note that under the GDPR personal data need not be true to qualify as such). The third issue concerns the “pre-emptive intervention problem” that zeros in on the fact that measures taken based on these predictions make testing their accuracy even more illusionary as we cannot know how people would have acted without those measures. This relates to the so-called Goodhart effect that foresees that “when using a measure as a target, it ceases to be a good measure.” The fourth issue concerns the “self-fulfilling prophecy” problem that reminds us of the seminal Thomas Theorem that states that “if men define a situation as real it is real in its consequences” which can be translated to our current environment as “if machines define a situation as real it is real in its consequences.”

The paper is all the more interesting because it refrains from framing everything and anything in terms of harm or risk of harm, foregrounding the constitutive impact of predictive inferences on society and human agency. Though the utilitarian framework of harm is part of their argument, the authors manage to dig deeper, thus developing insights outside the scope of cost-benefit analyses. Utilitarianism may in point of fact be part of the problem rather than offering solutions, because the utilitarian calculus cannot deal with the risk to rights unless it can be reduced to a risk of harm. In asserting the specific temporal nature of predictive inferences when used to pre-empt human behaviour, the constitutive impact on individual agency and societal dynamics becomes clear. It is this temporal issue that – according to the authors – distinguishes these technologies from many others, requiring new regulatory ways of addressing their impact.

To further validate their argument, the authors proceed to address a set of use cases, where the nefarious consequences of algorithmic targeting stand out, notably also because of their dubious reliability: credit scoring (now widely used in finance, housing, insurance or education), criminal justice (with a longstanding history of actuarial justice, now routinely used in decisions of bail, probation or sentencing, but also deployed to automate suspicion), employment (continuing surveillance-Taylorism while also targeting recruitment in ways that may exclude people from entering a job based on algorithmic scoring), education (where a focus on standardised testing and ‘early warning systems’ based on quantification of quality criteria may have perverse effects for those already disadvantaged) and insurance (where actuarial methods originated and the chimera of quantified efficiency of data-driven predictions could result in quasi-personalised premiums that charge people based on the statistical group they are deemed to fit). In all these contexts, the use of predictive and pre-emptive targeting restricts or enables future action, thus redefining the space for human agency. The design and deployment of predictive inferences enables corporations and public administration to create the future they predict, due to the performative effects they generate. Even if such creation is imperfect or was not intended, the authors highlight how it changes the dynamics of human society and disempowers those whose life is being predicted.

Matsumi and Solove end with a set of recommendations for legislatures, calling for legal norms that specifically target the use of predictive inferences, requiring scientific testability combined with evaluative approaches grounded in the humanities. They ask that legislatures develop a proper focus, avoiding over- and under-inclusivity, highlighting the relevance of context and stipulating specific requirements for training data in the case of data-driven systems. They call for the possibility to “escape” the consequences of unverifiable predictions and suggest an expiry date for predictive inferences, while emphasizing that individual redress cannot resolve issues that play out at the societal level. As they note, the EU AI Act addresses many of the problems they detect, providing many of the recommended “solutions,” though their current analysis of the Act remains cursory. (This is understandable as the final text was not yet available at the time of the release of this paper draft.)

Whereas the authors start their paper with a distinction between shamanic prophecies and calculated predictions, the distinction crumbles in the course of the paper, and rightly so. The initial assumption of objective and reliable predictive algorithms turns out to be a rhetorical move to call out the shamans of allegedly scientific predictions that may be refuted based on mathematical and empirical testing. It is key for lawyers to come to terms with the claimed functionalities of predictive tools that hold a potentially illusionary promise of reliable objective truth. We need to follow Odysseus’ strategy, when he bound himself to the mast after waxing the ears of his sailors, to avoid giving in to the Sirens of algorithmic temptation. To do so we cannot merely depend on self-binding (as the authors seem to suggest towards the end of their paper) but, as they actually convincingly advocate, we need to institute countervailing powers. That will necessitate legislative interventions beyond privacy and data protection, directly targeting e.g. digital services and ‘AI’ in the broad sense of that term. Matsumi & Solove’s paper holds great promise for an in-depth analysis of what is the key problem here and it should inform the development of well-argued and well-articulated legal frameworks.

Cite as: Mireille Hildebrandt, Addressing the Modern Shamanism of Predictive Inferences, JOTWELL (November 27, 2023) (reviewing Hideyuki Matsumi & Daniel J. Solove, The Prediction Society: Algorithms and the Problems of Forecasting the Future, GWU Legal Studies Rsch. Paper (forthcoming), available at SSRN (June 5, 2023)), https://cyber.jotwell.com/addressing-the-modern-shamanism-of-predictive-inferences/.

Best Laid Plans: The Challenges of Implementing Article 17

Jasmin Brieske & Alexander Peukert, Coming into Force, Not Coming into Effect? The Impact of the German Implementation of Art. 17 CDSM Directive on Selected Online Platforms, CREATe Working Paper, available at SSRN (Jan. 25, 2022).

The European Union has been busy updating its regulation of online services in a variety of ways. This includes a recent directive that directs Member States to institute a new online copyright regime. Services that host user-generated content will be required to keep unlicensed works off of their sites, and also required to negotiate with copyright owner groups for licensing agreements. In essence, other hosting sites will have to behave like YouTube in its deals with major music and film labels. This new regime was imposed by what’s known as Art. 17 of the 2019 Directive on Copyright in the Digital Single Market (CDSM Directive). (The Digital Services Act further complicates the picture because it overlaps with the laws required by Art. 17 and adds to their requirements, but I will focus here on Art. 17.)

Unlike its content-agnostic counterpart the Digital Services Act, the copyright-specific Art. 17 does not itself have the force of law; it requires transposition into national law, and different countries have taken different approaches to that transposition. Germany’s transposition has been one of the most ambitious and user-oriented. Brieske & Peukert’s working paper Coming into Force, Not Coming into Effect? The Impact of the German Implementation of Art. 17 CDSM Directive on Selected Online Platforms explores how the new German regime affected—and didn’t affect—the copyright-related policies and practices of major sites. As it turns out, neither the user protections nor the rightsowner protections seem to have changed the practices of the big sites—giving more evidence that the major impact will be on smaller sites that may not even have had the problems that purportedly justified this new licensing-first regime. The piece is an important reminder that implementation is everything: New legislation is exciting and produces lots of work for lawyers, but that doesn’t mean it produces wider change.

As Brieske and Peukert explain, few EU member states actually met the deadline for transposition, due in part to the inconsistencies in Art. 17 itself: Supposedly, the directive didn’t require the use of automated filtering—but it imposed duties to prevent unauthorized uploads that could not practically be accomplished without filters, to be applied when rightsholders supplied information sufficient to identify their works. Art. 17 was also supposed to preserve some user rights, but current technologies don’t (and likely never will) identify non-copyright-infringing uses of works (“fair dealing” in Germany) such as reviews, quotations, and parodies in an automated way.

Germany’s implementation aimed to thread the needle by limiting automated blocking and creating a category of uses that are presumptively authorized by law and should not be blocked. A presumptively authorized use:

(1) contains less than half of one or several other works or entire images,

(2) combines this third-party content with other content, and

(3) uses the works of third parties only to a minor extent or, in the alternative, is flagged by the user as legally authorized. Minor uses are really minor, however: “uses that do not serve commercial purposes or only serve to generate insignificant income and concern up to 15 seconds of a cinematographic work or moving picture, up to 15 seconds of an audio track, up to 160 characters of a text, and up to 125 kilobytes of a photographic work, photograph or graphic.” Unlike fair use or even traditional fair dealing, this is a rule rather than a standard.

Moreover, providers have a duty to notify rightsowners when their identified works are used in minor or flagged ways, and offer an opportunity for the rightsowners to object, either via a takedown notice or, in cases involving “premium” content like live sports or current movies, via a “red button” that will immediately block access to the upload.

As is evident, this is a complicated system, perhaps rescued by the idea that mostly it won’t be used, since rightsowners have no real incentives to protest truly minor or critical uses. The German implementation also requires services to inform users about the existence of exceptions and (like the DSA) provide a dispute resolution procedure. Article 17 contemplates only an internal dispute resolution process, while the DSA will require the largest sites to provide for external arbitration as well.

Did all this complexity result in changes in the copyright policies of major sites? The authors studied “YouTube, Rumble (a smaller platform with similar functionality), TikTok, Twitter, Facebook, Instagram, SoundCloud and Pinterest.” The sites appeared not to change much or at all in response to the new German law, even when they had Germany-specific versions (as most did), although their policies also varied a fair amount across the entire group. Most notably, all the sites, with the exception of Twitter, were already using automated upload filters before they were required to do so. This result reflects what followers of research on the US DMCA have long known: Big platforms that experienced lots of unauthorized uploads had already transitioned away from reliance on notice and takedown and legal safe harbors, and towards using filtering, and often licensing, in “DMCA Plus” systems. Art. 17 thus didn’t change matters much if at all for those platforms, while potentially imposing expensive new duties on platforms that don’t have significant infringement problems.

The theoretical protections for users don’t seem to have done much. Likewise, the sites all already had internal dispute mechanisms, further indicating that market pressures were already producing some “due process” protections for users even without legal requirements. Larger sites may also be incentivized to do so by legal requirements: under the overlapping obligations of the DSA, very large sites will be required to provide outside arbitrators for appeals. Meanwhile, the sites didn’t seem to implement or tell users about the possibility of flagging an upload as authorized by law, and they also didn’t warn copyright owners of the possible penalties for repeated abuse of the system. The inefficacy of user protections  may be a harbinger of the fate of other attempts to inject users’ rights into systems predicated on broad copyright controls.

As the authors point out, the difficulties in passing implementing legislation across the EU made a “wait and see” approach reasonable for many platforms. The European orientation towards accepting good-faith attempts at compliance, unlike the usually more-legalistic American approach, may also play a role. With Content ID or similar filtering mechanisms and internal appeal options already in place, the fact that the details vary somewhat from the formal requirements of the law might readily seem low-risk. There was no widespread noncompliance with the protections for large copyright owners, who are the most likely to sue and the most expensive to defend against. Users whose fair dealing is blocked are more likely to complain online or give up, neither of which are nearly as damaging.

The authors’ results are consistent with a story of regulation lagging behind reality, and also of regulation being designed with only the big players in mind. Websites like Ravelry (focused on the fiber arts) don’t really need filters to prevent them from being hotbeds of copyright infringement; keeping the site on-topic suffices for that even though it allows lots of user-generated content. And it turns out that the DMCA-Plus sites that most people use most of the time already did filter and didn’t bother to change how they filtered just because they were supposed to respect user rights in the process. The results also might support the alternate harder-law approach of the DSA, which doesn’t require transposition into national law. There’s no reason to wait and see what national implementations will look like, and a more limited risk of differing national interpretations (though this could still happen). Moreover, the DSA at least attempts to focus on the largest and thus most “dangerous” platforms, though I have argued elsewhere that its targeting is still relatively poor.

Brieske and Peukert help explain why online content governance is so difficult: Not only are regulators dealing with conflicting and sometimes irreconcilable priorities (pay copyright owners, avoid overblocking) but their solutions have to be translated into working systems. Services aware that they can’t automate fair dealing are easily tempted into sticking with the policies and systems they already put into place. Since the objective of licensing everything except that which need not be licensed can’t be done on an automated, large-scale basis, there is little incentive to improve. That is not a happy lesson, but it is one worth heeding.

Cite as: Rebecca Tushnet, Best Laid Plans: The Challenges of Implementing Article 17, JOTWELL (October 23, 2023) (reviewing Jasmin Brieske & Alexander Peukert, Coming into Force, Not Coming into Effect? The Impact of the German Implementation of Art. 17 CDSM Directive on Selected Online Platforms, CREATe Working Paper, available at SSRN (Jan. 25, 2022)), https://cyber.jotwell.com/best-laid-plans-the-challenges-of-implementing-article-17/.

Algorithmic Accountability is Even Harder Than You Thought

Jennifer Cobbe, Michael Veale & Jatinder Singh, Understanding Accountability in Algorithmic Supply Chains (May 22, 2023), available at Arxiv.

Most proposed regulations for algorithmic accountability mechanisms have a common feature: they assume that there is a regulatory target with the power to control the system’s inputs, structure, or outputs. Maybe it’s the algorithm’s creator, or the vendor, or the deployer—but surely there’s an entity that can be held to account!

In Understanding Accountability in Algorithmic Supply Chains, Jennifer Cobbe, Michael Veale, and Jatinder Singh upend that assumption. In ten tightly but accessibly written pages, they detail how there is often no single entity that may be legitimately held accountable for an algorithmic conclusion. This is partially due to the “many hands” problem that has already spurred arguments for strict liability or enterprise liability for algorithmic systems. But designing a governance regime is also difficult, the authors argue, because of how algorithmic systems are structured. The authors use the “supply chain” metaphor to capture the fact that these systems are comprised of multiple actors with shifting interdependencies and shared control, contributing varied data and changing elements of the infrastructure, all while data flows in multiple directions simultaneously. The difficulty in regulating algorithmic systems is not just that it is hard to identify which of many entities is the cheapest cost avoider or the one that can be fairly held accountable; instead, it may be impossible to identify which entity or even which combination of entities is causally responsible for any given output.

The authors identify four distinct characteristics of algorithmic supply chains, all of which muddle traditional accountability analyses: (1) “production, deployment, and use are split between several interdependent actors”; (2) “supply chain actors and data flows perpetually change”; (3) “major providers’ operations are increasingly integrated across markets and between production and distribution”; and (4) “supply chains are increasingly consolidating around systemically important providers.” The first three elements make it challenging to identify which actor caused a given result; the fourth creates a practical and political impediment to accountability, as certain entities may become “too big to fail.” Refreshingly, the authors’ precise descriptions of these complex systems are interspersed with ruminations on how technological affordances, law, and political economy realities foster elements of the supply chain, while being careful not to slip into technological determinism.

The authors’ first observation is one of those concepts that I had never considered, but which seemed obvious after reading this piece: algorithms often “involve a group of organizations arranged together in a data-driven supply chain, each retaining control over component systems they provide as services to others” (emphasis in original) (I have only one critique of this paper: the authors are extremely fond of italics). It is “no longer the case that software is generally developed by particular teams or organizations.” Rather, “functionality results from the working together of multiple actors across various stages of production, deployment and use of AI technologies.” These various actors are (sometimes unknowingly) interdependent. Each one “may not be aware of the others, nor have consciously decided to work together towards [an] outcome . . . . However, each depends on something done by others.”

This interdependent dynamic is somewhat abstract, so the authors helpfully provide diagrams and concrete examples. Consider their Figure 2 (below) which showcases how one AI service provider (the red dot) might play three different roles in the provision of an algorithmic result, including providing AI as a service infrastructure to one entity, providing AI as a service to second, and providing technical infrastructure for an app to a third:

Figure 2: A representative AI supply chain. The application developer (blue) initiates a series of data flows by sending input data to an AI service provider (grey). One AI service provider (red) appears at multiple key points in the supply chain – providing infrastructure (A) for an AI service offered by (grey); providing an AI service (B) to another cloud service provider (orange); and providing technical infrastructure (C) for application deployment.
© 2023 Copyright Jennifer Cobbe, Michael Veale & Jatinder Singh. Reproduced by permission subject to cc-by-nc license.

The authors’ second observation is that the interdependencies among the various actors are dynamic and unstable: a supply chain “may differ each time it is instantiated,” as it may be comprised of different data, different actors, and different data flows.” And the outputs change as actors introduce new features or retire older ones, employ additional support services, or otherwise tinker with the system.

These dynamic and unstable interdependencies of the algorithmic supply chain raise accountability issues. One is a variant on Charles Perrow’s Normal Accident theory, writ large: “Interdependence helps problems propagate.” If accidents are inevitable in complex systems, they are certainly inevitable in algorithmic supply chains! The other accountability challenge is that, even when a problem is identified, it may be impossible to determine how it arose or what might be done to correct or mitigate it.

That being said, the authors’ third and fourth observations suggest that some actors—namely, ones which have been able to consolidate and entrench power within an algorithmic supply chain—play more stable and predictable roles than others. Some actors are horizontally integrated and operate across markets and sectors, repurposing infrastructural or user-facing technology for a range of services. Amazon Web Services, for example, is a cloud computing service used by newspapers, food companies, and retailers. Others are vertically integrated, controlling multiple stages of production and distribution of a particular algorithmic supply chain. And a few are both horizontally and vertically integrated, rendering them practically inescapable. (For a visceral description of the inescapability of Amazon, Facebook, Google, Microsoft, and Apple, I strongly recommend Kashmir Hill’s 2019 Goodbye Big Five project). Their centralization renders these entities tempting regulatory targets—but it also means they have the power and resources to affect how regulations take shape.

This is hardly the first time legal actors have had to confront the questions of how to create the right incentives for complex systems or hold multiple entities liable. The varied forms of the administrative state and joint-and-several liability, products liability, market share liability, and enterprise liability are all still useful models for constructing governance mechanisms.

But algorithmic accountability proposals that focus on a discrete actor will likely be insufficient and unfair, unless they account for the complicated interrelations of different entities within the supply chain. Meanwhile, proposals that target centralized actors will need to attend to the risks of assisting incumbents in building regulatory moats and otherwise creating barriers to entry. As Cobbe, Veale, and Singh’s excellent article details, both policymakers and scholars will need to wrestle with the complicated reality of how algorithmic supply chains actually operate.

Cite as: Rebecca Crootof, Algorithmic Accountability is Even Harder Than You Thought, JOTWELL (September 28, 2023) (reviewing Jennifer Cobbe, Michael Veale & Jatinder Singh, Understanding Accountability in Algorithmic Supply Chains (May 22, 2023), available at Arxiv), https://cyber.jotwell.com/algorithmic-accountability-is-even-harder-than-you-thought/.

Sexuality’s Promise for Sexual Privacy

Brenda Dvoskin, Speaking Back to Sexual Privacy Invasions, 98 Wash. L. Rev. __ (forthcoming 2023), available at SSRN (March 6, 2023).

Thanks in part to the ardent work of dedicated activists and scholars, there is a growing body of law and industry self-regulation governing violations of individuals’ sexual privacy, such as the unconsented distribution of another’s intimate images online. In her thoughtful piece, Speaking Back to Sexual Privacy Invasions, scholar Brenda Dvoskin powerfully argues that a key example of such regulation—many internet platforms’ self-imposed total ban on nudity—goes too far and is in many ways counterproductive to the goals of sexual privacy. As Dvoskin explains in her effort to deepen sexual privacy legal theory and make its application more consistent with its professed values of fostering (consensual) sexual expression, any effort to completely abate the harms flowing from sexual privacy violations requires not just preventing unconsented disclosures ex ante, “but also transforming the meaning of public representations of sexuality.”

Dvoskin argues that one of the principal harms flowing from unconsented disclosures originates in the social stigma associated with nudity. If self-authorized nudity became more commonplace via deregulation, the social harm of having one’s body seen might be decreased (albeit not eliminated). Put succinctly by Dvoskin, “[p]ublic representations of sex are an essential tool to destabilize the meaning of unwanted exposures and, in turn, reduce the harms experienced by victims of privacy losses.” As conceptualized by Dvoskin, diminishing the negative social meaning ascribed to nudity reduces the power of privacy invaders to inflict any harm and, in that view, is an intervention that more fully captures feminism’s emancipatory potential.

Importantly, Dvoskin acknowledges that unconsented sexual privacy violations cause autonomy harms that are critical to address. She also, however, explains how sexual privacy theory has, at times, incorporated the social harms associated with nudity as the normative justification for prohibiting privacy violations. That is, scholars and lawmakers have relied upon the social interpretation of the nudity as shameful to justify the regulation. Or, as put by Judith Butler when discussing gender performances, “the anticipation conjures its object,” or, in this case, the harm.

Instead, Dvoskin advocates for a regulatory path that creates space for consensual sexual expression, instead of reifying the idea that sexual expression necessarily results in unanswerable harms. This, in turn, will help destigmatize the social harm of sexual privacy violations and, in that way, reduce the power of privacy violators. Dvoskin’s position stands shoulder to shoulder with other social/law reform efforts that have prioritized destigmatization, such as movement calls to “Shout Your Abortion” and “come out” as queer in order to harness the power of social contact theory as a means of changing social attitudes toward marginalized identities/behaviors.

To illustrate her point, Dvoskin aptly draws from the queer theory notion of “scripts” and explains that the harm of sexual privacy is both scripted (meaning that the social interpretation of nudity helps create the harm) and script (meaning that legal/regulatory reaction to the privacy loss further entrenches the meaning of the nudity as somewhat negative). As such, law and policy must be attentive to the way they may be reinforcing the very harms they aim to prevent.

Dvoskin recognizes that in the context of online sexual expression and sexual privacy, striking the right regulatory balance is difficult. She acknowledges that regulatory regimes that require ex ante consent before platforms permit a posting would be onerous and require more work on the part of platforms and regulators. But innovative approaches, including notice-and-takedown regimes, that allow those whose rights are violated to efficiently have the infringing image removed by platforms, are not without precedent such as in the Digital Millennium Copyright Act copyright infringement context. And Dvoskin rightly notes that nuance is a virtue rather than vice and can, perhaps, serve as a model for more calibrated content moderation in other contexts as well. Indeed, as the Supreme Court has noted in the context of prohibited sex discrimination, “administrative convenience” is not a sufficiently compelling justification. Nor is it an idealized form of governance. And Dvoskin’s piece is a powerful riposte in favor of a more bespoke approach to the governance of sexual expression and sexual privacy online.

The article is as superbly written as it is carefully conceptualized and, although focused on a very specific and important context, serves as a cautionary tale across regulatory regimes, reminding us both that laws are discourses that can perpetuate the very harms they are seeking to prevent and that we should be skeptical of bright line rules, which are both over- and under- inclusive in terms of achieving regulatory aims. Moreover, while not Dvoskin’s principal focus, implicit in the article is also a critique of platforms’ approach to sexual expression as an example of, in essence, “privacy washing”: platforms trumpet the puritanism and overregulation of sexual expression of their websites in part to distract from their abysmal approaches to content moderation and privacy in other contexts, such as their failure to meaningfully regulate false political and medical information. All told, as Dvoskin explains, sexual expression should not be sacrificed either on the altar of either sexual privacy or in the name of convenience.

Cite as: Scott Skinner-Thompson, Sexuality’s Promise for Sexual Privacy, JOTWELL (August 30, 2023) (reviewing Brenda Dvoskin, Speaking Back to Sexual Privacy Invasions, 98 Wash. L. Rev. __ (forthcoming 2023), available at SSRN (March 6, 2023)), https://cyber.jotwell.com/sexualitys-promise-for-sexual-privacy/.

Generating Genuine Data Protection

Carleen M. Zubrzycki, The Abortion Interoperability Trap, 132 Yale L.J.F. 197 (2022).

In April 2023, the State of Idaho enacted legislation making it a felony to help a minor obtain an abortion (or medication to induce abortion) by “recruiting, harboring, or transporting the pregnant minor within this state.” With more than a third of U.S. states having severely restricted or outright prohibited access to abortion within state borders, Idaho has now turned its attention to making it more difficult for at least some of its citizens to travel out of state to obtain abortion care. The legislation explicitly rejects as a defense that the provider of abortion services is in another state. Abortion care is not the only type of healthcare service that has raised interjurisdictional conflicts. As of April 2023, at least thirteen states have banned some or all gender affirming care for minors. In some states, government officials have attempted to define gender affirming care as child abuse, which would arguably support removing resident children from parental custody even if the contested care were sought beyond the state’s borders.

In response, other states have enacted legislation intended to shield patients, providers, and others who facilitate care that is lawful within that state from being prosecuted or sued elsewhere. Connecticut, which was the first state to enact such protections, largely prohibits healthcare providers from turning over abortion records in out of state legal proceedings without the patient’s explicit consent and bars state judicial authorities from issuing subpoenas related to reproductive services unless there is an equivalent cause of action under Connecticut law.

Yet, as Carly Zubrzycki demonstrates in her new article The Abortion Interoperability Trap, laws like Connecticut’s “miss[] a crucial piece of the puzzle: medical records are widely shared across state lines to facilitate patient care.” As Zubrzycki explains, these new state laws designed to protect reproductive and gender affirming care “are generally limited to preventing providers and other covered parties from directly sharing information in formal proceedings.” They do not prevent, and indeed often explicitly permit, sharing of patient records across state lines for purposes of patient care. The result is that these statutes largely fail to provide the protection they tout. “The reason is simple: in-state providers subject to a safe-haven law will, in the ordinary course of business as their patients seek care in other states, share medical records with out-of-state providers who are not subject to that law and who can therefore easily be asked to hand over the records in litigation.” This gap between what abortion-protective laws promise and what they genuinely offer is what Zubrzycki calls abortion’s “interoperability trap.” In this timely and insightful article, Zubrzycki offers not just a diagnosis but refreshingly practical solutions. Her work is already having a practical and important impact.

The abortion interoperability trap is not an incidental or minor exception to otherwise robust data protection. Rather, Zubrzycki compellingly demonstrates that it threatens to “swallow the protections the legislation purports to offer.” HIPAA, though widely known for its Privacy Rule, is actually primarily concerned with the portability of medical data. After all, the “P” in HIPAA stands for “portability,” not privacy. The growth of electronic health records and electronic medical-records systems has also facilitated the freer flow of medical data among a single patient’s providers. More recently, expansion in the applicability of the federal Information Blocking Rule, which is designed to limit information hoarding by individual medical providers, has “shifted the incentives for those with access to medical records to begin sharing those records far more widely.” The result (or at least, the intended goal) of this “legal and technological ecosystem of medical records” is to “require providers and others to share patient information seamlessly with other providers and health-information-technology companies.”

This new regulatory framework for sharing health data was not designed to trap abortion patients or others seeking contested care out of state. It was targeted at resolving persistent problems affecting the portability and sharing of medical records. But the seamless sharing of a patient’s medical data among her providers may have unintended consequences, and the abortion interoperability trap is one of them.

Zubrzycki explains that existing legal tools are unlikely to resolve the interoperability trap. The federal regulatory exceptions to medical information sharing are permissive, rather than mandatory, and are narrowly framed such that they are unlikely to provide much protection. Nor will the HIPAA Privacy Rule or the Fourth Amendment resolve this quandary. The HIPAA Privacy Rule, as I have explained elsewhere, broadly authorizes law enforcement access to otherwise protected personal health information. Moreover, the Privacy Rule “expressly permits patient records to be shared whenever the sharing is for treatment purposes.” The Fourth Amendment is equally unavailing. “[E]ven at its remedial maximum, the Fourth Amendment requires only a warrant based on probable cause in order for law enforcement to obtain records.” That is, “even assuming that the Fourth Amendment protects medical records at all . . . law enforcement could still obtain those records with relative ease, as the probable-cause standard is not particularly burdensome” in investigating a suspected unlawful abortion.

But all is not lost! In addition to lucidly and incisively diagnosing the abortion interoperability trap, Zubrzycki also identifies legal and other mechanisms for addressing this problem. She argues that private providers and other stakeholders can take steps to better protect sensitive medical records under their control. Healthcare providers could, for instance, adopt more exacting patient informed consent requirements prior to disclosing an abortion, miscarriage, or stillbirth, so long as that requirement is not so onerous as to run afoul of the Information Blocking Rule. A more extreme, but perhaps more effective, response would involve reverting to paper records instead of electronic ones, as paper records are not covered by the Information Blocking Rule.

Better still, Zubrzycki identifies interventions that state and federal actors could take to better shield sensitive medical data—and her work here has already had an impact. In March 2023, Maryland enacted legislation based in significant part on Zubrzycki’s recommendations for closing the abortion interoperability trap. In introducing the legislation, the sponsoring state senator expressly invoked Zubrzycki’s proposal that “[t]he most effective legislative approach for states may be to prohibit electronic-health-record vendors and health-information exchanges from facilitating the transfer of abortion-related data across state lines.” The U.S. Department of Health and Human Services has also proposed changes to the HIPAA Privacy Rule to better protect reproductive health care privacy that are consistent with Zubrzycki’s recommendations, though the notice of proposed rulemaking does not cite Zubrzycki directly.

Scholarship like Zubrzycki’s not only advances the scholarly conversation about data and health privacy; it also makes concrete and positive change in the real world. Particularly as states act ever more aggressively to regulate contested forms of care, such scholarship is an essential part of successful academic work.

Cite as: Natalie Ram, Generating Genuine Data Protection, JOTWELL (July 25, 2023) (reviewing Carleen M. Zubrzycki, The Abortion Interoperability Trap, 132 Yale L.J.F. 197 (2022)), https://cyber.jotwell.com/generating-genuine-data-protection/.

Words of Wisdom

Samuel R. Bowman, Eight Things to Know About Large Language Models, available at arXiv (Apr. 2, 2023).

Lenin did not actually say, “There are decades when nothing happens, and there are weeks when decades happen,” but if he had, he might have been talking about generative AI. Since November 30, 2022 when OpenAI released ChatGPT, decades have happened every week. It’s not just that generative AI models are now able to emit fluent text on almost any topic imaginable. It’s also that every day now brings news of new models, new uses, and new abuses. Legal scholars are scrambling to keep up, and to explain whether and how these AIs might infringe copyright, violate privacy, commit defamation and fraud, transform the legal profession, or overwhelm the legal system.

Samuel R. Bowman’s preprint Eight Things to Know About Large Language Models is an ideal field guide for the scholar looking to understand the remarkable capabilities and shocking limitations of ChatGPT and other large language models (LLMs). Bowman is a professor of linguistics, data science, and computer science at NYU, and a visiting researcher at the AI startup Anthropic. Eight Things is clear, information-dense, and filled with helpful citations to the recent research literature. It is technically grounded, but not technically focused. And if you are paying attention, it will grab you by the lapels and shake vigorously.

LLMs are syntactic engines (or stochastic parrots). What they do, and all they do, is predict the statistical properties of written text: which words are likely to follow which other words. And yet it turns out that statistical prediction — combined with enough data in a large enough model wired up in the right way — is enough to emulate human creativity, reasoning, and expression with uncanny fluency. LLMs can write memos, program games, diagnose diseases, and compose sonnets. Eight Things is a thoughtful survey of what LLMs can do well, what they can’t, and what they can pretend to.

Bowman’s first two Things to Know are an unsettling matched pair. On the one hand, LLMs predictably get more capable with increasing investment, even without targeted innovation. Simply pouring more time, training data, and computing power into training an LLM seems to work. This means that progress in the field is predictable; at least for now, it doesn’t seem to depend on uncertain scientific breakthroughs. Decades will keep on happening every week. (Indeed, the sixth Thing to Know, human performance on a task isn’t an upper bound on LLM performance, means that there is no necessary limit to this progress. For all we know, it might be able to continue indefinitely.)

But on the other hand, specific important behaviors in LLM[s] tend to emerge unpredictably as a byproduct of increasing investment. The fact of progress (Gozer the Gozerian) is predictable, but not its specific form (the Stay Puft Marshmallow Man). As Bowman explains, part of what makes ChatGPT so powerful and so adaptable is that it displays few-shot learning: “the ability to learn a new task from a handful of examples in a single interaction.” Post-ChatGPT LLMs are not just purpose-built AIs with fixed capacities — they can be coached by users into competence on new tasks. This is why, for example, ChatGPT can produce baseline-competent answers to law-school exams, even though almost no one had “go to law school” on their bingo cards five years ago.

Bowman’s third Thing, LLMs often appear to learn and use representations of the outside world, is also remarkable. Even though they are only syntactic engines, LLMs can give instructions for drawing pictures, draw inferences about the beliefs of a document’s author, and make valid moves in board games, all tasks that are usually thought of as requiring abstract reasoning about a model of the world. Legal doctrine and legal theory will need to decide when to adopt an external perspective (“ChatGPT is an algorithm for generating text, like throwing darts at a dictionary”) and when to adopt an internal one (“ChatGPT acted with actual malice when it asserted that I committed arson”).

Unfortunately, there are no reliable techniques for steering the behavior of LLMs. While Bowman describes widely-used techniques for steering LLM behavior—crafting well-chosen prompts, training on well-chosen examples, and giving human feedback—none of these techniques are reliable in the way that we typically think a well-trained human can be. While LLMs are getting better at learning what humans want, this is not the same as doing what humans want. “This can surface in the form of … sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.”

The seventh Thing—LLMs need not express the values of their creators nor the values encoded in web text—expands on this depressing framing to explore specific ways in which researchers are trying to embed important legal and societal values in LLM outputs. As programmer Simon Willison has argued, it is hard or impossible to put guardrails around an LLM to prevent it from producing specific kinds of outputs. Malicious users with sufficient dedication and creativity can often use “prompt injection” to override the developer’s instructions to the LLM system with their own.

One reason that steering is so difficult is due to Bowman’s fifth Thing: experts are not yet able to interpret the inner workings of LLMs. Legal scholars have been writing thoughtfully about the interpretability problem for AIs. Giving high-stakes decisions over to an AI model offends important rule-of-law values when the AI’s decisions cannot be more intelligibly explained than “computer says no.”  LLMs and other generative AIs compound these problems. The legal system currently depends on an ability to make causal attributions: was a fraudulent statement made with scienter, or was the defendant’s work subconsciously copied from the plaintiff’s? The current state of the art in AI research gives us very little purchase on these questions.

Bowman’s eighth and final point is a further reinforcement of the limits of our current knowledge: brief interactions with LLMs are often misleading. On the one hand, the fact that an LLM currently trips over its own feet trying to answer a math problem doesn’t mean that it’s incapable of answering the problem. Maybe all it needs is to be prompted to “think step by step.” LLMs and high-schoolers both benefit from good academic coaching. On the other hand, the fact that an LLM seems able to execute a task with aplomb might not mean that it can do as well on what humans might consider a simple extension of that task. It turns out, for example, that GPT-4 memorized coding competition questions in its training set: it could “solve” coding questions that had been posted online before its training cutoff date, but not questions posted even just a week later.

LLMs are strikingly powerful, highly unpredictable, prone to surprising failures, hard to control, and changing all the time. The world urgently needs the insights that legal scholars can bring to bear, which means that legal scholars urgently need to understand how LLMs work, and how they go wrong. Samuel Bowman’s insightful essay is table stakes for participating in these debates.

Cite as: James Grimmelmann, Words of Wisdom, JOTWELL (June 20, 2023) (reviewing Samuel R. Bowman, Eight Things to Know About Large Language Models, available at arXiv (Apr. 2, 2023)), https://cyber.jotwell.com/words-of-wisdom/.

What STS Can (and Can’t) Do for Law and Technology

Ryan Calo, The Scale and the Reactor (2022), available at SSRN.

The field of law and technology has come a long way since we last heard the unmistakable squeal of a modem connecting to cyberspace.  Most of us that remember that sound now probably have more grey hair than we used to. We’ve covered a lot of ground since “Lex Informatica” and “Code is Law,” so you’d think our field would have a deeply sophisticated method for understanding the relationship between law, society, and technology, right?

Professor Ryan Calo thinks the field can do better. In this concise and accessible unpublished article that is part of a new book project, Calo highlights how Science and Technology Studies, or STS, has been overlooked and could contribute to the field of law and technology. To Calo, law and tech took decades to wind up where STS would have started. It’s not that law and tech is redundant of STS, rather, the problem is that “law and technology has been sounding similar notes to STS for years without listening to its music.” As a result, our field “does not benefit from the wisdom of scholars who have covered roughly the same ground.” Calo looks to showcase critical STS ideas and debates “for the unfamiliar law and technology reader,” so that we no longer have an excuse to claim ignorance of the field. He accomplishes this in spades with a clear and deeply informed article that is a must read for anyone writing in the field of law and technology.

Calo wrote this article because he believes that “a working knowledge of STS is critical to law and technology scholarship.” He argues that the core insights of STS will help scholars avoid “the pitfalls and errors that attend technology as social fact.” Calo’s contribution has three parts. The first is a brief STS crash course for the uninitiated. If you are unfamiliar with STS and regularly read this journal, stop reading this and check out Calo’s highly efficient summary of STS in Part One (it’s only seven pages!). I imagine the work of Langdon Winner, Bruno Latour, Sheila Jasanoff, and many other STS scholars will resonate with you as it did for me when I first encountered them. This introduction to the field is both informative and enjoyable because of Calo’s palpable enthusiasm for STS. (As I wrote this, I laughed at how I’m writing a review about how much I like Calo’s article, which is about how much he likes STS. It’s like I’m writing a Jot about a Jot. A meta-Jot.)

The second part of this article is an exploration of STS insights that make up the “road not taken by law and technology.” Calo highlights what could have been gained if legal scholars had more explicitly embraced STS earlier, including more nuanced metaphors, more case studies, and fewer redundancies. Calo cites two downsides that arise from law and technology overlooking STS. First, failing to deeply engage with STS denies the field of law and tech wisdom and nuance. Additionally, law and tech scholarship often falls into some of the very traps STS grew up to avoid, such as a strong sense of technological determinism and the idea that technology will shape behavior in one single way and no other.

In the article’s third part, Calo highlights the limitations of STS scholarship for law and technology scholars. First, STS is relatively uncomfortable with normativity, compared with the law’s embrace of it. Additionally, STS sometimes struggles to translate concepts and observations in ways that can influence levers of power.  Calo notes that STS scholarship sometimes gets lost in its own complexity, a critique levied by some STS scholars themselves. But as Julie Cohen has noted, law is relentlessly pragmatic in its identification and attempt to solve real world problems. While other disciplines might hesitate to offer up messy and even internally conflicting prescriptions, legal scholars do it for a living when inaction means injustice. Calo highlights the dangers of law and tech avoiding normativity and pragmatism, including getting stuck in a “constant state of watchful paralysis.” This happens when legal actors wait so long to fully understand the social impacts of technologies that when clarity finally arrives these tools and systems are too entrenched to resist. In STS scholarship this is referred to as the “Collinridge dilemma,” and it gives more nuance to what I’ve heard some law and tech scholars describe as the “avocado ripeness” problem. (Not yet…not yet…not yet……..too late.)

Thus, Calo’s article ends up being part STS-primer and part STS-implementation guide for law and technology scholars. According to Calo, you shouldn’t simply chuck a bunch of STS into every corner of cyberlaw, because “importing STS wholesale…has the potential to undermine what is unique about the [law and technology] field.” In the final part, Calo recommends that legal scholars should be mindful of how technologies have value-laden affordances and social forces behind them while holding firm to legal scholarship’s normativity and pragmatism. I appreciated Calo’s suggestion that one major strength of law and technology scholarship is making ideas and concepts concrete enough for people to act on.

I like this article because it is clear, concise, and even witty. (It wouldn’t be a Calo article without puns and he even managed to work one into the title). And I like this article lots because of its meditation on the virtues, vices, and proper role of “law and technology” as a field of scholarship. This is one of the main aspects of Calo’s forthcoming book. My only complaint in this review is I wish the article had previewed his larger project on law and technology more.

If we are going to take a serious look as the relationship between rules and artifacts, we must have a good sense of both. This article uses STS to show where the field of law and technology can improve, and what it does best.

Cite as: Woodrow Hartzog, What STS Can (and Can’t) Do for Law and Technology, JOTWELL (May 19, 2023) (reviewing Ryan Calo, The Scale and the Reactor (2022), available at SSRN), https://cyber.jotwell.com/what-sts-can-and-cant-do-for-law-and-technology/.