Discrimination by Database

Solon Barocas & Andrew D. Selbst, Big Data's Disparate Impact, available at SSRN (2014).

I have previously written about an NYU School of Internet scholars, led by the philosopher Helen Nissenbaum, whose work is “philosophically careful, intellectually critical, rich in detail, and humanely empathetic.” There is also a Princeton School, which orbits around the computer scientist Ed Felten, and which is committed to technical rigor, clear exposition, social impact, and creative problem-solving. These traditions converge in Big Data’s Disparate Impact by Solon Barocas and Andrew Selbst. The article is an attempt to map Title VII employment discrimination doctrine on to data mining, and it is one of the most interesting discussions of algorithmic prediction I have read.

The pairing—anti-discrimination law and data mining—is ideal. They are both centrally concerned with how overall patterns emerge from individual acts; they shift back and forth between the micro and the macro, the stones and the mosaic. Moreover, they are both centrally concerned with making good decisions: each in its own way aspires to replace crude stereotypes with nuanced reason. It would seem then, that Big Data ought to be an ideal ally in Title VII’s anti-discrimination mission. But Barocas and Selbst give reasons to think that the opposite may be true: that data mining will introduce new forms of bias that Title VII is ill-equipped to remedy.

In any interesting decision problem, there is a gap between the evidence available to a decision-maker and her goals. A recruiter would like to avoid hiring candidates who will stab customers, but the candidates who end up stabbing customers never seem to list that fact on their resumes. Thus, the decision will be mediated through a rule: a prediction about how the observable evidence correlates with a goal. In this context, then, data mining is a discipline of using large datasets, sophisticated statistics, and raw computational power to formulate better, more predictive rules.

The resulting rules are at once intensely automated and intensely human. On the one hand, data mining algorithms can discover surprising rules that human rules would not have thought to look for or complicated rules that humans would not have been able to formulate. In this sense, the algorithmic turn allows the use of rules that really are supported by the data, rather than the biased rules we flawed humans would think to try.

At the same time, as Barocas and Selbst deftly show, data mining requires human craftwork at every step. Humans pick the datasets to use, and they massage that data to make it usable for the learning algorithms (e.g., by imputing ZIP codes for customers who haven’t listed them). Humans do the same thing on the other end, both approximating and constructing the characteristics they wish to select for. To learn who is a good employee, an algorithm needs to train on a dataset in which a human has flagged employees as “good” or “bad,” but that flagging process in a very real sense defines what it means to be a “good” employee. In the gap between evidence and goals, humans specify the set of possible rules the algorithm will choose among, and the algorithm that will choose among them.

Barocas and Selbst circle over this ground three times, each time at a higher level of abstraction: technical, doctrinal, prescriptive. On the first pass, they survey the ways that invidious biases can enter into the automated algorithmic judgments. On the second, they show that Title VII doctrine often fails to catch these biases, even when they would result in serious and unjustified mistreatment. And on the third, they show that it will not be easy to patch Title VII—that the challenges of Big Data go to the heart of the American project of equality.

Injecting algorithms into what was formerly a human decision-making process can undermine accountability by diffusing responsibility. For one thing, the data intensitivity of data mining makes it easier for bad actors to hide their fingerprints. Take the deeply uncool process of collecting, cleaning, and merging datasets to prepare them for mining. If a data broker redlines a tenant database that is then used as an input to an employment-applicant screening algorithm, the resulting hiring decisions will in a very real sense be racially motivated, but it will be almost impossible for anyone to reconstruct why. Proof problems abound in the land of Big Data, and Big Data’s Disparate Impact is replete with examples. Ring of Gyges, anyone?

It gets worse. Big Data optimists have argued that employers and other decision-makers rely on race as a crude proxy for the characteristics they really care about, so that with better data they will be less racist. Perhaps. But if Bert is a proxy for Ernie, then Ernie can also be a proxy for Bert. In a world where everything predicts everything else, as Paul Ohm has half-jokingly hypothesized, a data-mining algorithm does not need direct access to forbidden criteria like religion or race to make decisions on the basis of them. Indeed, it can find far subtler ones than humans are capable of: perhaps birth year plus car color plus favorite potato chip brand equals national origin. Put another way, data mining can be just as efficient at optimizing discrimination as at avoiding it.

Moreover, on closer inspection, almost every interesting dataset is tainted by the effects of past discrimination. In a classic example, St. George’s Hospital trained an algorithm to replicate its admission’s staff’s evaluations of medical-school applicants with 90% to 95% fidelity. Unfortunately, the staff’s past decisions had been racist and sexist, so “the program was not introducing new bias but merely reflecting that already in the system.” That last phrase should be alarming to anyone who has worried about the divide between disparate treatment and disparate impact. “In certain contexts, data miners will never be able to disentangle legitimate and proscribed criteria,” Barocas and Selbst write, because the “legitimate” criteria redundantly encode the “proscribed” ones. But if “the computer did it,” and these patterns seem to emerge from the data as if by magic, Title VII has a hard time explaining who if anyone has done something culpably wrong in relying on them.

In other words, as Barocas and Selbst observe, data mining brings civil rights law face to face with the unresolved tension between its nondiscrimination and antisubordination missions. On the one hand, individual acts of invidious discrimination dissolve into the dataset; on the other, the dataset itself is permeated by past discrimination. This would be a familiar enough observation about the limits of strictly race-neutral analysis in a world of self-perpetuating patterns of exclusion, but the algorithmic angle makes it new and urgent. Algorithms are not neutral; they make fraught decisions for complex reasons. In all of this, perhaps, Big Data is surprisingly human.


About Fallacies

Neil M. Richards & William Smart, How Should the Law Think About Robots? (2013), available at SSRN.

The article seems dated for a review here. There are newer ones on the subject, like e.g., Ryan Calo’s “Robotics and the Lessons of Cyberlaw” of 2014, for example. But the Richards & Smart article sticks in my mind. Maybe because, while both are premature (I will come to that immediately), this article makes a—or better—the fundamental point about law and politics in the face of changing technologies in a very simple and clear way.

“Premature” used to be the comment we would receive from the European Commission when we, at the heyday of European cyber regulation, as members of the Legal Advisory Board, an independent expert group abolished long since, would suggest a new initiative outside the Commission’s own agenda. Some of the readers may have encountered this word when presenting new ideas as legal counsel. I have never taken it as a derogatory term. “Premature” signifies a quality, if not an obligation, of legal proactive comment and advice. In that sense dealing with robotics and law is premature, and so are, by the way, the “We Robot” Conferences (established in 2012) which give context to this article, a conference series in which—disclosure is due—our Editor-in-Chief has been involved prominently.

The fundamental point is slow in coming: Richards & Smart start with a definition of a robot: a “non-biological autonomous agent,” i.e. “a constructed system that that display both physical and mental agency but is not alive in the biological sense.” We all are familiar, as the authors point out, with all sorts of robots. We know them from science fiction readings and the movies. There is already the small round disk that cleans our sitting rooms. There has been the automated assembly of cars by industrial robots. And lately these cars drive around themselves as robots guided by Google. And robots, the authors argue, will become increasingly multipurpose, gain more autonomy, and turn from lab exhibits into everyday devices communicating with each of us at any time. Law? There is a reference to the Nevada state regulation of 2011 for those car robots. But otherwise the article mentions legal implications only in a very general way; there is no discussion; there is not even a listing of possible legal problems.

And yet, it is exactly this lack that makes this article so special and brings us to that central point. The authors make a notable, an important pause. Before going into the legal details, they insist, we should be aware of how law and society deal with technology in general, and they take Cyberlaw as the example of what may happen to robotics and law: Essential for technology law is the way in which law perceives technology. It does so by analogy to a metaphor already in use, in order to relate the “new” to something law already knows. The example Richards & Smart are presenting from Cyberlaw is the evolving interpretation of the Fourth Amendment with regard to wiretapping: The metaphor chosen decides on the political and legal path the issues will take.

While the importance of a metaphor is not new to discussions about law and about Cyberlaw in particular – see for example Julie Cohen’s analysis of Cyberspace as space in a 2007 article (107 Colum. L. Rev. 210), the authors consciously register the moment of the critical turn before it is taken: Heed the warning, they say, “Beware of Metaphors.” They exemplify their premonitions about the way in which politics and law may perceive robots with what they call the “Android Fallacy”: The more robots may look and seem to behave like human beings, the more inclined we might be to assert them free will, and the more responsibility will be taken off the shoulders of their designers.

In essence, what this article is asking us—and this may be the real reason why this article sticks in my mind—is to what fallacies of Cyberlaw have we contributed with our writings, making way for what kind of legal policies, legislation and jurisprudence, with what kind of consequences even when we were acting with proactive intent? Shouldn’t we have allowed for more time to discuss the implications of our metaphors before surfing with the technological tide?

(Michael Froomkin took no part in the editing of this essay.)


From Google to Tolstoy Bot: Should the First Amendment Protect Speech Generated by Algorithms?

Stuart Minor Benjamin, Algorithms and Speech, 161 U. Pa. L. Rev. 1445 (2013), available at SSRN.

Information, increasingly, is everywhere. Machines gather information, process it, and automatically communicate it, often in terms humans understand. Bots tweet on Twitter; Fitbits communicate a user’s activity record; Project Tango devices render 3D maps; and IBM’s Watson can now argue. With algorithms increasingly writing, drawing, and even debating, a central question for regulators, courts, and scholars is to what extent the First Amendment protects speech generated by algorithms. If algorithmic communication falls within First Amendment coverage, regulators will have a more difficult time governing it. But if it does not, courts will need to explain how the exclusion can sit comfortably with First Amendment theory and current doctrine.

Stuart Minor Benjamin positions the puzzle of algorithmic speech as part of a larger project in understanding First Amendment jurisprudence and its expansion and contraction. In previous work, Benjamin has asked how hard it would be to expand First Amendment coverage; in Algorithms and Speech, he asks how hard it would be to narrow the existing jurisprudence to exclude a practice that would otherwise be covered. Benjamin recognizes the potential regulatory consequences of First Amendment coverage of algorithmic speech. But he surveys Supreme Court caselaw and concludes that there is no principled way to exclude many algorithmic communications from speech protection without excluding much other communication that we deem squarely within the First Amendment’s coverage.

Algorithms and Speech does valuable work in laying out the current state of expansive First Amendment doctrine, and in identifying the Supreme Court’s reluctance to create new exceptions. Benjamin also clarifies that the coverage of algorithmic speech is not just a matter of making analogies to earlier media. Search engine results may be like editorial decisions, the claim of Eugene Volokh and Donald Falk’s 2012 white paper, but Benjamin is intent on finding an underlying reason why both are covered that goes beyond structural similarities.

The touchstone of First Amendment coverage, according to Benjamin, is the sending of a substantive message. Benjamin points out that when a sendable and receivable message has actually been sent, the Supreme Court has never found that message to be outside First Amendment coverage (a point that is historically untrue, but correct within current jurisprudence). Excluding algorithmically generated speech would upend existing First Amendment caselaw, requiring either the drawing of arbitrary lines or the exclusion of much of what is currently considered to be speech. Benjamin explains that the arguments against covering algorithmic speech, such as distinguishing it as corporate speech or commercial speech, would also leave core First Amendment institutions such as newspapers unprotected. What is most important to Benjamin is consistency, and the article is admirable in trying to craft rules that apply equally to all.

Benjamin carves out several important limitations. First, an algorithm that does not communicate a substantive message will not be protected. Second, because Benjamin hinges First Amendment protection on the communication of a message (but interestingly, not its receipt), some companies may have to indicate that they are editors. Third, as is the case with newspapers, laws of general applicability such as labor laws, tax laws, and most antitrust laws can apply to algorithmic speakers with no First Amendment ramifications. The government just can’t ban or compel substantive communication.

Benjamin makes a convincing case for the protection of search engine results under current First Amendment doctrine. The recent SDNY decision in Zhang v. Baidu, where a district court judge found First Amendment protection for Baidu’s search results, shows that judges are likely to agree. The article is also painstakingly honest in trying to maintain the cohesiveness of the Court’s First Amendment reasoning. But by positioning the question of algorithmic speech within current jurisprudence and around the model of search engine results, Benjamin limits the scope of the article in several ways. Algorithms and Speech does important work and will likely be a foundation that others will build on, but it leaves several more difficult questions for another day.

Benjamin steers away from reasoning from First Amendment theory. He instead navigates the “guideposts” of Supreme Court jurisprudence, accepting Cass Sunstein’s assertion that the First Amendment in practice is incompletely theorized. This leads to an unstated bias towards the jurisprudential status quo. It’s not clear precisely why preserving the guideposts of current jurisprudence is the right approach. The primary explanation offered is that disturbing the status quo of jurisprudence will threaten other media that more clearly rest at the heart of First Amendment interests.

We may soon be at a stage, however, where upending existing First Amendment caselaw is to some extent inevitable. If algorithmic speech does get full First Amendment protection, how will the intent requirements in many of the categories of unprotected speech get implemented? And if algorithmic speech does not get protection, how will we distinguish that content from human speech without threatening many of the values underlying the First Amendment? Algorithms and Speech takes a fascinating first step, while in its caselaw-driven approach leaving a number of important questions on the near horizon.

The second way in which the article is less daring that it could be stems from the model Benjamin chooses for algorithmic speech. The search engine model—of an algorithm running according to its programmers’ general intent—runs the article into limitations right where the questions get most interesting. Benjamin’s touchstone for First Amendment coverage, based on the Spence test, is the intentional communication of a substantive message by a human being. When the algorithm is no longer a tool for its user, but an artificial intelligence, Benjamin suggests the connection to a human speaker might become sufficiently attenuated that First Amendment coverage might no longer be appropriate. The problem is, as Bruce Boyden has pointed out, that the line between tool and independent message generator is exceedingly difficult to draw.

The flurry of scholarship around algorithmic speech shows the variety of ways a First Amendment problem can be framed: with a focus on the speaker, the message, the medium, or the listener. Benjamin’s focus is in large part on an intentional speaker. The recipient of a message matters to Benjamin to the extent that the recipient can identify the speaker as communicating a message, but the intentional speaker is for Benjamin the core of what turns information into First-Amendment-relevant speech.

Robert Post has taken a different approach to First Amendment coverage, explaining that First Amendment protection extends to a medium of communication because of its status as a social practice (both Tim Wu and Andrew Tutt have applied this to search engines). The question is whether certain kinds of algorithmic speech are sufficiently like protected media or have acquired enough of their own cultural meaning to deserve protection. This is not a question courts usually want to explicitly ponder.

A third way to frame First Amendment interests is to talk about the message itself, and whether its contents reflect high or low value speech. But as Benjamin points out, the Supreme Court has repeatedly rejected this message-oriented approach, at least with respect to speech made by an intentional speaker.

A fourth way to frame First Amendment interests is to look at the broader communications environment, including the listeners, readers, and receivers of communication. And this approach highlights a problem with both Benjamin’s and Boyden’s potential exclusion of algorithmic speech once a human is no longer involved in the selection of a message. Even an autonomous Tolstoy Bot would still be creating works that appear to readers to be identical to speech by humans. At that future phase of algorithmic creativity, censorship of Tolstoy Bot would affect readers the same way as censorship of Tolstoy: they would have access to less information, and would perceive the government’s actions as censorship. Under multiple theories of the First Amendment, this kind of censorship would raise problems—even if there is no human meaningfully involved in the crafting of the substantive message listeners could receive.

The article’s focus on a speaker’s agency as the touchstone of First Amendment protection thus may in practice prove to be both too inclusive—including actions as speech merely because a speaker claims to have a substantive message—and too underinclusive—rejecting speech that clearly looks like speech to a reader. The speaker’s agency approach may be most consistent with current jurisprudence, but the more difficult question for a future article is whether it is the right approach to future technologies, and why.


Don’t Restrict My E-book

Angela Daly, E-Book Monopolies and the Law, 18 Media & Arts L. Rev. 350 (2013), available at SSRN.

It’s still fashionable to point to the “cloud” as the solution to all sorts of problems in technology. But can such a shift disturb the carefully worked out compromises between different interests, which are embedded in legislation on topics such as competition and intellectual property?

Angela Daly, a research fellow at the Swinburne Institute of Technology (Australia) who is also about to complete a doctorate at the European University Institute (Florence, Italy), suggests that these clouds may bring little but rain. In her article, “E-book monopolies and the law”, published in the consistently topical Melbourne-based Media and Arts Law Review, she considers two particular features of e-book platforms and content: digital rights management, and competition.

The significance of these sectors is apparent. Daly sets out the successes of various players (notably Amazon and its Kindle and Kindle Store), even in markets that they have not officially entered like Australia. Quickly, however, the problem is apparent: depending on how you define the relevant market, the user can find themselves faced with high ‘switching costs’ and with limited opportunities to take advantage of all that the digital world appears to promise.

The problems of the legislative protection of DRM are well known. While restating the key criticisms (including the replacement of flexible rules with unbalanced stipulations or complex processes and presumptions), Daly also brings to wider attention the particular difficulties encountered in Australia, where the result of trade deals has been the worst of both worlds–greater US-style protection for rightsholders without even the safeguards of the US system. The key argument in this section, though, is about exhaustion (taking in the landmark European decision in the software case of UsedSoft v Oracle and the very different US decision in Capitol Records v ReDigi). It’s argued that the potential for exhaustion to protect competition and consumer interests in the e-book world is being stifled by some decisions, by whittling away exemptions for temporary copies, and above all the move from “goods” to “services” and from “sales” to “licenses”.

One consequence of poorly designed or implemented DRM legislation is the locking in of users to a particular platform. In the second main part of the paper, Daly develops a theme that runs across much of her work–the combination of potential harms to competition (especially for the ‘normal’ user who is not accustomed to complex workarounds) and wider harms to the public interest (e.g., censorship by powerful platform operators). Reviewing the various stages in the US and European Union investigations into the agency models adopted by Apple and publishers and a class action against Amazon, Daly makes some particularly telling points about the wider problems of Apple’s approach to revenue sharing, and the interaction between these models and DRM.

Daly’s proposed solution is an interesting one. She recognises that competition law brings something to the table, but although these cases and investigations may have the result of lowering prices, they do not address the underlying (detrimental) impact of the use of DRM as a tool to protect business models rather than creative works. As such, she calls for greater attention to fair use outside the US and a “fair deal for users” in trade negotiations. When so many questions are seen solely in competition terms, this more subtle approach is particularly welcome.

Since this wide-ranging article was published, the competition matters have rolled on. Apple and major publishers have settled with some of their opponents, and Amazon prevailed in a case brought by bookstores, although Amazon and the publisher Hachette are now caught up in a significant dispute about ebook prices. Disclosures regarding the activities of national security services may have shaken some user confidence in the shift to the cloud–but the reliance on DRM in the e-book sector remains clear. Daly’s deft handling of the range of issues in this sector should be of particular interest to the many committees and bodies investigating ‘copyright reform’ around the world.


Free for the Taking (or Why Libertarians are Wrong about Markets for Privacy)

Have you heard any of these arguments lately? Consumers willingly pay for the wonderful free services they enjoy using the currency of their personal information. We can’t trust surveys that say that consumers despise commercial tracking practices, because the revealed preferences of consumers demonstrate that they are willing to tolerate tracking in return for free social networking services, email, and mobile apps. If privacy law X were implemented, it would kill the free Internet (or more immodestly, the Internet).

Two recent articles take on all of these arguments and more in the context of the privacy of information collected online by private corporations. The articles are similarly entitled (before their subtitle colons), Free and Free Fall. Both are written by excellent interdisciplinary scholars, Free by Chris Hoofnagle and Jan Whittington and Free Fall by Kathy Strandburg. These articles, individually but even more taken together, present a thorough, forceful, and compelling rebuttal to pervasive libertarian paeans to the supposed well-functioning online market for personal information.

Libertarian arguments hold great sway among policymakers, particularly in the United States. Libertarian think tank types exhort policymakers to respect the unprecedented efficiencies of the well-functioning market for personal information, which has created and supports today’s vibrant Internet. For many years, privacy law scholars (myself included)—the vast majority of whom do not subscribe to these libertarian beliefs—have treated these arguments as mere distractions but have not focused much attention on responding to them. This is no surprise, as the tools of economics seem not by themselves up to the task of describing the problems we have documented. Yet this inattention has taken a toll, as it has strengthened by want of opposition the libertarian critique of many proposed privacy regulations, which some policymakers have begun to parrot. While legal scholars have done little to respond, these arguments been rebutted, capably but only incompletely, by scholars from outside the field, like economist Alessandro Acquisti and engineers and computer scientists Jens Grossklags, Lorrie Cranor, Aleecia Macdonald, and Ed Felten. Before Free and Free Fall, however, we’ve lacked a thorough and thoroughly economic rebuttal to the libertarian critique.

The core libertarian argument that drives the rebuttal in Free and Free Fall is that people “pay” for free, online services with their data. No they don’t, at least not if “payment” is supposed to represent an accurate measure of consumer preference and definitely not if “payment” means that consumers rationally give up data about themselves in exchange for free services. Both Free and Free Fall carefully marshal forth arguments why ordinary economic conceptions of payment and cost and demand and preference do not hold in the “market” for data. The two articles use different economic methodologies—Free relies on the framework of “transaction costs economics” (TCE) and Free Fall speaks in the more traditional language of market failure. But both articles describe in detail the great risks of harm people expose themselves to by allowing companies to collect so much personal information, from identity theft to insecurity to self-censorship to humiliation to unwanted association. Consumers do not “pay” with information, because they do not understand the true costs of allowing their data to be snatched.

But, the libertarians might respond, consumers expose themselves to risk of harm in commercial transactions in other contexts, and in those cases we still consider payment in the face of risk to be an accurate measure of consumer preference. To respond, Free and Free Fall document the many reasons consumers find it impossible to account for the risk of harm from online data collection: the utter lack of transparency into corporate data practices creates insuperable information asymmetries; well-documented network effects give rise to lock-in and other barriers to competitive entry; and bounded rationality prevents consumers from accurately assessing risk. Worst of all, unlike in some consumer transactions, all of these barriers persist even after the commercial transaction takes place, leading Strandburg to compare them to “credence goods” like medical treatments and legal services, which tend to be “natural subjects of regulation.” She concludes “[c]onsumers are doing what amounts to closing their eyes and taking an unknown risk in exchange for a presently salient benefit.”

Of course, many other privacy articles and books have recounted the risks of privacy harm from commercial tracking, but these two articles work a subtle but powerful reframing of how we should account for this harm. Until now, the libertarians and the policymakers they have persuaded have found it easy to discount discussions of privacy harm as separate from and outweighed by the great and unmitigated benefits of economic efficiency and growth found on the other side of the scale. A little identity theft is a small price to pay for free Facebook and Gmail, they have argued. Free and Free Fall explain how these privacy harms themselves work on the “benefits” side of the scale, because they need to be accounted for as economic inefficiencies, which diminish the economic value, measured both individually and societally, of these online services. As Hoofnagle and Whittington’s Free puts it, “[t]he financial consequences of transactions that occur with the press of a button can be of such magnitude and lasting consequence that their implications for parties can easily dwarf those of typical purchases in our economy.”

In other words, the market for personal data is dysfunctional and distorted in ways that cause profound economic inefficiencies in the form of risk of privacy harm, inefficiencies that sensible privacy regulation can help correct. We need new privacy laws not despite what they might do to economic efficiency but because they will allow the market to produce even more economic efficiency.

Both articles also explain how these skewed market forces have been subtly re-architecting the Internet in societally harmful ways. Companies are being pushed to design data extractive services in pursuit of corporate riches, even if consumers would prefer precisely the opposite. Hoofnagle and Whittington’s Free recount Google’s history with the http Referer header, which has seen the company on more than one occasion intentionally rolling back or weakening pro-privacy, pro-security advances so as not to disrupt the expectations and profits of advertisers. Although the authors do not draw this particular connection, it is fair to say that some of the worst abuses of privacy of the NSA have resulted directly from corporate decisions like these to place the desires of advertisers ahead of the wishes of users.

But it disserves these two articles to lump them together without highlighting a little of what each does that the other doesn’t. Hoofnagle and Whittington’s Free builds on the TCE work of Oliver Williamson and others to propose a rigorous and grounded methodology for taking account of all of an online transaction’s efficiencies. Strandburg’s Free Fall focuses thoroughly on the development of the market for advertising, drawing on a rich and detailed history from economists and marketing experts outside the legal academy.

There is so much more to these long articles, but rather than describe more, I’ll simply urge those in the field to read both. It might be overselling things to say that these two articles have demolished the libertarian critique of privacy law. But they do administer a thorough and long overdue drubbing of some core libertarian arguments.


Good Fences Make Better Data Brokers

Woodrow Hartzog, Chain Link Confidentiality, 46 Georgia L. Rev. 657 (2012) available at SSRN.

Since at least the early 2000s, privacy scholars have illuminated a fatal flaw at the core of many “notice and consent” privacy protections: firms that obtain data for one use may share or sell it to data brokers, who then sell it on to others, ad infinitum. If one can’t easily prevent or monitor the sale of data, what sense does it make to carefully bargain for limits on its use by the original collector? The Federal Trade Commission and state authorities are now struggling with how to address the runaway data dilemma in the new digital landscape.  As they do so, they should carefully consider the insights of Professor Woody Hartzog. His article, Chain Link Confidentiality, offers a sine qua non for the modernization of fair data practices: certain obligations should follow personal information downstream.

After 2013, it is impossible to ignore the concerns of privacy activists. The Snowden revelations portrayed untrammeled data collection by government. Jay Rockefeller’s Senate Commerce Committee portrayed an out-of-control data gathering industry (whose handiwork can often be appropriated by government). America’s patchwork of weak privacy laws are no match for the threats posed by this runaway data, which is used secretly to rank, rate, and evaluate persons, often to their detriment and often unfairly. Without a society-wide commitment to fair data practices, a dark era of digital discrimination is a real and present danger.

As Hartzog notes, “current privacy laws are too limited, subjective, or vague to effectively police the “downstream” use of information by third parties.” This is a glaring weakness in privacy law, since a given bit of data might be redisclosed dozens or even hundreds of times in new digital data markets. Hartzog’s approach would “use contracts to link recipients of personal information ,” including in those contracts “(1) obligations and restrictions on the use of the disclosed information; (2) requirements to bind future recipients to the same obligations and restrictions; and (3) requirements to perpetuate the contractual chain.”  Like the viral licensing envisioned by Creative Commons, the “chain link confidentiality” approach is designed to effect a system of data transmission that balances the flexibility of private ordering with the stability of public law.

Hartzog’s article highlights the importance of health privacy law for modeling new relationships of responsibility between data collectors, sellers, and subjects.  As he observes, “The HIPAA Privacy Rules provide that, although only covered entities such as healthcare providers are bound to confidentiality, these entities may not disclose information to their business associates without executing a written contract that places the business associate under the same confidentiality requirements as the healthcare providers.” These protections have been strengthened even further by HITECH (and the HIPAA Omnibus Rule of 2013), which impose statutory and regulatory duties on business associates and even their downstream contractors.  The health privacy protections essentially “run with the data.”

What property-like restrictions accomplish in the health data sphere, Hartzog wants to accomplish via contracts that would bind the recipients of data to terms like those imposed on the original collector. Not only would this help individuals get a handle on “runaway data;” it would also help promote the validity of research in the big data field by indicating the provenance of data.  As Sharona Hoffman showed in the article “Big, Bad Data,” if we can’t tell the provenance of data, how can we adjust for or account for potential flaws or biases in it?

Both firms and data brokers increasingly try to integrate thousands of sources of information into profiles. The profiles are actionable, whether inside or outside the firm in which they are compiled. Runaway data can lead to cascading disadvantages. Once one piece of software has classified a person as a bad credit risk, a bad worker, or a poor consumer, that attribute may appear with decision-making clout in other systems all over the economy. And it can dilute or distort findings that are increasingly based on promiscuous correlations within unstructured data sets.  Chain link confidentiality would impose some baseline of order and attribution on the new data economy.

Runaway data poses a stark choice to data policymakers. Given the number of data breaches extant, it’s only a matter of time before breachers start developing dark markets of information more sensitive than credit card numbers online.  Are we going to allow datamongers to essentially act as “fences” for this stolen data? Or are we going to keep tabs on each “hop” of data from collector to broker to user and beyond—a bare predicate for keeping illicit or inaccurate data “fenced in?”  Hartzog’s chain links point us decisively toward the latter choice—a far better future for data practices.


Empirical Link Rot And The Alarming Spectre Of Disappearing Law

Something Rotten in the State of Legal Citation trumpets an important alarm for the entire legal profession, warning us that given current modes of citing websites in judicial cases create a very real risk that opinion-supporting citations by courts as important as the United States Supreme Court will disappear, making them inaccessible to future scholars. The authors of this important and disquieting article, Raizel Liebler and June Liebert, both have librarianship backgrounds, and they effectively leverage their expertise to explicate four core premises: Legal citations are important; web based legal citations can and do disappear without notice or reason; disappearing legal citations are particularly problematic in judicial opinions; and finally, to this reader’s vast relief, there are solutions to this problem, if only the appropriate entities would care enough to implement them.

Denoting the disappearing citation phenomenon with the vivid appellation “link rot,” Liebler and Liebert explain that the crucial ability to check and verify citations is badly compromised by link rot, and then demonstrate this with frankly shocking empirical evidence. According to their research:

[T]he Supreme Court appears to have a vast problem with link rot, the condition of internet links no longer working. We found that number of websites that are no longer working cited to by Supreme Court opinions is alarmingly high, almost one-third (29%). Our research in Supreme Court cases also found that the rate of disappearance is not affected by the type of online document (pdf, html, etc) or the sources of links (government or non-government) in terms of what links are now dead. We cannot predict what links will rot, even within Supreme Court cases. (P.278).

They warn that without significant changes to current practices, the information in citations within judicial opinions will be known solely from those citations. When citations lack lengthy parentheticals or detailed explanatory text, it might not even be clear to future readers, critics or researchers why a document was cited, no less the nature of the support or clarifications it offered.

Liebler and Libert acknowledge that the Internet has improved legal research in many ways, opening up information conduits that had not been easily available before, and that in many respects website citations were an exciting development for the Supreme Court. They note that Justice Souter was the first Justice to cite the Internet in 1996, in a concurrence, and “then in 1998, Justice Ginsburg used the Internet for sources to demonstrate different meanings of the word “carry” in her dissent.” (P. 279). By 2006, all of the Justices then serving had cited at least one website. Internet based citations continued to blossom, and Liebler and Liebert’s research establishes that between 1996 and 2010, 114 majority opinions of the Supreme Court included links, but that almost one third of them are no longer working. Link rot at the Supreme Court is extant, widespread, and perfidious. Among several arresting examples they offer is the following:

In Scott v. Harris, a video with a dead link was cited extensively by both the majority and minority opinions, serving as the focal point of a serious disagreement in the case. The majority opinion states, “We are happy to allow the videotape to speak for itself.” Additionally, the majority used the citation to the video to disagree with the dissent, stating that “Justice Stevens suggests that our reaction to the videotape is somehow idiosyncratic, and seems to believe we are misrepresenting its contents.” (P. 282).

Even when information cited by the Court remains available on the Supreme Court website, it is often relocated; the old links are not amended to point to the new location, so they are as good as dead if that is what researchers quite reasonably assume them to be. Liebler and Liebert’s findings affirm research which has charted extensive link rot in many other contexts such as law review articles. Even more disturbingly, this research is in accord with “a study of federal appellate opinions [which] found that in 2002, 84.6% of Internet citations in cases from 1997 were inaccessible; moreover, 34% of citations in cases from 2001 were already inaccessible by 2002.” (P. 290-91).

Liebler and Liebert’s stunning revelations are a simple matter to confirm in the context of any subject area. For example, one of the most important copyright cases the Supreme Court has ever decided was Sony Corp. v. Universal Studios in 1984. The long and not particularly well written majority opinion set the balance between content owners like Universal Studios, and companies like Sony that produced new and innovative technologies (in this case the Betamax videocassette recorder) with respect to secondary copyright infringement. Under Sony, a new technology that was capable of substantial non-infringing uses could not be enjoined from distribution on the grounds that it contributed to copyright infringement. Sony was controversial and its convoluted drafting gave lawyers and judges the opportunity to read it into a multitude of meanings. As a copyright law geek of long standing, this author has seen that majority opinion in Sony parsed, diced, sliced by lower courts, and ultimately repackaged as a shadow of its former self by a unanimous Supreme Court in 2005 in MGM Studios v. Grokster. But at least link rot was not a worry. The same cannot be said of Grokster, wherein Justice Breyer’s concurrence has links and some of the links have already rotted. In fairness he notes that “all Internet materials … are available in Clerk of Court’s case file,” but it is not at all clear how easy it might be for a researcher to access this now, or especially five years from now. According to Liebler and Liebert, the case files are only available to those with sufficient means to go to Washington, DC, and visit the office of the Clerk of the Supreme Court. (P.300).

Another thing one learns from Something Rotten in the State of Legal Citation is that the Supreme Court often does its own web-based fact-finding. Liebler and Liebert inform readers that Allison Orr Larsen conducted a study of fifteen years of Supreme Court opinions, and “found that of the over one hundred “most important Supreme Court cases” from 2000 to 2010, 56% include mentions of facts the Justices did not find in the record and instead found independently.” (P. 278). Liebler and Liebert quote her stunning finds as follows:

[I]t was quite common for Justices to demonstrate the prevalence of a practice through statistics they found themselves. And, at a fairly high rate these statistics were supported by citations to websites—I found seventy-two such citations in my non-exhaustive search. Importantly, statistics ere independently gathered from websites with widely ranging indicia of reliability.1

While it is sort of amusing to picture the Justices surreptitiously googling themselves when they get bored during oral arguments, it’s a little disconcerting to think of them relying even briefly on misinformation-ridden sites like Yahoo Answers. Yahoo has not cornered the market on dumb, because the Internet does not have corners, but Yahoo Answers is rather infamous for exchanges such as:2

Question: Is it wrong to hate a certain race?

Answer: No, because if you are only used to running a 5k, doing a 10k with your jogging group is going to take too long. I hate 10ks myself for this very reason.

Question: Why doesn’t the Earth fall down?

Answer: Because it can fly.

Question: I plan on starting a business selling dognuts, any advice?

Answer: If you want people to eat them, I would call them doughnuts.

Question: Does vodka kill bees and wasps?

Answer: Yes, over time it will destroy their tiny livers, but it is the disruption to the home life that really takes its toll.

One wonders about the quality of the information that the Justices are finding online, and this practice is even more dangerous if link rot means that citations to the Justices’ independent research cannot be assessed or verified. And if Supreme Court Justices are engaging in the dubious practice of doing their own online research about cases before them, one has to assume lower court judges are doing so as well.

Liebler and Lieber conclude their outstanding article by recommending possible solutions to the link rot problem. “Ideally,” they say, “every court should digitally archive all materials cited within an opinion, regardless of the format.” (P. 299) They observe that:

In 2009, the Judicial Conference of the United States created a report titled Internet Materials in Opinions: Citations and Hyperlinking that recommended two primary solutions to the broken Internet link problem: Clerks should download any cited Internet resources and include them with the opinions. The downloaded Internet resources should be included as attachments on a non-fee basis in each court’s Case Management/Electronic Case Files System, such as PACER. (P. 301).

PACER is not without its drawbacks, but there are other alternatives as well, including using the Internet Archive or other internet archiving organizations, or permanent URLs. The main takeaway from this valuable article is that something needs to be done about link rot, and the problem needs to be addressed quickly and expansively. Liebler and Liebert have done a great service to the entire legal profession by bringing link rot to our attention and mapping the gigantic contours of the problem so compellingly.

  1. See Allison Orr Larsen, Confronting Supreme Court Fact Finding, 98 Va. L. Rev. 1255, 1288 (2012) (including discussion of the Justices’ use of websites to conduct research during oral argument and for opinions). []
  2. These are representative, edited versions of Yahoo Answers, screen grabs of which are on file with the author. []

What Do People Think About Copyright?

Lee Edwards, Bethany Klein, David Lee, Giles Moss & Fiona Philip, Isn’t it just a way to protect Walt Disney’s rights?: Media user perspectives on copyright, 16 New Media & Socy (2013).

When it comes to the issue of copyright in the digital age, it is not uncommon to read claims and counter-claims regarding the public’s perception of copyright enforcement and infringement through file-sharing mechanisms. Public policy in the field is often driven by assumptions that tend to be nothing more than guesswork as to their effectiveness and efficiency. While copyright policy has been the subject of several government-funded reviews in the UK in several years, these have usually failed to be conducted with the end-user of copyright works in mind, which seem to cement the idea that the subject is too complex for the public. It is therefore a very refreshing development when research is conducted to provide us with better empirical understanding of what the public really thinks with regards to copyright, going beyond mere conjecture and potential biases.

In Isn’t it just a way to protect Walt Disney’s rights?, the authors have set out to engage in an empirically sound exercise in order to ascertain the validity of various statements that are often part of copyright debates. They have put together a series of focus groups designed to get the opinion of “ordinary media users,” as the authors claim that this is a sector that does not often get their opinion represented in the copyright debates. The study’s methodology consisted in carrying out twelve focus groups based in Yorkshire, England, and each of these ranged from three to ten participants, who were recruited as pre-existing groups of media users, varying in age, background and experience with downloading media. The groups were asked to discuss topics relating to copyright, the creative industries, digital media, downloading and piracy for over one hour, and while the groups were directed, they were given a set of open-ended questions to explore the users’ experience, attitudes, and behaviour with regards to copyright.

The results of the research are fascinating. One of the most striking elements is that users seem to be confused as to what constitutes copyright infringement, a confusion that has been corroborated by other surveys in the UK (an OFCOM study found that 44% of respondents cannot identify with certainty legal or illegal content).

Another intriguing result is that, while panel members agree with some of the justifications for copyright, such as the right of creators to derive monetary gain from their works, there was a large disconnect between the creator and the copyright industry. In other words, when faced with the opportunity to download pirated content, users would display a complex array of justifications that combined rationalization and even cynicism with regards to the copyright industry.

The study unearths the very complex relationship between users and content, one that does not respond easily to the simplistic view of lazy and greedy pirates who never pay a penny for copyright works. One of the participants made this clear when describing the way in which they interact with film releases:

A film comes out in the movies. And if it looks really good, I’ll go watch it. And then, I dunno, a couple of weeks go by, and you can get a relatively good quality [copy] online and download it illegally. And, I’ll do that so I can watch it at home. And then another few months go by and I can buy the DVD for a quid.

Participants seemed to be critical at some level of the “legal” alternatives offered by the industry. While older group members seemed more content with the purchasing choices that they are presented with, younger and more technology-oriented users seemed less impressed with options presented by platforms such as Spotify and iTunes.

Another interesting finding is that users tended to describe downloading and file-sharing as something transitory, for example, to be done while there are no legal alternatives, or to be performed while you do not have enough money to purchase content legally. Similarly, the delay between a TV show being distributed in the US and Europe was identified by participants as an important factor driving piracy levels up. Users also seemed to be more comfortable with sharing content with friends and family, than to widespread and indiscriminate file sharing online.

The study concludes that:

Focus group discussions demonstrated users as complex, rational and cynical in their approach to copyright, challenging stereotypes of infringers as knowingly criminal or naively ignorant, rescuing the collapse of the public into outdated notions of pirates, and broadening the one-dimensional portrayals that sometimes lurk in the background of less user-grounded frameworks and arguments. The historically embedded criminalization of users may be difficult to dislodge, but it is vulnerable to analysis that situates everyday behaviour and views within wider social, political and cultural contexts and which allows user voices to challenge and critique dominant justifications while contributing justifications of their own.

This is a very welcome addition to copyright literature, one that gives us a hint about the complex relationship between users and copyright.


What’s So Special About Information Security

Andrea M. Matwyshyn, The Law of the Zebra, 28 Berkeley Tech. L.J. 155 (2013).

A debate continues to brew about the proper interpretation of the Computer Fraud and Abuse Act (CFAA), the federal statute that imposes criminal penalties on individuals who access computer networks without authorization.  For at least a decade, scholars and a growing number of courts have wondered whether the owner of a computer network could define “authorization” using form “terms and conditions” of the sort often presented to consumers who purchase or use digital services.  If that strategy were successful, then someone who clicked “I Agree” on a digital form yet failed to comply with all of its terms might be accused – even convicted – of the federal crime specified by the CFAA.

Andrea Matwyshyn uses that apparently technical problem to revisit a much larger question:  When, whether, and how the law should treat computers and computer networks as special in any way when dealing with a host of doctrinal and policy issues:  commercial law, intellectual property law, telecommunications law, antitrust law, criminal law, and so on?  This was the subject of a famous scholarly debate back at the turn of the 21st century between Lawrence Lessig, who argued that considering a “law of cyberspace” offered commentators access to potentially valuable insights about how people interact with each other1, and Judge Frank Easterbrook, who accused cyberspace promoters of constructing an unworkable and unhelpful “law of the horse.”2 No one “won” the debate in its original form, but in the late 1990s the question was mostly academic, literally.  Too few law and policy judgments turned on the answer to make the debate matter in any but a conceptual or theoretical sense.

Matwyshyn’s “The Law of the Zebra” suggests that the answer does matter in a concrete set of cases, and she has the case reports to show it.  Her answer is that both Lessig and Easterbrook were right:  There is something special about computers and computer networks.  But what’s special about them is that judges should not be seduced into treating them as something new and strange.  Most of the time, the common law deals with them, or should deal with them, just fine.  Courts that fail to remember that fact are dealing in a “law of the zebra,” an unusual creature, rather than a law of the horse (of course), an ordinary and more common animal.

In one sense, then, the article resumes a dialogue about metaphorical treatments of the Internet that captured the imaginations of a host of legal scholars a decade or so ago, including me.  Is cyberspace a thing?  A place?  A frontier?  A horse?  A zebra?

That question has no single answer, and Matwyshyn is smart enough not to propose one.  Instead, she wants to show how the alleged specialness of computer networks leads courts astray.  The CFAA and breaches of relevant contracts form the doctrinal backbone of an inquiry into techno-exceptionalism.

She shows how courts have dealt with contract formation and breach of contract questions in computer access contexts in inconsistent ways, and how that inconsistency has affected application of the CFAA.  She identifies her normative baseline – a series of related principles or propositions that define a common law contract framework – and argues that in contract formation questions, a degree of techno-exceptionalism is warranted; in contract interpretation and enforcement contexts, “regular” contract law will do.  Using four paradigmatic examples of “types” of computer hackers who might breach agreements with network providers – the sorts of people the CFAA was arguably drafted to deal with – she shows how her balanced form of “restrained technology exceptionalism” treats CFAA/contract law intersections.  Ordinary contract remedies are sufficient to deal with the harms that result from most types of unauthorized network access linked to bypassing agreed-to terms and conditions.  She argues that adding criminal liability under the CFAA to those remedies amounts to a sort of “weaponized” breach of contract that is warping basic contract law as applied in computer contexts, is bad policy, and arguably conflicts with Constitutional law prohibiting peonage. The proper way to look at the CFAA/contract interface, she argues, is through the prism of private ordering, a framework that is consistent both with Lessig’s view of cyberspace law (in which computer networks present novel forms of private ordering for fresh normative evaluation) and Easterbrook’s (in which existing doctrinal categories were more than adequate to that normative task).

On the doctrinal question, is she right?  Possibly.  But the doctrinal means are less important here than the policy ends.  In effect, Matwyshyn argues that contract remedies should preempt CFAA liability where the two overlap.  That sort of “reverse federalism” (“reverse” because, of course, we rarely think of state law preempting federal law) is, in a perverse way, quite consistent with a heterogeneous, anti-one-size-fits-all view of the Internet.  Matwyshyn is not making an appeal to an idealized “information wants to be free” fantasy.  Instead, she points out that the real policy at stake in interpretations of the CFAA, and in metaphorical debates about horses and zebras, is information security.  Linking criminal liability under the CFAA to breaches of the standardized, form-based terms and conditions that are essentially ubiquitous on the Internet trivializes the idea of access and undermines incentives for network providers to care properly for information that they truly care about.

  1. Lawrence Lessig, The Law of the Horse: What Cyberlaw Might Teach, 113 Harv. L. Rev. 501 (1999). []
  2. Frank H. Easterbrook, Cyberspace and the Law of the Horse, 1996 U. Chi. L.F. 207 (1996). []

The Cancer of the Internet

Finn Brunton, Spam: A Shadow History of the Internet (MIT Press, 2013).

Technologies do not come with social or legal instruction manuals. There is nothing inherent in rooftop strobe light bars to suggest that police may use them but not civilians, or in thermal imaging cameras to suggest the reverse. The public must figure out what to do with each technology as it becomes available: embrace, ignore, regulate, ban. If we are lucky, the rules distinguishing acceptable from forbidden uses can come, over time, to seem like natural features of the technology itself. But they are not: the rules have to come from somewhere, and someone had to work them out, somehow.

For an example, consider today’s debates on what to do about drones. Or for another, consider spam, the subject of Finn Brunton’s erudite and entertaining Spam: A Shadow History of the Internet. Brunton pushes his history far back before the 1994 advertisment from a pair of immigration lawyers that is usually thought of as spam’s Ground Zero. He notes, for example, a 1971 antiwar message sent to every user of the Compatible Time-Sharing System and a 1978 announcement of a DEC computer demonstration sent to all West Coast ARPANET users–both of which provoked debate around the acceptable boundaries of network use. Brunton argues that well into the 1990s, spamming was considered a primarily social offense, separate and distinct from commercial self-promotion, and of an entirely lesser order than “net abuse” (P. 39) like crashing computers. Spam was a form of free speech, and like other inappropriate speech was to be met with censure rather than censorship.

But this attitude changed, and changed sharply, as the first wave of commercial spammers arrived en masse. Unlike the earlier “spammers,” who could be telephoned and reasoned with, or shamed into silence, or simply identified and ignored by users’ personal message filters, these new operators both flaunted their identity as outsiders to close-knit online communities and aggressively covered their tracks to keep the messages getting through. In the face of these new actors, Brunton shows that spam was effectively redefined as a legal and technical problem rather than a social one. To many antispam activists, the great danger of CAN-SPAM was that it would legitimize spam. But the combination of a legislative framework with reasonably effective filtering had another effect entirely–it “destroyed email spam as a reputable business model,” (P. 143) and “eliminated the mere profit-seeking carpetbaggers and left the business to the criminals.” (P. 144)

Spam is thoughtful about the ontology of its namesake. We are accustomed to thinking of spam as an email phenomenon. But, as Brunton effectively demonstrates, email spam is only one instance of a much larger pattern. Today there are Facebook spam, LinkedIn spam, blog comment spam, Twitter spam–and many more. Indeed, spam’s contested definitions create any number of difficult boundary cases. Gmail’s inbox tabs shunt “Promotions” into a separate folder, even when the recipients have affirmatively opted into receiving these emails. Or, to take one of Brunton’s examples, Demand Media “commissions content from human writers (who are willing to meet very low standards for very little money) on the basis of an algorithm that determines ad revenue over the lifetime of any given article.” (P. 162)

Brunton’s own definition of spam, offered at the end of the picaresque tour, is “the use of information technology infrastructure to exploit existing aggregations of human attention.” (P. 199) Both halves are exactly on point. Spam is medium- and technology- agnostic, but it is inherently a technological phenomenon: without the amplifying power of commodity copying, spam’s characteristic bulk is impossible. And spam is essentially a problem of attention hijacking: the systematic conscription of large and diffuse audiences by abusive speakers.

Much of Brunton’s story of spam is told through the eyes of its enemies, from the vigilantes who made tried to burn out commercial spammers’ fax machines to the modern programmers who build increasingly complex filters to identify and delete spam. Significantly, this is history through the eyes of its losers: the story of the tide as related by King Canute. Brunton conveys effectively the sheer frustration felt by anti-spam activists. The network they loved was being abused by outsiders who pointedly rejected their values, but they found themselves unable to stop the abuse. One countermeasure after another fell before the onslaught: killfiles, cancelbots, keyword filters, blackhole lists, and so many others.

Roughly the second half of the book is devoted to the remarkable technical evolution of computer-generated spam. Brunton traces the rise of keyword stuffing, hidden text, Oulipo-esque email generators, spam blogs, content farms, Mechanical Turk-fueled social spam, CAPTCHA crackers, Craigslist bots, malware as a source of spam, and online mercenaries renting out botnets to the highest bidder. This escalation–from a pair of immigration lawyers in over their heads to a “criminal infrastructure” industry (P. 195) in less than two decades–is nothing short of alarming.

Spam is also one of the most nuanced books to unpack what makes the postmodern post-Web 2.0 Internet tick. Borrowing Matt Jones’s concept of “robot-readable” media–“objects meant primarily for the attention of other objects” (Pp. 110-11)–Brunton gives an insightful metaphor of the uneasy coexistence of human and software readers online:

Consider a flower–say, a common marsh marigold, Caltha palustris. A human sees a delightful bloom, a solid and shiny yellow … A bee, meanwhile, sees something very different: the yellow is merely the edging around a deep splash of violet invisible to human eyes–a color out on the ultraviolet end of the spectrum known as “bee violet.” It’s a target meant for the creature that can fly into the flower and gather pollen. The marsh marigold exists in two worlds at once. (P. 110)

The visible language of QR codes and the invisible language of HTML tags are not meant for human consumption. They are there for our computers, not for us. But when we rely on those computers to find interesting things and show us the results, we leave ourselves open to a new kind of vulnerability:

If their points of weakness can be found, it is quite possible to trick our robots, like distracting a bloodhound with a scrap of meat or a squirt of anise–giving it the kind of thing it really wants to find, or the kind of thing that ruins its process of searching. The robot can be tricked, and the human reached: this is the essence of search engine spamming. (P. 113)

Brunton describes the current state of affairs, in which spammers and spam filters are locked in an arms race to master human linguistic patterns, as a parody of the Turing Test, “in which one set of algorithms is constantly trying to convince the other of their acceptable degree of salience–of being of interest and value to the humans.” (P. 150) And in the book’s conclusion, he circles back to spam’s central irony:

Indeed, from a certain perverse perspective … spam can be presented as the Internet’s infrastructure used maximally and most efficiently, for a certain value of “use.” … Spammers will fill every available channel to capacity, use every exploitable resource: all the squandered central processing unit cycles as a computer sits on a desk while its owner is at lunch, or toiling over some Word document, can now be put to use sending polymorphic spam messages–hundreds a minute, and each one unique. So many neglected blogs and wikis and other social spaces: automatic bot-posted spam comments, one after another, will fill the limits of their server space, like barnacles and zebra mussels growing on an abandoned ship until their weight sinks it. (P. 200)

Spam, in other words, is the cancer of the Internet. It is not an alien organism bent only invasion and destruction. Rather, it takes ordinary healthy communications and extrapolates them until they become grotesque, obscene, deadly parodies of themselves. Spam is constantly mutating, and it cannot be extirpated, not without killing the Internet, because the mechanisms they rely on to live are one and the same. The email is coming from inside the house.