Separating Fact from Fiction in the Senate IP Subcommittee’s AI Hearing

by Kevin Madigan and Rachel Kim

On July 12, the Senate Judiciary Committee’s Subcommittee on Intellectual Property held a hearing to discuss some of the issues and possible solutions relating to copyright and generative artificial intelligence (AI). While the hearing touched on a variety of AI-related topics, the Subcommittee members were most interested in the legal, business, and technical implications of AI ingestion of copyrighted works, as well as whether existing copyright law can effectively address the concerns of the creative community.

The Pitfalls of a Potential Opt-Out Approach

Throughout much of the hearing, Subcommittee members focused on the effectiveness and implications of an “opt-out” approach that would ostensibly allow copyright owners to prohibit their works from being ingested into AI systems. Some AI developers, like Stability AI, have implemented an opt-out model in response to objections from the creative communities about the indiscriminate scraping of copyrighted works to build these developers’ commercial AI models without consent, credit, or compensation to rights holders.

Under the Copyright Act, a user must seek permission from the copyright owner to use a work, unless there is a relevant exception in the law that obviates the need to seek permission (fair use is the most common of these exceptions.) Nowhere in the law does the onus fall on the copyright owner to take on the burden of determining who is using their copyrighted works and how, and then to pro-actively opt out of any uses they do not approve of. That type of opt-out approach is tremendously inefficient and inequitable—especially since the AI developer provides no direct notice to copyright owners that their work(s) is being used by the AI system. Importantly, it would also require a change in the law.

Unfortunately, in the AI context, a multitude of copyright owners have been forcibly subjected to a disenfranchisement of their copyrights by AI companies. Karla Ortiz, a visual artist and one of the witnesses in the hearing, noted in her response to Senator Hirono’s question of whether Ortiz would license her works to AI developers: “It’s about being able to have that choice. And artists don’t have that right now.” This was echoed by fellow witness, Jeff Harleston, General Counsel for Universal Music Group, who noted: “Some artists just don’t want their art distributed in certain ways.”

In current conditions, artists and other creators are not given this choice in the AI context. More often than not, these artists and other creators have no idea whether their works are being used without their permission. No one is reaching out to notify them, and even if they were to proactively search out who is using their works, in most cases they would not be able to find out because many AI systems do not disclose that information or make it easily accessible. As Ortiz pointed out in the hearing, there is generally a lack of transparency by most AI developers regarding their training data sets, which makes it very difficult (if not impossible) for copyright owners to determine whether and to what extent their works are ingested. In addition to the legal requirement that users of works must acquire affirmative consent to use copyrighted works, there is a practical element at play here as well. In a scenario where only the AI developers know what works have been fed into a system, it nonsensical to require copyright owners to somehow determine what works of theirs are being ingested and then force them to take steps to opt out.

Testifying on behalf of Stability AI, Head of Public Policy Ben Brooks spoke about developing AI models with appropriate safeguards and providing transparency through “open” models. While transparency and safety are welcome goals, Brooks’ comments on creator consent and “opting out” were less comforting. When asked by Senator Tillis whether creators are made aware that their copyrighted works are being ingested by Stability and whether they are compensated, Brooks would not give a yes or no answer, but instead said that they are working on incorporating a system which would honor opt-out requests by creators so that their works would not be used to train future Stability AI models. He also said that Stability respects robot.txt protocols that disallow web crawlers from accessing material on a particular website. Later in the hearing, when Senator Hirono pressed him over whether Stability AI asks for creators’ consent or compensates them, Brooks said that they do neither.

The opt-out theory promoted by Brooks and other AI developers is flawed by issues of practicality, and as noted above, a general misconstruction of the rights of copyright owners under existing law. First, because creators’ works have already been used without their consent to train Stable Diffusion and other AI tools, the opportunity to opt out has passed, and a particular work cannot be removed from the system—essentially meaning that once a work has been ingested by the AI, the value from and effects of that cannot be undone or excised from the AI. There is no way to put the AI toothpaste back in the AI tube.

Brooks said that while it is true that the tools that have already ingested copyrighted works can’t undo the ingestion process, the development and training of new iterations of a tool can be respectful of opt-out requests. That’s all well and good, but it ignores the key problem that a creator’s work may already be part of a dataset that is used by other AI system developers that either aren’t made aware of a creator’s desire to opt out or simply don’t respect an opt-out approach. Moreover, as a line of questioning from Senator Padilla made clear later in the hearing, processing massive amounts of inputs, filtering for opt outs, and then training or re-training the system is an “overwhelming and unfeasible” situation.

Finally, as to the robot.txt argument, Brooks and other AI developers conveniently ignore the fact that robot.txt is a blunt tool and implementing it means that websites and creativities on sites that use robot.txt will not show up in search results, which would severely limit a creator’s ability to commercialize their work. It also ignores the fact that there is no obligation that online platform to respect robot.txt.

But again, more troubling than issues of practicality is AI developers’ disregard for the fact that the Copyright Act affirmatively grants creators and copyright owners the right to exert real control and exercise their rights by affirmatively agreeing to have their works ingested by AI systems. To start in a default position where AI developers will use works, without any notice and unless told otherwise is an affront to the rights of copyright owners and creators and centuries of copyright law jurisprudence. Many AI developers have not been transparent about which and what kinds of copyrighted works were used in their AI training data sets. But from the little information we have thanks to the work of investigative journalists and other technologists, we know that many AI developers have already used massive amounts of copyrighted works without authorization, which implicates (and likely violates) copyright owners’ rights under Section 106 of the Copyright Act. This “use first, ask for permission later” approach is unfortunately common of some companies, but it could result in drastic infringement liability.

Licensing for AI Ingestion? It’s Totally Possible

The Senators’ questions also turned to the feasibility of licensing works for AI ingestion. As we point out in our position paper, licensing copyrighted works for AI use exists, and has existed in various, adaptable forms in various industries for many years. Senators directed most of these questions to Harleston, asking whether precedent in the music industry and its licensing arrangements with digital streaming platforms could inform how AI licensing of copyrighted works should look and who should be involved. Senator Hirono in particular asked about the feasibility of licensing vast quantities of copyrighted works for AI use.

Harleston pointed out that even within the vast complexities of the music licensing system, the music ecosystem exemplifies the abilities of the creative industries to adapt and be flexible for the needs of their audiences and to the technological times. The caveat is that AI developers need to come to the table with copyright owners. Harleston stated: “we need help to make sure everyone understands that there are rights that are affected and that the activity that is happening now is violative . . . We could license, but there has to be an initiative on the side of the companies to reach out.” He later added to this line of thought, noting that at the end of the day, consent is key and artists should not be robbed of their choice to permit their works to be used for AI ingestion.

Ingestion of Copyrighted Works for AI Training is not “Usually” Fair Use

Another important topic that came up during the hearing was whether fair use is an applicable defense to the unauthorized use of copyrighted materials to train generative AI systems. Addressing this issue, Matthew Sag, a Professor of Law at Emory University School of Law, suggested that ingesting copyrighted works usually qualifies as fair use because it is considered a “non-expressive” use. Further, he claimed that that generative AI models do not copy inputted materials but rather “learn” from them—which was also a main point of Brooks’ testimony. Sag’s and Brooks’ comparison of the operations of a generative AI machine to the way the human brain works is part of a larger push amongst AI developers to anthropomorphize AI, which allows them to claim that no copies are actually made of anything when AI system train on copyrighted works. But, simply put, AI models do not “learn” from copyrighted works or “create” new works the same way that humans do, and those who claim that they do devalue human creativity.

When questioned by Senator Blackburn about fair use, Sag made several errors in his legal analysis. For example, he emphasized the importance of transformativeness—even after the Supreme Court made it clear in the recent landmark Warhol v. Goldsmith decision that whether a use is transformative is just one of many considerations under the first factor and is in no way determinative of fair use. He also downplayed the importance of the substitutional nature of AI output under the fourth factor, even though the Supreme Court in Warhol instructs the exact opposite approach.

Professor Sag’s position on fair use and non-expressive copying is similar to arguments presented by OpenAI and other AI developers that insist that (1) no copying occurs when material is ingested for training, (2) even if copying occurs, AI systems only copy “non-expressive” (and thus non-protectable) elements of the ingested works, (3) AI training is a transformative use, and therefore constitutes fair use, and (4) there is only infringement liability if generative output is substantially similar. However, none of these claims are correct, and Congress should recognize that they are being used to validate wrongful acts of massive infringement that have already taken place.

First, it is simply wrong to say that copying doesn’t occur at some point of the ingestion/input stage (or in preparation of a dataset), regardless of whether a copy of a work is stored. At the very least, where a copy of an ingested work is made in the random-access memory (RAM) of the computer system, courts have made clear that that copy qualifies as a reproduction under section 106(1). And while there is an exception in section 117 of the Copyright Act to excuse the making of RAM copies that are made as part of computer maintenance or repair, that exception is very narrow and does not apply to the unauthorized ingestion of copyrighted works by AI systems.

Second, it is inaccurate and insulting to creators to claim that what AI systems copy is merely unprotectable data. This position is based on the assertion that when datasets are created for AI training purposes, expressive works of authorship are reduced into “data” about the “relationships” between elements of a work that is then processed by an algorithm. However, even if works are eventually being converted into a machine-readable format, that would not excuse the fact that infringing copies of the works—as they exist in their original form— are being made. Moreover, simply because a work is converted into a format that can more easily be ingested by an AI system does not mean that the work suddenly ceases to include copyrightable expression or loses copyright protection. The value of generative AI tools is based on the high-quality, expressive works that they ingest, and to characterize a copyrighted work as unprotectable “data” is to strip it of the critical essence by which it avails itself of copyright protection: its expressive value and human creativity. To support his position that AI ingestion is non-expressive copying, Sag referenced cases where courts addressed reverse engineering, search engines, and plagiarism detection software, and found that they constitute fair use. But while these cases may provide insight into how a court would approach similar activities, the ingestion of copyrighted works for generative AI uses is unlike anything courts have considered in the past, and the distinguishable characteristics and novel technology involved cannot be categorically defined as fair use.

Third, arguments in favor of the transformative nature of generative AI (and how heavily that would impact a fair use analysis) are misguided. No court has found that ingestion of copyrighted works by generative AI systems is a transformative use—let alone a use that categorically qualifies for the fair use exception. In many cases, the output of generative AI systems—whether it is a literary work, image, or piece of music—serves the same purpose of the ingested works, and courts would be unlikely to find such use to be transformative. But even if they did, that does not mean that AI ingestion is a fair use. In the past, a finding of transformative use would impact the other three fair use factors and almost certainly result in a finding of fair use. However, in Andy Warhol Foundation v. Goldsmith, the Supreme Court made unequivocally clear that whether a use is transformative does not control a fair use analysis. Rather, transformative purpose is merely one of many factors that are considered under the first fair use factor. Therefore, claims by some AI developers that the transformative nature of generative AI means it automatically qualifies as fair use are clearly not supported by the law, and any fair use analysis involving generative AI must be done on a case-by-case basis and involve consideration of all four fair use factors.

Finally, claiming that infringement is only possible if the output of the AI system bears too much resemblance to an ingested work is wrong because it ignores the “input-stage” infringement that occurs when an unauthorized reproduction is made. Section 106(1) of the Copyright Act vests copyright owners with the right to prevent the reproduction of their copyrighted works. When an unauthorized copy is made of a work protected by copyright, there is a violation of the copyright owner’s right to reproduce the work, absent a valid defense. This “stand-alone” right is distinguishable from any output stage infringement that is subject to a substantial similarity analysis.

Infringing Copyright vs…Well…Actually Innovating

Adobe’s witness, Dana Rao, made an interesting point when answering Chairman Coons’ question of the implications or consequences of Adobe’s approach to licensing its training data to develop its generative AI products. He pointed out that while Adobe’s approach of licensing works, using openly licensed works, or using public domain works for AI training indeed resulted in a limited training data set that didn’t initially produce desirable outcomes, Adobe’s technology team responded by focusing on innovating the software behind its generative AI products to produce desirable outputs.

This is a stark contrast to the statements made just moments before, when Brooks from Stability AI said to Senator Coons that the company believed that ingestion of every and any work that ever existed on the Internet was pivotal to reduce bias and to achieve the best output. While Rao acknowledged the benefits of a vast training data set, his response reflected that Adobe’s more careful and considered approach to respecting the copyright implications of AI ingestion drove the technological innovation and ingenuity that make their generative AI models competitive—putting to rest the tired arguments that licensing copyrighted works is a barrier or impediment to AI development. Adobe was able to account for bias and other concerns using technological innovation while maintaining a considered approach to its training data. It’s time other AI companies follow Adobe’s lead by complaining less and innovating more.

Technological Measures; Important, But Are They Adequate Safeguards Against Ingestion?

In the vein of the opt-out conversations, the point was made by both Brooks and Rao that technological tools like metadata tagging such as the “do not train” tag promoted by Adobe’s Content Authenticity Initiative, are vital in helping ensure that copyright owners can effectively prohibit the ingestion of their works by AI systems. Indeed, such measures are an important step in the right direction when it comes to making sure copyright owners’ rights and choices are respected. It promises copyright owners the ability to signal and notify others of the stipulations in the use of their works and their choice in not having their works used to train AI models. However, the question is: what are the practical implications and the complete picture of using these tools?

The answer is that the promises of technological measures are counterbalanced or even negated by the practical implications and consequences of actually deploying such measures. As evidenced in this hearing, technological measures are brought up in response to the question of how copyright owners can meaningfully enforce their rights against AI ingestion, with robots.txt being an oft-cited example. However, the reality is that some of these measures, like robots.txt, are often a Hobson’s choice. In the AI context, this potentially looks like a copyright owner deciding between using measures that severely impact their ability to digitally market and distribute their works for their audiences or having potentially their entire repertoire of copyrighted works ingested by AI.

Moreover, Ortiz made the point in the hearing that she is a visual artist and not a programmer or technology expert. It’s an iteration of an unfortunately common story of burden shifting where technological requirements are pushed on creators and copyright owners in lieu of those who are facilitating infringement actually implementing infringement-fighting measures. Despite not being a tech expert, Ortiz shared that according to technical experts in the field, such mechanisms and filters can easily be bypassed anyway.

Conclusion

The Senate hearing, along with other AI hearings in Congress and initiatives by the White House and various federal agencies, shows that lawmakers and the government have a genuine interest in ensuring accountability and responsibility in the generative AI space. Though it’s clear that AI technologies are here to stay, foundational principles of copyright law must not be cast aside in favor of AI developers whose indiscriminate, unlicensed, and unauthorized scraping of copyrighted works has done incredible damage to creators and copyright owners everywhere. It was clear from the witnesses’ testimony that most creators and copyright owners support the advancement of AI technologies, they simply want it done responsibly, respectfully, ethically, and in a way that upholds the underlying goals and purposes of our copyright system.

If you aren’t already a member of the Copyright Alliance, you can join today by completing our Individual Creator Members membership form! Members gain access to monthly newsletters, educational webinars, and so much more — all for free!

get blog updates