Generative Artificial Intelligence and Intellectual Property
Generative artificial intelligence (“AI”) is poised to transform business in pivotal ways that may overshadow the significant developments already wrought by personal computers, the internet, and handheld wireless devices. While early use of AI focused on reaching a decision or checking a factual circumstance—does the radiology image indicate cancer, or does the face match the reference person—“generative” refers to the use of AI tools to create images, textual works, music and various other content, typically in response to prompts entered by human users. Such tools have become widely available, with ChatGPT as a prime example. The output of such tools may be used as a substitute for human work, such as the use of an image generator to illustrate a print advertisement, a chatbot to answer customer service questions, or an AI system to identify and design pharmaceutically promising chemical compounds. In such uses, there are arguably many “creators”—the programmer of the tool, the supplier of the training data, the user of the tool, and the tool itself. The burgeoning use of this new technology raises many questions about who or what (if anyone, or anything) owns the works created using these programs and what steps companies should take to minimize the IP risks attendant in the training and use of these tools.
COPYRIGHT AND PATENT PROTECTION OF WORKS CREATED WITH GENERATIVE AI
The US Copyright Office and US Patent and Trademark Office have each been asked to protect works or inventions created with AI. While some of these requests have been in the nature of stunts designed to provide a legal test case, generative AI is advancing so rapidly that the question now has immediate practical implications. As of now, however, the agencies currently will not recognize AI programs as “authors” of copyrightable works or “inventors” of patentable inventions on the grounds that the plain language of the Copyright Act and the Patent Act each require that the creator be human.1 Neither Congress nor the judiciary have yet taken any steps to alter these conclusions, although the USPTO, the USCO, and Congress are actively considering the implications of AI for authorship and inventorship.
When it comes to AI-generated content, it is often difficult to determine at what point the content can be considered sufficiently human-authored to be eligible for copyright protection. In a March 16, 2023, policy statement, the US Copyright Office clarified its stance on AI-generated works and their eligibility for copyright protection. In short, the Copyright Office will not register works whose traditional elements of authorship are produced solely by a machine, such as when an AI program receives a prompt from a human and generates complex written, visual or musical works in response. According to the Office, in these cases, the AI technology, rather than the human user, determines the expressive elements of the work, making the generated material ineligible for copyright protection. However, if AI-generated content is artfully arranged or modified by an artist such that the modifications meet the standard for copyright protection, the work can be registered in the name of the human artist.
The March 2023 policy statement also states that copyright applicants have a duty to disclose any AI-generated content in a work submitted for registration, together with a description of the human author’s contributions to the work as distinct from the AI program.
Although the recent guidance is useful to artists, writers, and AI researchers, an unanswered question remains: If works produced by generative AI algorithms are not eligible for copyright, what is their legal status? As of now, such a work is in practice part of the public domain from a copyright perspective (although their use could still violate a binding agreement governing the work’s use).
INFRINGEMENT OF WORKS USED TO TRAIN AI TOOLS
AI tools are able to generate output in response to a user prompt because programmers have exposed those systems to vast quantities of visual images, text or other information, dubbed “training data”. Since many images and texts used as training data are copyrightable, litigation over whether use of such content to train the AI tools, or the output itself, constitutes copyright infringement has ensued. Processing training data and using AI-generated works all pose a risk of infringement claims until we obtain further clarity from the courts or Congress on the concomitant legal issues, including whether such use of training data constitutes fair use under the Copyright Act.
As of this writing, there have been no major legal decisions discussing the relationship between copyrighted training data and AI-generated works or the underlying copyright issues. However, several pending and previously decided cases will likely inform the analysis.
In early 2023, stock photo provider Getty Images sued Stability AI, accusing the AI company of unlawfully using more than 12 million copyrighted images from the Getty website to train its Stable Diffusion AI image-generation system. According to Getty, “Stable Diffusion at times produces images that are highly similar to and derivative of the Getty Images proprietary content that Stability AI copied extensively in the course of training [its] model” and the output sometimes even includes “a modified version of a Getty Images watermark, underscoring the clear link between the copyrighted images that Stability AI copied without permission and the output its model delivers.”2
In another pending lawsuit, Andersen v. Stability AI et al., three artists sued AI companies Stability, Midjourney and DeviantArt on behalf of a putative class for direct and vicarious copyright infringement. The artists claim that the AI companies used their copyrighted works without authorization to train AI programs to create works in their artistic style, which in turn allows users to generate unauthorized derivative works. According to the complaint, this practice “siphon[s] commissions from the artists themselves,” whose jobs may be “eliminated by a computer program powered entirely by their hard work.”3
In these cases, courts may have to clarify the bounds of what constitutes a “derivative work” under copyright law in the AI context and whether use of the copyrighted works to train the AI models constitutes fair use.
Although the question of fair use is likely to be fact intensive, one useful existing precedent is Author’s Guild v. Google, Inc. In that case, which was litigated over the course of a decade from 2005-2015, authors argued that Google was engaged in widespread copyright infringement when it scanned, rendered machine-readable, and indexed the full text of more than 20 million books in connection with its Google Books library project. The court ultimately sided with Google, finding fair use and noting that “while authors are undoubtedly important intended beneficiaries of copyright, the ultimate, primary intended beneficiary is the public”.4 The court saw Google’s use of copyrighted books as ultimately “[communicating] something new and different from the original” and expending utility to serve copyright’s “overall objective of contributing to public knowledge”.5 While it is impossible to predict how courts will come out on these issues and it is highly likely that different courts will reach different conclusions in the early stages of judicial interpretation of the issues, Authors Guild v. Google suggests one argument among many others that AI tool providers are likely to assert in arguing for a finding of fair use.
Training data cases have not been limited to images or copyright claims. On behalf of a putative class of computer programmers, a Doe lawsuit was brought against GitHub and others alleging that the use of open source code from the GitHub repositories violated the applicable open source licenses, a claim that recently survived in part a motion to dismiss.
All of these cases are in very early stages, and companies need to pay close attention to the evolving legal landscape. Just in the past two months prior to publication, putative rightsholders have filed several additional lawsuits. This activity reflects both an active plaintiff’s bar in this emerging area and a prevailing sense among rightsholders that AI tools present a competitive threat to their business models.
RECOMMENDATIONS FOR COMPANIES ENGAGED IN THE USE OF AI
In order to reap the many benefits of AI (including generative AI), companies must be aware of and make efforts to mitigate the attendant risks.
While the court system and legislators work on establishing guidelines and parameters around ownership and use of AI-generated materials, it is wise for companies to engage in the following practices:
- Set a company AI policy addressing acceptable AI tools and use parameters.
- Before using AI-generated content, find out from AI providers whether their models were trained with any copyrighted content. Review the terms of service and privacy policies of AI platforms and avoid generative AI tools that cannot confirm that their training data and software components are properly licensed or otherwise lawfully used.
- In due diligence for mergers and acquisitions implicating AI, unless a target used its own data, buyer's counsel should diligence how the training data was acquired.
- Include provisions on generative AI usage in contracts with vendors and customers such as: (1) requirements that the use of AI be disclosed or that certain guardrails be met (e.g. no unlicensed or otherwise unlawful content in data sets), (2) covenants regarding rights to data sets, and (3) indemnification for potential intellectual property infringement, including as caused by a failure of the AI companies to properly license data input.
- For content creators, (1) include terms of use on website prohibiting scraping,6 (2) review platform terms of use if posting original content to social media platforms, and (3) proactively apply for copyrights, as registration is required for enforcement purposes.
1 On September 15, 2022, artist and AI researcher Kristina Kashtanova was granted a copyright registration for a graphic novel entitled Zarya of the Dawn. Although Ms. Kashtanova had identified herself as the sole author of the work on the application, it became public that Ms. Kashtanova had used an AI tool (MidJourney) to generate many of the images in the work. After an investigation, the Copyright Office canceled the original copyright certificate and issued a new one that excluded the artwork generated by AI, but preserved Ms. Kashtanova’s rights in other aspects of the work, such as the arrangement of the images and the text.
In July 2019, artificial intelligence researcher Dr. Stephen Thaler filed two patent applications under the inventor name “DABUS,” an acronym for his AI program. When the applications were denied, Dr. Thaler filed a lawsuit in the Eastern District of Virginia. The district court and Federal Circuit each affirmed the USPTO’s finding that only human beings can be inventors, and, on April 24, 2023, the US Supreme Court denied a petition for certiorari. See Thaler v. Vidal, No. 22-919, certiorari denied (U.S. Apr. 24, 2023).
2 Getty Images (US) Inc. v. Stability AI, Ltd. and Stability AI, Inc., No. 1:23-cv-00135-GBW (D. Del. March 29, 2023) Amended Complaint at ¶¶ 61-62 (Dkt. 13).
3 Andersen v. Stability AI et. al., No. 3:23-cv-00201 (N.D. Cal. January 1, 2023) Complaint at ¶¶ 8-9 (Dkt. 1).
4 Author’s Guild v. Google, 804 F.3d 202, 212 (2d Cir. 2015).
6 Web scraping is the process of extracting data from websites. In the AI context, this extracted data then becomes part of a training set.