septembre 30 2024

California Passes New Generative Artificial Intelligence Law Requiring Disclosure of Training Data

Share

On September 28, 2024, Governor Gavin Newsom signed into law AB 2013, which is a generative artificial intelligence (“AI”) law that requires developers to post information on their websites regarding the data used to train their AI systems. Below are the key points regarding this new generative AI law:

  • Who does the law apply to? The law applies to AI developers, which is defined broadly to mean any person, government agency, or entity that either develops an AI system or service or “substantially modifies it,” which means creating “a new version, new release, or other update to a generative artificial intelligence system or service that materially changes its functionality or performance, including the results of retraining or fine tuning.”
  • What does the law regulate? The law regulates “generative artificial intelligence,” which is defined as AI “that can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence’s training data.” The law also adopts a common definition for AI that we have seen under other laws, such as the EU AI Act, Colorado’s AI law, and the recently passed California AI Transparency Act. AI is “an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.”
  • When does the law go into effect? The law applies to generative AI released on or after January 1, 2022, and developers must comply with its provisions by January 1, 2026.
  • What do developers need to do for compliance? If a developer makes a generative AI system publicly available to Californians, it must post on its website documentation regarding the data used to train the system or service. The elements that the developer must include on the website are:
  • The sources or owners of the datasets;
  • A description of how the datasets further the intended purpose of the AI system or service;
  • The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets;
  • A description of the types of data points within the datasets (e.g., types of labels used or general characteristics);
  • Whether the datasets include any data protected by copyright, trademark, or patent or whether the datasets are entirely in the public domain;
  • Whether the developer purchased or licensed the datasets;
  • Whether the datasets include “personal information” or “aggregate consumer information” as those terms are defined under the California Consumer Privacy Act;
  • Whether the developer cleaned, processed, or modified the datasets and the intended purpose of those efforts in relation to the AI system or service;
  • The time period during which the data in the datasets was collected, including a notice if the data collection is ongoing;
  • The dates the datasets were first used during the development of the AI system or service; and
  • Whether the generative AI system or service used or continuously uses synthetic data generation in its development. The developer may include in its answer a description of the synthetic data’s functional need or desired purpose based on the intended purpose of the AI system or service.
  • Are there any exemptions? Yes. The law does not apply to generative AI systems or services (A) whose sole purpose is to help ensure security and integrity, such as AI intended to detect security incidents; resist malicious, deceptive, fraudulent, or illegal actions; and ensure the physical safety of natural persons; (B) whose sole purpose is to operate aircraft in the national airspace; and (C) developed for national security, military, or defense purposes and that are made available only to a federal entity.

Takeaways

California’s new AI law underscores the importance of AI developers maintaining a data provenance record that traces the lineage of data used to train AI systems and taking steps to be transparent about how they develop AI, including through trust centers on websites. Developers should consider adopting technology that automates this process in order to operate at scale. Moreover, companies that integrate their AI offerings with a foundation model should consider the impact of this new law because it could apply to developers that fine-tune or retrain AI systems or services.

Compétences et Secteurs liés

Stay Up To Date With Our Insights

See how we use a multidisciplinary, integrated approach to meet our clients' needs.
Subscribe