Generative AI training in the crosshairs as ICO set to examine legality of personal data use
Generative AI training methods have sparked controversy in recent months due to concerns over data privacy and an influx of lawsuits against major industry players. The Information Commissioner’s Office (ICO) is now stepping in to scrutinize the legality of these training methods and their use of personal data.
The development of large language models (LLMs) like ChatGPT, which heavily rely on massive amounts of data collected through web scraping, has garnered significant attention. However, this practice has raised alarming questions about data privacy and potential copyright infringements for developers.
The ICO recognizes the need for clarity in the AI space and plans to engage with developers to shed light on various aspects of data protection laws as they pertain to generative AI. Key areas of concern include determining the appropriate lawful basis for training generative AI models, understanding the application of the purpose limitation principle in generative AI development and deployment, ensuring compliance with the accuracy principle, and meeting expectations in terms of data subject rights.
Over the next few months, the ICO will release guidance outlining how the UK GDPR and the Data Protection Act (2018) affect generative AI training methods. The ICO aims to offer certainty to the industry regarding its obligations while safeguarding individuals’ information rights and freedoms.
Stephen Almond, the ICO’s executive director for regulatory risk, emphasized the transformative potential of generative AI for society. He stated, This call for views will help the ICO provide industry with certainty regarding its obligations and safeguard people’s information rights and freedoms.
Under the UK GDPR, the purpose of data processing must be legitimate and necessary, with the individual’s interests not outweighing the pursued interest. The ICO’s current stance is that legitimate interests can serve as a valid basis for training generative AI models using web-scraped data, as long as developers can pass a three-part test.
Developers can define their interest as either the business interest in model development for commercial gain or wider societal interests, as long as they can clearly specify the model’s purpose and use. Necessity poses a challenge, as most generative AI training currently relies on data obtained through large-scale scraping. The ICO emphasizes the need for a balancing test, which becomes more complex depending on whether the developed generative AI model is deployed by the initial developer, a third-party via an API, or simply provided to third parties.
The ICO plans to engage with various stakeholders throughout the technology industry, including generative AI developers and users, legal advisors, consultants, civil society groups, and public bodies with an interest in generative AI.
The first consultation period will conclude on 1 March, with subsequent consultations anticipated during the first half of this year to address additional concerns, such as the accuracy of generative AI outputs.
As this examination unfolds, the ICO aims to strike a balance between fostering responsible development and deployment of generative AI for the benefit of society, while ensuring data protection remains paramount.
By addressing the legal considerations surrounding generative AI training methods, the ICO aims to provide much-needed clarity and guidance to this rapidly evolving field, fostering responsible practices and upholding individuals’ privacy rights.