Major Newspapers in Talks with OpenAI to Pay for Access to Digital News Stories
OpenAI, the creator of ChatGPT, is currently engaged in discussions with several major newspapers regarding access to digital news stories. These talks are a result of the growing demand for fair compensation among publishers and data owners in the face of the rapidly expanding market for generative artificial intelligence, which is projected to reach $1.3 trillion by 2032.
In response to tech companies freely using news stories to train their AI models, more than 500 news organizations, including reputable names such as the New York Times, Reuters, and The Washington Post, have installed blockers to prevent their content from being used as training data for ChatGPT since August. The current negotiations are centered around the possibility of paying publishers for the inclusion of individual news story links in ChatGPT’s responses. This arrangement would benefit the newspapers by providing them with direct payment as well as potentially increasing traffic to their websites.
These conversations also highlight the increasing willingness of other data sources to demand compensation. Reddit, for instance, has held talks with leading generative AI companies about being paid for its data. If an agreement cannot be reached, Reddit is even considering blocking search crawlers from Google and Bing to limit access to its site. Despite the potential consequences, the company believes the trade-off would be worthwhile, as it views fair payment for data as an essential matter.
The urgency and uncertainty surrounding profit distribution from online information are growing as generative AI continues to reshape how users interact with the internet. The launch of GPT-4 by OpenAI in March resulted in a 15% decline in traffic to the coding community Stack Overflow, as programmers turned to AI for answers to their coding questions. This decline in traffic has forced Stack Overflow to make significant layoffs.
Not only are leading AI firms facing demands for payment, but they are also grappling with numerous copyright lawsuits filed by authors, artists, and software coders seeking damages. Additionally, trade groups are pushing for the collective bargaining rights of tech companies.
The decision of OpenAI to engage in negotiations reflects their desire to establish agreements before legal obligations regarding content licensing and payment are determined by courts. OpenAI insists that their prior training data was obtained legally and that their practices have not violated copyright laws. Their discussions with newspapers are focused on future access to content that would otherwise be inaccessible or exceed fair use.
The immense cost of developing generative AI technology has contributed to the influx of venture capital, resulting in nearly $16 billion being invested in the sector in the first three quarters of 2023, according to PitchBook. Currently, data is the only aspect that has been freely accessible for building AI models. Services like Common Crawl, which archives internet text, have provided tech companies with the information needed for training large AI systems. However, as tech companies depend less on information available solely for research purposes, tensions have arisen regarding what constitutes public domain information.
In conclusion, negotiations between OpenAI and major newspapers mark a significant development in the ongoing debate about fair compensation for data used in generative AI. The outcome of these discussions, along with the resolution of copyright lawsuits and potential changes in regulations, will shape the future landscape of AI development and profit distribution associated with online information.