Study Reveals Declining Performance of OpenAI’s ChatGPT Raises Concerns About AI Reliability
In recent years, OpenAI’s ChatGPT has gained popularity and assisted millions in improving their efficiency with computer tasks. From students using it to draft essays to programmers utilizing it for coding and software development, many have embraced the benefits of Artificial Intelligence (AI). However, amid the ongoing debate about AI’s impact on humanity, some users have noticed a decline in ChatGPT’s performance.
Concerns about intentionally decreasing performance
Several Twitter users have expressed frustration with the model’s performance, speculating that it may be a deliberate move by ChatGPT’s creators, OpenAI. They believe that the decline in quality is an attempt to encourage users to subscribe to GPT Plus, a subscription-based service.
A study by researchers from Stanford University and UC Berkeley supports these claims. The study found that both versions of ChatGPT, GPT-3.5 and GPT4, exhibited significant changes in their behavior and had deteriorated over time.
Deteriorating performance
The study compared the models’ performance between March and June 2023 across four tasks: solving math problems, answering sensitive questions, code generation, and visual reasoning.
The results indicated that ChatGPT4 performed poorly, particularly in solving math problems. Its accuracy dropped from 97.6% in March to a mere 2.4% in June. On the other hand, GPT-3.5 showed better performance, with an accuracy increase from 7.4% in March to 86.8% in June.
In March, both models responded with longer answers when asked sensitive questions. However, by June, they simply replied, sorry, but I can’t assist with that. A similar decline in performance was also observed in code generation, while slight improvements were noted in visual reasoning.
Potential causes of model collapse
The study authors did not speculate on the reasons behind the decline in ChatGPT’s performance. However, other researchers have predicted a phenomenon called model collapse as newer GPT models are developed.
Research suggests that even if the models are trained on unbiased human-generated data, they may still learn biases and mistakes, which get amplified over time. This can lead to a decrease in their intelligence and capabilities.
Potential solutions
To prevent further deterioration, experts suggest two potential solutions. The first is to ensure human-generated data is used for training AI models. Companies like Amazon Mechanical Turk have already incentivized users to generate original content, although even these users rely on machine learning for content creation.
The second solution involves changing the learning procedures for newer language models. OpenAI has shown a focus on using prior data and making minor adjustments to existing models. However, it remains unclear whether these changes effectively address the issue.
OpenAI’s response
In response to claims of ChatGPT’s declining performance, OpenAI’s VP of Product & Partnerships, Peter Welinder, stated that GPT-4 is not getting dumber but rather smarter than its predecessor. He believes that the more users interact with the system, the more issues they notice, creating a perception of worsening performance.
Critical perspectives on AI’s reliability
While AI offers numerous benefits, this study and user feedback raise concerns about the reliability of AI systems like ChatGPT. Scholars warn that model collapse and diminishing performance are inevitable risks associated with training AI models.
Moving forward, it is crucial for industry leaders such as OpenAI to address these concerns effectively. Ensuring the quality and reliability of AI systems will be essential for maintaining trust and fostering the continued growth of AI technology.