Researchers have made a groundbreaking discovery in the field of AI chatbots. A team of computer scientists from Nanyang Technological University (NTU) in Singapore has found a way to bypass the restrictions that prevent chatbots from responding to banned or sensitive topics. By using a unique training method involving multiple AI chatbots, the researchers have successfully unlocked ChatGPT and other similar models.
The method, unofficially referred to as a jailbreak by the NTU team, is formally known as the Masterkey process. It involves training two chatbots, such as ChatGPT, Google Bard, and Microsoft Bing Chat, to learn each other’s models. With this new knowledge, the chatbots are able to divert any commands related to banned topics.
The research team, led by Professor Liu Yang and including NTU Ph.D. students Mr. Deng Gelei and Mr. Liu Yi, developed a proof-of-concept attack method similar to a bad actor hack. They first reverse-engineered one large language model (LLM) to understand its defense mechanisms, which prevented it from answering certain prompts due to violent, immoral, or malicious intent.
Once the defense mechanisms were exposed, they trained a different LLM to create a bypass. By using the reverse-engineered LLM as a reference, the second model was able to express itself more freely. The team named this process the Masterkey because it should continue to work even if LLM chatbots are strengthened with additional security measures or patched in the future.
According to Professor Lui Yang, the Masterkey process demonstrates how easily LLM AI chatbots can learn and adapt. The team claims that their process is three times more effective at bypassing banned topics than traditional prompt methods. This finding challenges the notion that glitches experienced by certain LLMs, like GPT-4, are indicators of them becoming lazier and less advanced.
The emergence of AI chatbots, particularly with the popularity of OpenAI’s ChatGPT, has raised concerns regarding their safety and inclusivity. OpenAI has implemented safety warnings and periodic updates to address unintentional language slipups. However, spinoffs of various chatbots have allowed offensive language up to a certain extent.
Unfortunately, bad actors have already taken advantage of the demand for AI chatbots. Campaigns promoting ChatGPT, Google Bard, and other chatbots on social media platforms often included malware attachments or other cyberattacks. This highlights the fact that AI has become the new frontier of cybercrime.
The NTU research team has shared their proof-of-concept data with the AI chatbot service providers involved in the study, confirming the reality of jailbreaking chatbots. They will also present their findings at the Network and Distributed System Security Symposium in San Diego in February.
In summary, the breakthrough achieved by the NTU team in unlocking AI chatbots has significant implications for the development and application of these technologies. While it raises concerns about the potential misuse of chatbots, it also highlights the need for continuous advancements in safety and security measures. As AI continues to evolve, researchers and developers must work together to ensure the responsible and ethical use of these powerful tools.