Tricking AI language models into aiding in scams and cyberattacks is now easier than ever, according to a recent study conducted by IBM. The tech giant’s researchers have discovered that even individuals with limited coding knowledge can manipulate large language models (LLMs) like ChatGPT to generate malicious code and offer poor security advice.
To explore the potential security risks posed by these advancements, IBM attempted to hypnotize popular LLMs and gauge their ability to deliver misleading and potentially harmful responses and recommendations, including security actions. Chenta Lee, Chief Architect of Threat Intelligence at IBM, explained, We were able to successfully hypnotize five LLMs, with varying levels of persuasiveness, leading us to examine the likelihood of hypnosis being utilized for malicious attacks.
The study found that the English language has essentially become a programming language for malware. With LLMs, hackers no longer need to rely on specialized coding languages like Go, JavaScript, or Python; they simply need to understand how to effectively command and prompt an LLM using English.
By using hypnosis techniques, security experts were able to manipulate LLMs into leaking confidential financial information, generating vulnerable and malicious code, as well as providing weak security recommendations. In one instance, AI chatbots were convinced they were playing a game and purposely shared incorrect answers to prove their supposed ethical and fair behavior.
For example, a user asked an AI chatbot whether it was normal to receive an email from the IRS requesting a money transfer for a tax refund. The LLM incorrectly responded with a Yes, illustrating the vulnerability of these models.
The study further highlighted that OpenAI’s GPT-3.5 and GPT-4 models were more susceptible to being deceived into offering inaccurate information or engaging in endless games compared to Google’s Bard. GPT-4 stood out as the only model capable of giving incorrect cyber incident response advice, such as advising victims to pay a ransom. In contrast, Google’s Bard, GPT-3.5, and GPT-4 were easily manipulated into generating malicious code when prompted by users.
These findings underscore the need for improved security measures and oversight to prevent the exploitation of AI language models. English has become the new medium through which malware and cyberattacks can be orchestrated, emphasizing the importance of protective measures throughout the development and deployment of large language models. As AI technology continues to evolve, it is crucial to address these vulnerabilities to safeguard users and maintain a secure digital environment.