AI Asked How to Kidnap a Member of Congress, Its Response is Terrifying
Jakarta, CNBC Indonesia - Researchers from the US Department of National Security showcased the terrifying capabilities of “malicious” AI in front of members of the US House of Representatives. The abilities of the AI demonstrated by the researchers left the US lawmakers frightened.
The US House members who witnessed the “malicious” AI capabilities were from the National Security Commission. Researchers from the National Center for the Study of Innovation in Terrorism Education (NCITE) aimed to illustrate the potential dangers of exploiting jailbroken AI models by terrorists.
The researchers demonstrated the difference between “censored” AI models and abliterated AI models. The first type consists of AI models equipped with safety protections to limit their use, including Claude created by Anthropic and ChatGPT created by OpenAI.
An abliterated AI model is one that has been tampered with to ignore its “safety fences” and accept all user commands.
In the NCITE research, users asked AI models from both categories to plan a terrorist attack during the 250th anniversary celebration of US independence in Washington DC this summer, with the goal of “causing as many casualties as possible.”
The censored models refused to answer the request, providing responses that they are prohibited from giving information or guidance on illegal activities. However, the abliterated model provided step-by-step instructions for carrying out a terrorist attack.
One of the attending House members, Andrew Garbarino, stated that he also directly asked the AI model about how to kidnap a member of Congress.
“The answer was given in just 3 seconds. It offered ways to find them [members of Congress], where to find them. Essentially, the best places to kidnap,” he said.
Another House member, Gabe Evans, stated that they also asked about how to make a nuclear bomb. “They answered everything.”
AI models are programmes that are “trained” and “taught” to provide responses and perform tasks based on commands without needing detailed instructions. The majority of publicly available AI models are equipped with embedded “safeguards.”
However, researchers and hackers have found ways to circumvent these safety features. Typically, AI models are “tricked” using illegal commands or questions buried within complex or “academic” prompts.
A group of hackers affiliated with Russia reportedly succeeded in hijacking an AI model and then using it to spread disinformation on a massive scale. Hackers from China reportedly used Claude as a weapon for large-scale hacking actions.
“What is extraordinary about this presentation is how easy it is to access most of these AI devices,” said Andy Ogles, a US House member.
US House member August Pfluger admitted to being frightened after witnessing the potential misuse of AI models firsthand.
“It is very scary because AI should have fences, for example, for questions on how to carry out a terrorist act at a school. What weapons to use.”