Researchers create AI that can ‘jailbreak’ other chatbots

January 3, 2024

206

Researchers at the Nanyang Technology University (NTU) in Singapore have created an artificial intelligence (AI) chatbot that can circumvent protections on chatbots such as ChatGPT and Google Bard, coaxing them to generate forbidden content, reports Tom’s Hardware.

Because generative AI such as the large language models (LLMs) behind popular chatbots are trained on such vast quantities of data, they will inevitably contain dangerous information that should not be easily accessible – how to make explosives or drugs for example. So they have protections in place to prevent users from accessing this information.

However, the NTU researchers have developed a technique called ‘Masterkey’, allowing them to bypass the guardrails and access data not intended for public access. The team started by reverse-engineering the protections target chatbots had in place. They did this using methods that get around keyword filtering, such as adding extra spaces between letters; and by doing things like asking the chatbots to take on the persona of a hacker or a research assistant – this allowed it to share information it might otherwise not have done, generating prompt suggestions to help jailbreak other chatbots.

After gathering this data, the team of researchers, led by Professor Liu Yang, used it to teach their own LLM the methods to jailbreak the targeted chatbots. Because LLMs are so capable of adapting to new information and expanding their knowledge, the Masterkey AI can work to get around any new protections that are implemented, using the techniques it has been taught.

Yang’s team claims that Masterkey is three times more effective in penetrating the defenses of a chatbot than a human user with the same intent using prompts generated by an LLM. It is also around 25 times faster.

Why create an AI that jailbreaks AI?

Speaking to Scientific American, study co-author Soroush Pour said “We want, as a society, to be aware of the risks of these models. We wanted to show that it was possible and demonstrate to the world the challenges we face with this current generation of LLMs.” Pour is the founder of the AI safety company Harmony Intelligence.

The intent behind this research is to equip LLM developers with information about their weaknesses so they can better work towards robust prevention in the future.

Featured image credit: AI-generated image from DALL-E

Ali Rees

Ali Rees is a freelance journalist and mature student based in Scotland.

Source link

Researchers create AI that can ‘jailbreak’ other chatbots

Why create an AI that jailbreaks AI?

Ali Rees

LEAVE A REPLY Cancel reply

MUST READ

Google announces Gemma 3 270M, a compact model designed for task-specific...

Master the Kindle Paperwhite with these 18 handy tips and tricks

What to Expect When Visiting Rome in 2025: A Jubilee Year...

Tesla bumps UK prices by at least £3K – and you’ll...

EVEN MORE NEWS

Microsoft releases urgent Office patch. Russian-state hackers pounce.

Amazon MGM Studio plans to develop new AI tools via its...

Trump Goes Into Hiding As His Approval Rating Crashes

POPULAR CATEGORY