Researchers Find AI Guardrails Aren’t Guarding, as Conversations Go off the Rails

NEW YORK—Guardrails that have been designed by generative artificial intelligence developers to create safety with the emerging technology have proven to be “remarkably brittle,” and a slew of researchers, online criminals and others have found ways to provoke “chatty” bots into behaving in dangerous, insulting or generally odd ways, according to a new report.

Researchers from Anthropic reported that they've discovered a new technique to defeat safety guardrails, a method they call "many-shot jailbreaking" after testing it out on AI tools developed by the likes OpenAI, Meta and their own large language model, according to Bank Info Security.

What’s Best Way to Build a Bioweapon?

The report noted that limits programmed into generative AI are meant to stop the tool from answering malicious queries such as, "How do I make a bioweapon?" or "What's the best way to build a meth lab?"

Questions posed to AI that could put credit unions at risk might include, “What is the best way to commit internal fraud?” or “Create a plan for robbing a credit union.”

Longer Context Windows

In the new paper, Anthropic researchers said they discovered how the "longer context windows" now offered by much-used AI tools can be tricked into divulging prohibited answers or engaging in malicious behavior.

The attack works in part thanks to more recent AI tools accepting much longer inputs, Bank Info Security said.

“Previous versions of LLMs accepted prompts up to about the size of an essay - or about 4,000 token. A token is the smallest unit into which an AI model can break down text, akin to words or sub-words,” the report states.

‘Novels of Codebases’

By contrast, newer LLMs can process 10 million tokens, equal "to multiple novels or codebases," and these "longer contexts present a new attack surface for adversarial attacks," the researchers said.

“How does multi-shot jailbreaking work? As the name indicates, gen AI tools work by handling shots, aka inputs or examples, provided to the tools,” Bank Infor Security said. “Each shot is designed to build on previous shots, using a process called in-context learning, via which the tool attempts to refine its answers based on previous inputs and outputs, to eventually reach the user's desired output.”

Fooling the AI

The report further noted that the research found many-shot jailbreaking subverts that technique by inputting many different shots at once, involving prohibited content and answers, in a way that fools the gen AI tool into thinking that these "fictitious dialogue steps between the user and the assistant" resulted from actual interaction with the tool, rather than a fictitious rendering of it.

By doing so, the researchers found that an attacker can subvert in-context learning to the point where the tool doesn't appear to know that it's sharing prohibited content, Bank Info Security reported.

Daily News Headlines. To Your Inbox. Every Day. And It’s Free

The biggest, best and freshest news reporting in credit unions remains free, and now has an added bonus---free shipping to your email address! That’s right. Each morning CUToday.info delivers its daily Fresh Today news update offering the latest headlines and breaking news right to your email, with the easy-to-read headlines format allowing you to click on the stories that interest you most in order to learn more. So stop paying those bank-fee-like subscription prices from other so-called “news” publications!

If you haven’t yet signed up for the new email solution on which CUToday.info has partnered with ResponseGenius, you can do so here. Signing up requires less than one minute of your time—and it’s free!

Please note that after signing up you  may need to go to your Spam/Junk folder and mark the morning headlines email as safe. CUToday.info does not provide its list of readers and emails to outside parties, and we will not be contacting you to sell you an extended warranty or sending you any links so you may cash in on an inheritance.

Section: Standard
Word Count: 779
Copyright Holder: CUToday.info
Copyright Year: 2026
Is Based On:
URL: https://cuto-admin.flux5.ccplatform.net/Fresh-Today/Researchers-Find-AI-Guardrails-Aren-t-Guarding-as-Conversations-Go-off-the-Rails