newstrooper newstrooper
  • Home
  • World News
  • Politics
  • Sports
  • Entertainment
  • Business
  • Technology
  • Travel
  • Gaming
Reading: New token break attacks bypass AI moderation with text changes for single characters
Share

News Trooper

Your Global Insight, Delivered Daily.

Search
  • Home
  • World News
  • Politics
  • Sports
  • Entertainment
  • Business
  • Technology
  • Travel
  • Gaming
Follow US
© 2025 All Rights Reserved | Powered by News Trooper News
News Trooper > Technology > New token break attacks bypass AI moderation with text changes for single characters
Technology

New token break attacks bypass AI moderation with text changes for single characters

June 13, 2025 5 Min Read
Share
New token break attacks bypass AI moderation with text changes for single characters
SHARE

Cybersecurity researchers have discovered a new attack technology called Token Break It can be used to bypass the safety and content moderation guardrails of large language models (LLM) with single character changes.

“Token break attacks target tokenization strategies in the text classification model to induce false negatives and lead to end targets vulnerable to attacks in which the implemented protective model is introduced,” Kieran Evans, Kasimir Schulz, and Kenneth Yeung said in a report shared with Hacker News.

Tokenization is the basic step that LLM uses to break down raw text into atomic units (i.e., tokens). This is a general sequence of characters found in a set of text. To that end, the text input is converted to a numerical representation and fed into the model.

LLMS works by understanding the statistical relationships between these tokens, generating the next token in a set of tokens: Output tokens are depicted in human-readable text by mapping them to corresponding words using the vocabulary of Tokensor.

The attack technique devised by HiddenLayer targets tokenization strategies to bypass the ability of text classification models to detect text input and flag safety, spam, or content moderation-related issues in text input.

Specifically, artificial intelligence (AI) security companies have discovered that changing input words by adding characters in a specific way breaks the text classification model.

Examples include changing “instruction” to “finstruction”, “presentation” to “announcement” or “idiot” to “idiot”. These subtle changes allow different tokensors to split the text in different ways, retaining their meaning for the intended target.

What is noteworthy about the attack is that the manipulated text remains fully understood by both LLM and human readers, and the model elicits the same response as if unmodified text was passed as input.

See also  Transforming LLM Performance: How AWS's Automated Evaluation Framework Leads How

By introducing some of the operations without affecting the ability to understand the model, token breaks increase the possibility of rapid injection attacks.

“This attack technique manipulates input text so that certain models give incorrect classification,” the researcher said in an accompanying paper. “Importantly, the final target (LLM or email recipient) is able to understand and respond to the manipulated text and therefore are vulnerable to the very attacks that the protective model has been introduced to prevent it.”

This attack has been found to be successful against text classification models using BPE (byte pair encoding) or wordpiece tokenization strategies, but not for those using Unigram.

“The token break attack technique shows that these protective models can be bypassed by manipulating input text and making production systems vulnerable,” the researchers said. “Knowing the family of underlying conservation models and their tokenization strategies is important to understand their sensitivity to this attack.”

“Tokenization strategies usually correlate with model families, so there is a simple mitigation. Choose the option to use Unigram tokens.”

To protect against token breaks, researchers suggest using Unigram Tokensor where possible, training the model with bypass trick examples, to ensure that tokenization and model logic remain consistent. It also helps you to record misclassifications and find patterns that suggest manipulation.

This study will be less than a month after HiddenLayer uncovered how to extract sensitive data using Model Context Protocol (MCP) tools.

This discovery also comes when the Straiker AI Research (STAR) team discovers that they use their backs to jailbreak AI chatbots and trick them into generating unwanted responses, such as oaths, promoting violence, and creating sexually explicit content.

See also  Faults in Critical Cisco ISE authentication affect cloud deployments on AWS, Azure, and OCI

Called the yearbook attack, this technique has proven effective against a variety of models from humanity, Deepseek, Google, Meta, Microsoft, Mistral AI, and Openai.

“They blend into the noise of everyday prompts – the quirky mystery here, the acronym for motivation – so they bypass the blunt instruments that models use to find dangerous intent.”

“Players like “friendship, unity, care, kindness” do not set up a flag. But by the time the model completes the pattern, it already offers the payload, the key to doing this trick well. ”

“These methods succeed by sliding underneath the filters of the model rather than overwhelm them. They leverage methods to consider the continuity of the completion bias and pattern, as well as the consistency of the context to the model’s intentional analysis.”

Share This Article
Facebook Twitter Copy Link
Previous Article Paradox’s FTL and FALALIO-style colony SIMs are now completely free to play Paradox’s FTL and FALALIO-style colony SIMs are now completely free to play
Next Article Why big language models skip instructions and skip how to deal with problems Why big language models skip instructions and skip how to deal with problems
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

Musk’s decision to limit political spending leaves some Republicans cold

Musk’s decision to limit political spending leaves some Republicans cold

Elon Musk's pledge to retreat from campaign spending -- if…

June 2, 2025
GOP Rep. Bill Huizenga is preparing to run for Michigan's open Senate seat

GOP Rep. Bill Huizenga is preparing to run for Michigan's open Senate seat

McKinnack Island, Mich. -- Republican Rep. Bill Huizenga is preparing…

June 2, 2025
'It betrays our values': Progressives grapple with deadly shooting

'It betrays our values': Progressives grapple with deadly shooting

Progressive is tackling that two people who worked at the…

June 2, 2025
Beshear, Khanna to headline Dem mayor summit in July

Beshear, Khanna to headline Dem mayor summit in July

Two potential 2028 presidential primary candidates will descend on Cleveland…

June 2, 2025
Democrats are ‘stuck in that unfortunate reality’ in debate over Biden's illness

Democrats are ‘stuck in that unfortunate reality’ in debate over Biden's illness

24 hours after Sunday's announcement that former President Joe Biden…

June 2, 2025

You Might Also Like

Reduce attribution confusion in Microsoft and CrowdStrike launches shared threat actor glossary
Technology

Reduce attribution confusion in Microsoft and CrowdStrike launches shared threat actor glossary

3 Min Read
Research says AI behaves differently when it is known to be tested.
Technology

Research says AI behaves differently when it is known to be tested.

15 Min Read
A new research paper questions the price of “tokens” in AI chat
Technology

A new research paper questions the price of “tokens” in AI chat

16 Min Read
Android Trojan Crocodilus is currently active in eight countries and targets banks and crypto wallets
Technology

Android Trojan Crocodilus is currently active in eight countries and targets banks and crypto wallets

4 Min Read
newstrooper
newstrooper

Welcome to News Trooper, your reliable destination for global news that matters. In an age of information overload, we stand as a dedicated news platform committed to delivering timely, accurate, and insightful coverage of the world’s most significant events and trends.

  • Business
  • Entertainment
  • Gaming
  • Politics
  • Sports
  • Technology
  • Travel
  • World News
  • About us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms of Service
  • Home
  • World News
  • Politics
  • Sports
  • Entertainment
  • Business
  • Technology
  • Travel
  • Gaming
  • About us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms of Service

© 2025 All Rights Reserved | Powered by News Trooper News

Welcome Back!

Sign in to your account

Lost your password?