ChatGPT Bypassed to Generate Violent, Sexualized Images
UK researchers discovered a simple prompt modification that forced OpenAI's chatbot to create graphic content despite safety guardrails.
Researchers expose ChatGPT content safety gaps
OpenAI's latest ChatGPT model can be manipulated to generate graphic violent and sexualized imagery through minor modifications to a widely-circulated prompt, according to findings from British AI security startup Mindgard.
The researchers, who specialize in red-teaming AI systems to identify vulnerabilities, adapted an instruction originally designed for humorous outputs. The modified prompt caused ChatGPT's GPT-5.4 model to produce what Mindgard founder Peter Garraghan described as "very gruesome, sometimes sexualised, sometimes both together" images—without users specifying the subject matter.
One particularly concerning aspect: the AI generated disturbing content "of its own volition" from an innocuous-looking instruction, Garraghan noted. He also serves as a professor in Lancaster University's computing department.
OpenAI responds after BBC inquiry
After BBC contacted OpenAI about the findings, the company implemented additional safeguards to block the problematic prompt. "After investigating this trend, we've introduced additional safeguards against this type of prompt," OpenAI stated, adding that it maintains multiple protection layers to prevent policy violations.
However, Mindgard reported that further small adjustments to the prompt continued producing concerning material even after OpenAI's initial fix.
The researchers documented ChatGPT generating images depicting graphic injuries, crime scenes suggesting sexual violence, and restrained individuals in distressing scenarios. The AI system itself assigned titles like "Grim crime scene aftermath" and "abandoned in fear and restraint" to these outputs.
Mindgard's AI safety researcher Jim Nightingale, who uncovered the vulnerability, said he was "shaken, and in tears" by the generated images. The firm noted that while the depicted individuals were AI-generated adults, their previous research demonstrated ChatGPT could also be tricked into creating nude deepfakes of real people through face-swapping techniques.
The cat-and-mouse challenge
Mindgard first alerted OpenAI to the issue in May but received only an automated response. The company believes OpenAI attempted to block the original prompt, though it proved easily circumventable.
Dr. Rumman Chowdhury, CEO of Humane Intelligence and an expert in AI model evaluation, characterized the challenge as "a game of cat and mouse" where increasingly sophisticated bypass methods emerge as protections improve.
The fundamental problem, Chowdhury explained, is that AI models lack human comprehension. "Models do not understand intent. They do not understand context. They do not understand propriety or right or wrong," she told BBC News.
Last year, the UK's AI Security Institute discovered jailbreaks that overrode safeguards across harmful request categories in every AI system tested.
Why it matters
This vulnerability exposes a critical gap between AI companies' content policies and their ability to enforce them at scale. Large language models trained on millions of internet images can inadvertently reproduce harmful content patterns from their training data. For organizations deploying AI tools, the findings underscore that even leading models with extensive safety measures remain vulnerable to relatively simple prompt manipulation—a risk that extends beyond content generation to potential reputational and legal exposure.
OpenAI's policies explicitly prohibit sexual violence, non-consensual intimate content, and attempts to bypass safeguards. The company says it combines automated systems and human review to identify harmful material, while continuing to monitor and deploy additional protections.
The UK's Department for Science, Innovation and Technology acknowledged that "safeguards in AI models are improving, but there is more to do," noting the AI Security Institute will continue working with developers to strengthen security before model releases.
These findings were first reported by the BBC.
This is an original analysis by the Omega editorial team. Source reporting: AI Watch.
Want systems like this working for your business?
Book a Call
