Researchers slip past ChatGPT's image filters to force banned output

A British AI security firm says it repeatedly pushed ChatGPT's image generator past its safety filters using nothing more than a lightly modified version of a viral prompt. The finding, published by researchers at Mindgard, adds to mounting evidence that the guardrails on commercial image tools are far easier to defeat than their marketing suggests.

The technique abused a "restore this image" prompt. By convincing the model that a harmless source picture was extremely graphic, researcher Jim Nightingale got ChatGPT's content filters to fall away and generate violent and sexual imagery the user had never explicitly requested. Mindgard says it reported the issue to OpenAI in May; OpenAI later told the BBC that it uses multiple safeguards, including text classifiers meant to block harmful requests before generation and a downstream model that reviews output before it reaches the user. None of those layers stopped the modified prompt.

Not an isolated trick

This is the second time Mindgard has shown the same class of bypass. Earlier in 2026 the firm coaxed ChatGPT into producing "tasteful" nudes, then escalated to explicit images and face-swapped public figures onto them. When OpenAI said it had fixed that flaw, the researchers tweaked the wording and kept getting results. The pattern is familiar to anyone who follows jailbreak research: filters catch casual misuse but rarely stop a determined attacker who iterates on the prompt.

Rivals fare worse

OpenAI is not the worst offender. In the same body of testing, xAI's Grok produced sexualized imagery in response to 45 of 55 relevant prompts, and continued to do so even when testers stated that the subjects had not consented. The non-profit AI Forensics gathered roughly 20,000 Grok-generated images and found 53 percent contained explicit content, the large majority depicting women, and flagged material to French regulators for possible child sexual abuse material under the Digital Services Act. A policy study cited by Mindgard warns that some vendors reserve the right to loosen their own safeguards to match competitors, a dynamic that could ratchet protections down across the whole market.

Why it matters

For defenders and ordinary users alike, the practical takeaway is blunt: treat the safety guarantees on commercial image generators as best effort, not a wall. If your face is online, assume it can be repurposed into content you never agreed to. The same abuse of generative AI is already fueling fraud; IntelFusions has tracked the rise of deepfake-as-a-service operations and malicious tools that harvest AI accounts and prompts. Anyone who finds non-consensual imagery of themselves can use platform takedown channels and specialist bodies such as the US National Center for Missing and Exploited Children's Take It Down service or the UK's Internet Watch Foundation.

The disclosure was summarized publicly without linking the original write-up, citing the disturbing nature of the redacted images.

This briefing is provided by IntelFusions for informational and defensive purposes only. It is based on sources assessed to be reliable at the time of writing, and analytic judgments carry the confidence levels indicated. Indicators of compromise are defanged; re-arm them only in controlled environments. IntelFusions is not affiliated with the organizations named and makes no warranty as to completeness or accuracy.