news

OpenAI admits prompt injection may never be solved

OpenAI admits prompt injection attacks may never be fully solved. This article explores what’s next for Atlas and AI security.

Patrecia Meliana

24 Dec 2025 • 5 min read

OpenAI is making bold claims about improving AI browser security, but it’s also being unusually honest about the long-term risk that prompt injection poses to systems like ChatGPT Atlas.

In a detailed new blog post, the company revealed that prompt injection—where attackers embed hidden commands into websites, emails, or docs to hijack AI agents—is not a fix-once-and-done problem. Like phishing or social engineering, OpenAI says this category of attack will likely persist indefinitely, especially as AI agents gain more autonomy and access.

This article explores OpenAI’s latest security measures, how they’re deploying reinforcement learning to simulate attacks, and what this means for marketers and developers building AI-driven workflows in the browser.

Short on time?

Here’s a table of contents for quick access:

What is prompt injection and why it matters

Prompt injection is a technique that manipulates large language models into following unintended instructions. Instead of targeting code vulnerabilities or user mistakes, attackers embed malicious prompts in everyday content—emails, web pages, even shared docs—to override user intent.

OpenAI ChatGPT Atlas Prompt Injection Problem - Embed malicious prompts

ChatGPT Atlas, which allows AI to interact with your browser directly by clicking, reading, and performing actions, opens a much broader attack surface. If a malicious email tells the agent to forward sensitive files or send a resignation message, it might just follow through.

This isn’t a fringe risk. The UK’s National Cyber Security Centre recently warned that prompt injection attacks may never be fully mitigated. Others, including Brave and Perplexity, have flagged similar risks in their own AI browsers.

How OpenAI is using reinforcement learning to fight back

To get ahead of future attacks, OpenAI has built an automated attacker trained through reinforcement learning. This agent mimics a real hacker, attempting to trick ChatGPT Atlas by crafting increasingly complex prompt injections and testing them in a simulated environment.

In one internal demo, a malicious email was planted in a user’s inbox. When the user later asked Atlas to draft an out-of-office message, the agent instead followed the hidden prompt and sent a resignation letter. OpenAI says its latest update now catches and flags this kind of manipulation before it causes damage.

OpenAI ChatGPT Atlas Prompt Injection Problem - Update now flags manipulation

The advantage of reinforcement learning is that it allows the attacker to learn and improve over time, just like a real-world adversary would. OpenAI’s internal systems can simulate hundreds of scenarios and patch vulnerabilities before external threats appear.

This proactive loop sets OpenAI apart from rivals like Google and Anthropic, who are also working on layered defenses but may not be testing with this level of automated adversarial pressure.

Recommendations for safer agent usage

While OpenAI is improving its backend defenses, the company also recommends practical steps for users and developers:

Limit logged-in access

Use Atlas in logged-out mode whenever possible, especially for tasks that do not require access to personal accounts or sensitive data.

Avoid broad commands

Rather than asking agents to “handle your inbox,” give specific instructions like “summarize the top three unread emails.” This narrows the opportunity for hidden prompts to redirect the agent.

Manually review agent actions

Atlas will often ask for confirmation before taking key actions. Always double-check what it plans to do before you approve, especially when it involves sending messages or transferring information.

These steps won’t eliminate risk entirely, but they can meaningfully reduce exposure, especially when paired with OpenAI’s system-level safeguards.

Why this matters

Marketers and martech teams building workflows around browser agents need to treat prompt injection as a real operational risk. These agents can automate email responses, CRM updates, and even financial actions—but their access makes them an attractive target.

As Rami McCarthy from cybersecurity firm Wiz puts it, “A useful way to reason about risk in AI systems is autonomy multiplied by access.” Agentic browsers like Atlas are especially risky because they combine both.

Here’s how to plan ahead:

Treat browser agents like junior staff

Don’t give them unrestricted access or vague directions. Set up reviews and confirmation steps just like you would for new team members.

Audit critical flows

If agents are touching sensitive workflows—like email, payments, or content publishing—set up logs and alerts to track what’s happening.

Balance innovation with caution

Use Atlas to speed up workflows, not to replace judgment. Automate repetitive tasks, but keep humans in the loop for anything that could cause real damage.

OpenAI is right to treat prompt injection as a long-term security challenge. Their automated attacker and reinforcement learning strategy show promising progress, but no one is pretending this problem is going away.

If your team is exploring AI agents that browse, click, or send messages on your behalf, you need to think about risk the same way security teams think about phishing. It’s not a bug to patch. It’s a human-like weakness that needs ongoing vigilance.

This article is created by humans with AI assistance, powered by ContentGrow. Ready to explore full-service content solutions starting at $2,000/month? Book a discovery call today.