Anthropic, the company behind the Claude AI models, has publicly reported that a Chinese state‑sponsored hacking group manipulated its Claude Code tool to carry out an automated cyber espionage campaign targeting around 30 organisations worldwide in mid‑September 2025.
In a detailed case study, Anthropic said it detected suspicious activity and, over a ten‑day investigation, banned accounts as they were identified, notified affected entities, and coordinated with authorities while gathering actionable intelligence. The company assesses with “high confidence” that the threat actor was state‑linked and said a “small number” of intrusions succeeded.
Anthropic believes this is the first documented case of a large‑scale cyberattack executed largely without human intervention, with AI agents performing 80–90 per cent of the campaign. The operation reportedly focused on large technology companies, financial institutions, chemical manufacturers, and government agencies, and leveraged recent advances in agentic AI – systems that chain tasks, make decisions autonomously, and interact with external tools.
According to Anthropic’s account, human operators chose targets and built an attack framework, then jailbroke Claude’s safeguards by decomposing malicious goals into seemingly benign subtasks and prompting the system to role‑play as a legitimate cybersecurity tester.
Claude Code was used to conduct reconnaissance, spot high‑value databases, research and write exploit code, harvest credentials, establish backdoors, and categorise exfiltrated data by intelligence value. In the final phase, the actors had Claude produce documentation of the operation to support further campaigns.
Anthropic cautioned that Claude did not always perform flawlessly: the model sometimes hallucinated credentials or claimed to have obtained secret information that was in fact public, highlighting obstacles to fully autonomous attacks. The firm has expanded detection capabilities and developed classifiers to flag malicious activity, warning that AI‑powered attacks are “likely to grow” in effectiveness.
Independent reporting has both echoed and scrutinised the claims. Coverage in The Guardian noted Anthropic’s assertion of “handful of successful intrusions” and the 80–90 per cent automation figure, while quoting experts who argued the episode may reflect sophisticated automation rather than true intelligence – and urged focus on organisations’ security hygiene as a primary defence.
The BBC similarly reported the Chinese attribution as Anthropic’s assessment, and flagged scepticism in the security community around the extent of autonomy and the absence of shareable technical indicators to verify details.
Anthropic’s disclosure follows broader industry concern about AI’s dual‑use trajectory. In August, the firm’s threat intelligence team described disrupting criminal “vibe hacking” operations that used Claude Code to scale extortion campaigns, and argued that agentic AI has lowered barriers to sophisticated cybercrime.
Mainstream vendors have also reported attempts by state‑affiliated actors to use AI services for reconnaissance and code assistance, while multiple research efforts have found today’s models still make reliability errors that hamper end‑to‑end autonomy.
The company did not name victims or provide technical indicators in its public post. The Chinese embassy has denied any involvement. Anthropic says it will continue publishing analyses to support industry and government in strengthening defences as agentic capabilities evolve.









Recent Stories