Anthropic Launches New Bug Bounty Program to Strengthen AI Safety in Claude 3.7

Anthropic Launches New Bug Bounty Program to Strengthen AI Safety in Claude 3.7

Today, Anthropic is launching a new bug bounty program designed to rigorously test the latest safety measures built into our AI systems. Similar to our 2023 initiative, this program challenges researchers to discover universal jailbreaks, which are vulnerabilities that bypass safety mechanisms across a wide range of topics, in our most advanced safety classifiers. These updated protections are critical steps toward meeting the AI Safety Level 3 (ASL-3) Deployment Standard outlined in our Responsible Scaling Policy, which guides the secure development of increasingly powerful AI models.

Whats New in This Bug Bounty Program

In partnership with HackerOne, this round of the program will focus on an enhanced version of our Constitutional Classifiers system. These classifiers are designed to protect against jailbreaks that could extract information related to CBRN (chemical, biological, radiological, and nuclear) weapons. Built around a set of predefined safety principles, the system determines what types of content should be allowed or blocked during interactions with Claude, our AI assistant, with a sharp focus on mitigating specific, high-risk harms.

Participants will gain early access to test these safety mechanisms on Claude 3.7 Sonnet, our latest model in development. We're offering bounties of up to $25,000 for verified universal jailbreaks on this unreleased system. For this initiative, we are especially focused on jailbreaks that could enable misuse in CBRN-related contexts.

Why It Matters

As our models grow more capable, robust safety protocols become increasingly important. This bug bounty program supports the continued development and refinement of the ASL-3 protections that are central to our Responsible Scaling Policy. The insights gained from this initiative will help us ensure that future AI models can be deployed responsibly and securely.

How to Participate

The new bug bounty program kicks off with participation from researchers who contributed to last years effort, and were now welcoming new applicants. If you have a background in red teaming, AI safety research, or have a proven track record of identifying language model jailbreaks, we invite you to apply for an invitation via our official application form.

Applications open today and the program will run through Sunday, May 18.
This is an invite only program, allowing us to provide detailed guidance and fast feedback to participants.

Join Us in Making AI Safer

We're deeply thankful for the ongoing support and collaboration of the global security community. Your contributions play a critical role in shaping the safe future of artificial intelligence.

To apply or learn more about the program, visit our bug bounty application page today.