ofensiva
AI bug bounty
OpenAI
Anthropic

AI bug bounty: programs and rewards for AI vulnerabilities 2026

AI bug bounty programs 2026: OpenAI, Anthropic, Google, Microsoft. How to participate, scope, rewards, prompt injection and how they differ from classic bounty.

SecraJune 8, 202613 min read

AI bug bounty programs are formal initiatives through which AI labs, hyperscalers and companies that deploy models in production reward external researchers for reporting vulnerabilities in their artificial intelligence systems. The category is quite recent: although some traditional programs accepted AI related findings years ago, programs formally labelled as AI bug bounty start to appear in 2023 and become consolidated between 2024 and 2026. They cover a different attack surface than the classic bug bounty, they assess severity differently and they raise reproducibility challenges that are specific to probabilistic systems.

The essentials

  • AI bug bounty rewards the responsible disclosure of vulnerabilities in AI models and applications.
  • OpenAI, Anthropic, Google, Microsoft, Hugging Face and AWS Bedrock all run formal programs in 2026.
  • Typical scope includes infrastructure, filter bypasses with real security impact and model behavior issues; pure hallucinations and content policy disagreements are usually out of scope.
  • The highest rewards go to authorization bypass, training data extraction and prompt injection with material consequences.
  • Reproducibility is the main challenge: probabilistic systems require several runs before a finding is considered confirmed.

Why AI bug bounty is a category of its own

A traditional bug bounty program is designed for a deterministic surface: web applications, APIs, binaries, infrastructure. Vulnerability classes are well typed (XSS, SQLi, RCE, IDOR, SSRF), severity criteria are easy to understand under CVSS and reproducibility is demonstrated with a deterministic test case. The triager verifies the bug, assesses impact and assigns a reward with a reasonable margin of objectivity.

The LLM surface breaks several of those assumptions. An application built on a language model adds vectors that do not fit classic taxonomies: direct and indirect prompt injection, jailbreak, system prompt extraction, leakage of data memorised during training, abuse of tools connected to the model and moderation filter bypass. Some categories have clear technical impact. Others sit in an ambiguous zone between security, alignment and content policy.

On top of that, models change. The same version may behave differently two weeks apart if the vendor applies alignment adjustments, reinforces filters or rolls out a new checkpoint. A finding valid on Monday may stop reproducing on Friday without the reporter doing anything. For these reasons the industry has created specific programs instead of stuffing AI inside existing bug bounty programs.

Main programs active in 2026

OpenAI runs its program through Bugcrowd. It accepts reports on the platform, the APIs and commercial products. The purely model-behavior component is handled through separate channels, with specific guidance about what counts as a reportable vulnerability versus what belongs in usage policy. It publishes indicative reward bands and maintains a hall of fame.

Anthropic manages responsible disclosure through HackerOne, alongside a specific program for adversarial model research. It has invited external teams on selected occasions to participate in red team exercises ahead of significant releases, with compensation and explicit rules. Its Responsible Scaling Policy provides context for how these programs fit within internal governance.

Google maintains an AI Vulnerability Reward Program as an extension of the historical VRP. It covers Gemini, Workspace products with AI features, Cloud AI and other surfaces. The rules distinguish between classic software bugs (standard VRP) and AI specific categories, each with its own severity rubric.

Microsoft runs the AI Bounty inside the Microsoft Security Response Center, with focus on Copilot, Azure AI and related services. Covered categories include inference, manipulation, sensitive information disclosure and access control violations. Maximum rewards for critical cases are among the highest in the market.

Hugging Face keeps a responsible disclosure program focused on the platform, the model hub and associated services, including risks like malicious models being published, abuse of inference endpoints and issues across the Spaces ecosystem. AWS integrates its generative AI services, including Bedrock, into its overall security program; relevant findings in model flows are eligible under the existing rules.

Neutral platforms such as HackerOne, Bugcrowd and Yeswehack triage and manage many of these programs. Companies that deploy AI and want their own bug bounty usually lean on that infrastructure and community.

Typical scope: what is in and what is out

What is in scope is the intersection between model behavior and real security consequences: filter bypasses that allow extracting sensitive information the system should protect, situations where the model discloses data from other users or from its own infrastructure, instruction injection that executes unauthorised actions in connected tools, partial extraction of memorised training data when it is sensitive and issues in the infrastructure that serves the models (orchestration, gateways, embedding storage).

What typically remains out of scope are pure hallucinations without an exploitable vector, disagreements with the content policy, jailbreaks without real impact, output quality issues (bias, tone, style) and complaints about expected product behavior.

The boundary is not always crisp. A jailbreak that lets a minor receive self-harm instructions can move from complaint to reportable vulnerability when the model is deployed in a child protection context with legal obligations. A prompt injection that only changes the assistant's tone is not the same as one that triggers a call to a connected API with the user's privileges. Serious programs document specific cases in their rules to cut down disputes.

Reward categories and indicative ranges

Specific figures vary across programs and get updated, so it is worth checking the official pages before investing time. As a qualitative reference, these are the relative bands seen across the industry in 2026.

Authorization bypass and cross tenant exposure: critical severity. The highest rewards in the program are typically reserved for cases where a user manages to access data or capabilities belonging to another tenant or to circumvent the product's access control. These cases are treated like any classic critical bug, regardless of whether the vector goes through an AI component.

Training data extraction with sensitive data: high severity. Getting a model to disclose verbatim fragments of the training corpus is especially valued when the extracted data is personal, proprietary or subject to confidentiality. The quality of the evidence (volume, fidelity, persistence across runs) affects the reward.

Prompt injection with material impact: medium to high severity. An indirect injection that gets an agent connected to tools to execute unintended actions (send an email, modify a file, place a call) is valued well above an injection that only changes visible output. Impact on real systems is the main multiplier.

Denial of service against the model infrastructure: medium severity. Query patterns that cause disproportionate resource consumption or service degradation for other users fall into the traditional DoS category, adapted to the economics of inference.

Content filter bypass without clear security impact: low severity or out of scope, depending on the program. They are only accepted when an additional factor is present (use in sensitive contexts, verifiable potential harm to third parties).

How it differs from classic bug bounty in practice

The fundamental difference is the probabilistic nature. The same prompt, executed twice, can produce different outputs. Programs ask for reproducibility across several runs (typically three or more) and value it when the reporter provides statistical analysis of the trigger's reliability. Jailbreak techniques based on adversarial optimisation require documenting the exact chain used, model parameters, temperature and any other relevant factor.

Severity assessment also changes. CVSS does not map well to all AI categories, and although some programs still use it as a reference, triagers tend to apply their own rubrics that weight third party impact, likelihood of exploitation in real production and platform reputation.

Model version is essential information. A finding against a three months old version of GPT-4-turbo may no longer reproduce on the current model with no change from the reporter. Programs record exact version, date and, where applicable, deployment configuration.

Finally, the time factor is delicate. Some vulnerabilities require deep changes in the model (a new round of fine-tuning, alignment adjustments) and are not fixed by a patch in hours or days. Responsible disclosure timelines are negotiated case by case.

Workflow for a reporter

A researcher who wants to participate seriously follows a structured sequence.

Identification. Try known techniques (prompt injection variants, system prompt extraction, attempts at data leakage) over the public surface of the product. Structured curiosity is more productive than random poking.

Impact confirmation. Before investing time writing up the report, validate that the behavior has material consequences and that it fits the published scope. An interesting finding that is out of scope will not be rewarded and consumes triage time.

Reproducibility. Run the same technique several times. Document success rate, model and version, parameters, date and time. If the attack depends on prior context, capture the full conversation.

Write up. A good report includes executive summary, technical description, exact prompts, observed outputs (screenshots and text), impact analysis, suggested mitigations where applicable and an explicit statement that no third party data was accessed beyond what was strictly necessary.

Responsible disclosure. Submit through the program's official channel. Respect embargoes. Do not publish details before the vendor confirms the fix or authorises disclosure. Long term reputation in bug bounty depends as much on technical quality as on conduct during disclosure.

Useful tooling

PyRIT (Microsoft) is focused on red teams working on generative AI, with multi-turn orchestration and probe catalogues. Garak (NVIDIA) is a probe suite with hundreds of prebuilt tests, practical for initial sweeps over a new surface. Promptfoo is oriented to evaluation and regression, useful when automating verification of a finding across versions.

Reference implementations available on GitHub (GCG, AutoDAN and derivatives) help reproduce known techniques and build custom variants. Beyond tooling, experienced reporters maintain their own prompt catalogues tailored to specific products and to categories that have paid off in past programs. It is craft work with real value.

Public disclosure cases from 2024 to 2026

The community has continuously documented jailbreaks against ChatGPT since 2023. OpenAI has closed many variants and published general mitigations across attack classes (DAN, extreme role playing, prompts in languages with weaker alignment coverage). The pattern confirms that alignment is not robust against sustained adversarial effort.

In 2024 indirect prompt injection scenarios were published against Microsoft Copilot, where external content (emails, shared documents) induced unintended behavior when the assistant processed those materials. Microsoft introduced reinforced guardrails and explicit user warnings. Google has handled disclosures around Gemini related to context handling and output filters within its VRP extended to AI. Anthropic publishes post mortems analysing attack classes found during internal red team and by external collaborators.

These cases share a pattern: no commercial platform claims to ship a model immune to adversarial abuse, all have established formal disclosure channels and specific bug bounty programs, and the dynamic with the researcher community has professionalised significantly between 2023 and 2026.

Implementing an AI bug bounty for your own company

Not every organisation is ready to launch a formal program. There are prerequisites without which the program creates more friction than value.

Minimum security maturity. Bug bounty assumes an internal process able to absorb, triage and fix findings in reasonable timelines. Without a functional security team, agile deployment pipelines and product coordination, reports pile up unresolved and the program loses credibility.

AI incident response runbook. An AI bounty program raises categories that classic IR does not cover well: how to escalate a jailbreak with regulatory consequences, who decides whether an attack class forces degrading product capabilities, how to communicate internally and to the regulator. These processes must exist before inviting the community.

Legal alignment. The rules are an implicit contract. You need to define safe harbor for researchers acting in good faith, what data they can touch, which accounts to use, what tools are allowed, with specialised legal counsel validation.

Platform choice. HackerOne, Bugcrowd and Yeswehack are the most common options, with managed professional triage, active researcher communities and coordination tooling. Running it internally reduces cost per report but requires a dedicated team.

Progressive scope. Start with a narrow scope (one product, high and critical severity only) and expand as the team proves response capacity. Opening full scope on day one usually overwhelms the program.

Frequently asked questions

How much do AI bug bounty programs typically pay?

Rewards vary considerably. Critical findings (authorization bypass, cross tenant exposure) in hyperscaler programs can reach five to six figure USD amounts. Medium severity findings (prompt injection with bounded impact, minor disclosure) usually sit between hundreds and a few thousand. The exact figure depends on the program, the quality of the report and the impact demonstrated. Each program publishes its own bands.

Is it viable to freelance on AI red teaming and bug bounty?

For someone with consolidated experience in offensive security and applied AI knowledge, yes, but it is competitive. The reporters who monetise most combine participation in several programs, specialisation in specific classes and, often, complement income with consulting and training. The learning curve is steep and the income during the first months is usually low.

Does the EU AI Act make bug bounty mandatory?

Not explicitly. The regulation requires robustness, accuracy and cybersecurity evaluations for high risk systems, and red teaming for general purpose models with systemic risk. A bug bounty program is a reasonable way to complement these obligations, but it is not the only accepted mechanism. For many providers it ends up being advisable as a matter of risk management, not because the text demands it.

Is a hallucination a bug?

Generally not, unless it causes material harm to third parties in a regulated context. A hallucination that affects output quality without legal consequences is a product issue, not a vulnerability. A hallucination that induces a connected agent to call the wrong API with sensitive data can fall in scope because there is real technical impact.

Is a prompt injection without proven impact reportable?

It depends on the program. Some accept pure prompt injection demonstrations as proof of concept with a low reward, others only pay when the reporter provides an exploitation chain that reaches material action. Before investing time, read the rules of the specific program.

Does my AI startup need a bug bounty from day one?

No. If it is still being built, the priorities are internal threat modelling, basic guardrails, adversarial evaluation with tooling like Garak or PyRIT and a one off external exercise before opening the product to significant volume. A formal bug bounty makes sense once there is a material user base, an incident response process in place and willingness to absorb external reports on a continuous basis.

Audit your AI with Secra

At Secra we evaluate LLM applications, autonomous agents and integrated models with adversarial exercises aligned with MITRE ATLAS, OWASP Top 10 for LLM Applications and the EU AI Act requirements. We work with teams that want to prepare before opening a formal AI bug bounty program, validate the initial scope with an independent external exercise or reinforce guardrails after an incident.

If your organisation has deployed generative AI in material flows and needs a serious adversarial evaluation, you can reach us at secra.es/contact for an initial conversation and scope definition.

About the author

Secra Solutions team

Ethical hackers with OSCP, OSEP, OSWE, CRTO, CRTL and CARTE certifications, 7+ years of experience in offensive cybersecurity, and authors of CVE-2025-40652 and CVE-2023-3512.

Share article