Introducing Hacktron AI: An autonomous penetration test of Gumroad

index

TL;DR

Gumroad was the very first audit we conducted at Hacktron a few months ago, and the results blew us away.

We ran our swarm of specialized agents using Gemini 2.5 Pro. Some were purely LLM-based, some AST-based, and others purpose-built for niche vulnerability classes. Our cutting-edge source code review agents were each laser-focused on a different weakness.

We’re proud to publicly release the pentest report, which you can view below. This is a first-of-its-kind demonstration of AI-powered penetration testing delivering equivalent results to a human pentest team.

👉 Gumroad Pentest Report

Intro

At Hacktron, we’re building collaborative AI agents that act as autonomous security researchers¹, capable of working across the entire software development lifecycle to find, triage, and fix vulnerabilities automatically.

These agents think like human hackers, collaborate across tasks, and take real action. This is made possible by distilling the expertise of our world-class security research team into our agents.

Further, we’re also building standardized security benchmarks to evaluate AI capabilities, and backing it all with zero-day research.

Manifesto

Our goal is to solve 90%² of current software security problems.

As security researchers, we live for zero-days³ and novel exploits. But the vast majority of real findings are not intellectually challenging research problems that take years to materialize. They are repetitions of the same mistakes in different places. These are the vulnerabilities that constitute more than 90% of the software security problems we face today, showing up across OWASP Top 10, HackerOne Bugcrowd, Synack, Cobalt yearly reports. Pentests and Audits only find variants of already existing vulnerabilities. We don’t want to patch one hole only for another one to show up months later. The only way to do this is to flip the penetration testing model on its head.

Just as AI has enabled an entire generation of non-technical users to build software, we see a real opportunity to bring the expertise of security researchers to ambitious teams who don’t have the time or expertise to think deeply about security. The fundamental difference here is that AI allows us to do this at scale and with much higher continuity compared to traditional security audits.

We are not just another static code analyser or security scanner. We are building a full-stack security engineer. This means going through the entire process of finding, triaging, and patching vulnerabilities autonomously.

These principles guide everything we build.

PoC || GTFO: Most security scanners are just noise. We find and confirm vulnerabilities that are actually exploitable, and we do this by working across the entire software development lifecycle.
Fostering open-source security: Our long-term goal is to secure the backbone of the internet — Chromium, Linux, Apache, and everything that keeps the world online.
Continuous evaluation of AI capabilities: We believe capable AI agents pose both an opportunity and a threat. That’s why we commit to open, continuous evaluation through real-world benchmarks — for the benefit of builders, policymakers, and the wider technology community. We’ve laid out our approach in detail at Hackbench.ai

The Gumroad Pentest

Saying “PoC || GTFO” then dropping a vague launch post with no real outcomes would’ve been just more noise. So instead, we’re presenting our work — an AI-powered pentest on a real product, with real findings. We pointed Hacktron at Gumroad, and it delivered.

This was part of a broader wave of assessments we’ve been running — targeting open source products, bug bounty targets, and enterprise software.

Among the various systems hacked by our agents, the Gumroad team demonstrated exceptional responsiveness. From initial disclosure to triage and resolution, their handling of issues was fast and effective. We would like to thank Sahil and Ershad for their colloboration throughout the process. We are now collaborating with Antiwork to integrate Hacktron into their CI/CD pipeline for ongoing, autonomous security monitoring.

👉 Gumroad Pentest Report

How Hacktron Works

To initiate the engagement, we connected Hacktron to the Gumroad codebase and triggered a full-scope assessment.

Scan

Hacktron began by cloning the repository and analyzing the application’s architecture. It performed a hybrid of static and dynamic analysis — reasoning through frontend and backend flows, mapping attack surfaces, and identifying critical data flows and control points.

Agent

For Gumroad, when vulnerabilities were identified, Hacktron triaged them autonomously, prioritizing issues based on exploitability, reachability, and security impact.

Issue

Each issue included full technical context: source traces, affected files, and remediation strategies. These were surfaced through a dedicated dashboard for both our team and Gumroad’s engineers. Raw tickets can be found on Hacktron’s hacktivity.

PoC

Our human researchers then reviewed and validated Hacktron’s findings, producing production-ready PoCs. In its latest version, Hacktron now integrates PoC generation and automated triaging, streamlining the process even further.

Reporting

For anyone in the security industry, these findings speak for themselves. These weren’t theoretical bugs or generic checklist items. They were exploitable, high-impact vulnerabilities. The SQL injection alone posed a serious risk, and Gumroad moved quickly to remediate it. A traditional pentest for the same scope would cost between $20,000 and $30,000, we achieved it for roughly one-third of the price.

Issue ID	Severity	Title
GUM-01-001 WP1	High	DOM XSS via Unsafe `innerHTML` Assignment in Tiptap Raw Node
GUM-01-003 WP1	High	DOM-based XSS via iframe.ly Embed Handling in `MediaEmbed.tsx`
GUM-01-004 WP1	High	Stored XSS via Product Description Rendering
GUM-01-005 WP1	High	Stored XSS via Seller Display Name in Receipt Generation
GUM-01-007 WP1	Critical	SQL Injection in `ORDER BY` Clause via Unvalidated `sort_direction`
GUM-01-008 WP1	Low	IDOR in Email Unsubscribe Functionality
GUM-01-009 WP1	Low	IDOR/BOLA in Affiliate Request Approval
GUM-01-010 WP1	Low	IDOR in Mobile Preorder Attributes API with Hardcoded Mobile Token
Miscellaneous
GUM-01-002 WP1	Info	Weak Host Validation in `isValidHost` Function
GUM-01-006 WP1	Info	Stored XSS via Unsanitized Third-Party Analytics Snippets
GUM-01-011 WP1	Low	Unauthenticated Purchase Unsubscribe via IDOR in `PurchasesController`
GUM-01-012 WP1	Low	Potential XSS via Arbitrary HTML Upload to `files.gum`

Patching

For confirmed issues, Hacktron generated suggested code-level patches and submitted Pull Requests, which were then merged by the Gumroad team.

Diff

About Us

Our security research team is world-class: top-ranked CTF competitors, DEF CON-published researchers, acclaimed creators, and leading bug bounty hunters. We’ve hacked everything from browsers and operating systems to mobile apps, desktop software, and massive web platforms.

Chances are, you’ve used something we’ve helped make more secure.

We’re now channeling that expertise into Hacktron: AI agents that bring real offensive capability into every stage of the software lifecycle.

Meet our team at https://www.hacktron.ai/#team.

Hacktron is currently in private beta. We’re running pilots with select partners to refine our AI agents and build out our platform.

Please contact us at app.hacktron.ai/contact or hello@hacktron.ai to discuss how we can help secure your product. You can also join our waitlist to be notified when we open up access more broadly.

Read more about our approach to building hacking agents at https://www.hacktron.ai/blog/posts/how-ai-can-hack/. ↩
Pentesting today mostly revolves around finding variants of already known vulnerabilities. If we look at the top 10 reports from HackerOne, Bugcrowd, or Synack over the past decade, it’s the same issues repeating — just showing up in different places. Specifically, Cobalt’s 2024 State of Pentesting report backs this up: on page 10, the numbers strongly support the estimate that around 90% of current web security issues are just repetitions. And even within the remaining 10%, about half are trickle-down issues from new variants uncovered first by security researchers. ↩
Our founding team has a proven track record of impactful security research, with work published across platforms like httpvoid.com, Electrovolt, s1r1us.ninja, analogue.computer and liveoverflow. Collectively, we’ve uncovered numerous high-impact vulnerabilities, including CVE-2023-38693, CVE-2023-32079, CVE-2023-32078, CVE-2023-32077, CVE-2020-11053, CVE-2022-25763, CVE-2022-28129, CVE-2022-1705, CVE-2022-32213, CVE-2022-32214, CVE-2022-32215, CVE-2022-24790, CVE-2022-24801, CVE-2022-24766, CVE-2022-24761, CVE-2023-3431, CVE-2023-3432, CVE-2023-3432, CVE-2021-41097, CVE-2022-29247, and CVE-2021-43908. ↩

Introducing Hacktron AI: An autonomous penetration test of Gumroad

TL;DR

Intro