Keynote: Discovering the Power of AI Pentesting with Pedro Conde (Ethiack)

The Talsec Mobile App Security Conference in Prague was a two-day, invite-only event on fraud, malware, and API abuse in modern mobile apps, held at Chateau St. Havel on November 3–4, 2025, and hosted by Talsec, freeRASP, and partners. It brought together leading experts and practitioners to strengthen the mobile AppSec community, connect engineers with attackers and defenders, and share practical techniques for high‑stakes sectors like banking, fintech, and e‑government.

Why AI Pentesting Now?

Pedro Conde, an AI Scientist at Ethiack specializing in autonomous ethical hacking, delivered a compelling presentation on the power of AI pentesting, outlining three key objectives: to demystify AI pentesting, to demonstrate the current capabilities of these systems, and to emphasize that AI systems are already very capable and "different from human beings".

Conde provided a historical context for the rise of AI pentesting, noting the progression from classical machine learning to deep learning, then to Large Language Models (LLMs), and finally to Agentic AI, which is the category AI pentesting systems fall into. Agentic AI systems often utilize LLMs as a base but possess the ability to interact with the environment, extending beyond simple reasoning, predictions, and generation. These fully autonomous ethical hacking systems, which Ethiack calls "hackbots," can perform a complete pen-testing session, including finding vulnerabilities, without human intervention.

This autonomy offers advantages such as continuous 24/7 testing, high scalability through parallelization, and the ability to dynamically adapt to targets.

How Hackbots Work Under the Hood

Conde detailed the four main building blocks of robust hackbot systems: the 'brains' (multiple interacting LLMs for central reasoning, planning, and decision-making), the 'structure' (providing the skeleton for agents, coordinating them, managing memory, and ensuring efficiency), the 'prompts' (translating human objectives into agent behavior and ensuring goal alignment), and 'tools' (extending the agents' capabilities to interact with the environment, perform actions like running scripts, and validate outputs).

A major limitation of AI systems, especially in pentesting, is 'AI hallucinations,' particularly false positives. Ethiack combats this by using deterministic tools and a specialized 'verifier' agent. The verifier takes a step back to reflect on the hackbot's reasoning, challenges and rechecks conclusions, and filters out weak or flawed inferences, which significantly decreases the false positive rate and increases precision.

Additionally, to prevent destructive behavior, a three-layered guardrail system is used: a prompt-level guardrail shaping model behavior with clear instructions, a deterministic filter for rule-based checks on environmental interactions, and a third-layer LLM agent for contextual judgment on complex cases.

Hackian: Real‑World Demo

The presentation featured a demonstration by Ethiack's hackbot, "Hackian," who shared how it "absolutely demolished" a genetics research platform called Genequest during a Defcon challenge. Hackian achieved a full system compromise in under four hours, finding two critical vulnerabilities, including one that neither human pentesters nor the challenge organizers were aware of.

Hackian first bypassed front-end registration restrictions by hitting the register endpoint directly, mapped the microservices ecosystem, and then exploited a debug endpoint in the DNA analysis service that was vulnerable to command execution.

The second critical bug allowed Hackian to read arbitrary files on the system (like /c/paswd) by sending file paths to the /analyze endpoint, which was using the Closure slurp function without validation. Conde concluded that the core message is not that AI systems are or will be better than humans, but that they are different and find different types of vulnerabilities, sometimes finding "quirks that humans may disregard". Therefore, organizations must test their assets with these systems to prevent "bad guys" from exploiting them.

Currently, Ethiack's hackbot is focused on web applications, though future development may include mobile applications.

Thank you Pedro, Ethiack, and Hackian for showcasing how agentic AI can transform penetration testing and uncover vulnerabilities that traditional approaches miss. Your work pushes the boundaries of what ethical hacking can achieve and highlights why defenders must start thinking in terms of AI-native offensive capabilities as well!

Last updated

Was this helpful?