Headless Chrome bots powered by Puppeteer are a popular choice among bot developers. The Puppeteer API’s ease of use, combined with the lightweight nature of Headless Chrome, makes it a preferred tool over its full-browser counterpart. It is commonly used for web scraping, credential stuffing attacks, and the creation of fake accounts.
In this article, we discuss how to detect headless Chrome bots instrumented with Puppeteer using HTTP headers and JavaScript fingerprinting signals.
TL;DR detection techniques:
If you are just interested in the code of the detection techniques, you can have a look at the code snippet below. The remainder of this article goes into the details of these techniques and explains how some of them can be bypassed by attackers. The 3 detection techniques work as follows:
- Using the user agent HTTP headers or with
navigator.userAgent
in JS to detect user agents linked to Headless Chrome:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36
- By detecting if
navigator.webdriver == true
in JavaScript - By detecting the side effects of CDP (Chrome DevTools Protocol), the protocol used by Puppeteer to instrument Headless Chrome
// 3 techniques to detect Headless Chrome bots instrumented with Puppeteer:
// Detection technique 1 based on the user agent
if (navigator.userAgent.includes("HeadlessChrome")) {
console.log('Headless Chrome detected!);
}
// Detection technique 2 based on navigator.webdriver
if (navigator.webdriver) {
console.log('Headless Chrome detected!);
}
// Detection technique 3 based on a serialization side effect of the Chrome Devtools Protocol
var e = new Error();
Object.defineProperty(e, 'stack', {
get() {
console.log('Headless Chrome detected!);
}
});
// This is part of the detection, the console.log shouldn't be removed!
console.log(e);
Technique 1: Detecting unmodified Headless Chrome using the user agent
To illustrate our detection techniques, we create a local server that listens to port 4006. We use it to collect information about HTTP headers as well as to deliver pages to collect JS browser fingerprinting challenges.
We create a simple Puppeteer bot that uses Headless Chrome, visits localhost:4006/http_headers, and prints its own HTTP headers returned by the server.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://localhost:4006/http_headers');
await new Promise(resolve => setTimeout(resolve, 2000));
const httpHeaders = await page.evaluate(() => {
return document.body.textContent;
});
console.log(JSON.stringify(JSON.parse(httpHeaders), null, 2));
await browser.close();
})();
We obtain the following HTTP headers:
{
"host": "localhost:4006",
"connection": "keep-alive",
"sec-ch-ua": "\"Chromium\";v=\"131\", \"Not_A Brand\";v=\"24\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"macOS\"",
"accept-language": "en-US,en;q=0.9",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"sec-fetch-site": "none",
"sec-fetch-mode": "navigate",
"sec-fetch-user": "?1",
"sec-fetch-dest": "document",
"accept-encoding": "gzip, deflate, br, zstd"
}
We notice that by default, headless Chrome bots instrumented with Puppeteer have a user agent that contains the HeadlessChrome
substring:
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/131.0.0.0 Safari/537.36",
Thus, the first detection technique is to verify if the user agent contains the HeadlessChrome
substring. Note that the user agent can be collected both on the server side (it is an HTTP header) and in the browser using JavaScript:
if (navigator.userAgent.includes("HeadlessChrome")) {
console.log('Headless Chrome detected!);
}
Technique 2: Detecting modified Headless Chrome using navigator.webdriver
When it comes to bot detection, the user agent is often the first fingerprinting attribute modified by attackers to hide their presence. In the case of Puppeteer, the user agent can be modified as follows:
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36');
Once changed, the user agent doesn’t contain the HeadlessChrome
substring anymore:
navigator.userAgent
// -> returns 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36'
However, we can still detect the presence of a bot using the JavaScript navigator.webdriver
attribute. It is a standard JavaScript property, available in modern browsers, that indicates whether a browser is being controlled by automation software (such as Selenium, Puppeteer, or other web automation tools).
if (navigator.webdriver) {
console.log('Headless Chrome detected!);
}
Technique 3: Detecting modified Headless Chrome bots using CDP
As you can imagine, most attackers also try to get rid of the navigator.webdriver = true
property to avoid being detected as a bot.
The simplest way to erase this discriminating signal is to use the --disable-blink-features=AutomationControlled
Chrome command line argument.
In Puppeteer, the argument can be passed when creating the Headless Chrome browser instance:
const browser = await puppeteer.launch({args: ['--disable-blink-features=AutomationControlled']});
With this argument, the navigator.webdriver
property doesn’t return true
anymore:
navigator.webdriver
// -> returns false
Thus, to detect bots that modify their fingerprint, we need another approach. One of the most popular approaches in 2025 is called CDP detection. It leverages the fact that under the hood, Puppeteer uses CDP (Chrome DevTools Protocol) to communicate with Headless Chrome. During the communication process, Puppeteer needs to serialize data to send it with WebSocket, which may have unintended side effects. Thus, the purpose of the JavaScript challenge below is to trigger an observable side effect when the CDP serialization occurs.
var e = new Error();
Object.defineProperty(e, 'stack', {
get() {
console.log('Headless Chrome detected!);
}
});
// This is part of the detection, the console.log shouldn't be removed!
console.log(e);
If the console.log
contained in the get
function is called, it means that the error object was serialized, which happens only when CDP is used. Thus, it can be used to detect bot automation frameworks like Puppeteer that use CDP under the hood. Note that one of the side effects of this detection technique is that it will flag human users with dev tools open. The console.log(e)
statement is part of the challenge since it is what triggers the serialization in CDP, it shouldn’t be removed.
Bot detection: A never-ending cat and mouse game?
When it comes to credential stuffing, carding, and creating fake accounts, bot developers don’t stop too easily. They try to remain undetected by lying about all attributes commonly used for fraud detection and bot detection. In particular, they don’t stop at the user agent, navigator.webdriver = true
or CDP detection. They lie about their canvas fingerprint, generate human-like mouse movements to appear more human, and leverage residential proxies to have a better IP reputation.
With the advent of bot development, they don’t need to be bot experts to develop sophisticated bots. They have access to open-source frameworks, such as Nodriver and Selenium driverless to have near-perfect human fingerprints. They can also benefit from residential proxy services that give them access to millions of residential IP addresses, and AI-based CAPTCHA farms that can automatically solve CAPTCHA at scale, for a few cents!
Thus, detecting sophisticated bots in 2025 is a full-time job, and goes beyond fingerprinting! It’s key to continuously adapt to the latest techniques used by attackers, and to leverage all available signals, such as the user’s behavior, IP reputation, proxy detection, and contextual signals using real-time machine learning.