Research · · 11 min read

Roll your own bot detection: server-side detection (part 2)

Roll your own bot detection: server-side detection (part 2)

This is the second part of our series on building a lightweight, vendor-free anti-bot system to protect your login endpoint.

In Part 1, we focused on the client side: we designed a fingerprinting script that collects various signals from the browser, obfuscates the code, encrypts the payload, and injects it into a login request. That setup lays the groundwork, but on its own, it doesn’t improve security. It’s just instrumentation.

To actually defend against bots, we now need to do something with the fingerprint once it reaches the server.

That’s the focus of this article. We’ll show how to use the fingerprint for two practical defenses:

  1. Bot detection rules: We define simple but effective heuristics to identify suspicious fingerprints, such as inconsistencies, signs of automation, or known headless environments.
  2. Fingerprint-based rate limiting: We go beyond traditional IP-based rate limiting and use the fingerprint as a more resilient key to track abusive behavior, even when attackers rotate IPs.

This part of the system remains simple on purpose, it’s still a toy project, but the ideas behind it mirror real-world production practices. We’ll highlight where things can break down, what assumptions are reasonable, and where to be cautious.

The full source code for this article and Part 1 is available on GitHub: castle/castle-code-examples.

Creating the server

We continue building on the toy website introduced in Part 1. This time, we set up a basic Express.js server to receive and process login requests. The choice of Express is incidental, our goal is to focus on the detection logic, not the framework. Everything presented here can be adapted to other backends, whether you use Python, Go, or another language.

The server exposes just two routes:

  1. A GET / route that serves the login page
  2. A POST /login route that handles login submissions, including the encrypted fingerprint
// server.js

const express = require('express');
const path = require('path');
const { sanitizeLoginData, loginRateLimiter, detectBot } = require('./lib/middlewares');

const app = express();
const PORT = 3010;

app.use(express.json());
app.use(express.urlencoded({ extended: true }));

app.use(express.static(path.join(__dirname, 'static')));

// Basic route for the root path - serve the login page
app.get('/', (req, res) => {
  res.sendFile(path.join(__dirname, 'static', 'login.html'));
});

// POST /login route 
// with middleware chain: 
// 1. sanitize -> 2. detectBot -> 3. loginRateLimiter -> 4. route handler
app.post('/login', 
    sanitizeLoginData, 
    detectBot, 
    loginRateLimiter, 
    async (req, res) => {
        const { email, password } = req.sanitizedData;

        // We always return success (as requested)
        // In a real implementation, you would validate credentials here
        const isValidLogin = true || (email === 'test@test.com' && password === 'test');
        if (!isValidLogin) {
            return res.status(400).json({
                success: false,
                message: 'Invalid login attempt'
            });
        }
          
        // Set a session cookie
        res.cookie('session', 'fake-session-token-' + Date.now(), {
            httpOnly: true,
            secure: false, // Set to true in production with HTTPS
            sameSite: 'strict',
            maxAge: 24 * 60 * 60 * 1000 // 24 hours
        });
        
        
        // Return success response
        res.json({
            success: true,
            message: 'Login successful',
            fingerprintProcessed: true,
        });
    }
);

// Start the server
app.listen(PORT, () => {
  console.log(`Server is running on <http://localhost>:${PORT}`);
  console.log(`Static files are served from: ${path.join(__dirname, 'static')}`);
});

Middleware breakdown

When a POST request hits the /login route, we chain three middleware before executing the login logic:

  1. sanitizeLoginData: Validates the request payload. It checks for the presence of email and password, and attempts to decrypt the fingerprint. If successful, it attaches the parsed fingerprint to req.sanitizedData. If not, it returns a 400 error silently.
    • This step ensures we don't waste time or logs on clearly broken or malformed input.
    • Note: both client and server must use the same encryption key for decryption to work.
  2. detectBot: Applies fingerprint-based bot detection rules (detailed in the next section). If the fingerprint matches known automation patterns or shows signs of tampering, the request is rejected.
    • Returning 400 without any detail helps avoid leaking signal to attackers.
  3. loginRateLimiter: Implements rate limiting keyed off the fingerprint (rather than IP). This helps mitigate distributed attacks that rotate IPs but reuse the same device fingerprint. We'll go deeper into this below.
  4. Login handler: This is where credential validation would happen in a real app. Here, we simulate success unconditionally for demonstration purposes.

This flow gives us a layered defense: we sanitize, detect, and limit before we ever touch login logic or database queries.

Bot detection middleware

In this section, we focus on the detectBot middleware. Its role is to analyze the decrypted fingerprint attached to a login request and decide whether the environment shows signs of automation, spoofing, or inconsistency.

This middleware runs after payload sanitization and before rate limiting. At this stage, we assume the fingerprint is valid and decrypted, and we want to assess its trustworthiness using simple heuristic rules.

Here’s the core logic:

// in lib/middlewares.js

const detectBot = (req, res, next) => {
    console.log('Bot detection middleware executing...');
    
    const { fingerprint } = req.sanitizedData;
    
    // Perform bot detection
    const botDetection = isBot(fingerprint);
    req.isBot = botDetection.isBot;
    req.isOutdatedPayload = botDetection.isOutdatedPayload;
    
    if (botDetection.isBot || botDetection.isOutdatedPayload) {
        console.log('Bot detection: Bot detected');
        return res.status(400).json({
            success: false,
            message: 'Invalid login attempt'
        });
    }
    
    console.log('Bot detection: Human user detected');
    next();
};

This middleware delegates detection to the isBot function defined in lib/botDetection.js. That function applies a series of checks to the fingerprint and returns a verdict. Here are some of the tests we use:

Again, the goal here is not to provide a full taxonomy of detection methods, but to show where such logic can live and how it can evolve. You can define your own heuristics and plug them into this system. For instance:

Separately, we also check for staleness using isOutdatedPayload. If the payload is older than a defined threshold, we reject it. This helps mitigate replay attacks or delayed replays.

function hasBotUserAgent(fingerprint) {
    const uaLower = fingerprint.userAgent.toLowerCase();
    return uaLower.includes('headless') || uaLower.includes('bot') || uaLower.includes('crawler') || uaLower.includes('spider');
}

function hasWebDriverTrue(fingerprint) {
    return fingerprint.webdriver;
}

function hasHeadlessChromeScreenResolution(fingerprint) {
    return (fingerprint.screen.width === 800 && fingerprint.screen.height === 600) || 
           (fingerprint.screen.availWidth === 800 && fingerprint.screen.availHeight === 600);
}

function hasPlaywright(fingerprint) {
    return fingerprint.playwright;
}

function hasCDPAutomation(fingerprint) {
    const cdpInMainContext = fingerprint.cdp;
    const cdpInWorker = fingerprint.worker.cdp;
    return cdpInMainContext || cdpInWorker;
}

function hasOSInconsistency(fingerprint) {
    return fingerprint.userAgent.includes('Win') && fingerprint.platform.includes('Mac');
}

function hasHighCPUCoresCount(fingerprint) {
    return fingerprint.cpuCores > 90;
}

function hasWorkerInconsistency(fingerprint) {
    if (!fingerprint.worker || fingerprint.worker.userAgent === 'NA') {
        return false;
    }

    const hasInconsistency = !(
        fingerprint.worker.webGLVendor === fingerprint.webgl.unmaskedVendor &&
        fingerprint.worker.webGLRenderer === fingerprint.webgl.unmaskedRenderer &&
        fingerprint.worker.userAgent === fingerprint.userAgent &&
        fingerprint.worker.languages === fingerprint.languages &&
        fingerprint.worker.platform === fingerprint.platform &&
        fingerprint.worker.hardwareConcurrency === fingerprint.cpuCores
    );

    return hasInconsistency;
}

function isOutdatedPayload(fingerprint, maxMinutes) {
    if (!fingerprint.timestamp) return true;
    const timestamp = new Date(fingerprint.timestamp);
    const now = new Date();
    const diff = now.getTime() - timestamp.getTime();
    return diff > 1000 * 60 * maxMinutes;
}

All detection functions are wrapped in safeEval to prevent a single faulty value from crashing the logic:

function isBot(fingerprint) {
    const safeEval = (fn, args) => {
        try {
            return fn(args);
        } catch (e) {
            return false;
        }
    };

    const botDetectionChecks = {
        botUserAgent: safeEval(hasBotUserAgent, fingerprint),
        webdriver: safeEval(hasWebDriverTrue, fingerprint),
        headlessChromeScreenResolution: safeEval(hasHeadlessChromeScreenResolution, fingerprint),
        playwright: safeEval(hasPlaywright, fingerprint),
        cdp: safeEval(hasCDPAutomation, fingerprint),
        osInconsistency: safeEval(hasOSInconsistency, fingerprint),
        workerInconsistency: safeEval(hasWorkerInconsistency, fingerprint),
        highCPUCoresCount: safeEval(hasHighCPUCoresCount, fingerprint),
    };

    return {
        isBot: Object.values(botDetectionChecks).some(check => check),
        numChecks: Object.values(botDetectionChecks).filter(check => check).length,
        checks: botDetectionChecks,
        isOutdatedPayload: safeEval(isOutdatedPayload, fingerprint, 15)
    };
}

module.exports = {
    isBot
};

The isBot function returns a structured object:

If isBot or isOutdatedPayload is true, we stop the request and return a generic error. This avoids giving feedback that could help attackers tune their spoofing.

This setup gives you a foundation that’s easy to extend: you can add more rules, refine your thresholds, or change your verdict logic, all without touching the rest of your login flow.

Fingerprint based rate limiter

Our fingerprint-based rate limiter builds on the express-rate-limit package. By default, this package limits traffic using the IP address as the aggregation key—but that isn’t sufficient when facing credential stuffing or bot attacks that rotate IPs. Fortunately, express-rate-limit exposes a keyGenerator option, which allows us to use a custom key instead. That’s where the fingerprint comes in.

Why not rely on IP alone?

IP-based rate limiting is still useful and should remain part of your defense stack. It makes attackers pay more to scale their operation, since they need access to residential or proxy IPs. But once they rotate IPs, which they often do, IP-based limits lose their effectiveness. A fingerprint-based rate limiter adds an additional layer: instead of counting attempts per IP, we count them per device fingerprint. This helps catch distributed attacks that reuse the same environment while hopping across IPs.

Implementation

Here’s how our fingerprint-based limiter is configured. We apply a threshold of 50 attempts per fingerprint within a 15-minute window. When the limit is exceeded, we reject the request with a 400 response.

const loginRateLimiter = rateLimit({
    windowMs: 15 * 60 * 1000, // 15 minutes
    limit: 50, // Limit each fingerprint to 5 login attempts per 15 minutes
    keyGenerator: (req) => {
        // Compute fingerprint hash directly here
        if (req.sanitizedData && req.sanitizedData.fingerprint) {
            const rateLimitHash = computeRateLimitFingerprintHash(req.sanitizedData.fingerprint);
            return rateLimitHash;
        }
        // Return a default key if no fingerprint is available
        return 'default-key';
    },
    handler: (req, res) => {
        console.log('Login route handler: Rate limit exceeded');
        return res.status(400).json({
            success: false,
            message: 'Invalid login attempt'
        });
    },
    skip: (req) => {
        // Skip rate limiting if it's a bot (bots are handled separately)
        return req.isBot === true;
    },
    requestPropertyName: 'rateLimit', // Attach rate limit info to req.rateLimit
    standardHeaders: true, // Enable RateLimit headers
    legacyHeaders: false, // Disable X-RateLimit headers
});

Tuning the thresholds

Rate limiting always involves tradeoffs. A short window with a low threshold can block bursty attacks quickly, but may miss low-and-slow ones. A long window with a higher threshold catches slower attempts but increases the risk of false positives.

There’s no universal answer here, you’ll need to calibrate your limits based on real traffic data. A common strategy is to use multiple limiters in parallel:

How we hash the fingerprint

Let’s revisit the keyGenerator logic. It calls computeRateLimitFingerprintHash to transform the raw fingerprint into a stable, spoof-resistant key:

keyGenerator: (req) => {
    // Compute fingerprint hash directly here
    if (req.sanitizedData && req.sanitizedData.fingerprint) {
        const rateLimitHash = computeRateLimitFingerprintHash(req.sanitizedData.fingerprint);
        console.log('Rate limiter: Hash computed:', rateLimitHash);
        return rateLimitHash;
    }
    // Return a default key if no fingerprint is available
    return 'default-key';
}

Now, why not just hash the entire fingerprint with JSON.stringify? Because in practice, attackers randomize attributes to evade detection—especially the user agent, which is one of the easiest values to spoof.

If we included the entire stringified fingerprint, changing a single character in the user agent would completely change the hash. That would make the rate limiter trivial to bypass.

Instead, we want to build a resilient aggregation key: one that ignores noisy or attacker-controlled attributes, but still captures enough structure to link similar environments.

Strategy

We apply the following principles when selecting fields for the hash:

This helps ensure that devices with slightly different but forged environments still map to the same rate-limiting bucket.

function safeConvertToString(value) {
    if (typeof value === 'undefined' || value === null || value === undefined) {
        return 'NA';
    }
    return value.toString();
}
		
function computeRateLimitFingerprintHash(fingerprint) {
    const dataHash = [
        // We don't use the user agent since it can be spoofed too easily

        safeConvertToString(fingerprint.cpuCores),
        safeConvertToString(fingerprint.deviceMemory),
        safeConvertToString(fingerprint.language),
        safeConvertToString(fingerprint.languages),
        safeConvertToString(fingerprint.timezone),
        safeConvertToString(fingerprint.platform),
        safeConvertToString(fingerprint.maxTouchPoints),
        safeConvertToString(!!fingerprint.webdriver),
        safeConvertToString(fingerprint.webgl.unmaskedRenderer),
        safeConvertToString(fingerprint.webgl.unmaskedVendor),

        // Screen-related signals
        safeConvertToString(fingerprint.screen.width),
        safeConvertToString(fingerprint.screen.height),
        safeConvertToString(fingerprint.screen.colorDepth),
        safeConvertToString(fingerprint.screen.availWidth),
        safeConvertToString(fingerprint.screen.availHeight),   
        safeConvertToString(fingerprint.playwright),
        safeConvertToString(fingerprint.cdp),

        // Worker signals
        safeConvertToString(fingerprint.worker.webGLVendor),
        safeConvertToString(fingerprint.worker.webGLRenderer),
        safeConvertToString(fingerprint.worker.languages),
        safeConvertToString(fingerprint.worker.platform),
        safeConvertToString(fingerprint.worker.hardwareConcurrency),
        safeConvertToString(fingerprint.worker.cdp),

        // If the canvas is randomized, we don't use the hash, we just ignore it to make the fingerprint more stable
        fingerprint.canvas.hasAntiCanvasExtension || fingerprint.canvas.hasCanvasBlocker ? 'IGNORE' : fingerprint.canvas.hash,
    ]
    const hash = crypto.createHash('sha256').update(dataHash.join('')).digest('hex');

    return hash;
}

Of course, this field selection is subjective. Everything client-side can be modified. But in practice, certain attributes (like the user agent or languages) are modified far more often than others, and thus make poor keys for long-lived tracking or rate limiting.

Possible improvements

The techniques introduced across these two articles, client-side fingerprinting, payload encryption, bot heuristics, and fingerprint-based rate limiting, are designed to be practical foundations for real-world bot detection. While the implementation itself is a proof of concept, the concepts are production-relevant and can serve as a lightweight first layer of protection.

Used correctly, this layer can help block obvious automated traffic before handing off requests to more expensive third-party detection systems. This not only reduces operational cost but also filters low-effort attacks early.

That said, there’s plenty of room to harden and extend this setup.

Strengthening client-side defenses

Basic obfuscation isn’t enough. The POC uses obfuscator.io via Webpack. While this helps deter casual analysis, it’s not robust against skilled reverse engineers. Tools like deobfuscate.io are designed specifically to unravel common obfuscation patterns, and will likely succeed against ours. For production, you’d need deeper protections, runtime integrity checks, and potentially VM-based obfuscation.

Static logic is a weakness. Our current script behaves the same for every user. The encryption key is hardcoded and constant, and the payload structure is predictable. An attacker could hook the encryption logic, replay it with forged values, and produce a "valid" payload without executing the real signal collection. A more resilient system would rotate keys per session or per user, ideally tying encryption to server-issued tokens or session secrets.

Fingerprint depth and tamper signals are limited. The current signal set is narrow: mostly browser- and hardware-level attributes. A more complete implementation would:

Hardening server-side logic

Use multiple rate-limiting windows. The current fingerprint-based rate limiter operates with a single window and threshold. In practice, a layered rate limiter is more effective. For example:

Limit based on failed attempts. Right now, rate limits are applied to all login attempts. A more forgiving approach would only countthe failed ones. This allows for repeated legitimate logins without penalty while still catching brute force patterns. For example, a fingerprint with many failed attempts across rotating IPs could be temporarily blocked without affecting valid users.

Tune thresholds based on fingerprint popularity. Not all fingerprints are equally rare. Many iPhones, for example, share near-identical environments. A static threshold might block those users too aggressively. Ideally, rate limits should be adaptive: fingerprints that appear rarely can be rate-limited more aggressively than ones that are common and tied to legitimate traffic.

Expand the detection rules. The isBot function is deliberately minimal. But a production system should go further. In particular:

These aren’t about absolute accuracy; they’re about layering heuristics to increase confidence without overfitting.

Broader architectural needs

Lack of visibility is dangerous. One major gap in the current system is observability. In production, you need to understand why requests were blocked, especially for debugging or tuning purposes. This means:

Even basic dashboards can provide early warning signs of misclassifications.

No risk-based context. Every user in the current system is treated the same. But user context matters. For example:

A system with adaptive risk scoring would treat new or risky contexts more cautiously, while allowing known users some leeway.

Detection logic should be decoupled. Currently, all detection logic is embedded in route middleware. This makes deployment risky, one logic error could block all logins. A better approach is to externalize detection logic into a decision engine or policy layer. Ideally, it should support dry runs, logging, and staged rollout so you can measure impact before enforcing rules.

These improvements are not exhaustive. But they highlight the difference between a basic anti-bot filter and a production-grade detection system. Moving toward the latter means not just better rules, but safer deployments, better observability, and a system that can evolve with attacker behavior.

Read next