In movies, hackers seemingly get into any system by manually guessing the password; in real life, bot-generated traffic is behind much of the fraud on the internet. It used to be easy to detect bots. Real users navigate your site with a mouse and a keyboard (or nowadays, a finger and a touch device), but bots have neither arms nor fingers. If a web event, say a login attempt, was made without any type of mouse or keyboard interaction, then that event was most likely generated by a bot.
Things became more complicated with the advent of the headless browser: a scriptable browser that can interact with a web page just like a regular user. Even though it's easy to instruct a headless browser to generate mouse interactions, it's not straightforward to generate human-looking interactions. This changed the bot-detection problem from simply having to check whether there were any mouse interactions to trying to detect if the interactions looked human or bot-like.
In this post, we'll take a look at three types of bot-generated mouse interactions and we'll discuss how these can be automatically detected. To have something to compare against, we asked the Castle team to generate some real, human mouse interactions, and in each pair of mouse patterns below one is from a bot and one is from a human. See if you can figure out which is which!
Here is our first pair of mouse interactions where one of the mouse paths below was made by a human and one was made by a bot. The way we track these mouse movements is by using the
mousemove event, an internal event triggered by all modern browsers. Each point in the paths below marks the location of the mouse pointer when a
mousemove event was triggered. Can you guess if path 1A or 1B was generated by a bot?
Yeah, this was too easy! Unless you're a bot yourself, I think you agree that path 1A looks suspicious. However, many simple bots do generate mouse paths that look like 1A. This is because, when programming a bot, the easiest way of moving the mouse around is simply by giving it the coordinates it should be at. This results in "impossible" mouse movements where the mouse pointer instantly jumps between two locations. A human, on the other hand, moves the mouse pointer by pushing it around and this generates relatively smooth paths which trigger many
A simple way of detecting bots like the one that generated 1A, is by counting the number of triggered
mousemove events and flagging interactions where the count is impossibly low. A count that is not in the hundreds is generally suspicious and, in the paths above, the human mouse movement triggered 378
mousemove events while the bot triggered only four.
Which of the two paths below was generated by a bot?
Ah, easy again! But at least the bot that made 2B is making an effort this time, but barely. The most straightforward way to program a bot to move the mouse around is to make it move it in, you guessed it, straight lines. And, while straight lines are easy for you to identify visually, it is harder to come up with an automated detection rule. The old trick of counting the number of
mousemove events won't work here as both 1A and 1B have around 120 events each.
By doing a bit of feature engineering we can get closer to detecting straight lines. But first, what does it mean to move in a straight line? It's not a philosophical question, but a question about physics. To move in a straight line is to move at a constant speed, which means you move with zero acceleration. The plot below shows the X-axis position, the X-axis speed, and the X-axis acceleration for the human (2A) and the bot (2B).
Here, one can see that as the bot moves in a straight line, the speed is constant and the acceleration is zero, except when the bot changes direction. This is very different from how the human acceleration looks, which is almost never close to zero. This observation gives us another simple detection rule; if the acceleration of a mouse path is almost always zero then it was likely generated by a bot.
The final test. Which of the paths below was generated by a bot?
This time it's harder! Both 3A and 3B consist of many
mousemove events, and both feature smooth curves rather than straight lines. Maybe too smooth?
While 3B is zig-zagging erratically in a human-like fashion, the curves of 3A are generated by a bot drawing Bézier curves. Made popular by the French engineer Pierre Bézier, these curves are commonly used in computer graphics and, when used to create mouse paths, are much harder to differentiate from human mouse movements, both visually and by using detection rules. However, using a combination of more clever feature engineering and machine learning, it's possible to detect these kinds of bots too.
To protect your site from spam registrations and account takeover attempts it's important to be able to filter out web events that are generated by bots rather than by your flesh-and-blood users. While bots are constantly getting more sophisticated, one can still make it harder for them to masquerade as real users by a combination of feature engineering and machine learning.