Amazon’s Simple Way of Detecting Robotic Ad Clicks using Deep Learning

Addressing a billion dollar problem with Machine Learning

4 min readJun 25, 2024

Yes it’s a cliche, but don’t underestimate the importance of good data. Take Amazon for example. They solve a multi-billion dollar problem using a pretty simple model. Let’s talk about how.

Amazon has to detect robotic clicks on its platforms to maintain its search. This is a very important problem, where accuracy is a must- incorrectly labeling a robotic click as human causes advertisers to lose money, and incorrectly labeling a human as a robot eats into Amazon’s profits.

Their method of accomplishing it is brilliantly simple- they combine data from various dimensions into one input point- which is then fed to a simple model for classification. The data relies on the following dimensions-

User-level frequency and velocity counters- compute volumes and rates of clicks from users over various time periods. These enable identification of emergent robotic attacks that involve sudden bursts of clicks.
User entity counters keep track of statistics such as number of distinct sessions or users from an IP. These features help to identify IP addresses that may be gateways with many users behind them.
Time of click tracks hour of day and day of week, which are mapped to a unit circle. Although human activity follows diurnal and weekly activity patterns, robotic activity often does not.
Logged-in status differentiates between customers and non-logged-in sessions as we expect a lot more robotic traffic in the latter.

The data is supplemented by using a policy called Manifold Mixup. The team relies on this technique because the data is not very high-dimensional. Carelessly mixing data up would thus lead to high mismatch and information loss. Instead, they “leverage ideas from Manifold Mixup for creating noisy representations from the latent representations of hidden states.” This part is not simple, but as you can see- it’s only one component out of a much larger setup.

I love this approach because it highlights 2 key things-

Good data/inputs are more than enough, even in complex real-world challenges. Instead of tuning to death, focus on improving the quality of data.
Domain knowledge is key (look at how it’s required to feature engineer). Too many AI teams arrogantly believe that they can ML Engineer their way w/o studying the underlying domain. This is a good way to waste your time and money.

For more insight into how Amazon detects robotic ad clicks, read the following-

How Amazon tackles a multi-billion dollar bot problem[Breakdowns]

How to detect Robotic Ad Clicks.

artificialintelligencemadesimple.substack.com

If you liked this article and wish to share it, please refer to the following guidelines.

That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.

I put a lot of effort into creating work that is informative, useful, and independent from undue influence. If you’d like to support my writing, please consider becoming a paid subscriber to this newsletter. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.

Help me buy chocolate milk

PS- We follow a “pay what you can” model, which allows you to support within your means. Check out this post for more details and to find a plan that works for you.

I regularly share mini-updates on what I read on the Microblogging sites X(https://twitter.com/Machine01776819), Threads(https://www.threads.net/@iseethings404), and TikTok(https://www.tiktok.com/@devansh_ai_made_simple)- so follow me there if you’re interested in keeping up with my learnings.