Ad fraud comes in many shapes and is continually changing. At AdExchanger’s recent Clean Ads I/O conference I heard two of the speakers, Michael Tiffany, co-founder and CEO of White Ops, and Douglas de Jager, engineering lead and manager of Google Ad Traffic Quality, mention the difference between “interesting” fraud and “non-interesting” fraud.
“Interesting” fraud is where hardworking adversaries are carefully trying to pass off fake traffic as legitimate. For example, they may limit the amount of traffic coming from a particular robot, and take many other steps to try to make the traffic look similar to typical human traffic.
“Non-interesting” is where the fraudster or source of the traffic is not being careful and is leaving obvious indications that they are not legitimate traffic. In some cases, the non-human traffic may be serving some non-advertising purpose and loading webpages (and thus ads) for some legitimate reason. In these cases, the person running the robot is not intending to drive non-human ads, but if the ad sellers, exchanges and buyers aren’t careful, they may accidentally buy or sell ads that will not be shown to a human. For example, web crawlers (frequently referred to as spiders) analyze many sites/pages on the Internet to provide high quality search results. These crawlers load each page and may often load the part of the page that contains an ad.
As fraudulent players continue to find opportunities to exploit the rapid, continued growth and investment in digital advertising with “interesting” and “non-interesting” fraud, the industry has taken great strides to combat fraud. Organizations across the ad tech ecosystem have developed protocols and technologies to combat fraud as nefarious players vigilantly evolve their methods. At Quantcast, we have a dedicated Inventory Quality team of specialized engineers devoted to tackling this industry challenge. We continuously monitor fraudulent behavior and advance techniques that detect constantly evolving non-human behavioral patterns in order to protect both our advertiser and publisher partners.
In this blogpost, I will share a couple of basic approaches that are widely used in the industry to identify certain forms of “non-interesting” fraud and how to apply the findings. Through analyzing bid requests and cookie behavior on real-time bidding exchanges we will look into how non-human traffic (NHT) presents itself and can change over time.
Time of Day Analysis
First, let’s look at the number of bid requests happening during the day. I have plotted the percent of bid requests across several exchanges by the hour of the day in the graph below. At hour 0 (midnight to 1 a.m.), we see that the number of bid requests is steadily decreasing until around 4 a.m., when most people are sleeping. Then in the middle of the day, when most people are awake, the exchanges see the largest amount of bid requests. While one of the exchanges is particularly active during work hours, they all follow the general trend of serving more ads when more people are awake and fewer ads when more people are asleep.
Let’s dig into the data a bit further. Rather than just looking at the exchanges as a whole, we’ll segment the bid requests by the ISP (Internet Service Provider) of the user that is going to view the ad. The ISP is the business that is responsible for connecting your computer to the Internet. If you use Internet at home, your ISP will likely be a large organization such as AT&T, Comcast, Time Warner, Verizon, etc. If you are a larger business, you may have your own ISP.
Next, I plot the percent of bid requests from each ISP by hour of the day. As we see, many of these ISPs follow the same pattern we saw with the exchanges: more bid requests during waking hours and fewer during sleeping hours. The light blue org has a much sharper pattern with most bid requests occurring during working hours. When we investigated this org, we found that it was an ISP specializing in providing Internet services to businesses.
However, some orgs have very different time of day patterns. In the next graph, I show the time of day activity for several ISPs that have a flatter pattern. These ISPs tend to serve a more equal number of bid requests, regardless of the time of day, showing only slightly more bid requests at 4 p.m. than at 4 a.m. For contrast, the black line is the average activity of the larger orgs. So why would the users at an ISP be browsing the web just as frequently at night as they are during the day? When we investigate the ISPs with this flat pattern, we find that many of them are data centers, businesses that run automated machines to browse the Internet. If we serve ads to these machines, the ads will not be viewed by humans.
Which exchanges are serving these bid requests? In the graph below, I plot the percentage of bid requests per exchange that come from these suspicious ISPs. The blue bars show the percentage of bid requests from a few months ago, and the red bars show the analysis from a few days ago.
From this graph, we see that 1) some exchanges have several times as much suspicious traffic from data centers as other exchanges 2) the pattern of suspicious traffic changes over time. Additionally, we can see that some exchanges had a low rate of suspicious data center traffic for months, some exchanges have reduced the amount of suspicious traffic, and some exchanges are still sending high rates of suspicious traffic.
Some cookies also behave in very suspicious manners – for example, a cookie that is consistently active for very long periods of time, or one that visits many more webpages than a human would be able to view. There are many other ways of classifying cookie behavior as suspicious, but for the purpose for our analysis, I restricted the criteria to these simple, conservative behaviors. In the graph below, I plot the percentage of bid requests that come from highly suspicious cookies per exchange. Again we see that there is a dramatic difference in the percent of suspicious bid requests by exchange; some exchanges have over 10 times the percent of suspicious bid requests than other exchanges.
Even in these simpler, clear-cut examples of non-human traffic on real-time bidding exchanges, we see varying degrees of suspicious traffic among the different exchanges. This may indicate several things. First, that each exchange uses different algorithms and techniques to monitor and block bad bid requests, and their approaches have varying degrees of effectiveness. Second, that the thresholds and tolerance for suspicious inventory varies by exchange. And third, that as fraudulent activity evolves, the amount of suspicious activity in a given exchange can increase or decrease.
This basic analysis of the more easily detected forms of fraud highlights how fraud is ever-changing and teaches us how we need to continuously monitor inventory for fraud:
- There is a lot of variability in the levels of even the most obvious forms of non-human traffic on exchanges. Media buying partners need to screen inventory and have robust fraud prevention methods.
- Fraud levels on exchanges can change drastically in a short period of time. Fraud continues to take new forms and even as some forms of fraud are caught others emerge. Fraud prevention needs to be an ongoing process.
- There is no one-size-fits-all approach to tackling fraud. Different exchanges and inventory sources may experience different types of fraud. Fraud prevention methods need to be agile and adaptive.
Fraud is a continuous and evolving problem, and adversaries will continue to try to find new ways to avoid detection. Fraud prevention requires everyone — suppliers, buyers and all others — to work vigilantly to fight fraud every day. This is what the Inventory Quality team at Quantcast does day in, day out. Through monitoring fraud and testing new fraud detection approaches we learn the intricacies of fraudulent behavior and uncover new forms of both the “interesting” and “non-interesting” fraud.
Written by Durban Frazer on behalf of the Inventory Quality team at Quantcast
Durban Frazer leads the Brand and Inventory Quality team at Quantcast. He has worked at Quantcast for five years, building and improving Performance and Brand models and software for both real-time bidding and publisher products. He greatly enjoys tackling massive data problems and discovering interesting patterns in these human and non-human datasets.