Quantcast Bot is the name of a web crawler used by Quantcast for advertisement quality assurance and to understand page content for Interest-Based Audiences.
The Quantcast Bot uses a specific User Agent (referred to as Quantcastbot) in HTTP requests to allow third parties, such as ad servers, analytics providers, webmasters, publishers and inventories to distinguish these requests from regular internet traffic.
The request header used is:
User-Agent: Quantcastbot/1.0 (+https://www.quantcast.com/bot) or
User-Agent: Quantcastbot/2.0 (+https://www.quantcast.com/bot)
Use case: Quality Assurance
Quantcast uses automated Quality Assurance (QA) technology to ensure that the advertisements we deliver meet our creative specifications. Advertisements are regularly and autonomously checked by Quantcast’s QA technology while being delivered to website visitors.
For ad servers and analytics providers
Quantcast’s QA technology will trigger impression and click trackers while checking an advertisement. The HTTP requests generated will include the Quantcastbot User Agent, and should be discounted from impression and click counts as non-human traffic. You should not alter the behaviour of an advertisement when it is requested by the Quantcastbot User Agent.
Quantcast’s QA technology will not access your website unless Quantcast is trafficking advertisements that click through to your website. Advertisements are checked regularly, so you may see several visits from Quantcastbot while advertisements are being delivered. The bot will only visit the landing page of an advertisement; the bot does not crawl your website content. As the bot follows a click on an advertisement, it doesn’t obey the rules specified in your website’s robots.txt file. This is to ensure that the advertisement landing page meets our guidelines, in addition to the originating advertisement.
Use case: Interest-Based Audiences
Quantcast crawls a daily selection of 2-5 million web sites to understand page content. We process website header to extract title and description fields; and website body to extract text fields such as headers and paragraphs. This data supports our Interest-Based Audiences (IBA) technology and is used to help categorize audience interests and deliver relevant online advertising to relevant audiences.
Quantcast’s IBA technology may access our publishers sites and other websites we deliver ads on. When Quantcast crawls web sites to understand page content, we do honor disallows for either the specified url or the full domain on robots.txt. For any domain, we rate limit our calls to about 1000 urls per day evenly spread out over a period of several hours.
Verifying the Quantcastbot User Agent
You can verify if a request accessing your server really is Quantcastbot by checking if the originating IP is one of the following:
This is useful if you are concerned that spammers are accessing your site while claiming to be Quantcastbot.
Blocking the Quantcastbot User Agent
Quantcast bot will read and parse the robots.txt file from the root of your site e.g. http://www.example.com/robots.txt. See the examples below showing how to allow or block Quantcastbot:
# Disallowing crawling of the entire website for only the Quantcastbot:
# Allow access to all crawlers:
# Allow access to only Quantcastbot
# Disallow crawling of specific directories and websites in those directories: