By signing in or creating an account, you agree with Associated Broadcasting Company's Terms & Conditions and Privacy Policy.
New Delhi: A fresh dispute has broken out in the AI world. This time, it's between internet infrastructure company Cloudflare and AI-powered search startup Perplexity. On August 4th, Cloudflare published a blog accusing Perplexity of using "stealth crawling" tactics to bypass no-crawl rules set by website owners; a charge that Perplexity strongly denies.
According to Cloudflare, its tests showed that Perplexity’s web crawlers were accessing content from sites that had specifically blocked them through robots.txt and firewall rules. This included newly created test websites that had never been indexed or linked publicly.
Cloudflare explained that Perplexity’s declared bots, called "PerplexityBot" and "Perplexity-User", were correctly being blocked. But despite that, Perplexity was still able to answer queries with detailed content from those sites.
Cloudflare alleges the company was using undeclared bots that pretended to be Chrome browsers on macOS. These bots, the report says, switched IP addresses and even entire networks (ASNs) to avoid detection. Cloudflare stated, “Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity.”
They even observed as many as 20 to 25 million requests daily coming from Perplexity’s declared bots, and another 3 to 6 million from the stealth version.
Cloudflare has now removed Perplexity’s bots from its verified list and added rules to block what it calls “stealth crawling behavior.”
It seems like Perplexity didn’t take the accusation lightly. In a long response, the startup said Cloudflare completely misunderstood how its AI agents work. It said Perplexity doesn’t crawl websites to build giant databases. Instead, its system works more like a digital assistant that only fetches content when a user asks something specific.
“When Perplexity fetches a webpage, it's because you asked a specific question requiring current information,” the company said. “The content isn't stored for training, it's used immediately to answer your question.”
The startup also said that Cloudflare misattributed some web traffic to Perplexity that actually came from BrowserBase, a third-party cloud browser it occasionally uses. Perplexity claimed this represented fewer than 45,000 daily requests.
On Cloudflare’s technical analysis, the company said, “Cloudflare fundamentally misattributed 3–6 million daily requests from BrowserBase’s automated browser service to Perplexity.” They added, “This is a basic traffic analysis failure that's particularly embarrassing for a company whose core business is understanding and categorizing web traffic.”
This clash reflects deeper tensions over how AI companies collect data from the open internet. Tools like ChatGPT, Perplexity, and others rely on fresh content to answer real-time queries. But many websites, including news publishers and software firms, are now blocking crawlers out of fear their content is being used without permission.
In contrast to Perplexity, Cloudflare praised OpenAI for respecting web crawling rules and clearly explaining what their bots do. “When we ran the same test with ChatGPT, we found that ChatGPT-User fetched the robots.txt file and stopped crawling when it was disallowed,” Cloudflare noted.
It remains to be seen if this public fallout will lead to stricter web crawling standards, new policies, or lawsuits.