TV9
user profile
Sign In

By signing in or creating an account, you agree with Associated Broadcasting Company's Terms & Conditions and Privacy Policy.

Cloudflare accuses Perplexity of stealth web scraping, Perplexity hits back

Cloudflare has accused AI startup Perplexity of using hidden bots to access websites that had blocked them, even bypassing robots.txt files and firewalls. Perplexity denies the charges, claiming its tool only fetches live content when users ask questions and doesn't store or train on it.

Cloudflare says Perplexity secretly scraping sites, sparks AI data fight
Cloudflare says Perplexity secretly scraping sites, sparks AI data fight
| Updated on: Aug 05, 2025 | 12:03 PM
Share
Trusted Source

New Delhi: A fresh dispute has broken out in the AI world. This time, it's between internet infrastructure company Cloudflare and AI-powered search startup Perplexity. On August 4th, Cloudflare published a blog accusing Perplexity of using "stealth crawling" tactics to bypass no-crawl rules set by website owners; a charge that Perplexity strongly denies.

According to Cloudflare, its tests showed that Perplexity’s web crawlers were accessing content from sites that had specifically blocked them through robots.txt and firewall rules. This included newly created test websites that had never been indexed or linked publicly.

Also Read

What Cloudflare found about Perplexity?

Cloudflare explained that Perplexity’s declared bots, called "PerplexityBot" and "Perplexity-User", were correctly being blocked. But despite that, Perplexity was still able to answer queries with detailed content from those sites.

Cloudflare alleges the company was using undeclared bots that pretended to be Chrome browsers on macOS. These bots, the report says, switched IP addresses and even entire networks (ASNs) to avoid detection. Cloudflare stated, “Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity.”

They even observed as many as 20 to 25 million requests daily coming from Perplexity’s declared bots, and another 3 to 6 million from the stealth version.

Cloudflare has now removed Perplexity’s bots from its verified list and added rules to block what it calls “stealth crawling behavior.”

Perplexity responds: "We’re not a bot"

It seems like Perplexity didn’t take the accusation lightly. In a long response, the startup said Cloudflare completely misunderstood how its AI agents work. It said Perplexity doesn’t crawl websites to build giant databases. Instead, its system works more like a digital assistant that only fetches content when a user asks something specific.

“When Perplexity fetches a webpage, it's because you asked a specific question requiring current information,” the company said. “The content isn't stored for training, it's used immediately to answer your question.”

The startup also said that Cloudflare misattributed some web traffic to Perplexity that actually came from BrowserBase, a third-party cloud browser it occasionally uses. Perplexity claimed this represented fewer than 45,000 daily requests.

On Cloudflare’s technical analysis, the company said, “Cloudflare fundamentally misattributed 3–6 million daily requests from BrowserBase’s automated browser service to Perplexity.” They added, “This is a basic traffic analysis failure that's particularly embarrassing for a company whose core business is understanding and categorizing web traffic.”

Why fight over websites?

This clash reflects deeper tensions over how AI companies collect data from the open internet. Tools like ChatGPT, Perplexity, and others rely on fresh content to answer real-time queries. But many websites, including news publishers and software firms, are now blocking crawlers out of fear their content is being used without permission.

In contrast to Perplexity, Cloudflare praised OpenAI for respecting web crawling rules and clearly explaining what their bots do. “When we ran the same test with ChatGPT, we found that ChatGPT-User fetched the robots.txt file and stopped crawling when it was disallowed,” Cloudflare noted.

It remains to be seen if this public fallout will lead to stricter web crawling standards, new policies, or lawsuits. 

{{ articles_filter_432_widget.title }}