Cloudflare accuses Perplexity of stealth web scraping, Perplexity hits back

Cloudflare has accused AI startup Perplexity of using hidden bots to access websites that had blocked them, even bypassing robots.txt files and firewalls. Perplexity denies the charges, claiming its tool only fetches live content when users ask questions and doesn't store or train on it.

Cloudflare says Perplexity secretly scraping sites, sparks AI data fight

Siddharth Shankar | Updated on: Aug 05, 2025 | 12:03 PM

New Delhi: A fresh dispute has broken out in the AI world. This time, it's between internet infrastructure company Cloudflare and AI-powered search startup Perplexity. On August 4th, Cloudflare published a blog accusing Perplexity of using "stealth crawling" tactics to bypass no-crawl rules set by website owners; a charge that Perplexity strongly denies.

According to Cloudflare, its tests showed that Perplexity’s web crawlers were accessing content from sites that had specifically blocked them through robots.txt and firewall rules. This included newly created test websites that had never been indexed or linked publicly.

What Cloudflare found about Perplexity?

Cloudflare explained that Perplexity’s declared bots, called "PerplexityBot" and "Perplexity-User", were correctly being blocked. But despite that, Perplexity was still able to answer queries with detailed content from those sites.

Cloudflare alleges the company was using undeclared bots that pretended to be Chrome browsers on macOS. These bots, the report says, switched IP addresses and even entire networks (ASNs) to avoid detection. Cloudflare stated, “Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity.”

They even observed as many as 20 to 25 million requests daily coming from Perplexity’s declared bots, and another 3 to 6 million from the stealth version.

Cloudflare has now removed Perplexity’s bots from its verified list and added rules to block what it calls “stealth crawling behavior.”

Perplexity responds: "We’re not a bot"

It seems like Perplexity didn’t take the accusation lightly. In a long response, the startup said Cloudflare completely misunderstood how its AI agents work. It said Perplexity doesn’t crawl websites to build giant databases. Instead, its system works more like a digital assistant that only fetches content when a user asks something specific.

“When Perplexity fetches a webpage, it's because you asked a specific question requiring current information,” the company said. “The content isn't stored for training, it's used immediately to answer your question.”

The startup also said that Cloudflare misattributed some web traffic to Perplexity that actually came from BrowserBase, a third-party cloud browser it occasionally uses. Perplexity claimed this represented fewer than 45,000 daily requests.

On Cloudflare’s technical analysis, the company said, “Cloudflare fundamentally misattributed 3–6 million daily requests from BrowserBase’s automated browser service to Perplexity.” They added, “This is a basic traffic analysis failure that's particularly embarrassing for a company whose core business is understanding and categorizing web traffic.”

Why fight over websites?

This clash reflects deeper tensions over how AI companies collect data from the open internet. Tools like ChatGPT, Perplexity, and others rely on fresh content to answer real-time queries. But many websites, including news publishers and software firms, are now blocking crawlers out of fear their content is being used without permission.

Add TV9 English As A Trusted Source

In contrast to Perplexity, Cloudflare praised OpenAI for respecting web crawling rules and clearly explaining what their bots do. “When we ran the same test with ChatGPT, we found that ChatGPT-User fetched the robots.txt file and stopped crawling when it was disallowed,” Cloudflare noted.

It remains to be seen if this public fallout will lead to stricter web crawling standards, new policies, or lawsuits.

Cloudflare accuses Perplexity of stealth web scraping, Perplexity hits back

Cloudflare has accused AI startup Perplexity of using hidden bots to access websites that had blocked them, even bypassing robots.txt files and firewalls. Perplexity denies the charges, claiming its tool only fetches live content when users ask questions and doesn't store or train on it.

What Cloudflare found about Perplexity?

Perplexity responds: "We’re not a bot"

Why fight over websites?

Latest

Uttarakhand Weather: Dense fog alert in five districts, snowfall continues in Badrinath-Kedarnath, Mussoorie hit by hailstorm

Uttarakhand sets new tourism record; over 6 crore visitors in 2025, Haridwar tops with 3.42 crore pilgrims

Pat Cummins' IPL 2026 playing chances rest on recovery from back injury

Simple nail care guide to prevent damage for women who cook daily

Scare prompts Air India to reinspect fuel control switches on Boeing 787s

Mysterious figure appears in sky during extreme solar storms

Report: Porsche could kill their electric sports car project

The Epstein Files probe: Bill and Hillary Clinton agree to testify before Congress, slam Committee for 'partisan-politics'

Aviation watchdog orders special audit of VSR Ventures as probe into Ajit Pawar plane crash intensifies

{{ item.title }}