Perplexity AI Accused of Sneaky Web Scraping Tactics

Perplexity, an AI search startup, is reportedly bypassing established restrictions designed to keep its web crawlers from accessing various websites. This allegation comes from a detailed analysis by Cloudflare, which asserts that the company alters its crawling identity to evade website preferences when it encounters access blocks.

The findings amplify worries regarding Perplexity’s practice of collecting content without consent, harking back to incidents last year when the firm was found to be circumventing paywalls and disregarding robots.txt directives. At that time, Perplexity’s CEO, Aravind Srinivas, attributed the behavior to third-party crawlers utilized by the service.

Cloudflare, a leading internet architecture provider, reported receiving multiple complaints from clients alleging that Perplexity’s bots continued to access their sites despite implementing directives in their robots.txt files and setting up Web Application Firewall (WAF) rules to block these AI crawlers.

To verify these claims, Cloudflare created new domains with similar restrictions against the AI scrapers from Perplexity. The company discovered that Perplexity’s initial attempt to access these sites involved identifying itself under the names of its crawlers, which included “PerplexityBot” or “Perplexity-User.”

However, when faced with restrictions against AI scraping, Cloudflare asserts that Perplexity alters its user agent—a piece of information indicating the type of browser or device in use—to impersonate Google Chrome on macOS. This “undeclared crawler” reportedly employs “rotating” IP addresses not listed in the documentation provided by Perplexity for its bots.

Moreover, Cloudflare claims that Perplexity changes its autonomous system networks (ASN)—which are identifiers for groups of IP networks controlled by one operator—to further navigate around blocks. This activity has reportedly been recorded across tens of thousands of domains and involves millions of requests daily.

In response to the allegations, Perplexity spokesperson Jesse Dwyer labeled Cloudflare’s report a “publicity stunt,” suggesting that it contained numerous inaccuracies.

Additionally, Perplexity has issued a statement on its website, asserting that Cloudflare mischaracterized 20 to 25 million user agent requests as being associated with AI scrapers. Perplexity states that user-driven agents operate solely upon specific user requests, fetching only the necessary content to fulfill these requests. The company also argued that Cloudflare mistakenly conflated their operations with “3-6M daily requests of unrelated traffic from BrowserBase,” a cloud browser for AI agents that Perplexity claims it uses only sporadically.

Following these developments, Cloudflare has removed Perplexity’s status as a verified bot and implemented measures to thwart the startup’s stealth crawling efforts.

Matthew Prince, CEO of Cloudflare, has previously expressed concerns regarding the potential threat posed by AI to publishers. Recently, Cloudflare began allowing websites to request payment from AI companies for crawling their content and has instituted default blocks against AI crawlers.

Update, August 5th: Included statement from Perplexity.

Perplexity AI Accused of Sneaky Web Scraping Tactics

Share This Post

or copy the link

Related

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Related News

Privacy Alert: TikTok Sellers Push Fake Covers for Ray-Ban Glasses

OpenAI’s GPT Model Now Runs on Windows PCs for Free!

Windows 2030: Microsoft’s Bold AI-Driven Future Revealed!

Apple Launches AI Chatbot for 24/7 Support Assistance

Comet & Dia Browsers Unveil Game-Changing Automation Tools

Write a Reply Cancel