Why Cloudflare Blocked Perplexity AI
Cloudflare has taken the unusual step of delisting and actively blocking Perplexity AI from crawling websites after discovering the company was using stealth tactics to bypass robots.txt files (the standard web protocol that tells crawlers which content they can and cannot access). The move highlights growing tensions between AI companies and website owners over data usage and web standards.
This article covers:
What Perplexity Did Wrong
Cloudflare’s investigation revealed that Perplexity was repeatedly attempting to circumvent website access restrictions. When websites explicitly blocked Perplexity’s declared crawlers through robots.txt files and firewall rules, the company allegedly responded by:
- Disguising its bots as regular Chrome browsers on Mac systems to appear like human users
- Switching to undeclared IP addresses outside Perplexity’s official range to avoid detection systems
- Rotating through different ASNs (network identifiers that group IP addresses) across tens of thousands of domains
To verify this behaviour, Cloudflare created test domains with restrictive robots.txt files that had never been indexed by search engines or made publicly accessible. When queried about these domains, Perplexity was still able to provide detailed information, proving it was accessing blocked content.
Why This Matters for Business Websites
For over 30 years, the web has operated on an honour system. Websites put up a robots.txt file, essentially a “No Trespassing” sign, and companies respected it voluntarily. Search engines index your content and send you traffic in return. It’s a fair trade that benefits everyone. Perplexity’s alleged violations represent a fundamental breach of web etiquette.
This creates three key problems for businesses:
- AI tools significantly reduce website traffic when they scrape content and provide answers directly to users. Research shows that traditional search engines like Bing generate one human visitor for every 11 scrapes. OpenAI’s ratio is 179:1. For Perplexity, it’s 369:1.
- You invest considerable resources in creating valuable content to attract customers and build authority. If AI companies can simply ignore your access restrictions, it undermines your entire content marketing strategy.
- Legal experts suggest that whilst scraping is generally acceptable when websites permit it, actively circumventing restrictions could expose AI companies to lawsuits.
Cloudflare's Response
Cloudflare took decisive action by de-listing Perplexity as a verified bot programme and implementing new rules to block the stealth activity. The company proved the difference in behaviour by testing other AI systems under identical conditions.
When ChatGPT encountered the same blocks, it respected the robots.txt files and stopped crawling immediately. No follow-up attempts were made using alternative identities or IP addresses, which is exactly how ethical crawlers should behave.
What This Means for UK Businesses
This development is particularly relevant for UK businesses using Cloudflare’s services, which protect millions of websites globally. As we previously covered, Cloudflare recently became the first infrastructure provider to block AI crawlers by default, requiring explicit permission for access. The Perplexity situation demonstrates exactly why such protective measures are necessary.
If you’re concerned about AI crawling:
- Check your robots.txt file to ensure it clearly specifies which bots can access your content
- Review your Cloudflare settings if you use their services. They now offer more granular controls over AI crawler access
- Monitor your analytics for unusual bot traffic that might indicate unauthorised scraping
- Consider the implications for your content strategy and business model
The situation also highlights the importance of working with partners who understand both technical implementation and ethical web standards. As AI companies become more aggressive in their data collection practices, businesses need protection mechanisms that actually work.
Perplexity's Defence
Perplexity has defended its actions by reframing the entire issue. Rather than addressing the stealth tactics, the company argues that Cloudflare is wrongly targeting AI assistants and claims that blocking them would be equivalent to blocking email clients or web browsers.
The company made a particularly pointed rebuttal: “When companies like Cloudflare mischaracterise user-driven AI assistants as malicious bots, they’re arguing that any automated tool serving users should be suspect—a position that would criminalise email clients and web browsers.”
Perplexity also questioned Cloudflare’s technical competence, suggesting their systems cannot adequately distinguish between legitimate AI assistants and actual threats.
This defence sidesteps the core issue of why Perplexity’s systems allegedly switched to stealth tactics when blocked, rather than simply respecting the website owner’s wishes. The distinction between “AI assistants” and traditional crawlers becomes less relevant when the system actively circumvents access restrictions.
The Real Issue
The technical details matter less than the principle at stake. Web standards like robots.txt exist to give website owners control over their content. When companies ignore these directives and use sophisticated evasion tactics, it undermines the collaborative foundation that has made the web successful.
Perplexity’s argument that they’re providing a service to users doesn’t change the fact that website owners have legitimate reasons for restricting access and those wishes should be respected.
Looking Forward
Cloudflare anticipates that bot operators will continue developing new evasion techniques, suggesting an ongoing arms race between content protectors and AI companies seeking training data.
The controversy raises bigger questions about the future of the web. Should AI companies pay for the content they use? How can website owners maintain control over their intellectual property? These questions will likely shape regulations and industry standards in the coming years.
Priority Pixels understands the balance between accessibility and protection, helping you implement technical safeguards whilst maintaining visibility for legitimate search engines and AI tools.