Your hosting provider may be blocking GPTBot, ClaudeBot, and PerplexityBot from your site by default. Here's how to check, and the one-line robots.txt fix.
You can have the clearest writing in your industry, perfect Schema.org markup, and a site that loads in under a second, and still get zero citations from ChatGPT, Claude, or Perplexity.
Why? Because before any of that matters, AI engines have to be allowed in. And right now, most sites are not letting them in. Most owners do not know.
According to Otterly's AI Citations Report 2026, about 73% of sites have at least one AI crawler blocked at the robots.txt or CDN layer. Roughly three out of four sites we scan. We see the same shape in our own data: AI Bot Accessibility blocks keep turning up across every industry we look at, on sites whose owners never set the rule themselves.
Otterly tracks who cites you. friendly4AI tells you why the crawler is not reaching you in the first place — the layer underneath the citation. Both numbers matter; you cannot move the first without first fixing the second.
That number caught me off guard. For most of our scoring history we treated "is your robots.txt blocking AI?" as an edge case. It is not. It is the default failure mode. If you are an SEO manager explaining to a stakeholder why AI search matters this quarter, this is also the number you bring into the meeting — three out of four sites in your category are doing this wrong by default.
That is why Score v2.1 now ships a dedicated AI Bot Accessibility section in every report. It checks each major AI crawler against your robots.txt and tells you who can read your site, and who can't.
This is the part worth saying plainly. There is no SEO trick, no schema upgrade, no content rewrite that fixes an AI crawler block. If GPTBot is disallowed, OpenAI's training corpus does not see your page. If PerplexityBot is disallowed, Perplexity has nothing to cite. There is no second chance further down the funnel.
It is the single highest-leverage technical fix for AI Visibility, and it costs almost nothing. A one-line change in a file most people already have.
Here is where it gets uncomfortable. The block is usually not something you did.
Throughout 2025 and into 2026, several large managed hosting platforms (WP Engine, Squarespace, Wix, and others) added default robots.txt rules that disallow GPTBot, ClaudeBot, and similar AI training agents. The intent was reasonable: protect customers from uncompensated scraping. The effect, especially for owners who never opened their robots.txt, is that they are silently absent from AI answers.
I see this pattern in our inbox almost every week. A small business owner re-scans, sees a Critical verdict, and writes in asking what they did wrong. Answer: nothing. Their host did. They are now in the position of having to override a default they did not know existed, on a platform they pay for to avoid exactly that kind of decision.
When our scanner detects you are on one of these platforms, the report surfaces a "Why is this happening?" callout naming your host and linking to the platform's override instructions. The block is fixable from your side. Your host's default does not have the final word on your robots.txt.
Open this URL in your browser:
https://yourdomain.com/robots.txt
Scan it for any of these user-agent strings followed by Disallow: /. If you see a match, that crawler cannot read your site, and you lose what is in the right-most column.
| User-agent token | Company | Type | What you lose if blocked |
|---|---|---|---|
GPTBot | OpenAI | Training | Your pages are excluded from future GPT training data; long-term ChatGPT recall about your brand erodes. |
OAI-SearchBot | OpenAI | Real-time search | ChatGPT cannot cite your page when it searches the web mid-answer. |
ClaudeBot / anthropic-ai | Anthropic | Training | Your content is excluded from Claude's training corpus. |
PerplexityBot | Perplexity | Index / search | Perplexity has nothing to cite when answering questions in your topic area. |
Google-Extended | AI training opt-out | Your content is excluded from Gemini training and grounding. (Separate from Googlebot; does not affect Google Search ranking.) | |
CCBot | Common Crawl | Training feed | Many AI training pipelines start from Common Crawl. A block here propagates to multiple downstream models. |
Bytespider | ByteDance | Training | You are absent from ByteDance AI products including TikTok's Doubao. |
Table: Major AI crawlers, what they do, and what a block costs you (2026).
You want both training and real-time crawlers allowed. The first feeds the model itself; the second fetches you when someone asks a question right now.
Easier path: run a free scan at friendly4.ai. The AI Bot Accessibility section gives you each crawler's verdict in one view, plus the score impact and the copy-paste fix.
If you are on Squarespace, Wix, or WP Engine, skip the snippet below. Those hosts ship a managed robots.txt you cannot replace by upload. A direct file edit will not stick. Instead, open your friendly4AI report, find the "Why is this happening?" callout for your host, and follow the platform-specific override instructions. Then come back here for the verification step.
For self-hosted sites — your own WordPress, Next.js, static site, custom stack — add this block to your robots.txt and replace any existing AI-crawler rules:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /
That covers training crawlers and real-time fetchers in one pass. For a deeper breakdown of which bot does what, see Understanding AI Crawlers.
Once your robots.txt is updated, re-scan your site. In your friendly4AI report, the AI Bot Accessibility section should update from "Critical" or "Mixed" to "All allowed."
If your previous report showed a capped score (two numbers, for example "AI-Readiness: 55 / Potential after fix: 79"), the cap lifts on the next scan once the block is resolved. The "Potential after fix" number becomes your headline score.
We argued about the cap value internally for a week. Sixty was the number that felt honest — clearly below the "good" threshold, not so low it obscures the rest of the work. When AI cannot reach your site, a score of 80 is misleading, no matter how good the rest of your setup looks. The dual-number view shows you both where you are today and where you will be once the crawler block is gone.
If you have been tracking GEO news, you may have heard the same week that llms.txt does not move the needle. Google publicly confirmed in 2025 that llms.txt is not used in Search or AI Overviews ranking. Independent studies of 94,000+ URLs show no measurable citation effect. We have re-weighted it accordingly in Score v2.1.
We still detect llms.txt on your site and surface it in an Experimental signals section for completeness. Its weight in AI-Readiness score: 0 points. We did not delete the check; we just stopped recommending it as a priority action.
The reason these two changes ship together is simple. We removed a low-impact recommendation and added a high-impact one, in the same release, so the "Critical" findings band in your report becomes more honest about which actions actually move AI Visibility.
Scan your site at friendly4.ai if you have not in the last week, and look at the AI Bot Accessibility section. If you see any Critical or Mixed verdict, copy the robots.txt snippet from your report (it lists only the agents you currently block) and apply it. Re-scan to confirm the fix landed, then move on.
Stuck. If our scanner says you are blocked and you cannot apply the fix yourself, reply to your report email or write ai@friendly4.ai. We are tagging these so we know which hosts are causing the most pain.
The free scan tells you who can read your site. Tracking who actually cites you across ChatGPT, Claude, Gemini, Grok, and Perplexity is on Starter.
Once AI can reach your site, the rest of your AI-readiness work (structured data, semantic HTML, internal linking) starts compounding. Before that, the score is mostly diagnostic.
If you are in the 27% whose AI Bot Accessibility is already clean, you are ahead of most of the web. The next fixes are in How to Improve Your AI-Readiness Score.
llms.txt (July 2025) — public statement that llms.txt is not used in Search or AI Overviews rankingllms.txt

