How do I check if my site is blocking AI crawlers?

Open https://yourdomain.com/robots.txt in a browser and look for lines like `User-agent: GPTBot` or `User-agent: ClaudeBot` followed by `Disallow: /`. If you see any of those, that crawler is blocked. The fastest way to get a full picture across every major AI agent is to run a free scan at friendly4.ai — the AI Bot Accessibility section lists each crawler's verdict in one view.

Why is my managed host blocking AI crawlers by default?

Several large hosting platforms — WP Engine, Squarespace, Wix, and others — added default robots.txt rules in 2025 to block AI training crawlers. The intent was to protect customers from uncompensated content scraping. The side effect is that sites which want to appear in AI-generated answers are blocked from doing so unless the owner overrides the default.

Will allowing GPTBot hurt my Google ranking?

No. GPTBot is OpenAI's crawler, not Google's. Allowing or disallowing it has no effect on how Googlebot crawls or ranks your site. The same applies to ClaudeBot, PerplexityBot, and Google-Extended — none of these change your traditional search position.

What is the score cap and why is it 60?

In Score v2.1, when all major AI training crawlers are blocked at your site, the AI-Readiness Score is capped at 60/100 regardless of other factors. A site AI systems cannot crawl will not appear in AI-generated answers, so a high score for that site would be misleading. Your report shows both the capped AI-Readiness number and the 'Potential after fix' number you would achieve once the block is removed.

How long does it take for AI engines to see the change?

Once your robots.txt allows the crawlers, AI bots typically discover the change within hours to a few days. You can verify our parser sees the fix immediately by re-scanning at friendly4.ai — the AI Bot Accessibility verdict updates on the next scan.

Why Your Site Is Invisible to AI (And How to Fix It in 5 Minutes)

Marina, friendly4AI Team19 May 2026

Last updated: 04 Jul 2026

Your hosting provider may be blocking GPTBot, ClaudeBot, and PerplexityBot from your site by default. Here's how to check, and the one-line robots.txt fix.

TL;DR

About 73% of sites have at least one AI crawler blocked, often by a managed host that added the rule by default. If GPTBot or PerplexityBot cannot reach your pages, no schema or content fix gets you cited. Open your robots.txt, allow the crawlers, and re-scan to confirm the block is gone.

You can have the clearest writing in your industry, perfect Schema.org markup, and a site that loads in under a second, and still get zero citations from ChatGPT, Claude, or Perplexity.

Why? Because before any of that matters, AI engines have to be allowed in. And right now, most sites are not letting them in. Most owners do not know.

The finding that triggered this

According to Otterly's AI Citations Report 2026, about 73% of sites have at least one AI crawler blocked at the robots.txt or CDN layer. Roughly three out of four sites we scan. We see the same shape in our own data: AI Bot Accessibility blocks — including network-level blocks a WAF or CDN applies before robots.txt is even read — keep turning up across every industry we look at, on sites whose owners never set the rule themselves.

Otterly tracks who cites you. friendly4AI tells you why the crawler is not reaching you in the first place, the layer underneath the citation. Both numbers matter; you cannot move the first without first fixing the second.

That number caught me off guard. For most of our scoring history we treated "is your robots.txt blocking AI?" as an edge case. It is not. It is the default failure mode. If you are an SEO manager explaining to a stakeholder why AI search matters this quarter, this is also the number you bring into the meeting — three out of four sites in your category are doing this wrong by default.

That is why Score v2.1 now ships a dedicated AI Bot Accessibility section in every report. It checks each major AI crawler against your robots.txt and tells you who can read your site, and who can't.

A site AI cannot read is a zero-citation site

This is the part worth saying plainly. There is no SEO trick, no schema upgrade, no content rewrite that fixes an AI crawler block. If GPTBot is disallowed, OpenAI's training corpus does not see your page. If PerplexityBot is disallowed, Perplexity has nothing to cite. There is no second chance further down the funnel.

It is the single highest-leverage technical fix for AI Visibility, and it costs almost nothing. A one-line change in a file most people already have.

The managed-host trap

Here is where it gets uncomfortable. The block is usually not something you did.

Throughout 2025 and into 2026, several large managed hosting platforms (WP Engine, Squarespace, Wix, and others) added default robots.txt rules that disallow GPTBot, ClaudeBot, and similar AI training agents. The intent was reasonable: protect customers from uncompensated scraping. The effect, especially for owners who never opened their robots.txt, is that they are silently absent from AI answers — their site broadcasts an AI opt-out signal they never chose to send.

I see this pattern in our inbox almost every week. A small business owner re-scans, sees a Critical verdict, and writes in asking what they did wrong. Answer: nothing. Their host did. They are now in the position of having to override a default they did not know existed, on a platform they pay for to avoid exactly that kind of decision.

When our scanner detects you are on one of these platforms, the report surfaces a "Why is this happening?" callout naming your host and linking to the platform's override instructions. The block is fixable from your side. Your host's default does not have the final word on your robots.txt.

How to check in 30 seconds

Open this URL in your browser:

https://yourdomain.com/robots.txt

Scan it for any of these user-agent strings followed by Disallow: /. If you see a match, that crawler cannot read your site, and you lose what is in the right-most column. This robots.txt read is the same thing the robots.txt accessibility parameter does automatically, checking each AI agent's verdict for you instead of by hand.

User-agent token	Company	Type	What you lose if blocked
`GPTBot`	OpenAI	Training	Your pages are excluded from future GPT training data; long-term ChatGPT recall about your brand erodes.
`OAI-SearchBot`	OpenAI	Real-time search	ChatGPT cannot cite your page when it searches the web mid-answer.
`ClaudeBot` / `anthropic-ai`	Anthropic	Training	Your content is excluded from Claude's training corpus.
`PerplexityBot`	Perplexity	Index / search	Perplexity has nothing to cite when answering questions in your topic area.
`Google-Extended`	Google	AI training opt-out	Your content is excluded from Gemini training and grounding. (Separate from Googlebot; does not affect Google Search ranking.)
`CCBot`	Common Crawl	Training feed	Many AI training pipelines start from Common Crawl. A block here propagates to multiple downstream models.
`Bytespider`	ByteDance	Training	You are absent from ByteDance AI products including TikTok's Doubao.

Table: Major AI crawlers, what they do, and what a block costs you (2026).

You want both training and real-time crawlers allowed. The first feeds the model itself; the second fetches you when someone asks a question right now.

Easier path: run a free scan at friendly4.ai. The AI Bot Accessibility section gives you each crawler's verdict in one view, plus the score impact and the copy-paste fix.

The 5-minute fix

If you are on Squarespace, Wix, or WP Engine, skip the snippet below. Those hosts ship a managed robots.txt you cannot replace by upload. A direct file edit will not stick. Instead, open your friendly4AI report, find the "Why is this happening?" callout for your host, and follow the platform-specific override instructions. Then come back here for the verification step.

For self-hosted sites — your own WordPress, Next.js, static site, custom stack — add this block to your robots.txt and replace any existing AI-crawler rules:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

That covers training crawlers and real-time fetchers in one pass. For a deeper breakdown of which bot does what, see Understanding AI Crawlers.

Re-scan to verify the fix

Once your robots.txt is updated, re-scan your site. In your friendly4AI report, the AI Bot Accessibility section should update from "Critical" or "Mixed" to "All allowed."

If your previous report showed a capped score (two numbers, for example "AI-Readiness: 55 / Potential after fix: 79"), the cap lifts on the next scan once the block is resolved. The "Potential after fix" number becomes your headline score.

We argued about the cap value internally for a week. Sixty was the number that felt honest — clearly below the "good" threshold, not so low it obscures the rest of the work. When AI cannot reach your site, a score of 80 is misleading, no matter how good the rest of your setup looks. The dual-number view shows you both where you are today and where you will be once the crawler block is gone.

One quick note on llms.txt

If you have been tracking GEO news, you may have heard the same week that llms.txt does not move the needle. Google publicly confirmed in 2025 that llms.txt is not used in Search or AI Overviews ranking. Independent studies of 94,000+ URLs show no measurable citation effect. We have re-weighted it accordingly in Score v2.1.

We still detect llms.txt on your site and surface it in an Experimental signals section for completeness. Its weight in AI-Readiness score: 0 points. We did not delete the check; we just stopped recommending it as a priority action.

The reason these two changes ship together is simple. We removed a low-impact recommendation and added a high-impact one, in the same release, so the "Critical" findings band in your report becomes more honest about which actions actually move AI Visibility.

What to do next

Scan your site at friendly4.ai if you have not in the last week, and look at the AI Bot Accessibility section. If you see any Critical or Mixed verdict, copy the robots.txt snippet from your report (it lists only the agents you currently block) and apply it. Re-scan to confirm the fix landed, then move on.

Stuck. If our scanner says you are blocked and you cannot apply the fix yourself, reply to your report email or write ai@friendly4.ai. We are tagging these so we know which hosts are causing the most pain.

The free scan tells you who can read your site. Tracking who actually cites you across ChatGPT, Claude, Gemini, Grok, and Perplexity is on Starter.

Once AI can reach your site, the rest of your AI-readiness work (structured data, semantic HTML, internal linking) starts compounding. Before that, the score is mostly diagnostic.

If you are in the 27% whose AI Bot Accessibility is already clean, you are ahead of most of the web. The next fixes are in How to Improve Your AI-Readiness Score.

Keep reading

Understanding AI Crawlers — training bots, search bots, user fetchers
How LLMs Choose Which Websites to Recommend — what gets cited and why
What Is AI Visibility? — the outcome the AI Bot Accessibility fix protects
How to Improve Your AI-Readiness Score — the full prioritized checklist

Sources

Otterly, AI Citations Report 2026 (May 2026) — 73% of sites have at least one AI crawler blocked at the robots.txt or CDN layer
Google / Gary Illyes on llms.txt (July 2025) — public statement that llms.txt is not used in Search or AI Overviews ranking
ALM Corp robots.txt strategy analysis (2025) — independent analysis of 94,000+ URLs finds no measurable citation lift from llms.txt
OpenAI GPTBot documentation — official user-agent and IP-range reference
Anthropic crawler documentation — official ClaudeBot reference

AI Bot Accessibility

robots.txt

GPTBot

ClaudeBot

AI Visibility

Score v2.1

GEO

Why Your Site Is Invisible to AI (And How to Fix It in 5 Minutes)

TL;DR

The finding that triggered this

A site AI cannot read is a zero-citation site

The managed-host trap

How to check in 30 seconds

The 5-minute fix

Re-scan to verify the fix

One quick note on llms.txt

What to do next

Keep reading

Sources

Recent Posts

Why Your Site Is Invisible to AI (And How to Fix It in 5 Minutes)

TL;DR

The finding that triggered this

A site AI cannot read is a zero-citation site

The managed-host trap

How to check in 30 seconds

The 5-minute fix

Re-scan to verify the fix

One quick note on llms.txt

What to do next

Keep reading

Sources

Recent Posts