robots.txt Accessibility

stable

Category: crawlability · Methodology v4.4

Signal Source

Source
https://{domain}/robots.txt
Kind
http_response

Score Bands

Passrobots.txt is present with no blanket Disallow: / under User-agent: *, OR robots.txt is absent and no blocking meta robots / X-Robots-Tag directive is present
Partialrobots.txt is absent and the page's meta robots / X-Robots-Tag carries only weak negatives (e.g. noarchive) with no hard crawl/index block
Faila blanket Disallow: / under User-agent: * in robots.txt, OR an explicit noindex / none / nofollow via meta robots or X-Robots-Tag

Description

What this parameter measures

This parameter checks whether the major AI crawlers are allowed to fetch your site. friendly4AI reads your robots.txt and evaluates the rules that apply to the wildcard (User-agent: *) and to known AI crawler tokens such as GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and Google-Extended. If robots.txt is absent, it falls back to your meta robots and X-Robots-Tag directives.

Why it matters for AI-readiness

AI systems honor robots.txt before fetching your pages. If GPTBot is blocked, ChatGPT cannot browse or cite your site; blocking PerplexityBot removes you from Perplexity's answers; disallowing Google-Extended opts your content out of Google's Gemini and Vertex AI generative features. A blanket block is the single most damaging misconfiguration for AI visibility.

How we score it

Under the v4.4 methodology (v2.2 scoring), this Crawlability parameter is scored in two tiers. When robots.txt is present, it passes unless a blanket Disallow: / under User-agent: * blocks crawling, which fails it — there is no partial in this tier. When robots.txt is absent, the scan falls back to your meta robots and X-Robots-Tag directives: noindex, none, or nofollow fails, only a weak negative such as noarchive is a partial, and anything else passes. An unreachable robots.txt is not a fail — it routes to the fallback above.

How to fix common issues

  • Do not add Disallow: / under User-agent: * (or under a specific AI crawler token like GPTBot, ClaudeBot, PerplexityBot, or Google-Extended) unless you mean to block it.
  • Validate your file's syntax against Google Search Central's robots.txt guide; a stray rule can block more than you expect.
  • Confirm the exact crawler tokens each engine uses — see OpenAI's GPTBot documentation — then allow the ones you want to reach you.
  • If you rely on the meta robots fallback, keep noindex, none, and nofollow off pages you want surfaced.
  • Re-scan after any change to confirm crawlers are still allowed.

Version History

Introduced
v4.0
Last changed
v4.2