- Methodology
- Parameters
- Search-Bot Network Reachability
Search-Bot Network Reachability
stableCategory: crawlability · Methodology v4.5
A robots.
Signal Source
- Source
https://{domain}- Kind
- http_headers
Score Bands
| Verdict | Condition |
|---|---|
| Pass | all declared search-bot user-agents (OAI-SearchBot, Claude-SearchBot, PerplexityBot) are reachable — no 403 or network block detected at the WAF/CDN layer |
| Partial | at least one search bot is confirmed network-blocked while at least one is reachable, or reachability is mixed or uncertain for one or more bots |
| Fail | all probed search-bot user-agents are confirmed network-blocked — every engine is invisible at the network layer |
Description
What this parameter measures
A robots.txt Allow rule only works if the bot's HTTP request actually reaches your origin. This parameter checks the layer beneath robots.txt — the WAF and CDN — to determine whether declared AI search-bot user-agents are reachable at the network level. friendly4AI probes three principal search-bot identities: OAI-SearchBot (OpenAI), Claude-SearchBot (Anthropic), and PerplexityBot (Perplexity). Each probe is classified as REACHABLE (HTTP 200/30x with normal content), BLOCKED (HTTP 403 or a WAF/CDN challenge page), or INCONCLUSIVE (timeout or ambiguous response). Bot IP ranges are cross-referenced against each engine's published JSON (OpenAI: openai.com/searchbot.json, Anthropic: claude.com/crawling/bots.json, Perplexity: perplexity.ai/perplexitybot.json). When all probes are inconclusive or time out, the result degrades to an advisory UNKNOWN — excluded from the score denominator so you are not penalised for network ambiguity. A confirmed network block on any search-bot tier can also cap your composite score in the same way a robots.txt Disallow would for tier-A bots.
Why it matters for AI-readiness
WAF and CDN bot-management products often block generic bot user-agents or unfamiliar IP ranges by default. If OAI-SearchBot, Claude-SearchBot, or PerplexityBot are blocked at the network layer, those engines cannot fetch your pages regardless of how permissive your robots.txt is. This is a silent failure: your robots.txt says "allowed," but the crawler sees a 403 and stops. The content then never enters the engine's index and is not cited in AI-generated answers on ChatGPT, Claude.ai, or Perplexity. This parameter makes that failure visible and scoreable, complementing robots-txt-accessibility (which only checks the robots.txt declaration, not network reachability).
How we score it
Under the v4.5 methodology, this Crawlability parameter scores on a three-tier gradient based on confirmed probe outcomes. It passes (100) when all probed search-bot UAs are REACHABLE — no 403 or cloaking signals detected at the WAF/CDN layer. It scores partial (50) when at least one bot is confirmed BLOCKED but at least one is REACHABLE, or when reachability is mixed or uncertain across bots. It fails (0) when every probed bot UA is confirmed BLOCKED — all AI search engines are network-invisible. When all probes are INCONCLUSIVE or time out, the result is UNKNOWN (advisory) — excluded from the score denominator, not treated as a scored 0. A confirmed block also activates a score cap: a search bot that is network-blocked is treated equivalently to a Disallow: / in a tier-A robots.txt for the purpose of the composite cap calculation.
How to fix common issues
- Open your WAF or CDN bot-management console (Cloudflare Bot Management, AWS WAF Managed Rules, Akamai Bot Manager, or equivalent) and create explicit allow-list rules for the OAI-SearchBot, Claude-SearchBot, and PerplexityBot user-agent strings.
- Add the official IP ranges from each engine's published JSON to your allowlist to prevent IP-level blocking from interfering with requests that carry the correct UA.
- If you use Cloudflare's "Block bots" or "Managed Challenge" rules, verify that AI search crawlers are excluded from the managed-challenge scope — challenges that return a JavaScript interstitial register as BLOCKED to the scanner.
- Test each bot UA manually:
curl -A "OAI-SearchBot/1.0" https://yourdomain.comshould return a 200 with your normal page content, not a Cloudflare or WAF error page. - Re-scan after updating WAF rules to confirm all three bots are now classified as REACHABLE.
Version History
- Introduced
- v4.5
- Last changed
- v4.5
Key takeaways
- Signal: https://{domain}
- Category: Crawlability & Access
- Passes when: all declared search-bot user-agents (OAI-SearchBot, Claude-SearchBot, Perplex…