- Methodology
- Parameters
- Paywall and Login Detection
Paywall and Login Detection
stableCategory: crawlability · Methodology v4.5
Is your content freely reachable, or does it sit behind an access barrier that AI crawlers cannot pass?
Signal Source
- Source
https://{domain}- Kind
- html_dom
Score Bands
| Verdict | Condition |
|---|---|
| Pass | no strong access barrier and fewer than two weak signals — the page reads as openly accessible (no login form, no CAPTCHA, no paywall phrasing) |
| Partial | exactly one strong signal, OR two or more weak signals — access looks partially or ambiguously gated (e.g. a single 'subscribe to continue' marker, or a sign-in entry combined with a registration wall) |
| Fail | two or more strong signals — a hard access barrier such as a login form plus a paywall marker, a CAPTCHA plus 'members-only', or two distinct paywall phrases |
Description
What this parameter measures
Is your content freely reachable, or does it sit behind an access barrier that AI crawlers cannot pass? friendly4AI scans the page HTML for strong signals: a login form containing a password field, a CAPTCHA widget (reCAPTCHA, hCaptcha, Cloudflare Turnstile, or a "verify you are human" challenge), and hard paywall phrases such as paywall, subscribe to continue, premium content, members-only, sign in to continue, log in to continue, start your subscription, and already a subscriber. It also tracks weak signals such as member access and an authentication entry point combined with registration-wall language. A higher score means the page is more open, not more restricted.
Why it matters for AI-readiness
AI crawlers cannot authenticate or solve challenges. Anthropic's system card states plainly that the Claude crawler cannot access password-protected pages, sign-in pages, or CAPTCHA-protected content, and the same constraint applies to ChatGPT, Perplexity, and Gemini crawlers. If your primary content sits behind a login wall, a hard paywall, or a CAPTCHA, no AI system will read it — your most important pages become invisible to every AI engine even if everything else on the page is well-structured.
How we score it
Under the v4.4 methodology, this Crawlability parameter is scored in three tiers by counting access signals. A login form, a CAPTCHA, and each matched strong paywall phrase each add one strong signal. Two or more strong signals score fail (0). Exactly one strong signal, or two or more weak signals, scores partial (50). No strong signal and fewer than two weak signals scores pass (100). Cookie-consent markers (for example "cookie policy" or "accept all cookies") suppress one weak count, because a cookie banner alone does not block AI crawlers and should not be mistaken for a content wall.
How to fix common issues
- Keep your primary content publicly readable. Your value proposition, key facts, and pricing should render without login or payment.
- If you run a paywall, adopt a metered model so the first view is open, or expose key landing pages above the wall.
- Remove CAPTCHA gates from public content pages; reserve challenges for suspicious traffic, not every visitor.
- Avoid hard-gating phrases ("subscribe to continue", "members-only", "sign in to continue") on pages you want AI systems to read.
- Cookie consent banners alone are fine — they do not block AI crawlers and the scan discounts them; re-scan after changes to confirm the barriers cleared.
Version History
- Introduced
- v4.2
- Last changed
- v4.4
Key takeaways
- Signal: https://{domain}
- Category: Crawlability & Access
- Passes when: no strong access barrier and fewer than two weak signals — the page reads as …