Beyond traditional search engine bots like Googlebot, a new generation of crawlers is visiting your website. These bots support AI products in three main ways: training, search/indexing, and user-triggered retrieval.
If your goal is discoverability, your baseline should be:
If you want a practical improvement plan, start here: How to Improve Your AI-Readiness Score.
OpenAI publishes separate tokens for different use cases.
GPTBot
Training data collection.OAI-SearchBot
Search/indexing for OpenAI search experiences.ChatGPT-User
User-triggered fetches (when a user asks ChatGPT to open a page).Anthropic also separates training vs. user-triggered vs. search.
ClaudeBot
Training data collection.Claude-User
User-triggered fetches.Claude-SearchBot
Crawling to improve Claude search results.Powers Perplexity AI's search and answer engine.
PerplexityBotPerplexity also uses:
Perplexity-User
User-triggered fetches. According to Perplexity, this fetcher generally ignores robots.txt rules.Google-Extended is a token you can use to control whether content may be used for training and grounding for Gemini products. It does not impact inclusion in Google Search and is not used as a ranking signal in Google Search.
Google-Extended| Aspect | Search Bots | AI Crawlers |
|---|---|---|
| Goal | Index for search results | Understand for conversations |
| Frequency | Regular intervals | Less predictable |
| Depth | All pages | Focus on quality content |
| Usage | Display in SERPs | Generate responses |
Reality check: many “AI crawlers” still behave like classic bots (HTTP requests, HTML parsing). The difference is how the content is reused downstream.
AI crawlers prioritize:
They also evaluate:
To maximize AI visibility:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: Google-Extended
Allow: /
If you only want to allow user-triggered fetchers, you can allow those tokens specifically (where supported) and still disallow training bots.
If you prefer to opt out:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
Allow public content, block sensitive areas:
User-agent: GPTBot
Allow: /blog/
Allow: /products/
Disallow: /admin/
Disallow: /private/
Important nuance:
robots.txt controls crawling.noindex or access controls.Look at the User-Agent header and match tokens like GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot, and Google-Extended.
Not necessarily. AI assistants can still discover and cite your pages via search indexes or user-triggered fetches. But blocking may reduce the breadth of how your content is used.
As AI systems become more sophisticated, expect:
Stay ahead by making your website AI-friendly today.