How LLMs Choose Which Websites to Recommend

Alex, friendly4AI Team19 Feb 2026

Last updated: 16 Apr 2026

ChatGPT, Claude, and Gemini each pick sources differently. Learn how training data and real-time retrieval work, why brands disappear, and what actually improves AI Visibility.

In the previous article we defined AI visibility — whether AI assistants mention your site when users ask relevant questions. Now let's look at the mechanics: how does an LLM decide which brands to include in its answer?

The short version: there are two knowledge pathways, each platform uses different sources, and brand strength alone doesn't guarantee you'll show up.

Two knowledge pathways

Every major LLM uses two fundamentally different ways to find information.

1) Training data (parametric knowledge)

This is what the model "learned" during pre-training. Every LLM was trained on a massive dataset — web pages, books, articles, Wikipedia, forums, documentation. Brands mentioned frequently across authoritative sources develop stronger representations in the model's weights.

Some context:

60% of ChatGPT queries are answered purely from training data, without triggering any web search
~22% of training data for major LLMs comes from Wikipedia
Entities mentioned consistently across multiple authoritative sources are more likely to be recalled

If your brand wasn't well-represented in the training data, the model may simply not know you exist. No amount of real-time optimization changes what's already baked into the weights.

2) Real-time retrieval (RAG)

When the model does search the web — because the query requires fresh information or the model is uncertain — it uses Retrieval Augmented Generation (RAG). The query gets converted to a vector, matched against indexed pages, and the top results are injected into the LLM's context as source material.

The catch: each platform retrieves from a different search engine.

How each LLM finds sources

LLM	Search engine	How it works
ChatGPT	Bing	Queries Bing when browsing is enabled. 87% of citations match Bing's top 10 organic results
Claude	Brave Search	Autonomously decides when a search is needed. Cites with URL, title, and snippets
Gemini	Google	Pulls from Google's index but increasingly keeps users inside Google's interface
Perplexity	Own index	Searches on every query against an index of 200+ billion URLs

This creates fragmentation across providers. Only 11% of domains are cited by both ChatGPT and Perplexity. Being visible on one platform doesn't guarantee visibility on another.

Each model has also carved out a different audience:

LLM	Strength	Primary audience
ChatGPT	General-purpose recommendations, broad reach	Consumers, general professionals
Claude	Deep document analysis, strategic reasoning	Enterprise, regulated industries, developers
Gemini	Google ecosystem integration, mobile	Google Workspace users, Android users
Perplexity	Real-time research with citations	Researchers, finance professionals

GPTrends research found only 25% overlap between ChatGPT and Perplexity recommendations — meaning each platform surfaces different brands for the same query. A multi-platform visibility strategy isn't optional anymore.

Why brands disappear from AI answers

McKinsey found that in major categories — credit cards, hotels, electronics, apparel — top brands can be completely absent from AI answers. Why?

Your website is only 5–10% of what AI references. LLMs pull from affiliates, publishers, user-generated content, industry publications, Reddit threads, and reviews. If the broader web doesn't describe your brand in ways that AI can extract and verify, you're invisible — even if your own website is perfect.

The strongest single predictor of whether an LLM cites a brand? Brand search volume — a 0.334 correlation, stronger than backlinks, domain authority, or any traditional SEO signal. Brands that people already search for by name are the ones LLMs mention most. Backlinks, surprisingly, show weak or neutral correlation with AI citations.

Other factors:

Blocking AI crawlers. Some publishers block GPTBot or ClaudeBot in robots.txt. This removes their content from consideration entirely. See our AI Crawlers guide for details on which bots to allow
No entity presence. If your brand doesn't exist on Wikipedia, Wikidata, or multiple authoritative third-party sources, the model has a weaker representation of you. Sites mentioned on 4+ platforms are 2.8x more likely to appear in ChatGPT
Ambiguous or unstructured content. AI can't extract a clear recommendation from vague marketing copy. It needs specific, factual, well-structured information
Content buried behind JavaScript. If key content only renders client-side, AI crawlers may never see it

What actually improves AI Visibility

The Princeton GEO (Generative Engine Optimization) study tested optimization strategies across 10,000 queries and found they can boost AI Visibility by 30–40%. Two categories of action matter.

On your site: content quality and technical foundation

Making your content easy for LLMs to extract, adding structured data, and ensuring AI crawlers can access your pages — these are the foundations. To check how AI-ready your site is, run a free scan and see which parameters need attention. Our existing guides go deeper:

How to Improve Your AI-Readiness Score — quotable content, heading structure, internal linking
Structured Data for AI — Schema.org markup that helps LLMs understand entities (sites with proper schema see 47% higher AI citation rates)
Understanding AI Crawlers — which bots to allow and how to configure robots.txt

One key finding: adding statistics to content increases AI Visibility by 22%, quotations from experts by 37%, and citations to sources by 115% for lower-ranked sites. Concrete evidence makes your content more citable.

Beyond your site: entity presence

Your own site is a small fraction of what LLMs reference. The broader web matters more:

Get mentioned on authoritative third-party sites, industry publications, review platforms
Ensure consistent brand information across the web
If notable enough, create or maintain a Wikipedia/Wikidata entry — it's the #1 source for Google's Knowledge Graph

FAQ

Does blocking AI crawlers in robots.txt hurt visibility?

For most businesses, yes. You lose the real-time retrieval pathway entirely — though your brand might still appear from training data. See our AI Crawlers guide for details on which bots to allow.

Why does the same query give different recommendations on ChatGPT vs. Claude?

Each LLM uses different training data, different retrieval systems, and different ranking logic. ChatGPT relies heavily on Bing for web searches; Claude uses Brave Search; Gemini uses Google's index. Only 11% of domains are cited by both ChatGPT and Perplexity, so variation is the norm, not the exception.

Can I optimize for one specific LLM?

To some extent. Optimizing for Bing helps ChatGPT citations. Getting indexed quickly via IndexNow helps Microsoft Copilot. Publishing comprehensive documentation helps Claude (which favors long-context analysis). But the best strategy is platform-agnostic: clear, structured, factual content with strong entity presence works across all models.

What Is AI Visibility? — what it is, why it matters, how to measure it
What is AI-readiness?
The Evolution of Search: From SEO to GEO
Understanding AI Crawlers
Structured Data for AI: A Practical Guide
How to Improve Your AI-Readiness Score

AI Visibility

LLM

ChatGPT

Claude

Gemini

GEO

How LLMs Choose Which Websites to Recommend