ChatGPT, Claude, and Gemini each pick sources differently. Learn how training data and real-time retrieval work, why brands disappear, and what actually improves AI Visibility.
In the previous article we defined AI visibility — whether AI assistants mention your site when users ask relevant questions. Now let's look at the mechanics: how does an LLM decide which brands to include in its answer?
The short version: there are two knowledge pathways, each platform uses different sources, and brand strength alone doesn't guarantee you'll show up.
Every major LLM uses two fundamentally different ways to find information.
This is what the model "learned" during pre-training. Every LLM was trained on a massive dataset — web pages, books, articles, Wikipedia, forums, documentation. Brands mentioned frequently across authoritative sources develop stronger representations in the model's weights.
Some context:
If your brand wasn't well-represented in the training data, the model may simply not know you exist. No amount of real-time optimization changes what's already baked into the weights.
When the model does search the web — because the query requires fresh information or the model is uncertain — it uses Retrieval Augmented Generation (RAG). The query gets converted to a vector, matched against indexed pages, and the top results are injected into the LLM's context as source material.
The catch: each platform retrieves from a different search engine.
| LLM | Search engine | How it works |
|---|---|---|
| ChatGPT | Bing | Queries Bing when browsing is enabled. 87% of citations match Bing's top 10 organic results |
| Claude | Brave Search | Autonomously decides when a search is needed. Cites with URL, title, and snippets |
| Gemini | Pulls from Google's index but increasingly keeps users inside Google's interface | |
| Perplexity | Own index | Searches on every query against an index of 200+ billion URLs |
This creates a fragmented landscape. Only 11% of domains are cited by both ChatGPT and Perplexity. Being visible on one platform doesn't guarantee visibility on another.
Each model has also carved out a different audience:
| LLM | Strength | Primary audience |
|---|---|---|
| ChatGPT | General-purpose recommendations, broad reach | Consumers, general professionals |
| Claude | Deep document analysis, strategic reasoning | Enterprise, regulated industries, developers |
| Gemini | Google ecosystem integration, mobile | Google Workspace users, Android users |
| Perplexity | Real-time research with citations | Researchers, finance professionals |
GPTrends research found only 25% overlap between ChatGPT and Perplexity recommendations — meaning each platform surfaces different brands for the same query. A multi-platform visibility strategy isn't optional anymore.
McKinsey found that in major categories — credit cards, hotels, electronics, apparel — top brands can be completely absent from AI answers. Why?
Your website is only 5–10% of what AI references. LLMs pull from affiliates, publishers, user-generated content, industry publications, Reddit threads, and reviews. If the broader web doesn't describe your brand in ways that AI can extract and verify, you're invisible — even if your own website is perfect.
The strongest single predictor of whether an LLM cites a brand? Brand search volume — a 0.334 correlation, stronger than backlinks, domain authority, or any traditional SEO signal. Brands that people already search for by name are the ones LLMs mention most. Backlinks, surprisingly, show weak or neutral correlation with AI citations.
Other factors:
The Princeton GEO (Generative Engine Optimization) study tested optimization strategies across 10,000 queries and found they can boost AI Visibility by 30–40%. Two categories of action matter.
Making your content easy for LLMs to extract, adding structured data, and ensuring AI crawlers can access your pages — these are covered in detail in our existing guides:
One finding worth highlighting: adding statistics to content increases AI Visibility by 22%, quotations from experts by 37%, and citations to sources by 115% for lower-ranked sites. Concrete evidence makes your content more citable.
Your own site is a small fraction of what LLMs reference. The broader web matters more:
For most businesses, yes. You lose the real-time retrieval pathway entirely — though your brand might still appear from training data. See our AI Crawlers guide for details on which bots to allow.
Each LLM uses different training data, different retrieval systems, and different ranking logic. ChatGPT relies heavily on Bing for web searches; Claude uses Brave Search; Gemini uses Google's index. Only 11% of domains are cited by both ChatGPT and Perplexity, so variation is the norm, not the exception.
To some extent. Optimizing for Bing helps ChatGPT citations. Getting indexed quickly via IndexNow helps Microsoft Copilot. Publishing comprehensive documentation helps Claude (which favors long-context analysis). But the best strategy is platform-agnostic: clear, structured, factual content with strong entity presence works across all models.


