A practical, prioritized checklist to improve AI-readiness: crawlability, content clarity, structured data, and internal linking.
So you've scanned your site and the score isn't what you hoped for. We get it. We've been there with our own site too.
The good news: most issues we see are fixable in a few hours, not weeks. After analyzing thousands of sites, we've noticed the same problems come up again and again. This guide covers what actually moves the needle.
If you're short on time, here's our recommended order:
Not sure where you stand? Scan your site first.
Think of your score (0–100) as a rough proxy for how easily AI can work with your site. It's not magic. It checks whether AI systems can:
We've seen sites go from 40 to 75+ after a weekend of fixes. The payoff? When someone asks ChatGPT or Perplexity about your product category, you're more likely to show up—and show up accurately, not as some hallucinated version of yourself.
Want to check if AI assistants already mention you? Learn about AI Visibility—a complementary metric that measures whether LLMs actually recommend your site. To understand the mechanics behind AI recommendations, see How LLMs Choose Which Websites to Recommend.
This sounds obvious, but we see it constantly: sites block AI crawlers without realizing it, then wonder why they're invisible to AI search.
Before touching anything else, check these:
200, not redirect chains or soft 404srobots.txt to hide things; it's a suggestion, not a lockDifferent AI systems use different crawler names, and if you've copied some generic robots.txt from 2015, you might be blocking half of them.
A simple "allow everything public" setup looks like this:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
Want to know what each of these bots actually does? We wrote a whole piece on it: Understanding AI Crawlers.
AI doesn't read your page top-to-bottom like a human might. It's scanning for fragments it can use—definitions, answers, facts.
We call this "quotability." Can an AI pull a clean 2-sentence answer from your page? If your homepage is all vibes and no substance, the answer is no.
For pages that matter (home, product, pricing, docs):
AI systems are getting better at evaluating source credibility. We've noticed pages with clear authorship and dates tend to get cited more often than anonymous content.
You might think Schema.org is just for Google rich snippets. It's not—AI systems use it too, especially to understand:
Our advice: start with Organization and WebSite on every page. Add Article to blog posts. Don't overthink it—basic structured data beats none.
We cover this in detail here: Structured Data for AI: A Practical Guide.
Orphan pages are invisible pages. If nothing links to your best content, AI has no way to find it or understand that it matters.
This is how we think about it:
Here's an example from this blog:
We learned this the hard way: you can't improve what you don't measure.
After making changes:
Don't expect perfection on the first pass. We've iterated on our own site dozens of times.
After reviewing thousands of scans, these come up constantly:
Using robots.txt as a security tool It's not. Bots can ignore it. If something should be private, put it behind auth.
Landing pages with zero substance "We help businesses succeed" tells AI nothing. Add specifics—what exactly do you do, for whom, with what constraints.
Orphan pages nobody can find If your best content isn't linked from anywhere, it might as well not exist. Audit your internal links.
You don't need to fix everything at once. Pick one thing from this list, ship it today, and scan again tomorrow.
We've watched sites jump 10-15 points from a single afternoon of work. The trick is starting.
Fix, measure, repeat. That's the whole game.



