- Answer-First Compliance - Answer Structure - Section Length Distribution

Chunk Extractability — Methodology v4.5

Q: What does Chunk Extractability check?

Each H2 block runs through three checks: - No leading pronoun — the first sentence does not start with an anaphoric pronoun. - No anaphora in the first 20 words — none of it, this, that, they, these, those, them, their, its show up in the opening 20 words. - Noun-phrase heading — the heading is a real noun phrase, not a generic placeholder like Overview, Details, or Introduction. Pass at least 2 of the 3 and the block is self-contained. The page score is the percentage of H2 blocks that qualify.

Q: How do you fix Chunk Extractability issues?

- Open each H2 with the explicit entity name instead of a pronoun that points at an earlier section. - Keep it, this, they, and similar pronouns out of the first 20 words of every block. - Swap generic headings like "Overview" or "Details" for descriptive noun phrases that name the subject. - Write each section so it reads on its own, as if it were the only thing in front of the reader. - Re-scan, then read the selfContainedCount, pronounDriftBlocks, and vagueHeadings evidence fields to pinpoint the blocks still failing.

Score Bands

Verdict	Condition
Pass	80% or more of H2 blocks pass at least 2 of the 3 self-containment checks (no leading pronoun, no anaphora in the first 20 words, noun-phrase heading) — score 80-100
Partial	40-79% of H2 blocks pass at least 2 of 3 self-containment checks; some lean on pronouns from earlier sections or use vague headings
Fail	Fewer than 40% of H2 blocks pass; anaphoric pronouns and vague headings dominate, or no H2 blocks are found (score 0)

Verdict

Condition

Pass

80% or more of H2 blocks pass at least 2 of the 3 self-containment checks (no leading pronoun, no anaphora in the first 20 words, noun-phrase heading) — score 80-100

Partial

40-79% of H2 blocks pass at least 2 of 3 self-containment checks; some lean on pronouns from earlier sections or use vague headings

Fail

Fewer than 40% of H2 blocks pass; anaphoric pronouns and vague headings dominate, or no H2 blocks are found (score 0)

Description

Chunk Extractability measures whether each H2 section of a page can stand on its own as a quotable chunk after an AI retriever lifts it out of page order. friendly4AI scores every H2 block against three deterministic self-containment checks. A block counts as self-contained when at least 2 of the 3 pass, and the score is the percentage of blocks that clear that bar (0-100).

What does Chunk Extractability check?

Each H2 block runs through three checks:

No leading pronoun — the first sentence does not start with an anaphoric pronoun.
No anaphora in the first 20 words — none of it, this, that, they, these, those, them, their, its show up in the opening 20 words.
Noun-phrase heading — the heading is a real noun phrase, not a generic placeholder like Overview, Details, or Introduction.

Pass at least 2 of the 3 and the block is self-contained. The page score is the percentage of H2 blocks that qualify.

Why does Chunk Extractability matter for AI-readiness?

LLM retrieval pulls one section at a time, out of page order. A section that opens with "It does this…" or "They also offer…" falls apart the moment it leaves the text it pointed back to, because the model has lost the antecedent. A heading like "Overview" gives the retriever nothing to match against a query. Name the subject explicitly in each H2, give it a descriptive heading, and the chunk reads cleanly on its own. That is what makes it citable in an AI answer.

How is Chunk Extractability scored?

The v4.5 Content Structure methodology scores this parameter as a gradient rather than pass/fail. The processor prefers an article snapshot, runs the three checks per H2 block, and computes round(100 * selfContainedCount / blocksEvaluated), where a block is self-contained once at least 2 checks pass. The heading check passes when the heading carries a capitalised proper noun or brand name after the first word, or holds at least two non-vague word tokens.

The published rubric maps that proportional score onto bands:

Pass (80-100) — 80% or more of H2 blocks are self-contained.
Partial (40-79) — between 40% and 79% of blocks are self-contained.
Fail (0) — fewer than 40% of blocks pass.

A page with no H2 blocks, or no content-analysis cache, returns 0 or is skipped — so a page without real article sections cannot pass. Note that the processor's anaphoric-pronoun set runs slightly wider than the spec's: it also catches that, them, their, and its.

How do you fix Chunk Extractability issues?

Open each H2 with the explicit entity name instead of a pronoun that points at an earlier section.
Keep it, this, they, and similar pronouns out of the first 20 words of every block.
Swap generic headings like "Overview" or "Details" for descriptive noun phrases that name the subject.
Write each section so it reads on its own, as if it were the only thing in front of the reader.
Re-scan, then read the selfContainedCount, pronounDriftBlocks, and vagueHeadings evidence fields to pinpoint the blocks still failing.

Related parameters

Score Bands

Verdict	Condition
Pass	80% or more of H2 blocks pass at least 2 of the 3 self-containment checks (no leading pronoun, no anaphora in the first 20 words, noun-phrase heading) — score 80-100
Partial	40-79% of H2 blocks pass at least 2 of 3 self-containment checks; some lean on pronouns from earlier sections or use vague headings
Fail	Fewer than 40% of H2 blocks pass; anaphoric pronouns and vague headings dominate, or no H2 blocks are found (score 0)

Verdict

Condition

Pass

80% or more of H2 blocks pass at least 2 of the 3 self-containment checks (no leading pronoun, no anaphora in the first 20 words, noun-phrase heading) — score 80-100

Partial

40-79% of H2 blocks pass at least 2 of 3 self-containment checks; some lean on pronouns from earlier sections or use vague headings

Fail

Fewer than 40% of H2 blocks pass; anaphoric pronouns and vague headings dominate, or no H2 blocks are found (score 0)

Description

What does Chunk Extractability check?

Each H2 block runs through three checks:

No leading pronoun — the first sentence does not start with an anaphoric pronoun.
No anaphora in the first 20 words — none of it, this, that, they, these, those, them, their, its show up in the opening 20 words.
Noun-phrase heading — the heading is a real noun phrase, not a generic placeholder like Overview, Details, or Introduction.

Pass at least 2 of the 3 and the block is self-contained. The page score is the percentage of H2 blocks that qualify.

Why does Chunk Extractability matter for AI-readiness?

How is Chunk Extractability scored?

The published rubric maps that proportional score onto bands:

Pass (80-100) — 80% or more of H2 blocks are self-contained.
Partial (40-79) — between 40% and 79% of blocks are self-contained.
Fail (0) — fewer than 40% of blocks pass.

How do you fix Chunk Extractability issues?

Open each H2 with the explicit entity name instead of a pronoun that points at an earlier section.
Keep it, this, they, and similar pronouns out of the first 20 words of every block.
Swap generic headings like "Overview" or "Details" for descriptive noun phrases that name the subject.
Write each section so it reads on its own, as if it were the only thing in front of the reader.
Re-scan, then read the selfContainedCount, pronounDriftBlocks, and vagueHeadings evidence fields to pinpoint the blocks still failing.

Chunk Extractability

Signal Source

Score Bands

Description

What does Chunk Extractability check?

Why does Chunk Extractability matter for AI-readiness?

How is Chunk Extractability scored?

How do you fix Chunk Extractability issues?

Related parameters

Version History

Key takeaways

Chunk Extractability

Signal Source

Score Bands

Description

What does Chunk Extractability check?

Why does Chunk Extractability matter for AI-readiness?

How is Chunk Extractability scored?

How do you fix Chunk Extractability issues?

Related parameters

Version History

Key takeaways

Signal Source

Score Bands

Description

What does Chunk Extractability check?

Why does Chunk Extractability matter for AI-readiness?

How is Chunk Extractability scored?

How do you fix Chunk Extractability issues?

Related parameters

Version History

Key takeaways

Related Parameters

Signal Source

Score Bands

Description

What does Chunk Extractability check?

Why does Chunk Extractability matter for AI-readiness?

How is Chunk Extractability scored?

How do you fix Chunk Extractability issues?

Related parameters

Version History

Key takeaways

Related Parameters