- Methodology
- Parameters
- Chunk Extractability
Chunk Extractability
stableCategory: content-structure · Methodology v4.5
Can each H2 block stand alone as a quotable chunk?
Signal Source
- Source
https://{domain}- Kind
- html_dom
Score Bands
| Verdict | Condition |
|---|---|
| Pass | 80% or more of H2 blocks pass at least 2 of the 3 self-containment checks (no leading pronoun, no anaphora in the first 20 words, noun-phrase heading) — score 80-100 |
| Partial | 40-79% of H2 blocks pass at least 2 of 3 self-containment checks; some lean on pronouns from earlier sections or use vague headings |
| Fail | Fewer than 40% of H2 blocks pass; anaphoric pronouns and vague headings dominate, or no H2 blocks are found (score 0) |
Description
What this parameter measures
Can each H2 block stand alone as a quotable chunk? This parameter checks that. friendly4AI evaluates every H2 block against three deterministic checks: the first sentence does not start with an anaphoric pronoun, no anaphoric pronouns (it, this, that, they, these, those, them, their, its) appear in the first 20 words, and the heading text is a real noun phrase rather than a generic placeholder like Overview, Details, or Introduction. A block is self-contained when at least 2 of the 3 checks pass, and the score is the percentage of self-contained blocks.
Why it matters for AI-readiness
LLM retrieval pulls one section at a time, out of page order. A section that opens with "It does this…" or "They also offer…" is meaningless once lifted away from the text it refers back to, because the model has no antecedent. Vague headings like "Overview" give the retriever nothing to match against a query. When each H2 names its subject explicitly and carries a descriptive heading, every chunk is independently meaningful, which is what makes it citable in an AI answer.
How we score it
The v4.4 methodology scores this Content Structure parameter as a gradient. The processor prefers an article snapshot, then runs the three checks per H2 block and scores round(100 * selfContainedCount / blocksEvaluated), where a block is self-contained when at least 2 checks pass. The heading check passes when the heading carries a capitalised proper noun or brand name after the first word, or has at least two non-vague word tokens. The published rubric maps the proportional score onto bands: pass at 80% or more self-contained blocks, partial between 40% and 79%, and fail below 40%. A page with no H2 blocks, or no content-analysis cache, returns 0 or is skipped, so a page without real article sections cannot pass. The processor's anaphoric-pronoun set is slightly wider than the spec's, also catching that, them, their, and its.
How to fix common issues
- Open each
H2with the explicit entity name rather than a pronoun referring to an earlier section. - Keep
it,this,they, and similar pronouns out of the first 20 words of each block. - Replace generic headings like "Overview" or "Details" with descriptive noun phrases that name the subject.
- Make each section make sense when read in isolation, as if it were the only thing the reader saw.
- Re-scan and check the
selfContainedCount,pronounDriftBlocks, andvagueHeadingsevidence fields to find the blocks still failing.
Version History
- Introduced
- v4.2
- Last changed
- v4.4
Key takeaways
- Signal: https://{domain}
- Category: Content Structure
- Passes when: 80% or more of H2 blocks pass at least 2 of the 3 self-containment checks (no…