Name: ChatGPT Retrieval & Citation Dataset: 16,851 Queries, 353,799 Pages
Creator: AirOps
Published: April 13, 2026

TL;DR

AirOps partnered with Kevin Indig (Growth Memo) to map what happens between a user's query and an AI citation, analyzing 16,851 queries and 353,799 pages across ChatGPT's full retrieval pipeline.

Retrieval rank is the #1 signal. A page at position 1 in ChatGPT's retrieval results has a 58% citation rate vs. 14% at position 10; a 4x gap that no amount of content quality alone won't close. Great SEO is your advantage in AI search.
Your headings are what get you cited. Pages with headings that closely match the user's query are cited 41% of the time vs. 29% for weak matches. Heading structure is the primary on-page lever for AI citation, more impactful than word count, topical breadth, or body copy.
Focused pages beat comprehensive guides. Pages covering 26–50% of ChatGPT's fanout sub-queries outperform pages covering 100%. The "ultimate guide" playbook that dominated traditional SEO actually hurts citation rates when query relevance is held constant.
Domain authority doesn't translate. DA and backlinks show no positive correlation with AI citation, and are slightly inversely correlated. ChatGPT evaluates content directly based on relevance and structure, not authority signals.

The full report breaks down every signal we tested: retrieval rank, heading match, content length, readability, schema markup, freshness and more, with controlled comparisons and actionable benchmarks.

‍

Retrieval rank is the strongest predictor of whether a page gets cited in a ChatGPT answer.

A page at the top position in ChatGPT's web search results has a 58% chance of citation; by position 10, that drops to 14%. Among content signals, pages whose headings closely match the original query are cited more consistently than pages covering a broad set of fan-out subtopics. Moderate coverage of 2-3 subtopics outperforms exhaustive coverage when primary query relevance is held constant.

Domain authority and backlinks show no positive correlation with citation, and are slightly inversely correlated. The one exception is Wikipedia, which achieves the highest citation rate in the dataset (59%) through extreme content density (4,383 average words, 31 lists per page, 6.6 tables per page) despite the worst retrieval rank. For everyone else: be findable, match the query, structure your content well. Broad fan-out coverage is overrated.

Methodology

This study measures how well web pages cover the topics that ChatGPT searches for when answering user queries. We scraped ChatGPT’s UI for the data, not the API.

Scale: 16,851 unique queries across 10 categories (Publishing, E-commerce, Travel, Health, SaaS, Real Estate, Finance, Legal, Marketing, Business Services) and 4 query types (commercial, informational, transactional, local).

Process: Each query was sent to ChatGPT 3 separate times (runs 1, 2, and 3). For each run, the pipeline captured:

The full ChatGPT response, parsed into answer sections
Every fan-out query ChatGPT issued internally (sub-searches it used to gather information)
Every URL returned by those fan-out searches (search results), and every URL cited in the answer (citations)
The full HTML and extracted text of every page ChatGPT retrieved during answer generation for a user query (pages both retrieved, and those that were also cited)

Coverage Scoring: Each page's H1-H4 headings were embedded using BAAI/bge-base-en-v1.5 (768 dimensions), and cosine similarity was computed between query embeddings and heading embeddings. A page "covers" a query or fan-out subtopic when heading similarity exceeds a defined threshold (0.80 for the primary analysis; we also tested at 0.60 and 0.70, which produced the same pattern, see Finding 2).

Key numbers:

Metric	Count
Unique queries	16,851
ChatGPT responses (queries x 3 runs)	50,553
Pages scraped	353,799
Coverage scoring rows (query x page x run)	815,484
Fan-out detail rows	1,511,251

‍

Fan-out behavior: 88.6% of queries generate exactly 2 fan-out sub-queries. Only 8.8% generate zero (typically simple product or entity queries), and 2.5% generate 4 or more (complex comparative or review queries).

Retrieval rank is the dominant signal

A page's position in ChatGPT's web search results is the single strongest predictor of citation. This held across every control we tested.

When ChatGPT answers a query, it issues web searches (fan-out queries) through its search tool and gets back a ranked list of URLs. Position 0 is the first result returned, position 1 is the second, and so on. The underlying search provider is not confirmed in the data; ChatGPT is publicly known to use Bing, but whether it also pulls from Google or other sources is unclear.

Web Search Position	Citation Rate
1 (first result)	58.4%
2	54.4%
3	35.5%
4	29.9%
6	24.6%
10	14.2%

‍

A page at rank 0 is 4x more likely to be cited than a page at rank 10.

Retrieval rank predicts citation consistency

Pages cited in all 3 runs (the most reliable sources) have dramatically better retrieval positions than pages never cited:

Group	N	Avg Best Rank	Median Rank
Cited all 3 runs	6,124	6.2	2.5
Cited 2 of 3	21,287	8.8	5.0
Cited 1 of 3	77,285	11.6	8.0
Never cited	172,190	14.6	13.0

‍

Rank matters even when content relevance is high

Among pages with headings that match the query (primary similarity >= 0.8), retrieval rank still drives a 58pp gap in citation rate:

Rank Bucket	Citation Rate (primary sim >= 0.8)
Rank 1	79.6%
Rank 2	74.7%
Rank 3-4	47.8%
Rank 5-6	37.0%
Rank 7-10	25.5%
Rank 11+	21.5%

‍

A page with perfect content relevance at rank 11+ (21.5%) is still outperformed by a page with mediocre content at rank 0 (55.9% for primary sim < 0.60).

Implication: Retrievability is the first optimization target for ChatGPT citation. Content quality amplifies the signal, but without retrieval, there is nothing to amplify.

Key takeaways:

Position in the retrieval system is the single strongest citation predictor, with a 4x gap between rank 0 (58%) and rank 10 (14%).
Citation consistency tracks retrieval rank: pages cited in all 3 runs have a median rank of 2.5 vs 13.0 for never-cited pages.
Even pages with strong heading matches (>= 0.8 similarity) drop from 80% to 22% citation rate as rank falls from 0 to 11+.
A mediocre page at rank 0 (56% cite rate) outperforms a strong page at rank 6+ (26% cite rate). Rank overrides content quality.

Query match beats topical breadth

The strongest content signal is how well a page's headings match the original query. How many fan-out subtopics the page covers barely registers.

Methodology note: This analysis measures heading-level similarity only (H1-H4 text vs query embeddings). Full page body text was not vectorized against the query, so pages with strong body content but weak headings may be underrepresented in this signal. The heading-based approach captures structural relevance (what the page is organized around) rather than total textual overlap.

Primary similarity drives citation

Primary similarity measures the cosine similarity between the query embedding and the best-matching heading on a page. The relationship with citation is clear and monotonic:

Primary Similarity	Citation Rate	Avg Runs Cited
< 0.50	30.2%	0.43
0.50-0.59	29.8%	0.44
0.60-0.69	28.6%	0.42
0.70-0.79	31.0%	0.49
0.80-0.89	34.5%	0.62
0.90+	41.0%	0.81

‍

Even controlling for retrieval rank (only pages ranked 0-2), higher primary similarity adds +19pp to citation rate: from 55.9% at < 0.60 to 75.3% at 0.90+.

Fan-out coverage is a weak signal

Fan-out coverage ratio measures what share of the fan-out subtopics a page covers, scored against H2-H4 subheadings at a 0.80 cosine similarity threshold. We tested this at 0.60 and 0.70 as well; the pattern is identical at all thresholds (moderate coverage outperforms exhaustive coverage when query match is held constant).

Fan-out Coverage (H2-4 @ 0.80)	N	Citation Rate	Avg Runs Cited
0%	591,662	30.6%	0.48
1-50%	112,706	33.3%	0.56
51-100%	107,943	35.2%	0.64

‍

Full fan-out coverage adds only +4.6pp over zero coverage. This gap is misleading: pages with high fan-out coverage also tend to have higher query match scores (0.834 vs 0.680), so the two signals travel together. The controlled test below isolates them and shows density adds little on its own.

The controlled test: moderate coverage beats exhaustive coverage

When we hold primary similarity constant (>= 0.8), the density advantage disappears and even reverses:

fan-out Coverage (primary sim >= 0.8)	N	Citation Rate
0%	79,024	35.5%
26-50%	28,785	38.2%
100%	120,572	34.0%

‍

Pages covering 26-50% of fan-out subtopics outperform pages covering 100%. This suggests that exhaustive coverage may signal "generalist" content that addresses many topics without depth, while moderate coverage paired with strong primary relevance signals focused expertise.

Heading spread reinforces the pattern

We also measured how many distinct H2-H4 headings on a page match fan-out queries (at 0.70 threshold), controlling for primary similarity >= 0.8:

Distinct Subheading Matches	N	Citation Rate	Cited All 3 Runs
0 headings	79,024	35.5%	3.77%
1 heading	101,430	35.2%	3.87%
2 headings	48,150	33.9%	3.60%
3-4 headings	635	29.8%	2.68%

‍

Matching 1 subheading performs identically to matching 0 (meaning the query-to-heading match alone is enough to drive citation, without any subtopic coverage). Matching 3-4 subheadings drops citation by 6pp. More heading matches do not help and may indicate diluted content.

Implication: Write content that directly answers the query you're targeting. A page that nails one question outperforms a page that adequately addresses five. The fan-out subtopics are not a content checklist.

Key takeaways:

‍Match the query directly in your primary heading, then use 4-10 subheadings to structure the answer, not to chase every related subtopic. Pages that match the query well get cited up to 41% of the time. Spreading across too many subtopics dilutes the signal and drops citation by 6pp.

Query match (heading similarity to the original query) is the strongest content signal, scaling from 30% to 41% citation rate across similarity buckets.
Even at top retrieval ranks, higher query match adds +19pp to citation rate.
Fan-out coverage adds only +4.6pp uncontrolled, and the signal disappears when query match is held constant.
Moderate subtopic coverage (26-50%) outperforms exhaustive coverage (100%) among pages with strong query match, at every threshold tested (0.60, 0.70, 0.80).
Matching 3-4 distinct subheadings drops citation by 6pp vs matching 0-1. Breadth dilutes.

Content structure has a supporting role

Structural signals help at the margins but don't override retrieval rank or query relevance. The following analysis excludes low-content pages (word count < 100).

Word Count	N	Citation Rate
< 500	44,451	30.5%
500-999	104,690	34.3%
1,000-1,499	127,128	32.9%
1,500-1,999	113,538	33.5%
2,000-2,999	154,509	32.6%
3,000-4,999	130,397	30.2%
5,000+	74,170	28.6%

‍

The sweet spot is 500-2,000 words. Pages over 5,000 words underperform pages under 500 words. Length works against you in ChatGPT citation.

Heading structure: 7-20 subheadings is optimal

H2-H4 Subheadings	N	Citation Rate
0	31,656	30.1%
1-3	66,522	28.0%
4-6	56,088	32.1%
7-10	82,397	33.5%
11-20	190,733	33.6%
21+	321,487	31.9%

‍

Articles need enough structure to organize content but not so much that they become diluted. The 1-3 heading range (28.0%) performs worst, worse than zero headings (30.1%). This varies by page type:

Page type	0 sub-headings (h2-h4)	1-3	4-10	11-20	21+
Article	30.4%	31.8%	33.2%	32.6%	33.0%
Product	43.2%	33.8%	31.9%	33.8%	25.0%
Other (forums, landing pages, etc.)	29.8%	27.6%	32.9%	33.8%	31.8%

‍

For articles, the pattern is straightforward: more headings help up to a point, with 4-10 performing best (33.2%). For product pages, zero headings has the highest cite rate (43.2%), likely because product pages are already focused on a single item and don't need editorial structure. The "other" bucket (forums, homepages, landing pages) drives most of the zero-heading volume and shows the same 4-10 sweet spot as articles.

Schema markup: meaningful boost

JSON-LD Status	N	Citation Rate
Has JSON-LD	13,341	38.5%
No JSON-LD	735,542	32.0%

‍

Pages with JSON-LD schema markup have a +6.5pp citation advantage.

The top-performing schema types:

JSON-LD Type	Citation Rate
MedicalWebPage	47.0%
BreadcrumbList	46.2%
FAQPage	45.6%
Organization	44.3%
WebSite	40.6%

‍

We checked whether JSON-LD pages differ on other signals that could explain the gap. They don't: JSON-LD pages have similar word counts (2,634 vs 2,627), similar heading counts (23.7 vs 23.0), similar DA (60.2 vs 59.4), and similar query match scores (0.745 vs 0.739). The schema markup boost appears to be an independent signal, possibly because structured data helps the retrieval system parse and categorize page content.

Lists and tables: modest signal

Structure	N	Citation Rate
Both lists + tables	178,469	33.9%
Tables only	6,787	32.9%
Lists only	499,992	31.6%
Neither	63,635	31.0%

‍

Pages with both lists and tables earn a +2.9pp advantage over pages with neither. Split by page type:

Page type	Both	Lists only	Neither
Article	34.0%	32.2%	33.7%
Product	37.8%	28.0%	24.4%
Other	33.9%	31.5%	30.4%

‍

The list+table signal is strongest for product pages (+13pp vs neither). For articles, it barely matters. For the "other" bucket, the pattern tracks the overall average.

Readability: higher grade level performs better

Flesch-Kincaid Grade	N	Citation Rate
< 8 (Kindergarten)	50,286	29.6%
8-9 (High School)	173,374	32.1%
10-11	263,130	31.7%
12-13	159,258	32.3%
14-15	56,586	33.3%
16-17 (College)	20,692	35.9%
18-19 (Post Grad)	8,781	34.3%
20+ (Academic)	16,776	33.2%

‍

The FK 16-17 range performs best at 35.9%, consistent with prior AEO research that found FK 16 optimal for AI citation. The signal peaks at college-level writing and tapers above 18.

ChatGPT favors more sophisticated writing, peaking at college-level grade. This likely reflects that expert-written content tends to use higher-grade vocabulary and more complex sentence structure.

Implication: Structure your content with 7-20 subheadings, include lists and tables where appropriate, add JSON-LD schema markup, and write at FK grade 14-17. These are table stakes for AI visibility, not differentiators. None of these signals can overcome poor retrieval rank or weak query relevance.

Key takeaways:

Word count sweet spot is 500-2,000 words. Pages over 5,000 words underperform pages under 500.
4-10 H2-H4 subheadings is the sweet spot for articles. Product pages perform best with zero headings.
JSON-LD schema adds +6.5pp citation advantage independent of other content signals. FAQPage, MedicalWebPage, and BreadcrumbList lead.
FK readability peaks at 16-17 (35.9%), confirming prior AEO research. College-level writing outperforms both simple and overly academic text.
Lists + tables matter most for product pages (+13pp). For articles, the effect is negligible.

Authority signals don't predict citation

Traditional SEO authority metrics (domain authority, backlink count) show no positive correlation with citation rate in AI-generated answers.

Group	Avg DA	Avg Backlinks
Always cited (79K pages)	53.0	1.1M
Mixed (57K pages)	57.8	752K
Never cited (182K pages)	55.7	3.2M

‍

Pages that are always cited have lower domain authority and fewer backlinks than pages that are never cited.

DA doesn't help at any similarity level

Primary Sim Level	DA Q1 (lowest) Cite%	DA Q4 (highest) Cite%
Low (< 0.7)	31.3%	27.3%
Mid (0.7-0.8)	33.2%	30.2%
High (0.8+)	35.2%	35.0%

‍

At every level of content relevance, the lowest DA quartile performs equal to or better than the highest.

High-authority platforms underperform

Platform	Domain Authority	Citation Rate
YouTube	100	2.4%
Reddit	92	29.9%
Major News	94	32.0%
Health Publishers	90	46.4%
Wikipedia	95	59.2%

‍

The five highest-DA site types in the dataset (YouTube 100, Wikipedia 95, Major News 94, Reddit 92, Health Publishers 90) produce citation rates ranging from 2.4% to 59.2%. Nearly identical authority, wildly different outcomes. DA tells you nothing about citation likelihood.

Implication: ChatGPT appears to evaluate content directly based on relevance, structure, and coverage. Domain authority carries no observable weight. Brands should evaluate their AEO strategy based on content quality, not link profiles.

Key takeaways:

Always-cited pages have lower DA (53) than never-cited pages (56). Backlinks show a 3x inverse gap (1.1M vs 3.2M).
At every level of query match, the lowest DA quartile performs equal to or better than the highest.
The five highest-DA site types in the dataset (YouTube 100, Wikipedia 95, Major News 94, Reddit 92, Health Publishers 90) produce citation rates ranging from 2.4% to 59.2%. Nearly identical authority, wildly different outcomes. DA tells you nothing about citation likelihood.
The signal that matters is content relevance at the page level, not domain-level authority.

Site type analysis

Citation rate, retrieval rank, and content profiles vary significantly by site type:

Site Type	Pages	Cite%	Avg Rank	Median Rank	% Top 3	Primary Sim
Wikipedia	5,342	59.2%	25.1	24.0	3.6%	0.576
Health Publishers	3,484	46.4%	10.3	7.0	29.3%	0.734
Travel Platforms	2,147	42.3%	10.3	8.0	21.8%	0.764
Education	4,031	41.2%	10.0	8.0	17.5%	0.659
Government	5,234	34.4%	10.0	8.0	19.0%	0.649
Article (other)	53,688	32.6%	12.3	10.0	14.0%	0.754
Major News	2,686	32.0%	14.5	12.0	20.2%	0.720
Product Page	3,627	30.4%	10.5	9.0	15.9%	0.757
Reddit	12,765	29.9%	15.5	11.0	15.0%	0.743
Marketplace	1,301	15.9%	13.2	12.0	12.5%	0.726
YouTube	2,647	2.4%	16.4	15.0	10.7%	0.714

‍

The Wikipedia exception

Wikipedia achieves the highest citation rate in the dataset (59.2%) despite having the worst retrieval rank (median 24.0, only 3.6% in the top 3) and the lowest primary similarity (0.576). It is the only site type where density clearly overcomes poor retrieval position.

What makes Wikipedia different is its content profile:

Metric	Wikipedia	Health Publishers	Reddit	All Pages Avg
Avg words	4,383	2,111	1,194	2,324
Avg H2-H4	13.6	17.2	1.4	22.1
Avg lists	31.0	18.8	1.8	10.7
Avg tables	6.6	0.2	0.0	0.8
JSON-LD	0.0%	1.5%	0.0%	1.0%
DA	95	90	92	51

‍

Wikipedia pages are longer, have more lists per page than any other site type, and have 8x the tables. They also have no JSON-LD schema and low domain authority. Wikipedia wins purely on content density: encyclopedic coverage, rich structured data within the content, exhaustive topic treatment.

No other site type replicates this pattern. This is not a scalable playbook for most sites.

Health publishers: the query match + rank model

Health publishers (Healthline, WebMD, Mayo Clinic, Cleveland Clinic, Verywell Health, Medical News Today) achieve the second-highest citation rate (46.4%) through a different strategy than Wikipedia. They have the best retrieval rank among all site types (median 7.0, 29.3% in top 3) combined with high primary similarity (0.734) and the highest fan-out coverage ratio (0.918).

Their content is focused (2,111 avg words), well-structured (17.2 H2-H4, 18.8 lists), and highly relevant to the queries they surface for. They also have the highest "mixed" rate (35.1% of their pages are sometimes-cited), reflecting intense competition across health queries.

Reddit: high authority, low value

Reddit has a DA of 92 but a citation rate of only 29.9% and the lowest citation consistency in the dataset (only 0.59% of Reddit pages are cited in all 3 runs). Reddit pages have almost no content structure (1.4 H2-H4 headings, 1.8 lists, 0 tables) and relatively short text (1,194 words).

There is no structural difference between always-cited and never-cited Reddit pages (1,111 vs 1,212 avg words, 1.0 vs 1.6 headings). For Reddit, citation depends entirely on whether the thread happens to contain the specific information ChatGPT is looking for.

Major news: surfaced often, cited inconsistently

Major news outlets (Forbes, NYT, The Guardian, BBC, CNN, Reuters, Washington Post) have the second-highest DA (94) but a below-average citation rate (32.0%) and high "mixed" rate (28.1%). They get surfaced across many queries but rarely own a topic.

Within major news, always-cited pages have significantly more structure than never-cited pages (28.4 vs 20.4 H2-H4 headings, 2,904 vs 2,268 words). This is one of the few site types where content structure meaningfully separates winners from losers.

Government: focused beats exhaustive

Government pages (.gov) show an unexpected within-type pattern. Never-cited government pages are longer (6,292 vs 4,091 avg words) and have more headings (26.3 vs 21.4) than always-cited government pages. For government content, shorter and more focused outperforms longer and broader.

Government pages do show a citation boost beyond what content signals explain. Controlling for query match level:

Query match	Government cite%	Non-government cite%	Gap
Low (< 0.7)	30.1%	29.4%	+0.7pp
Mid (0.7-0.8)	39.0%	31.6%	+7.4pp
High (0.8+)	49.1%	35.2%	+13.9pp

‍

At high query match, government pages get cited 49% of the time vs 35% for non-government pages. The gap widens as content relevance increases, suggesting ChatGPT may apply a source-trust signal for .gov domains.

YouTube and marketplaces: structurally disadvantaged

YouTube (2.4% citation rate) and marketplace pages are structurally disadvantaged. YouTube pages have minimal extractable text (600 avg words), and marketplace pages, despite having rich content (3,349 words, 40.4 H2-H4 headings), serve product listing formats that don't align well with informational queries.

Amazon does skew the marketplace numbers down. Per-marketplace breakdown:

Marketplace	Pages	Citation rate
Amazon	752	12.2%
Walmart	249	17.4%
Etsy	108	23.0%
Target	166	26.7%
eBay	52	18.3%

‍

Amazon's 12% rate (likely affected by bot-blocking) pulls down the group average. Target and Etsy perform closer to the overall dataset average, suggesting the marketplace format itself is the primary constraint, with Amazon's crawl restrictions adding a secondary penalty.

Key takeaways:

Wikipedia wins through density alone (59% cite rate) despite worst retrieval rank (median 24) and lowest query match (0.576). No other site type shows this pattern.
Health publishers win through the opposite strategy: best retrieval rank (median 7), strong query match (0.734), focused content (2,111 words).
Government pages get a citation boost beyond content signals (+14pp at high query match), suggesting a source-trust factor.
Reddit's DA 92 produces 0.59% consistency (cited all 3 runs). Authority without structure is unreliable.
Amazon's bot-blocking drops its cite rate to 12%, pulling down marketplace averages.

The bimodal reality of citation

The citation distribution is bimodal, meaning pages tend to either get cited by ChatGPT or not. There is little middle ground.

Page Citation Rate	Pages	% of All Pages
0% (never cited)	205,334	58.0%
1-10%	1,371	0.4%
11-25%	9,714	2.7%
26-50%	38,208	10.8%
51-75%	10,335	2.9%
76-99%	1,333	0.4%
100% (always cited)	87,504	24.7%

‍

58% of pages are never cited in any query they appear for. 25% are cited every time they appear in ChatGPT’s web search. Only 17% fall in between.

On-page signals don't explain the split

Metric	Always Cited	Never Cited
Pages	79,273	182,056
Avg word count	2,172	2,365
Avg H2-H4 headings	20.3	21.1
Avg readability (FK)	12.0	12.2
Avg lists	11.0	10.6
Avg tables	0.9	0.8
Avg DA	53.0	55.7

‍

The profiles are nearly identical. Word count, headings, readability, lists, tables, and domain authority do not differentiate always-cited pages from never-cited pages. The differentiator, as established in Finding 1, is retrieval position.

The "mixed" pages tell a story

The 17% of pages that are sometimes-cited have a distinct profile:

Metric	Always Cited	Mixed	Never Cited
Pages	79,273	57,450	182,056
Median queries appeared in	1	3	1
Avg word count	2,172	2,573	2,365
Avg H2-H4 headings	20.3	23.5	21.1
Avg DA	53.0	57.8	55.7

‍

Mixed pages appear across the most queries (median 3, up to 372x for a single page), have the longest content, the most headings, and the highest domain authority. These are broad, authoritative resources that get surfaced often but don't reliably win. They represent the "cover everything, earn links, hope to be cited" strategy. The data suggests this approach produces inconsistent results.

By page type, 80% of mixed pages are classified as "other" and 19% as articles. Breaking the "other" bucket down further:

Pattern	Mixed pages	%
Misc deep pages	41,672	72.5%
Blog pages	6,555	11.4%
Product/shop pages	2,062	3.6%
Reddit	1,580	2.8%
Health publishers	1,224	2.1%
Wiki/Encyclopedia	1,003	1.7%
Government	970	1.7%
Education	887	1.5%
Major news	754	1.3%

‍

The top domains among mixed pages: Reddit (1,582), Wikipedia (965), Alibaba (572), Forbes (537), Vogue (512), TechRadar (509), Healthline (505), Tom's Guide (434), Consumer Reports (400). These are editorial, review, and lifestyle publishers that cover many topics broadly. Mixed citation is a byproduct of breadth-first content strategies across verticals.

The bimodal split by site type

Site Type	Pages	% Never Cited	% Always Cited	% Mixed
Wikipedia	5,342	25.9%	55.4%	18.8%
Travel Platforms	2,147	51.7%	32.1%	16.2%
Education	4,031	46.5%	31.5%	22.0%
Reddit	12,765	58.4%	29.3%	12.4%
Health Publishers	3,484	36.3%	28.6%	35.1%
Article (other)	53,688	56.2%	24.8%	19.0%
Government	5,234	57.0%	24.5%	18.5%
Major News	2,686	54.1%	17.8%	28.1%
Marketplace	1,301	80.6%	13.5%	5.8%
YouTube	2,647	95.9%	1.6%	2.5%

‍

Wikipedia has the most favorable distribution: 55% of its pages are always-cited. Health publishers have the highest "mixed" rate (35.1%), reflecting competitive health query space. Major news also has high "mixed" (28.1%), consistent with pages that cover many topics but rarely dominate.

Implication: Consistent AI citation comes from being the retrievable, well-matched answer to a specific query. Broad coverage strategies produce "mixed" results at best. The highest-performing pages are narrowly focused resources that surface for few queries and win every time they appear.

Key takeaways:

The citation distribution is bimodal: 58% of pages are never cited, 25% are always cited. Only 17% are in between.
On-page signals (word count, headings, readability, DA) are nearly identical between always-cited and never-cited pages. Retrieval position is the separator.
"Mixed" pages have the longest content, the most headings, and the highest DA. They are the "ultimate guides" and they perform the least reliably.
Mixed pages are broadly distributed across verticals, not concentrated in one site type.
Wikipedia has the most favorable split: 55% always-cited. Marketplaces have the worst: 81% never-cited.

Citations are front-loaded and follow retrieval rank

ChatGPT's answers cite 5-7 sources on average. Those citations are concentrated in the first third of the answer and follow the retrieval rank order.

Citation distribution across the answer

Answer position	% of citations
First third	40.7%
Middle third	34.8%
Last third	24.5%

‍

41% of all citations appear in the opening section of the answer. The last third accounts for only 25%.

Search rank predicts citation position

Pages that rank higher in ChatGPT's web search get cited earlier in the answer:

Search rank	Avg citation position	Median
Rank 0-2	2.2	2
Not in search results (from memory)	3.3	3
Rank 3-5	3.9	3
Rank 6-10	4.9	5
Rank 11+	6.2	6

‍

Pages ranked 0-2 are cited at position 2.2 on average. Pages ranked 11+ appear at position 6.2. Pages cited "from memory" (not found in any search result) appear early too (position 3.3), suggesting ChatGPT treats its training-data references as high-confidence sources.

Density doesn't earn repeat citation

Each cited page appears exactly one time per response regardless of fan-out coverage. Pages with 100% subtopic coverage get cited once, same as pages with 0%. Density does not earn a page multiple citations within a single answer.

Key takeaways:

41% of citations land in the first third of the answer. Early position correlates with search rank.
Search rank 0-2 pages are cited at position 2 on average. Rank 11+ pages are cited at position 6.
Pages cited from ChatGPT's training data (without appearing in search results) are treated as high-confidence, appearing at position 3.3 on average.
No page gets cited more than once per response, regardless of density.

Freshness matters, but only with relevance

41% of pages in the dataset have a detectable publish date. Among those, page age shows a clear relationship with citation.

Page age vs citation rate

Page age	N	Citation rate	Cited all 3 runs
< 30 days	33,922	25.3%	1.91%
30-89 days	41,528	32.8%	2.36%
90-179 days	40,892	32.4%	2.16%
180-364 days	57,591	31.5%	1.98%
1-2 years	54,500	32.0%	2.14%
2-5 years	77,160	27.5%	1.94%
5+ years	37,234	27.6%	1.80%

‍

The sweet spot is 30-89 days old (32.8%). Very fresh content (< 30 days) underperforms at 25.3%, possibly because brand-new pages haven't been fully indexed or established retrieval signals yet. Pages older than 2 years decline to ~27.5%.

Freshness by industry

41% of pages in the dataset have a detectable publish date. Among those, page age shows a clear relationship with citation.

Category	<30 days	30-89 days	90-364 days	1-2 years	2-5 years	5+ years
SaaS	36.3%	39.3%	38.0%	36.1%	34.7%	28.5%
Finance	46.5%	50.2%	49.1%	39.1%	40.4%	35.1%
E-commerce	24.4%	35.9%	37.3%	37.4%	33.5%	36.7%
Legal	--	--	49.2%	38.7%	44.4%	32.1%
Real estate	36.0%	38.3%	44.1%	40.7%	35.2%	36.5%

Key patterns by vertical:

Finance has the strongest freshness signal: 50.2% at 30-89 days, dropping to 35.1% at 5+ years (15pp gap). Makes sense for a category where rates, regulations, and products change frequently.
SaaS peaks at 30-89 days (39.3%) and declines steadily to 28.5% at 5+ years (11pp gap). Software content ages fast.
Travel shows a sharp freshness curve: 44.8% at 30-89 days, down to 26.2% at 5+ years (19pp gap, the largest in the dataset).
E-commerce is the exception: freshness barely matters. The 5+ year bucket (36.7%) performs nearly as well as 30-89 days (35.9%). Evergreen product content holds up.
Health is unusual: 1-2 year old content (32.3%) slightly outperforms fresh content (25.6-29.9%). Established medical content may carry more trust.

Freshness matters most when content is relevant

Controlling for query match level:

Page age	High sim (0.8+)	Mid sim (0.7-0.8)	Low sim (< 0.7)
< 1 year	35.4%	30.9%	25.9%
1-5 years	31.2%	28.9%	27.2%
5+ years	31.7%	28.6%	23.9%

‍

Among pages with strong query match, fresher pages have a +4.2pp advantage (35.4% vs 31.2%). Among pages with weak query match, the age effect is negligible or slightly reversed. Freshness amplifies content relevance; it does not substitute for it.

Key takeaways:

The citation sweet spot is 30-89 days old (32.8%). Very fresh (< 30 days) underperforms, possibly due to incomplete indexing.
Pages older than 2 years see a 5pp drop in citation rate.
Freshness matters most when query match is already strong (+4.2pp). With weak query match, age barely matters.

ChatGPT cites from memory, and those pages look like everything else

6,371 page-query combinations in the dataset were cited by ChatGPT without the page appearing in any search result for that query. These are pages ChatGPT referenced from its training data.

What gets cited from memory?

Site type	Pages	%
Misc domains	1,363	79.5%
Reddit	190	11.1%
Health publishers	57	3.3%
Major news	42	2.4%
Wikipedia	37	2.2%
Education	14	0.8%
Government	12	0.7%

‍

Reddit is the largest identifiable source of memory citations (11.1%), followed by health publishers and news. The majority (79.5%) come from miscellaneous domains. Top individual domains: Reddit (190 pages), Wikipedia (37), Forbes (32), Healthline (25), Tom's Hardware (22).

Memory-cited pages look identical to search-cited pages

Metric	Cited from memory	Cited from search
Avg word count	2,418	2,518
Avg H2-H4	22.8	22.7
Avg DA	59.1	58.0
Avg query match	0.748	0.747

‍

The content profiles are nearly identical. Memory-cited pages have marginally higher DA (59.1 vs 58.0) and slightly shorter word count, but the differences are trivial. ChatGPT does not appear to apply a different quality bar for memory citations vs search-surfaced citations.

Key takeaways:

6,371 citations come from ChatGPT's training data, not from search results.
Reddit accounts for 11% of memory citations, the largest identifiable source.
Memory-cited pages have the same content profile as search-cited pages: similar word count, headings, DA, and query match.
Memory citations appear at position 3.3 in the answer on average, earlier than most search-surfaced citations.

Implications for your AI visibility strategy

1. Retrievability first

If ChatGPT can't find your page in its web search results, nothing else matters. Rank position is the strongest citation predictor (58% at position 0 vs 14% at position 10). Optimization for AEO starts with being discoverable by the retrieval system.

2. Query match over breadth

Match the query directly in your headings. A focused 800-word page outperforms a 5,000-word guide. When primary query relevance is high, moderate subtopic coverage (26-50%) outperforms exhaustive coverage (100%). Write content that is the best answer to one specific question.

3. Structure supports, doesn't save

Use 4-10 H2-H4 subheadings for articles (the optimal range varies by page type; product pages perform best with fewer or no subheadings). Include lists and tables where appropriate (especially for product pages, where the effect is strongest). Add JSON-LD schema markup (especially FAQPage, Article, or MedicalWebPage). Write at a professional grade level. Structural signals add 2-6pp to citation rates but cannot overcome poor retrieval or weak relevance.

4. Rethink authority as a proxy

Domain authority and backlink volume don't translate to AI citation. The lowest DA quartile performs as well as the highest at every relevance level. Evaluate your AEO strategy on content merit, not link profiles.

5. The Wikipedia playbook only works at Wikipedia scale

Wikipedia-level density is a real signal, but it requires encyclopedic coverage: 4,000+ words, dozens of lists, multiple tables, exhaustive topic treatment. No other site type replicates this pattern. For most publishers, the better strategy is to be the well-matched, well-structured, findable answer to a specific query.

6. Freshness is a lever to pull

Pages aged 30-89 days hit the highest citation rate (32.8%). Very fresh content (< 30 days) underperforms at 25.3%, likely because new pages haven't built retrieval signals yet. Pages older than 2 years drop to 27.5%. The freshness effect is strongest when query match is already high (+4.2pp for pages under 1 year vs 1-5 years). For publishers with relevant content aging past 2 years, refreshing is a 5pp lift waiting to be captured.

Methodology notes and caveats

Scope: This study measures ChatGPT's search and citation behavior specifically. ChatGPT and Google AI Mode use a similar fan-out query pattern (expanding a single query into multiple sub-queries), but they use different retrieval systems. Content signal findings (query match, density, structure) may transfer to other AI answer engines. Retrieval rank findings are specific to ChatGPT's search system.

Similarity thresholds: The default 0.5 cosine similarity threshold used in the pre-computed fan-out_coverage_ratio column creates a ceiling effect (P25 = median = P75 = 1.0). All density analysis in this report uses stricter thresholds (0.70 and 0.80) applied to H2-H4 subheading matches via the coverage_fan-out_detail table.

H1 vs H2-H4: Primary similarity matches against all headings (H1-H4). Since H1 is typically the page title and tends to match broadly, the fan-out coverage analysis uses H2-H4 only (h234_heading_sim) to measure whether the article's content subheadings address the subtopics.

Data exclusions: 806 queries have no coverage data (cited/seen URLs had no fetchable HTML). 35,020 pages (10%) flagged as is_low_content (word count < 100) were excluded from content quality analysis. These are primarily paywalled articles, Facebook posts, and JS-rendered tool pages.

Embedding model: All similarity scores use BAAI/bge-base-en-v1.5 (768 dimensions). Scores are directly comparable across all queries and pages.

Consistency measurement: Each query was sent to ChatGPT 3 times. Citation consistency (total_runs_cited) measures how many of the 3 runs cited a given page for a given query. Only 2.3% of page-query combinations are cited in all 3 runs, making consistent citation the most stringent quality bar in the dataset.

‍

The Fan-Out Effect: What Happens Between a Query and a Citation

Methodology

Retrieval rank is the dominant signal

Retrieval rank predicts citation consistency

Rank matters even when content relevance is high

Query match beats topical breadth

Primary similarity drives citation

Fan-out coverage is a weak signal

The controlled test: moderate coverage beats exhaustive coverage

Heading spread reinforces the pattern

Content structure has a supporting role

Heading structure: 7-20 subheadings is optimal

Schema markup: meaningful boost

Lists and tables: modest signal

Readability: higher grade level performs better

Authority signals don't predict citation

DA doesn't help at any similarity level

High-authority platforms underperform

Site type analysis

The Wikipedia exception

Health publishers: the query match + rank model

Reddit: high authority, low value

Major news: surfaced often, cited inconsistently

Government: focused beats exhaustive

YouTube and marketplaces: structurally disadvantaged

The bimodal reality of citation

On-page signals don't explain the split

The "mixed" pages tell a story

The bimodal split by site type

Citations are front-loaded and follow retrieval rank

Citation distribution across the answer

Search rank predicts citation position

Density doesn't earn repeat citation

Freshness matters, but only with relevance

Page age vs citation rate

Freshness by industry

Freshness matters most when content is relevant

ChatGPT cites from memory, and those pages look like everything else

What gets cited from memory?

Memory-cited pages look identical to search-cited pages

Implications for your AI visibility strategy

1. Retrievability first

2. Query match over breadth

3. Structure supports, doesn't save

4. Rethink authority as a proxy

5. The Wikipedia playbook only works at Wikipedia scale

6. Freshness is a lever to pull

Methodology notes and caveats

Win AI Search.