Jekyll-AEO
Make your Jekyll site visible to AI search engines, assistants, and LLMs. One gem, zero config, eight features.
On this page:
Traditional search traffic is declining 25% by 2026. AI-referred sessions grew 527% in early 2025. When someone asks ChatGPT, Perplexity, or Claude about a topic you’ve written about, your content needs to be in a format they can actually read. Most websites aren’t.
Jekyll-AEO fixes that in a single bundle install. It hooks into Jekyll’s build lifecycle and produces everything AI systems need to discover, parse, and cite your content — clean markdown copies, a site-wide llms.txt index, structured JSON-LD, crawler policies, and domain identity metadata. No external services, no API calls, no runtime dependencies.
Why This Matters Now
No AI crawler except Google renders JavaScript. Most AI systems see your raw HTML source — navigation menus, tracking scripts, layout divs, and framework boilerplate — not your rendered page. A typical web page is 85-95% markup and 5-15% actual content.
The math is brutal: AI systems have finite context windows, and every token wasted on HTML noise is a token that can’t be used to understand your content. Markdown reduces that overhead by 20-30%. A well-structured llms-full.txt file achieves 90%+ token reduction overall.
Sites with schema.org structured data get 2.5x more AI citations. FAQPage markup specifically gives a 3.2x higher chance of appearing in AI Overviews. And only ~10% of sites have adopted llms.txt — the early-mover window is open.
Features
Jekyll-AEO ships eight features. Three are on by default, one is available via a Liquid tag, and four are opt-in.
Generate .md Files
Every HTML page gets a companion .md file — your content stripped of Liquid tags, kramdown annotations, and layout noise. Just clean, structured markdown that LLMs can ingest directly.
/about/index.html → /about.md
/blog/my-post/ → /blog/my-post.md
/products/widget/ → /products/widget.md
Liquid tags, kramdown annotations, and developer comments are stripped automatically. Content is preserved. Title, description, and last-modified date are prepended from front matter.
Inject Link Tags
Every HTML page automatically gets a <link rel="alternate"> tag in the <head> pointing to its markdown copy. AI crawlers discover the machine-readable version of each page without needing to know your URL scheme. Works in auto-inject mode (default) or data mode for manual template placement.
Generate llms.txt
Following the llms.txt specification by Jeremy Howard (Answer.AI), Jekyll-AEO generates:
/llms.txt— A structured index of your entire site, organized by collection, with titles, descriptions, and links to markdown copies/llms-full.txt— Every page’s markdown content concatenated into a single file
# My Website
> A website about building great products
## Pages
- [About](/about.md): Learn about our company and mission
- [Pricing](/pricing.md): Plans and pricing for all tiers
## Blog Posts
- [Launching v2.0](/blog/launching-v2.md): Our biggest release yet
Sections auto-generate from your Jekyll collections, or you can define custom sections. Adoption is accelerating — Mintlify has rolled it out across thousands of documentation sites including Anthropic and Cursor.
Generate JSON-LD
Add {% aeo_json_ld %} to your layout and get schema.org structured data automatically:
| Schema | Trigger | Impact |
|---|---|---|
| BreadcrumbList | Every page except homepage | Helps AI understand site hierarchy |
| Organization | Homepage | Brand identity signal |
| FAQPage | faq: array in front matter |
3.2x higher AI Overview appearance |
| HowTo | howto: object in front matter |
Step-by-step content extraction |
| Speakable | speakable: true in front matter |
Voice assistant discovery |
| Article | Dated pages (skips when jekyll-seo-tag is installed) | Authorship and date signals |
Add FAQ or HowTo structured data to any page through simple front matter arrays — no manual JSON-LD required.
Generate robots.txt
Control which AI bots can access your site. Jekyll-AEO generates a robots.txt that separates search bots (allowed) from training bots (blocked):
User-agent: OAI-SearchBot # Allow search
Allow: /
User-agent: GPTBot # Block training
Disallow: /
Llms-txt: https://yoursite.com/llms.txt
Automatically includes Sitemap: and Llms-txt: directives, covers all major AI companies, and steps aside if you already have a robots.txt in your source directory.
Generate Domain Profile
Publish a /.well-known/domain-profile.json file following the AI Domain Data specification. This gives AI assistants authoritative metadata about your site’s identity — reducing hallucination and improving how your brand appears in AI-generated answers. Auto-populated from your _config.yml site settings.
Generate URL Map
A structured markdown table of every page on your site — with URLs, layouts, redirect mappings, markdown copy paths, and skip reasons. Written to your source directory so it can be committed to version control as a content audit tool.
Validate Output
Verify your AEO output after every build:
bundle exec jekyll aeo:validate
Checks that llms.txt exists and is properly formatted, all referenced markdown files exist, and domain-profile.json (if present) has valid structure with all required fields.
AEO GEO Deep Dive
How AI Platforms Find Your Content
Each AI platform has different content preferences — only 21-25% of cited domains overlap between platforms. Optimizing for ChatGPT alone misses 75-79% of the opportunity.
| Platform | Primary Signals | Top Source | Search Index |
|---|---|---|---|
| ChatGPT | Referring domains, domain traffic | Wikipedia (47.9% of top-10 citations) | Bing (87% match) |
| Perplexity | Credibility, recency, semantic relevance | Reddit (47% of responses) | Bing + own index |
| Claude | Brand authority, content quality | Mentions brands in 97.3% of responses | Brave Search (86.7% match) |
| Google AI Overviews | Schema markup, E-E-A-T | Most conservative (48.5% brand mention rate) | Google Search |
A cross-platform technical foundation — llms.txt, clean markdown, domain profiles, proper robots.txt — is the highest-leverage investment because it works across all platforms simultaneously.
AI Crawler Landscape
Understanding which bots are crawling your site — and why — is essential. Jekyll-AEO’s robots.txt generator gives you granular control.
| Company | Search/Retrieval Bot | Training Bot | Notes |
|---|---|---|---|
| OpenAI | OAI-SearchBot, ChatGPT-User | GPTBot | ChatGPT-User no longer respects robots.txt |
| Anthropic | Claude-SearchBot, Claude-User | ClaudeBot | Respects robots.txt + Crawl-delay |
| Perplexity | PerplexityBot, Perplexity-User | — | Perplexity-User ignores robots.txt |
| Googlebot | Google-Extended | Only platform that renders JavaScript | |
| Microsoft | Bingbot | — | Powers ChatGPT citations (87% match) |
| Apple | Applebot-Extended | Applebot | Apple Intelligence |
| Meta | — | Meta-ExternalAgent | Training only |
| Amazon | — | Amazonbot | Training / Alexa |
The strategy: Allow search/retrieval bots so your content appears in AI answers. Block training bots so it doesn’t end up in training datasets. You get the citation benefits without contributing to model training.
Content Optimization Tips
Jekyll-AEO makes your content technically accessible to AI. To maximize citation frequency, the research says:
What the data shows:
| Signal | Impact |
|---|---|
| Pages with 19+ statistics | 5.4 avg citations vs. 2.8 without |
| Pages with expert quotes | 4.1 avg citations vs. 2.4 without |
| FAQPage schema markup | 3.2x higher chance of appearing in AI Overviews |
| Schema.org structured data | 2.5x higher citation chance overall |
| Brand presence on 4+ channels | Significantly more AI citations |
| Brand authority | 0.334 correlation — the single strongest predictor |
Actionable principles:
- Answer-first structure — Lead every section with a direct, concise answer. AI systems extract the first 50-150 words of a topically relevant section most frequently.
- Fact density — Include concrete statistics, data points, and cited sources. This is the highest-impact content optimization.
- Clear hierarchy — Use H2/H3 headings as semantic boundaries. Keep sections at 120-180 words — the optimal chunk size for RAG retrieval.
- Tables and lists — AI systems excel at extracting structured tabular data and numbered steps.
- Content freshness — Visibility drops 2-3 days after publication without updates. A 30-90 day refresh cadence is recommended. Jekyll-AEO adds last-modified dates automatically.
- Neutral tone — Encyclopedic, authoritative prose is 30% more likely to appear in AI answers than opinion-heavy content.
The foundational academic paper “GEO: Generative Engine Optimization” (Princeton, Georgia Tech, Allen AI, IIT Delhi — ACM SIGKDD 2024) tested 9 optimization methods and found citing sources, adding quotations, and adding statistics each improve visibility 30-40%. Keyword stuffing is explicitly ineffective for GEO.
How It Works
Jekyll-AEO hooks into Jekyll’s standard build lifecycle. Add the gem, run jekyll build, and all outputs are generated alongside your existing site — no separate build step, no external services, no post-processing scripts.
Every feature shares the same skip logic: redirect pages, non-HTML outputs, and excluded paths are handled consistently. Per-page outputs (markdown copies, link tags) are generated during the build. Site-wide outputs (llms.txt, robots.txt, domain profile) are generated in a single pass after all pages are written.
For a technical deep dive into the build pipeline, see the README.
Proven Results
Companies investing in AEO and GEO are seeing measurable returns:
| Company | Result | Method |
|---|---|---|
| LS Building Products | 540% boost in AI Overview mentions | Content restructuring + schema |
| Go Fish Digital | AI traffic converts at 25x traditional search | GEO optimization |
| Ramp | Citation share 8.1% to 12.2% in one month | Monitoring + optimization |
| Runpod | 4x customer growth in 90 days | Content architecture redesign |
| NerdWallet | 35% revenue growth despite 20% traffic decrease | AI-first content strategy |
B2B SaaS companies report 6-27x higher conversion rates from AI-referred traffic vs. traditional search. The traffic numbers are smaller, but visitors who arrive through AI recommendations arrive with higher intent, better context, and more confidence — because an AI system already evaluated the options and recommended you specifically.
Get Started
Quick Start
Three steps, under five minutes:
1. Install
# Gemfile
gem "jekyll-aeo"
bundle install
2. Build
bundle exec jekyll build
Check your output directory. You’ll find .md companions for every page, plus llms.txt and llms-full.txt at the root.
3. Validate
bundle exec jekyll aeo:validate
That’s it. Your site is now AI-readable. Enable advanced features like robots.txt, domain profiles, and URL maps through _config.yml when you’re ready — see the README for all configuration options.
Designed to Coexist
Jekyll-AEO works alongside the plugins you already use. It complements jekyll-seo-tag (different layer — AI outputs vs. HTML meta tags), cooperates with jekyll-sitemap (priority ordering prevents robots.txt conflicts), and respects jekyll-redirect-from (redirect pages are automatically skipped).
Nothing to configure. Install both and they cooperate automatically.
Ruby >= 3.0. Jekyll >= 4.0. MIT licensed. Built and maintained by ZAAI.
Highlights
- Generates a clean .md copy of every page — 20-30% fewer tokens than HTML
- Produces llms.txt and llms-full.txt following the llmstxt.org spec — 90%+ token reduction
- Injects tags for AI crawler discovery
- Outputs schema.org JSON-LD: FAQPage, HowTo, BreadcrumbList, Organization, Speakable, Article
- Generates a robots.txt that allows search bots and blocks training bots
- Publishes /.well-known/domain-profile.json for authoritative site identity
- Ships a CLI validator: bundle exec jekyll aeo:validate
- Zero config to start. Works with jekyll-seo-tag, jekyll-sitemap, jekyll-feed, and jekyll-redirect-from
Tech Stack
- Ruby
- Jekyll
- Liquid
- Schema.org JSON-LD