Optimize for LLM citation

ChatGPT, Claude, Perplexity, Bing Chat, and Google’s AI Mode all browse the open web and cite a small set of pages per answer. Getting cited there — Generative Engine Optimization, or GEO — is its own distribution channel, and the patterns that win there are not the same as traditional SEO. This page covers what Essel already does on the writing side and what to add on your own site so the blogs you publish from our consumer API are as cite-able as possible.

What Essel already does for you

Every released blog runs through a geo rule category enforced by the draft and revision agents. You don’t need to brief any of this — it is automatic on every article ≥ 800 words:

  • Named source. At least one specific study, report, framework, standards body, or quoted expert by its actual name. Generic “studies show” or “experts agree” phrasing is stripped by the revision agent.
  • Attributed statistic. At least one specific number with a real, named source and year, formatted either inline (72% (Source, 2024)) or as a GFM footnote with a URL.
  • Recency anchor. An explicit “as of <Month Year>” or dated lead in the intro. LLM browsing agents de-prioritize content that reads as undated.
  • Extractable answers. Each H2 opens with a self-contained sentence that works as a standalone quote — the structural pattern LLMs lift most reliably.
  • Key takeaways block. Articles ≥ 1,200 words include a 3–5 bullet “Key takeaways” block right after the intro, where each bullet is a complete declarative sentence that can be cited in isolation.
  • No fabricated sources. Inventing the number, the source, or the date is a critical failure that blocks release.

The rest of this guide is the render-and-distribution side — the patterns that only matter once the markdown is published to a real URL. Those live on your domain, so they’re yours to add.

What to add on your site

1. JSON-LD structured data

LLM browsing agents read schema.org markup as a high-confidence signal of what a page is and what it claims. The Essel consumer API already returns everything you need to emit Article markup — add it to the <head> of every blog page.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Title from blog.title",
  "description": "Excerpt from blog.excerpt",
  "datePublished": "2026-05-28T09:00:00Z",
  "dateModified": "2026-05-28T09:00:00Z",
  "author": { "@type": "Person", "name": "blog.author.name" },
  "publisher": {
    "@type": "Organization",
    "name": "Your company",
    "logo": { "@type": "ImageObject", "url": "https://yoursite.com/logo.png" }
  },
  "image": "blog.coverImage.url",
  "mainEntityOfPage": "https://yoursite.com/blog/blog.slug"
}
</script>

When the article shape warrants it, layer additional schemas on the same page — they compound rather than conflict:

Article shapeAdd this schemaWhy
tutorialHowTo with each H2 step as HowToStepBing Chat lifts HowTo steps verbatim.
faq_driven or any article with a FAQ sectionFAQPage with each Q/A pairPerplexity reads FAQPage for quick-answer extraction.
explainer with a defined term in the introDefinedTerm on the first paragraphLLM definition-lookups frequently cite DefinedTerm pages.
Anything cited by voice assistantsspeakable selector pointing at the Key takeaways blockMarks the highest-confidence answer span.

Use the Essel blog fields directly — the agents structure the markdown so the H2/H3 hierarchy maps cleanly to HowToStep or Question blocks without rewriting.

2. Recency markup the parsers can read

A literal “as of 2026” in the prose helps — but machines also want machine-readable timestamps. Two cheap wins:

<!-- Published date in the article header -->
<time datetime="2026-05-28T09:00:00Z">May 28, 2026</time>

<!-- Open Graph for crawlers that don't parse schema -->
<meta property="article:published_time" content="2026-05-28T09:00:00Z" />
<meta property="article:modified_time" content="2026-05-28T09:00:00Z" />

If you re-publish a Essel blog with edits, update dateModified in the JSON-LD and article:modified_time. LLM crawlers boost recently-modified pages.

3. Raw markdown access

LLM crawlers prefer clean markdown over rendered HTML — fewer tokens, no parsing ambiguity. Exposing a raw-markdown variant of every blog is one of the highest-leverage GEO moves you can make and takes a single route:

// Example: Next.js App Router
// app/blog/[slug].md/route.ts
export async function GET(_req: Request, { params }: { params: { slug: string } }) {
  const res = await fetch(
    `https://api.contentpilot.uixlabs.co/api/consumer/blogs?slug=${params.slug}`,
    { headers: { "x-api-key": process.env.CONTENT_PILOT_API_KEY! } },
  );
  const { data } = await res.json();
  return new Response(data.body.markdown, {
    headers: { "Content-Type": "text/markdown; charset=utf-8" },
  });
}

Then link to the markdown variant from the HTML page via a <link rel="alternate">:

<link
  rel="alternate"
  type="text/markdown"
  href="https://yoursite.com/blog/your-slug.md"
/>

Crawlers that respect alternates (ChatGPT, Perplexity) will fetch the markdown form when it’s cheaper than the HTML.

4. llms.txt

llms.txt is an emerging convention — an LLMs.txt file at the root of your domain that lists what you want LLMs to read, in priority order. It’s not a standard yet, but it’s already respected by several indexing pipelines and costs nothing to add.

Place at https://yoursite.com/llms.txt:

# Your Company

> One-sentence description of what you do.

## Blog

- [Title of post one](https://yoursite.com/blog/slug-one.md): one-line summary
- [Title of post two](https://yoursite.com/blog/slug-two.md): one-line summary

## Docs

- [Getting started](https://yoursite.com/docs.md): one-line summary

Two patterns to follow:

  • Link to the markdown variant of each blog (from §3), not the HTML. The whole point of llms.txt is to give LLMs the clean form.
  • Order entries by what you most want surfaced, not chronologically. Crawlers truncate. Pin your highest-value evergreen content at the top.

A larger llms-full.txt can include the full text of every blog inline — Anthropic’s documentation publishes one, and it works well for domains under ~500 pages.

5. robots.txt for LLM crawlers

If your robots.txt uses an explicit allowlist (rather than the default-allow User-Agent: *), name the LLM bots directly so they don’t get caught by a generic catch-all deny. The relevant agents as of 2026:

# Explicit allows for major LLM crawlers
User-Agent: GPTBot
Allow: /blog/
Allow: /docs/

User-Agent: ClaudeBot
Allow: /blog/
Allow: /docs/

User-Agent: PerplexityBot
Allow: /blog/
Allow: /docs/

User-Agent: OAI-SearchBot
Allow: /blog/
Allow: /docs/

User-Agent: Google-Extended
Allow: /blog/
Allow: /docs/

# Always include your sitemap
Sitemap: https://yoursite.com/sitemap.xml

GPTBot powers ChatGPT browsing and the OpenAI training corpus. OAI-SearchBot is the dedicated search-index agent and is treated separately. Google-Extended controls inclusion in Gemini and Google’s AI Mode without affecting your regular Google rank.

If your stance is “yes to citation, no to training,” only the OpenAI side currently distinguishes the two via separate user agents — the others bundle both intents under one token.

Quick checklist

When you publish a Essel blog to your domain, the GEO surface area you control is:

  1. JSON-LD Article schema in <head>, plus HowTo / FAQPage / DefinedTerm where the shape warrants.
  2. <time datetime> and article:published_time / article:modified_time meta tags.
  3. A raw-markdown route (/blog/[slug].md) plus a <link rel="alternate"> on the HTML page.
  4. llms.txt at the domain root pointing at the markdown variants.
  5. Explicit GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Google-Extended allow rules in robots.txt.

The writing side — citations, statistics, recency, extractable answers, Key takeaways — is already enforced by the draft and revision agents. Items 1–5 above are the rest of the loop.

Further reading

  • Princeton GEO paper — “GEO: Generative Engine Optimization” (Aggarwal et al., 2024). The original study identifying citations, statistics, and quotes as the highest-leverage tactics.
  • Schema.org — the canonical reference for Article, HowTo, FAQPage, DefinedTerm, and Speakable markup.
  • llms.txt specificationllmstxt.org.
  • OpenAI’s bot documentationplatform.openai.com/docs/bots for the up-to-date list of GPTBot / OAI-SearchBot user agents.
  • Anthropic’s crawler documentationsupport.anthropic.com for the current ClaudeBot and claude-user documentation.