← All Writing
April 4, 20267 min read

How I Built an AI Content Pipeline for Every Writing Post

Using Claude Haiku for topic tags, Supabase vector embeddings for related posts, and build-time scripts to turn 18 blog posts into a connected content product

YieldAI-generated tags on all posts, 2 related post suggestions per page, verified reading times, dynamic tag filtering
Difficulty2 / 5 — three scripts, two components, one CSS block
Total Cook Time~1 hour in one session

Ingredients

What I Was Trying to Solve

After 18 posts, the Writing section was a flat list. Every post had a title, a subtitle, and a date. Five manual tags — Website, Features, Automation, Headless Linux, Games — were the only way to filter. No related posts. No way to discover connections between builds. If you finished reading about the market briefing bot, there was nothing pointing you toward the Garmin automation that uses the same architecture.

I wanted three things: richer topic tags generated by AI from a proper taxonomy, related post suggestions based on actual content similarity, and verified reading times. All static. All generated at build time. All following the same pattern I’d already established with TL;DR by Goose.

The Foundation That Already Existed

This build didn’t start from zero. Two earlier projects laid the groundwork:

The architecture decision was already made. I just needed to follow it.

The Build

Afternoon, April 4 — ~1 hour

Part 1: AI Tag Generation

I built scripts/generate-tags.ts following the exact same pattern as the TL;DR script. It reads each post’s JSX file, strips the markup to extract prose, and sends the content to Claude Haiku with a fixed taxonomy of 11 tags:

AI Tools, Backend, Frontend, Automation, Product Thinking, Game Dev, Data, DevOps, Security, API Design, Linux

Haiku picks 3–5 per post. The prompt is strict: return only a JSON array of strings from the approved list. No explanations, no creativity. The model is a classifier here, not a writer.

Terminal
user@MacBook-Air joseandgoose-site-main % npx tsx scripts/generate-tags.ts --all
Generating tags for 18 posts...

✓ how-i-upgraded-search-to-vectors: [Backend, AI Tools, Data]
✓ how-i-built-api-server: [Backend, API Design, Linux, Security, DevOps]
✓ how-i-built-cron-ops: [Linux, Automation, DevOps]
✓ how-i-built-numerator: [AI Tools, Frontend, Game Dev]
... 14 more posts

Done.

18 posts tagged in under a minute. Results saved to app/lib/tags.json.

The script also counts words in each post and computes a reading time at 230 words per minute. The original reading times were manually estimated — some were close, some were off by a few minutes. The automated counts replaced all of them.

🔧 Developer section: tag generation

Part 2: Related Posts via Vector Similarity

The site already had semantic embeddings for every post stored in Supabase, generated by the generate-embeddings.ts script from the vector search build. Those embeddings use the MiniLM-L6-v2 model — each post is represented as a 384-dimensional vector based on its title, description, and TL;DR summary.

I built scripts/generate-related.ts to query all post embeddings from Supabase, compute cosine similarity between every pair, and pick the top 2 most similar posts for each. The results are written to app/lib/related.json — same static JSON pattern as everything else.

The similarity scores make intuitive sense. The cron ops post is most related to the server alerts post (0.600). The market daily briefing maps to the Garmin recaps (0.532) — both are automated email pipelines on the Alienware. The Gemini Grades post maps to the original site build post (0.675) — they’re literally about the same project.

🔧 Developer section: related posts

Part 3: The UI Components

Two new components, both following the TLDRBadge pattern: client components that read from static JSON, no API calls, no loading states.

PostTags renders tag pills below the post meta (date and read time). Each pill is a small rounded badge in the site’s forest-green-on-pale-green color scheme. The tags come directly from the post metadata in posts.ts.

RelatedPosts appears at the bottom of each post, above the back navigation. Two cards with the related post’s title and reading time, styled as bordered links that highlight on hover. It reads from related.json and cross-references posts.ts for the reading time.

The Writing index page also got an upgrade: the tag filter pills are now derived dynamically from the posts array instead of being hardcoded. Adding a new tag to any post automatically adds it to the filter bar — no manual list to maintain.

🔧 Developer section: components

The Full Pipeline

Here’s what the content pipeline looks like now, from writing a post to deploying it:

  1. Write the post as a TSX file in app/writing/[slug]/page.tsx
  2. Add metadata to app/lib/posts.ts
  3. Run npx tsx scripts/generate-tldr.ts [slug] — AI summary
  4. Run npx tsx scripts/generate-tags.ts [slug] — AI topic tags + reading time
  5. Run npx tsx scripts/generate-embeddings.ts — vector embeddings
  6. Run npx tsx scripts/generate-related.ts — related post suggestions
  7. Deploy with vercel --prod

Steps 3–6 are all build-time, all idempotent, all writing to static JSON or Supabase. The live site never touches an API at read time. Every piece of AI-generated content is baked into the build.

What This Unlocks

The Writing section went from a flat blog to a connected content product. Readers can filter by 11 topic tags, see related posts at the bottom of every article, and get accurate reading times. Every post is now enriched with AI-generated metadata that would have taken hours to create manually.

More importantly: all of this data — the TL;DR summaries, the topic tags, the vector embeddings, the related post graph — feeds directly into Ask Goose, the conversational AI assistant coming to the site. When Ask Goose answers a question about what I’ve built, it won’t be searching raw text. It’ll be retrieving semantically similar content from a curated, tagged, summarized knowledge base.

The content pipeline isn’t just a feature. It’s the retrieval layer for everything that comes next.

← Back to all writing