How I Built an AI Content Pipeline

Using Claude Haiku for topic tags, Supabase vector embeddings for related posts, and build-time scripts to turn 18 blog posts into a connected content product

YieldAI-generated tags on all posts, 2 related post suggestions per page, verified reading times, dynamic tag filtering

Difficulty2 / 5 — three scripts, two components, one CSS block

Total Cook Time~1 hour in one session

Ingredients

Claude Code — terminal-based AI for building scripts and components ($200/yr)
Claude API — Haiku model — for generating topic tags per post (own API key, pay-per-use)
Supabase — pgvector embeddings for semantic similarity (free tier)
Next.js — the framework running the site (free)
Vercel — hosting and deployment (free)

What I Was Trying to Solve

After 18 posts, the Writing section was a flat list. Every post had a title, a subtitle, and a date. Five manual tags — Website, Features, Automation, Headless Linux, Games — were the only way to filter. No related posts. No way to discover connections between builds. If you finished reading about the market briefing bot, there was nothing pointing you toward the Garmin automation that uses the same architecture.

I wanted three things: richer topic tags generated by AI from a proper taxonomy, related post suggestions based on actual content similarity, and verified reading times. All static. All generated at build time. All following the same pattern I’d already established with TL;DR by Goose.

The Foundation That Already Existed

This build didn’t start from zero. Two earlier projects laid the groundwork:

TL;DR by Goose (March 5) — established the pattern: a TypeScript script reads post content, calls Claude Haiku, writes results to a static JSON file, and a React component reads from that JSON at render time. Zero runtime API calls.
Vector Embeddings (April 4) — every post already had a semantic embedding stored in Supabase via generate-embeddings.ts. These embeddings power the site’s vector search. I just needed to query them differently.

The architecture decision was already made. I just needed to follow it.

The Build

Afternoon, April 4 — ~1 hour

Part 1: AI Tag Generation

I built scripts/generate-tags.ts following the exact same pattern as the TL;DR script. It reads each post’s JSX file, strips the markup to extract prose, and sends the content to Claude Haiku with a fixed taxonomy of 11 tags:

AI Tools, Backend, Frontend, Automation, Product Thinking, Game Dev, Data, DevOps, Security, API Design, Linux

Haiku picks 3–5 per post. The prompt is strict: return only a JSON array of strings from the approved list. No explanations, no creativity. The model is a classifier here, not a writer.

Terminal

user@MacBook-Air joseandgoose-site-main % npx tsx scripts/generate-tags.ts --all

Generating tags for 18 posts...

&check; how-i-upgraded-search-to-vectors: [Backend, AI Tools, Data]

&check; how-i-built-api-server: [Backend, API Design, Linux, Security, DevOps]

&check; how-i-built-cron-ops: [Linux, Automation, DevOps]

&check; how-i-built-numerator: [AI Tools, Frontend, Game Dev]

... 14 more posts

Done.

18 posts tagged in under a minute. Results saved to app/lib/tags.json.

The script also counts words in each post and computes a reading time at 230 words per minute. The original reading times were manually estimated — some were close, some were off by a few minutes. The automated counts replaced all of them.

🔧 Developer section: tag generation

Script follows the generate-tldr.ts pattern: manual .env.local parsing, JSX stripping via regex, Claude Haiku API call
First run had 6 failures — Haiku wrapped JSON output in markdown code fences (```json ... ```). Fixed by stripping fences before JSON.parse
Tags are validated against the taxonomy after parsing — any hallucinated tags are filtered out
Output is app/lib/tags.json keyed by slug, containing tags array, word count, and computed reading time
Reading time uses the full stripped text (not truncated), while the Haiku prompt gets the first 4,000 characters

Part 2: Related Posts via Vector Similarity

The site already had semantic embeddings for every post stored in Supabase, generated by the generate-embeddings.ts script from the vector search build. Those embeddings use the MiniLM-L6-v2 model — each post is represented as a 384-dimensional vector based on its title, description, and TL;DR summary.

I built scripts/generate-related.ts to query all post embeddings from Supabase, compute cosine similarity between every pair, and pick the top 2 most similar posts for each. The results are written to app/lib/related.json — same static JSON pattern as everything else.

The similarity scores make intuitive sense. The cron ops post is most related to the server alerts post (0.600). The market daily briefing maps to the Garmin recaps (0.532) — both are automated email pipelines on the Alienware. The Gemini Grades post maps to the original site build post (0.675) — they’re literally about the same project.

🔧 Developer section: related posts

Embeddings are stored as JSON strings in Supabase — script parses them to number[] arrays before computing similarity
Cosine similarity is computed in pure TypeScript (dot product / product of magnitudes) — no external math library
Output is app/lib/related.json keyed by slug, each value is an array of 2 objects with slug and title
The script filters to content_type = 'post' only — static pages and features are excluded from related suggestions

Part 3: The UI Components

Two new components, both following the TLDRBadge pattern: client components that read from static JSON, no API calls, no loading states.

PostTags renders tag pills below the post meta (date and read time). Each pill is a small rounded badge in the site’s forest-green-on-pale-green color scheme. The tags come directly from the post metadata in posts.ts.

RelatedPosts appears at the bottom of each post, above the back navigation. Two cards with the related post’s title and reading time, styled as bordered links that highlight on hover. It reads from related.json and cross-references posts.ts for the reading time.

The Writing index page also got an upgrade: the tag filter pills are now derived dynamically from the posts array instead of being hardcoded. Adding a new tag to any post automatically adds it to the filter bar — no manual list to maintain.

🔧 Developer section: components

Both components are "use client" — PostTags because it imports from the posts module, RelatedPosts because it reads static JSON
WritingFilter.tsx derives tags with Array.from(new Set(posts.flatMap(p => p.tags))).sort() — deduped and alphabetized
All 18 post pages were updated programmatically via a Node script that added imports, inserted <PostTags> after .post-meta, and <RelatedPosts> before .post-back--bottom
CSS uses existing design tokens: --forest-pale backgrounds, --forest text for tags, --rule borders for related post cards

The Full Pipeline

Here’s what the content pipeline looks like now, from writing a post to deploying it:

Write the post as a TSX file in app/writing/[slug]/page.tsx
Add metadata to app/lib/posts.ts
Run npx tsx scripts/generate-tldr.ts [slug] — AI summary
Run npx tsx scripts/generate-tags.ts [slug] — AI topic tags + reading time
Run npx tsx scripts/generate-embeddings.ts — vector embeddings
Run npx tsx scripts/generate-related.ts — related post suggestions
Deploy with vercel --prod

Steps 3–6 are all build-time, all idempotent, all writing to static JSON or Supabase. The live site never touches an API at read time. Every piece of AI-generated content is baked into the build.

What This Unlocks

The Writing section went from a flat blog to a connected content product. Readers can filter by 11 topic tags, see related posts at the bottom of every article, and get accurate reading times. Every post is now enriched with AI-generated metadata that would have taken hours to create manually.

More importantly: all of this data — the TL;DR summaries, the topic tags, the vector embeddings, the related post graph — feeds directly into Ask Goose, the conversational AI assistant coming to the site. When Ask Goose answers a question about what I’ve built, it won’t be searching raw text. It’ll be retrieving semantically similar content from a curated, tagged, summarized knowledge base.

The content pipeline isn’t just a feature. It’s the retrieval layer for everything that comes next.

How I Built an AI Content Pipeline for Every Writing Post