writing / field notes

Real problems I hit in production, and how I fixed them

I write about the stuff that actually breaks at 2 AM: Kubernetes pods stuck in CrashLoopBackOff, Terraform state files that drifted into chaos, CI pipelines that pass locally and die in staging. If you run infrastructure for a living, you will recognize the problems.

You will also find posts about building AI products -- not the hype-cycle kind, but the practical side. Shipping features that use LLMs without burning through your API budget, choosing the right model for the job, and keeping latency under control when your users actually need real-time responses.

Every article comes from something I actually shipped or debugged. No theoretical hot takes, no ragebait. Just the problem, the context, and the fix. If you want tools to go with the reading, check out the free developer tools for AI workflows.

topics

What I write about

Claude Code & LLM tooling -- building developer tools on top of large language models, prompt engineering that works in production, and keeping AI costs under control.

DevOps & cloud infrastructure -- Kubernetes, Terraform, CI/CD pipelines, and the operational side of keeping services running at scale.

Developer productivity -- terminal workflows, editor setups, automation scripts, and small process changes that compound over time.

AI workflows -- practical patterns for integrating AI into real products, from retrieval-augmented generation to agent orchestration.

all posts

2026-02-22 · 6 min read

Azure Costs Are Spiking and Nobody Knows Why. This Skill Finds the Waste.

Microsoft ships an official Azure cost optimization skill for GitHub Copilot which is usable for any agent. Here is how it actually works in production, what it misses, and how to do the same thing without Copilot.

AzureCost OptimizationGitHub Copilotazure-copilot-skills

2026-02-17 · 5 min read

Claude Code Was Hallucinating. The Fix Was a Progress Bar.

When Claude Code's context window fills up, output quality tanks: hallucinated imports, phantom files, broken logic. I fixed it by adding a statusline that shows context usage in real-time, and built a tool so you can do the same.

Claude CodeDeveloper ToolsBashProductivity

2025-06-15 · 3 min read

My Kubernetes Cluster Looked Healthy. Production Wasn't.

Green dashboards, running pods, low CPU, and yet production felt broken. A story about why Kubernetes metrics can lie about user experience, and what to watch instead.

KubernetesObservabilityMonitoringProductionDevOps

2025-05-20 · 4 min read

Automating Kubernetes Deployments with ArgoCD Image Updater on GKE

How I implemented ArgoCD Image Updater to automate container image deployments on Google Kubernetes Engine, with Helm charts, Pub/Sub-based autoscaling, External Secrets, and Slack notifications.

GitOpsArgoCDKubernetesGKEHelmCI/CD

2025-04-10 · 4 min read

Multi-Environment Infrastructure on GCP with Terraform and Terragrunt

Building a scalable IaC pipeline for dev, staging, and production on Google Cloud Platform using Terraform modules, Terragrunt environments, and automated CI/CD with GitHub Actions.

TerraformTerragruntGCPInfrastructure as CodeGKE