LLMOps Notes

LLMOps NotesPractical notes on LLMOps, AI agents, MCP, Langfuse, Cloudflare, and related experiments.https://llm-lab.dev/When You Build a Minimal API Loop, You Stop Designing Prompts and Start Designing Stop Conditionshttps://llm-lab.dev/en/posts/llm-loop-engineering-minimal-api/https://llm-lab.dev/en/posts/llm-loop-engineering-minimal-api/I implemented a minimal generate-evaluate-feedback-regenerate loop in a verification script. This post organizes the stop conditions and evaluation units that actually matter when stabilizing AI output.Sun, 28 Jun 2026 00:00:00 GMTWhen Using OpenAI-Compatible APIs in Flue, Check the Model Specifier Firsthttps://llm-lab.dev/en/posts/flue-openai-compatible-provider-note/https://llm-lab.dev/en/posts/flue-openai-compatible-provider-note/A note on getting stuck with 'Unknown model specifier' in Flue 1.0 Beta by mixing up the actual model ID and the provider-id/model-id format.Sat, 27 Jun 2026 00:00:00 GMTBefore Growing Hermes Agent: Creating Synthetic Support Triage Scenarioshttps://llm-lab.dev/en/posts/hermes-agent-002-support-triage-scenarios/https://llm-lab.dev/en/posts/hermes-agent-002-support-triage-scenarios/As a preparatory step before delegating support triage to Hermes Agent, I built three evaluation scenarios using synthetic data—fixing decision criteria and safety constraints in advance without relying on real customer data.Thu, 25 Jun 2026 00:00:00 GMTObserving the Sakana Fugu API with Langfuse: Understanding Hidden Costs in Multi-Agent Systemshttps://llm-lab.dev/en/posts/sakana-fugu-langfuse-experiment/https://llm-lab.dev/en/posts/sakana-fugu-langfuse-experiment/A hands-on report instrumenting Sakana Fugu's OpenAI-compatible API with Langfuse, measuring how latency, token consumption, and TTFT change across Level 1–3 tasks.Wed, 24 Jun 2026 00:00:00 GMTEve's TUI vs HTTP Event Streams: A Side-by-Side Look at Tool Callinghttps://llm-lab.dev/en/posts/vercel-eve-http-stream-events/https://llm-lab.dev/en/posts/vercel-eve-http-stream-events/A hands-on log comparing how the same weather tool call looks in Vercel's Eve agent framework when observed from the TUI versus the HTTP API, separating the developer-friendly display from the integration-friendly event stream.Tue, 23 Jun 2026 00:00:00 GMTObserving the Black Box of a Multi-Agent API with Sakana Fugu and Langfusehttps://llm-lab.dev/en/posts/sakana-fugu-langfuse-observability-plan/https://llm-lab.dev/en/posts/sakana-fugu-langfuse-observability-plan/I subscribed to Sakana Fugu to understand its nature as an OpenAI-compatible API and to plan how to observe its black-box cooperative reasoning from the outside.Tue, 23 Jun 2026 00:00:00 GMTStreaming Flue Observe Events to Langfuse: Monitoring an Issue Triage Agenthttps://llm-lab.dev/en/posts/flue-langfuse-observability-issue-triage/https://llm-lab.dev/en/posts/flue-langfuse-observability-issue-triage/An experiment log where I redact Flue 1.0 Beta observe events before sending them to Langfuse, tracking the issue triage workflow's runId, model, and results.Mon, 22 Jun 2026 00:00:00 GMTObserving Eve TUI Execution and Tool Calls with Langfusehttps://llm-lab.dev/en/posts/vercel-eve-langfuse-observability/https://llm-lab.dev/en/posts/vercel-eve-langfuse-observability/A hands-on comparison of two ways to send Eve tool-calling executions to Langfuse as trace/span/generation data.Sun, 21 Jun 2026 00:00:00 GMTBuilding a Tool-Calling Agent with Vercel's Eve and Running It from the TUIhttps://llm-lab.dev/en/posts/vercel-eve-deep-dive/https://llm-lab.dev/en/posts/vercel-eve-deep-dive/A follow-up to my first look: adding tools and evals to Eve, configuring models via the Vercel AI Gateway, invoking tools from the TUI, and exploring the info and eval commands.Sat, 20 Jun 2026 00:00:00 GMTMoving Issue Triage into CI: Running a Flue Workflow from GitHub Actionshttps://llm-lab.dev/en/posts/flue-github-actions-issue-triage-workflow/https://llm-lab.dev/en/posts/flue-github-actions-issue-triage-workflow/A verification log of dry-running a GitHub Issue triage workflow built with Flue 1.0 Beta from GitHub Actions' issues.opened, instead of a persistent webhook server.Sat, 20 Jun 2026 00:00:00 GMTBuilding a GitHub Issue Triage Agent with Flue 1.0 Betahttps://llm-lab.dev/en/posts/flue-1-0-beta-issue-triage-agent/https://llm-lab.dev/en/posts/flue-1-0-beta-issue-triage-agent/An experimental log of building a triage agent with Flue 1.0 Beta's Agent, Skill, and Workflow features that returns structured severity, reproducibility, and label suggestions for GitHub issues.Fri, 19 Jun 2026 00:00:00 GMTA Quick Look at Eve, Vercel's New Agent Frameworkhttps://llm-lab.dev/en/posts/vercel-eve-first-look/https://llm-lab.dev/en/posts/vercel-eve-first-look/A quick validation log of running Vercel's open-source agent framework Eve locally through init, dev startup, and the first session.Fri, 19 Jun 2026 00:00:00 GMTHow to Use GLM-5.2 on Cloudflare Workers AI: Model ID, Pricing, and TypeScript Setuphttps://llm-lab.dev/en/posts/cloudflare-worker-ai-glm-5-2/https://llm-lab.dev/en/posts/cloudflare-worker-ai-glm-5-2/A technical note covering the model ID, pricing, context length, Wrangler configuration, and TypeScript implementation for calling GLM-5.2 on Cloudflare Workers AI.Thu, 18 Jun 2026 00:00:00 GMTBefore Using Flue: Figuring Out What This Framework Actually Ishttps://llm-lab.dev/en/posts/flue-framework-overview/https://llm-lab.dev/en/posts/flue-framework-overview/A rough summary of how Flue thinks about harnesses, agents, workflows, skills, tools, sandboxes, and persistence — before actually running anything.Thu, 18 Jun 2026 00:00:00 GMTWhat Is Flue 1.0 Beta? New Features and Quickstart Caveats from Local Testinghttps://llm-lab.dev/en/posts/flue-1-0-beta-local-check/https://llm-lab.dev/en/posts/flue-1-0-beta-local-check/Hands-on notes on running the Astro team's agent framework Flue 1.0 Beta locally, covering init, build, and run behavior plus the rough edges I hit.Thu, 18 Jun 2026 00:00:00 GMTPre-Implementation Notes on Vercel's Agent Framework, evehttps://llm-lab.dev/en/posts/eve-vercel-agent-framework-survey/https://llm-lab.dev/en/posts/eve-vercel-agent-framework-survey/A summary of my pre-implementation research into Vercel's eve agent framework: directory-based design, the difference between tools and skills, sandboxing, durable execution, and more.Thu, 18 Jun 2026 00:00:00 GMTMoving Cloudflare Pages Deploys to npm Scriptshttps://llm-lab.dev/en/posts/cloudflare-pages-deploy-script-note/https://llm-lab.dev/en/posts/cloudflare-pages-deploy-script-note/A short operations note on adding deploy scripts to package.json so I don't have to remember wrangler commands every time I redeploy to Cloudflare Pages.Wed, 17 Jun 2026 00:00:00 GMTTraining Hermes Agent as a Business Decision Partnerhttps://llm-lab.dev/en/posts/hermes-agent-001-support-triage-agent-start/https://llm-lab.dev/en/posts/hermes-agent-001-support-triage-agent-start/An experiment in growing Hermes Agent into a business-ready support agent for decision-making.Sun, 14 Jun 2026 00:00:00 GMTNatural Language Data Analysis with ClickHouse and Claude MCP — Is the Era of Writing SQL Coming to an End?https://llm-lab.dev/en/posts/clickhouse-001-claude-mcp/https://llm-lab.dev/en/posts/clickhouse-001-claude-mcp/Building an environment where Claude Desktop can operate ClickHouse through natural language using the ClickHouse MCP.Fri, 12 Jun 2026 00:00:00 GMTWhy Cloudflare KV and a Small Admin Panel Fit a Parking Lot Sitehttps://llm-lab.dev/en/posts/local-site-admin-kv-note/https://llm-lab.dev/en/posts/local-site-admin-kv-note/A short design note on handling frequently changing public values—like parking availability and phone numbers—using Cloudflare KV and a simple admin panel instead of environment variables.Thu, 11 Jun 2026 00:00:00 GMTBuilding a Langfuse Morning Briefing for Slackhttps://llm-lab.dev/en/posts/langfuse-morning-briefing-trial/https://llm-lab.dev/en/posts/langfuse-morning-briefing-trial/A practical experiment that aggregates Langfuse traces in a Cloudflare Worker and sends only token, cost, and latency anomalies to Slack.Mon, 08 Jun 2026 00:00:00 GMTI Want to Build a Langfuse Morning Briefing That Pushes Only Anomalies to Slackhttps://llm-lab.dev/en/posts/langfuse-morning-briefing/https://llm-lab.dev/en/posts/langfuse-morning-briefing/Manually checking traces every day is unsustainable. A personal GenAIOps plan to aggregate Langfuse failures, token spikes, and low-score outputs into a single Slack morning briefing.Sat, 06 Jun 2026 00:00:00 GMTWhy I Stopped Polishing Prompts and Started Using Feedback Loopshttps://llm-lab.dev/en/posts/llm-loop-engineering-first-step/https://llm-lab.dev/en/posts/llm-loop-engineering-first-step/I explain why output quality stays unstable even with careful prompt design, and how I switched to a generate-evaluate-feedback-regenerate loop. Includes the smallest manual steps to start today.Wed, 03 Jun 2026 00:00:00 GMTHow Much of Your LLM Traffic Can Cloudflare AI Gateway Actually Log?https://llm-lab.dev/en/posts/cloudflare-ai-gateway-llm-request-response-logging/https://llm-lab.dev/en/posts/cloudflare-ai-gateway-llm-request-response-logging/Using Cloudflare AI Gateway as an OpenAI-compatible endpoint, I walk through its logging, payload controls, metadata, cost estimation, and OTel integration to see how far it can serve as an entry point for LLM observability.Sat, 16 May 2026 00:00:00 GMTWhat to Know First When Introducing Cloudflare AI Gatewayhttps://llm-lab.dev/en/posts/cloudflare-ai-gateway-introduction/https://llm-lab.dev/en/posts/cloudflare-ai-gateway-introduction/The minimum steps to start using Cloudflare AI Gateway from Workers AI, and what to understand about its role, settings, and checkpoints before moving to the OpenAI-compatible API.Fri, 15 May 2026 00:00:00 GMTWhat Engineers Should Design After AI Makes Coding Fasterhttps://llm-lab.dev/en/posts/aidd-after-code-generation-design/https://llm-lab.dev/en/posts/aidd-after-code-generation-design/As tools like Claude Code and Codex accelerate code generation, the scope of what engineers must design expands from the code itself to problems, constraints, operations, and validation.Sun, 03 May 2026 00:00:00 GMTFrom Work Logs to Blog Notes: Extract One Decision at a Timehttps://llm-lab.dev/en/posts/work-log-to-blog-note/https://llm-lab.dev/en/posts/work-log-to-blog-note/How I turn stuck points and decisions from Codex work logs into short blog notes.Fri, 01 May 2026 00:00:00 GMTStop Repeating Yourself in AIDD: Turn Repeat Work into Standard Entry Pointshttps://llm-lab.dev/en/posts/aidd-standardization-repeat-work/https://llm-lab.dev/en/posts/aidd-standardization-repeat-work/How to move beyond personal AI prompt tricks by turning them into standard commands, templates, and review checklists that the whole team can reuse.Mon, 27 Apr 2026 00:00:00 GMTOpenUI: A Framework for Rapid Generative UI Developmenthttps://llm-lab.dev/en/posts/generative-ui-fast-development-openui/https://llm-lab.dev/en/posts/generative-ui-fast-development-openui/Implementing Generative UI from scratch means dealing with messy component management and streaming control. OpenUI is a framework that neatly abstracts and hides all of that.Tue, 21 Apr 2026 00:00:00 GMTTaming Thumbnail White Space in an Astro Bloghttps://llm-lab.dev/en/posts/astro-thumbnail-contain-note/https://llm-lab.dev/en/posts/astro-thumbnail-contain-note/A quick UI note on switching article card thumbnails from `object-cover` to `object-contain` to prevent cropping, and why I rolled the background back to near-white with a light border.Fri, 17 Apr 2026 00:00:00 GMTPublishing a Next.js Static Site on Cloudflare Pages for Freehttps://llm-lab.dev/en/posts/cloudflare-pages-static-site-deploy/https://llm-lab.dev/en/posts/cloudflare-pages-static-site-deploy/An experiment log verifying deployment of a Next.js static site to Cloudflare Pages, from local verification and build to production deployment and custom domain setup.Sat, 11 Apr 2026 00:00:00 GMTTips for Reliably Extracting Structured JSON from LLMshttps://llm-lab.dev/en/posts/generative-ui-correct-json-output/https://llm-lab.dev/en/posts/generative-ui-correct-json-output/I summarize the approaches and tips for extracting structured data from llms as reliably, accurately, and with as low latency as possible.Sun, 18 Jan 2026 00:00:00 GMTAgentOps Sounds New, but the Problems Are Familiarhttps://llm-lab.dev/en/posts/agentops-old-automation-note/https://llm-lab.dev/en/posts/agentops-old-automation-note/How old automation failures — batch jobs, notification bots, admin UIs, outdated runbooks — raise the same questions for AI agent operations.Thu, 23 Oct 2025 00:00:00 GMTRenaming the Blog: Tsurezure Agent OPShttps://llm-lab.dev/en/posts/rename-blog-tsurezure-agentops/https://llm-lab.dev/en/posts/rename-blog-tsurezure-agentops/Why I moved from Field Ops Notes to Tsurezure Agent OPS — a space for operations, automation, and AI agent topics, rooted in small daily frictions.Mon, 18 Aug 2025 00:00:00 GMTWhat to Check When Asked 'Can AI Do This?'https://llm-lab.dev/en/posts/can-ai-do-this-check-note/https://llm-lab.dev/en/posts/can-ai-do-this-check-note/A note on evaluating whether a task can be handled by AI based on input variability, failure impact, and human review cost rather than model performance alone.Thu, 12 Jun 2025 00:00:00 GMTWhy Success Logs Alone Aren't Enough for Operationshttps://llm-lab.dev/en/posts/success-log-not-enough-note/https://llm-lab.dev/en/posts/success-log-not-enough-note/A short note on log design: clean success logs alone don't help you diagnose failures or improve recovery.Tue, 04 Mar 2025 00:00:00 GMTWrite Down Manual Decision Criteria Before Delegating to an AI Agenthttps://llm-lab.dev/en/posts/manual-decision-before-agent-note/https://llm-lab.dev/en/posts/manual-decision-before-agent-note/A note on why you need to audit what operators actually look at before you start agentifying a workflow.Thu, 16 Jan 2025 00:00:00 GMTSmall Admin Panels Are Where Audit Logs Get Left Behindhttps://llm-lab.dev/en/posts/admin-audit-log-later-note/https://llm-lab.dev/en/posts/admin-audit-log-later-note/Notes on how losing track of who changed what and when in a small internal admin panel makes later investigations surprisingly painful.Thu, 07 Nov 2024 00:00:00 GMTWhat to Decide Before Letting AI Summarize Inquirieshttps://llm-lab.dev/en/posts/ai-inquiry-summary-before-note/https://llm-lab.dev/en/posts/ai-inquiry-summary-before-note/A note on why you should define who reviews AI summaries and how to trace back to the original message before prioritizing convenience when summarizing inquiry emails or Slack threads with AI.Wed, 18 Sep 2024 00:00:00 GMTGitLab Duo Goes Free for All Users: Overview and Impact on Development Workflowshttps://llm-lab.dev/en/posts/gitlab-duo-release/https://llm-lab.dev/en/posts/gitlab-duo-release/GitLab announced something developers cannot afford to miss. The AI assistance feature 'GitLab Duo', previously limited to paid plans (Ultimate and Premium) or requiring an additional add-on license, will now be provided by default to all GitLab users, including those on the free plan.Thu, 22 Aug 2024 00:00:00 GMTDocumentation Isn't Ignored — It's Left Unupdatedhttps://llm-lab.dev/en/posts/docs-are-not-updated-note/https://llm-lab.dev/en/posts/docs-are-not-updated-note/A note on why procedures lose trust on the ground: not because they go unread, but because operational gaps never make it back into the docs.Tue, 02 Jul 2024 00:00:00 GMTBuilding a Notification Bot Turned Me Into Its Support Deskhttps://llm-lab.dev/en/posts/notification-bot-owner-note/https://llm-lab.dev/en/posts/notification-bot-owner-note/A note on how small notification bots blur the lines around accuracy and responsibility as they become more useful.Tue, 14 May 2024 00:00:00 GMTWhy CSV Import Edge Cases Escape Your Runbookhttps://llm-lab.dev/en/posts/csv-import-exception-note/https://llm-lab.dev/en/posts/csv-import-exception-note/A note on how small divergences in CSV imports—column shifts, encoding issues, end-of-month exceptions—gradually drift outside the runbook.Thu, 21 Mar 2024 00:00:00 GMTRemotion Fundamentals: A Field Note on Using React Components for Videohttps://llm-lab.dev/en/posts/react-remotioni-movie-test/https://llm-lab.dev/en/posts/react-remotioni-movie-test/A write-up on how the powerful React and TypeScript ecosystem can be applied directly to video production through Remotion, and where this approach shines.Wed, 14 Feb 2024 00:00:00 GMTRethinking Nightly Batch Failure Alerts as an Operations Entry Pointhttps://llm-lab.dev/en/posts/nightly-batch-alert-agentops-note/https://llm-lab.dev/en/posts/nightly-batch-alert-agentops-note/A short operations note on treating nightly batch failure alerts as more than simple warnings—breaking them down into detection, diagnosis, retry decisions, and human handoff.Thu, 08 Feb 2024 00:00:00 GMTStarting This Bloghttps://llm-lab.dev/en/posts/start-blog-self-introduction/https://llm-lab.dev/en/posts/start-blog-self-introduction/The first post from an engineer who maintains business systems and builds small automations: documenting what gets stuck in operations and keeping technical notes.Fri, 12 Jan 2024 00:00:00 GMT