Launch Week 5 · Day 2: Langfuse agent skill →
Live · May 25–29, 20265 days · 5 dropsLive demos at ClickHouse OpenHouse
Langfuse · May 25–29, 2026

Launch Week #5

Five days. Five drops. New building blocks for taking AI applications from prototype to production. Unveiled live at ClickHouse OpenHouse.

Get every drop in your inbox · No spam
5 days5 feature drops
Live demosClickHouse OpenHouse
Open sourceMIT · self-host anytime
Monday → FridayMay 25–29, 2026
The Week · May 25–29, 2026

One drop a day, every day.
Monday through Friday.

We'll unwrap a new feature each day and update this page as each one ships. Subscribe to the newsletter or follow us so you don't miss a drop.

Day 01Live
01
Monday
May 25, 2026
Experiments in CI/CD
Day 02Live
02
Tuesday
May 26, 2026
Langfuse agent skill
Day 03
03
Wednesday
May 27, 2026
Find anything
Day 04
04
Thursday
May 28, 2026
Evals as code
Day 05
05
Friday
May 29, 2026
Never miss a thing
Day 02 · Tuesday, May 26, 2026

Langfuse agent skill.

Building an agent is easy. Getting it to production is hard. You set up tracing and evaluators, but how do you know what your agent's real failure modes are? How do you know your LLM-as-a-judge is actually calibrated against your human annotators?

The Langfuse Skill lets you hand your AI coding agent a playbook for working with Langfuse. It teaches Claude Code, Cursor, Codex, etc. how to instrument an app, query traces, manage prompts, and set up evaluators. Drop it into your editor, then describe the job in plain language and the agent runs with it.

In the video below, Marlies uses the LLM-as-a-Judge calibration skill with Codex to produce a full analysis with accuracy, F1, precision, recall, and cost, all graphed directly in the new Langfuse Experiments view.

Day 01 · Monday, May 25, 2026

Experiments in CI/CD.

Run your Langfuse experiments inside GitHub Actions. The new action tests every pull request against a Langfuse dataset, fails the workflow when scores drop below the threshold you set, and posts the result back to the PR as a comment. Every run is tracked in Langfuse so you can dig into regressions later.