Distributed Systems on Euijun's Personal Blog

「Agentic AI Dev Note」 Observing Agent Quality with Evaluation

Mon, 15 Jun 2026 20:00:00 +0900

Across Part 1 and Part 2 of this 「Agentic AI Dev Note」 series, we made our agent operable in the cloud. Now it’s time to check whether that agent actually behaves the way we expect. There are three ways to measure an agent’s quality.

Code quality: Most of it can be verified with unit/integration/e2e tests. The same input produces exactly the same output, so it can be verified precisely.
System quality: Measured through runtime error rates, latency, dependency failures, and the like, via alarms and monitoring. Canary tests can monitor system quality too.
LLM response quality: Measures whether the LLM produced the expected answer to the user’s input.

But how do you guarantee the response quality of a non-deterministic LLM, where the same input can yield a different output every time? For the first two, the same input gives the same output, so you can prove it is “correct.” LLM responses, however, cannot be proven. And if you can’t prove it, the only option left is to observe it: watch the trend of behavior instead of any single correct answer. This post focuses on that.

「Agentic AI Dev Note」 Managing Agent Memory in a Distributed Systems

Mon, 18 May 2026 06:22:00 +0900

In Part 1, we discussed stateless containers for running Agents in a distributed system. However, the Agents we interact with seem to behave statefully. When a user asks an Agent a question, it answers, and as the conversation progresses, the Agent maintains the context of previous discussions. To achieve this, the system is designed by separating the request processing area from the memory area, fetching the conversation history from memory whenever needed.

「Agentic AI Dev Note」 Developing and Operating Agents in a Distributed Systems

Wed, 13 May 2026 17:32:42 +0900

We imagine it like this: a user sends a request to an Agent, an Agent floating somewhere processes the request, and returns the result. In the big picture, this is correct. However, a real distributed environment is not quite this simple.

When a request comes in, a container is provisioned, the Agent inside it processes the request, and if it remains idle for a certain period, the container is destroyed. The next request might be received by a completely different container. This is a mechanism for the scalability and efficiency of a distributed system. If traffic increases, containers must be scaled up; if it decreases, they must be scaled down.