Details #
- Event: Voxxed Days Amsterdam 2026
- Type: Conference Talk
- Location: Amsterdam, The Netherlands
- Speakers: Jettro Coenradie, Daniël Spee
- Schedule: Thursday, April 2, 2026, 14:25-15:10, Zaal 2
- Recording: Available on YouTube.
Description #
Would you let a stranger handle your customer data? Would you let a new hire talk to a client on their first day? Would you put your kid in a self-driving car and just say “Have fun at school.”
Then why do we trust our shiny new AI Agents to behave correctly in production without testing them?
In this talk, we share our journey of exploring how to evaluate Agentic Systems before and after deployment. We’ll walk through how to move from “it works in the demo” to trustworthy and observable systems that you can confidently run in production.
We’ll show practical examples of building evaluation pipelines, and how we experiment with simple, measurable ways to understand an agent’s behavior over time. We’ll share what we’ve learned so far, where things go wrong, what helps, and what’s still an open challenge as we build toward more mature evaluation practices.
Expect real experiences, not just theory. Expect live examples, and ideas you can take home to build trust into your own agents.
Key Takeaways
Why testing AI Agents is different from traditional software testing How to design evaluation frameworks that fit your use case How to combine offline testing with live production observation
Target Audience Developers, architects, and AI practitioners who are experimenting with or building agent-based systems and want to learn how to evaluate and test them effectively.