When I started in quality engineering ten years ago, no one was talking about artificial intelligence inside testing teams. Our tools were Selenium, JIRA, and a few brittle Python scripts. Quality was treated as a phase that came after development — a few days before release, with a checklist that almost no one read end to end.

Today, everything has changed. AI agents now write test cases, run them, and file bug reports autonomously. The traditional QA team is no longer the group that clicks buttons until something breaks. We’ve become designers of trust — building the systems that watch the AI, and verify it does what it promised.

Why my QA background matters

People who spent years in quality teams understand three truths that an average developer can’t fully realize until they survive a production failure:

  • Systems fail in ways designers never anticipate.
  • Users don’t behave according to the happy path.
  • Trust is built over years and lost in seconds.

When you carry those truths into the world of intelligent agents, the questions change. How do I know the agent actually got it right? When should I trust an LLM’s answer? What does “correct” even mean when the answer changes on every invocation? These are quality engineering questions wearing modern clothes — they’re not new at all.

“AI doesn’t eliminate the QA engineer. It elevates them: from tester to trust engineer.”

Three shifts worth watching

Over the last twelve months, I’ve seen three shifts in the role of the quality team worth pausing on. The first is that teams now write evals more than they write unit tests. The second: verification stopped being binary — it became a probability distribution.

The third shift — and the most important

We’ve started talking about model behavior in meetings using the language of language itself, instead of system behavior. This subtle shift in vocabulary changes the entire shape of the team — suddenly the role of “quality engineer” morphs into “behavior engineer.”

And that’s exactly why, in 2026, I find myself talking more to Scrum and product teams than to the testing team. The questions are no longer “Does it work?” — they’re “When does it fail, and how will we know?”