When Your Safety System Fails, You Need More Than a Fix

We had a bug this week. A user triggered Okaya's safety system — the part designed to identify at-risk individuals who mention harm to themselves or others.

We had been using a smaller LLM for harm detection because it gave us the fastest response times. But our detection prompts had grown more nuanced over time, and the smaller model was missing signals it should have caught.

Fixing the model choice was straightforward. Knowing we could trust the fix — that was the harder problem.

This is why we have a validation system. Over 150 scenarios — some with real warning signals, others designed to trick the system with false positives and false negatives. Each scenario has expected results reviewed by licensed professionals.

When we swapped models, the framework told us exactly what improved and what regressed. Instead of days of manual testing, we could iterate quickly — adjust a prompt, run the scenarios, and evaluate the results in minutes.

If you're embedding an LLM into your product, this kind of tooling is what lets you move with confidence instead of guessing.

Originally published on LinkedIn — view the original post for comments and reactions.

← All posts