Evidence-Led Agent Workflows

Agent workflows tend to be evaluated only by their final answer. That is the wrong unit of evaluation. The unit that matters is the chain of evidence that produced the answer. ### Artifacts Over Assertions Each meaningful step an agent takes should produce an artifact: a structured record of inputs, decisions, and outputs. Operators should be able to inspect these artifacts as easily as they would inspect a pull request. ### Evidence as Governance Surface Once evidence is first-class, governance becomes tractable. Reviewers can scan artifacts, recover from failure points, and audit behavior after the fact. Cerberus consumes these artifacts to support policy review; StrongHold archives them as durable records. The workflow becomes legible because the evidence is.