Ops Agent System Card · v1.0
The agent behind every dispatch decision.
Published under NIST AI RMF + OpenAI/Anthropic system-card conventions. Read this before integrating, auditing, or relying on agent outputs for consequential decisions.
Intended use
Dispatch work-order intake, NTE quoting, contractor routing, talent application scoring, and security scan initiation across US commercial field-service accounts. Operated by the platform under human ownership.
Out-of-scope
Defense contracting, export-controlled technologies (ITAR), high-risk medical decisions, financial instrument advice, child-directed services, or any use-case in the EU AI Act prohibited list.
Training & tuning data
Base models are proprietary frontier LLMs (Anthropic, OpenAI, Google) accessed via API. SteadyWrk does not train foundation models. Contextual data for routing comes from our own operational history plus public reference corpora.
Eval methodology
Rolling 30-day window across 8 evals (see /evals). Each eval reports Wilson score CI + bootstrap 95%. Ground truth is contractor outcome and payment disposition. Drift monitored weekly.
Capability limits
Quote accuracy ~90% within NTE bands. Routing confidence correlates with data completeness. Agent defers to human operator when confidence < 70% (QANAT rule 4).
Safety mitigations
Claims-based authorization. Zod schema validation on all inputs. Upstash rate-limits. Fingerprint tracking for anomalous request patterns. Kill switch via feature flag at the edge.
Failure modes
Known: overlapping NTE bands produce ambiguous routing; new accounts without historical data route conservatively; PDF parse errors on non-standard work-order formats fall back to manual review.
Escalation & human oversight
Confidence < 70% → human queue. Override rate < 3% targeted. All decisions emit audit event; append-only log with 7-year retention (QANAT rule 5).