If every run is only success or failure, the system hides the part that actually needs judgment
An automation job is rarely a single instant action. It usually moves through states such as waiting to start, queued, running, waiting for an external callback, partially completed, retrying, under manual handling, terminated, or completed. If the system writes only the final outcome, the team sees the ending but not how the job arrived there.
That directly affects both operations and product decisions. Did the failed run come from a timeout, an idempotency conflict, incomplete input, or a manual state change upstream? If every exception is flattened into “failure,” alerts become misleading, retries get sprayed in the wrong places, and people end up guessing from raw logs.
At minimum, distinguish queued, running, waiting for callback, retryable failure, non-retryable failure, and manual handover
State names should reflect handling meaning, not just technical result codes
If operations still need to ask engineering what kind of failure happened, the state model is too weak