Evaluation to Deployment Loop

The Orlo Platform loop is:

Task versioning

Tasks are mutable heads with immutable task versions behind them. Deployments and evaluations bind to task versions for reproducibility.

Evaluations are async and budget-bounded. The result surface is uncertainty-aware: if two models are statistically too close to call, Orlo says so.

Deployments freeze the task version, model, and strategy into a reproducible snapshot. Activation switches the active deployment for a task.

Live requests can now be served in three modes:

explicit deployment selection
routing-policy selection from evaluated candidates
active-deployment fallback when no routing policy decision can be applied cleanly

Orlo is not limited to offline dataset refresh. Teams can also turn reviewed trace samples and governed agent trajectories into new evaluation data.