Evaluation to Deployment Loop
The Orlo Platform loop is:
- create a task
- upload a dataset
- run an evaluation
- review the recommendation
- create and activate a deployment
- send inference traffic
- collect feedback, traces, and approvals
- promote useful traces or reviewed samples back into evaluation data
Task versioning
Tasks are mutable heads with immutable task versions behind them. Deployments and evaluations bind to task versions for reproducibility.
Evaluation posture
Evaluations are async and budget-bounded. The result surface is uncertainty-aware: if two models are statistically too close to call, Orlo says so.
Deployment posture
Deployments freeze the task version, model, and strategy into a reproducible snapshot. Activation switches the active deployment for a task.
Runtime posture
Live requests can now be served in three modes:
- explicit deployment selection
- routing-policy selection from evaluated candidates
- active-deployment fallback when no routing policy decision can be applied cleanly
Improvement posture
Orlo is not limited to offline dataset refresh. Teams can also turn reviewed trace samples and governed agent trajectories into new evaluation data.