Inference
Orlo exposes two public inference paths:
- OpenAI-compatible chat completions
- task-native governed execution
POST /v1/chat/completions
OpenAI-compatible path.
Required headers:
X-Orlo-Org-IdX-Orlo-Task-Id
Key request fields:
messages- optional
model - optional
temperature - optional
max_tokens
Orlo responds with an OpenAI-style chat.completion object and sets these headers:
x-orlo-modelx-orlo-trace-idx-orlo-validationx-orlo-task-versionx-orlo-routing-mode
x-orlo-routing-mode tells you how the final deployment was selected:
explicit— your request named a deployment directlypolicy— Orlo selected a deployment using task routing policy and evaluated candidatesactive_fallback— Orlo served the task's active deployment because no routing policy decision was usable
POST /v1/tasks/:task_id/run
Task-native inference path.
Key request fields:
input- optional
deployment_id - optional
explain:default,debug, oraudit
debug and audit
When explain is debug or audit, Orlo includes:
- latency
- token usage
- deployment metadata
- validation result
- retrieval attribution
- routing metadata
Routing metadata includes:
- route mode
- selected deployment ID
- fallback deployment ID when present
- ranked model candidates when available
- abstain and escalation flags when the policy layer determined the request should not auto-route cleanly
Example
bash
curl https://api.useorlo.com/v1/tasks/<task-id>/run \
-H 'Content-Type: application/json' \
-H 'X-Orlo-Org-Id: <org-uuid>' \
-d '{
"input": { "question": "What are the reporting deadlines?" },
"explain": "audit"
}'
Notes
- The chat path is useful when you want OpenAI-compatible client behavior.
- The task-native path is better when you want explicit Orlo explainability controls.
- Routing policy only applies when the request does not pin a deployment explicitly.