Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
Build LLM-powered apps with the Anthropic Claude API or SDK across Python, TypeScript, Java, Go, Ruby, C#, and PHP.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
shared/managed-agents-outcomes.md
1# Managed Agents — Outcomes23An **outcome** elevates a session from *conversation* to *work*: you state what "done" looks like, and the harness runs an iterate → grade → revise loop until the artifact meets the rubric, hits `max_iterations`, or is interrupted. A separate **grader** (independent context window) scores each iteration against your rubric and feeds per-criterion gaps back to the agent.45The SDK sets the `managed-agents-2026-04-01` beta header automatically on all `client.beta.sessions.*` calls; no additional header is required for outcomes.67---89## The `user.define_outcome` event1011Outcomes are not a field on `sessions.create()`. You create a normal session, then send a `user.define_outcome` event. The agent starts working on receipt — **do not also send a `user.message`** to kick it off.1213```python14session = client.beta.sessions.create(15agent=AGENT_ID,16environment_id=ENVIRONMENT_ID,17title="Financial analysis on Costco",18)1920client.beta.sessions.events.send(21session_id=session.id,22events=[23{24"type": "user.define_outcome",25"description": "Build a DCF model for Costco in .xlsx",26"rubric": {"type": "text", "content": RUBRIC_MD},27# or: "rubric": {"type": "file", "file_id": rubric.id}28"max_iterations": 5, # optional; default 3, max 2029}30],31)32```3334| Field | Type | Notes |35|---|---|---|36| `type` | `"user.define_outcome"` | |37| `description` | string | The task. This is what the agent works toward — no separate `user.message` needed. |38| `rubric` | `{type: "text", content}` \| `{type: "file", file_id}` | **Required.** Markdown with explicit, independently gradeable criteria. Upload once via `client.beta.files.upload(...)` (beta `files-api-2025-04-14`) to reuse across sessions. |39| `max_iterations` | int | Optional. Default **3**, max **20**. |4041The event is echoed back on the stream with a server-assigned `outcome_id` and `processed_at`.4243> **Writing rubrics.** Use explicit, gradeable criteria ("CSV has a numeric `price` column"), not vibes ("data looks good") — the grader scores each criterion independently, so vague criteria produce noisy loops. If you don't have a rubric, have Claude analyze a known-good artifact and turn that analysis into one.4445---4647## Outcome-specific events4849These appear on the standard event stream (`sessions.events.stream` / `.list`) alongside the usual `agent.*` / `session.*` events.5051| Event | Payload highlights | Meaning |52|---|---|---|53| `span.outcome_evaluation_start` | `outcome_id`, `iteration` (0-indexed) | Grader began scoring iteration *N*. |54| `span.outcome_evaluation_ongoing` | `outcome_id` | Heartbeat while the grader runs. Grader reasoning is opaque — you see *that* it's working, not *what* it's thinking. |55| `span.outcome_evaluation_end` | `outcome_evaluation_start_id`, `outcome_id`, `iteration`, `result`, `explanation`, `usage` | Grader finished one iteration. `result` drives what happens next (table below). |5657### `span.outcome_evaluation_end.result`5859| `result` | Next |60|---|---|61| `satisfied` | Session → `idle`. Terminal for this outcome. |62| `needs_revision` | Agent starts another iteration. |63| `max_iterations_reached` | No further grader cycles. Agent may run one final revision, then session → `idle`. |64| `failed` | Session → `idle`. Rubric fundamentally doesn't match the task (e.g. description and rubric contradict). |65| `interrupted` | Only emitted if `_start` had already fired before a `user.interrupt` arrived. |6667```json68{69"type": "span.outcome_evaluation_end",70"id": "sevt_01jkl...",71"outcome_evaluation_start_id": "sevt_01def...",72"outcome_id": "outc_01a...",73"result": "satisfied",74"explanation": "All 12 criteria met: revenue projections use 5 years of historical data, ...",75"iteration": 0,76"usage": { "input_tokens": 2400, "output_tokens": 350, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 1800 },77"processed_at": "2026-03-25T14:03:00Z"78}79```8081---8283## Checking status & retrieving deliverables8485**Status** — either watch the stream for `span.outcome_evaluation_end`, or poll the session and read `outcome_evaluations`:8687```python88session = client.beta.sessions.retrieve(session.id)89for ev in session.outcome_evaluations:90print(f"{ev.outcome_id}: {ev.result}") # outc_01a...: satisfied91```9293**Deliverables** — the agent writes to `/mnt/session/outputs/`. Once idle, fetch via the Files API with `scope_id=session.id`. This is the same session-outputs mechanism documented in `shared/managed-agents-environments.md` → Session outputs (including the dual-beta-header requirement on `files.list`).9495---9697## Interaction rules & pitfalls9899- **One outcome at a time.** Chain by sending the next `user.define_outcome` only after the previous one's terminal `span.outcome_evaluation_end` (`satisfied` / `max_iterations_reached` / `failed` / `interrupted`). The session retains history across chained outcomes.100- **Steering is allowed but optional.** You *may* send `user.message` events mid-outcome to nudge direction, but the agent already knows to keep working until terminal — don't send "keep going" prompts.101- **`user.interrupt` pauses the current outcome** — it marks `result: "interrupted"` and leaves the session `idle`, ready for a new outcome or conversational turn.102- **After terminal, the session is reusable** — continue conversationally or define a new outcome.103- **Outcome ≠ session-create field.** Don't put `outcome`, `rubric`, or `description` on `sessions.create()` — outcomes are always sent as a `user.define_outcome` event.104- **Idle-break gate is unchanged.** In your drain loop, keep using `event.type === 'session.status_idle' && event.stop_reason?.type !== 'requires_action'` — do **not** gate on `span.outcome_evaluation_end` alone (on `needs_revision` the session keeps running). See `shared/managed-agents-client-patterns.md` Pattern 5.105106For the raw HTTP shapes and per-language SDK bindings beyond Python, WebFetch `https://platform.claude.com/docs/en/managed-agents/define-outcomes.md` (see `shared/live-sources.md`).107