Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/benchmarks/effectiveness/tasks/001-filesystem-context-offload/README.md
1# 001 - Filesystem context offload23## Hypothesis45An agent equipped with the `filesystem-context` skill will:671. Write the simulated tool output to a file under `scratch/` instead of returning it inline.82. Use targeted retrieval (grep + read with line ranges) to answer the follow-up question without re-loading the full payload.93. Use noticeably fewer total tokens than the control condition.1011A control agent (no skills) is expected to dump the full payload back into context or otherwise inflate token usage.1213## Setup1415The `starting/` directory contains:1617- `tool_output.txt`: ~5,000 lines of synthetic agent-trace data with one targeted fact buried at line 4321.18- `instructions.md`: brief reminder of what files are present.1920The agent receives `task.md` as its prompt.2122## Grading2324`verify.sh` checks:25261. `scratch/` directory exists (skill behavior expected).272. At least one file in `scratch/` contains lines copied from `tool_output.txt` (the agent actually offloaded).283. The agent's final response contains the targeted fact (`API_RATE_LIMIT=8475`).2930A run passes when all three checks pass. Token cost and wall time are recorded regardless and reported as effect sizes against the `control` condition.3132## Categories of behavior we expect to differentiate3334- `control`: agent likely returns the full output inline or fails to find the fact; high tokens.35- `target` (filesystem-context loaded): agent should offload and retrieve targeted; lower tokens, success.36- `negative` (bdi-mental-states loaded, filesystem-context absent): equivalent to control.37- `full` (all skills): success rate should match `target`; tokens may be slightly higher from extra context.38