Source from repo

Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.

muratcankoylanGitHub muratcankoylanSource repo Original GitHub link

Files

339

Skill

n/a

Size

4.3 MB

Entrypoint

SKILL.md

Format

git-repo

Open file

researcher/benchmarks/effectiveness/tasks/001-filesystem-context-offload/README.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown38 linesFree

researcher/benchmarks/effectiveness/tasks/001-filesystem-context-offload/README.md

1# 001 - Filesystem context offload
2 
3## Hypothesis
4 
5An agent equipped with the `filesystem-context` skill will:
6 
71. Write the simulated tool output to a file under `scratch/` instead of returning it inline.
82. Use targeted retrieval (grep + read with line ranges) to answer the follow-up question without re-loading the full payload.
93. Use noticeably fewer total tokens than the control condition.
10 
11A control agent (no skills) is expected to dump the full payload back into context or otherwise inflate token usage.
12 
13## Setup
14 
15The `starting/` directory contains:
16 
17- `tool_output.txt`: ~5,000 lines of synthetic agent-trace data with one targeted fact buried at line 4321.
18- `instructions.md`: brief reminder of what files are present.
19 
20The agent receives `task.md` as its prompt.
21 
22## Grading
23 
24`verify.sh` checks:
25 
261. `scratch/` directory exists (skill behavior expected).
272. At least one file in `scratch/` contains lines copied from `tool_output.txt` (the agent actually offloaded).
283. The agent's final response contains the targeted fact (`API_RATE_LIMIT=8475`).
29 
30A run passes when all three checks pass. Token cost and wall time are recorded regardless and reported as effect sizes against the `control` condition.
31 
32## Categories of behavior we expect to differentiate
33 
34- `control`: agent likely returns the full output inline or fails to find the fact; high tokens.
35- `target` (filesystem-context loaded): agent should offload and retrieve targeted; lower tokens, success.
36- `negative` (bdi-mental-states loaded, filesystem-context absent): equivalent to control.
37- `full` (all skills): success rate should match `target`; tokens may be slightly higher from extra context.
38

Preparing the source view

Agent Skills for Context Engineering

researcher/benchmarks/effectiveness/tasks/001-filesystem-context-offload/README.md