Source from repo
Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
muratcankoylanGitHub muratcankoylanSource repo Original GitHub link
Files
339
Skill
n/a
Size
4.3 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
researcher/claims/index.jsonl

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
text13 linesFree
researcher/claims/index.jsonl
1{"claim_id":"claim-evaluation-browsecomp-variance","claim_text":"BrowseComp-style browsing performance is dominated by token usage, with tool calls and model choice as secondary drivers.","owning_skill":"evaluation","section":"Core Concepts / Performance Drivers","source_url":"docs/blogs.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"high","last_reviewed":"2026-05-15"}
2{"claim_id":"claim-multi-agent-token-multiplier","claim_text":"Multi-agent systems can cost substantially more tokens than single-agent chat and should be justified by context isolation or parallel exploration.","owning_skill":"multi-agent-patterns","section":"Core Concepts / Token Economics","source_url":"docs/blogs.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"high","last_reviewed":"2026-05-15"}
3{"claim_id":"claim-context-optimization-tool-output-dominance","claim_text":"Tool outputs frequently dominate agent trajectory tokens, so observation masking often yields the largest context-capacity gain.","owning_skill":"context-optimization","section":"Core Concepts / Observation masking","source_url":"docs/claude_research.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"medium","last_reviewed":"2026-05-15"}
4{"claim_id":"claim-memory-locomo-filesystem-baseline","claim_text":"Filesystem-style memory baselines can outperform more specialized memory tooling on some long-conversation memory benchmarks.","owning_skill":"memory-systems","section":"Core Concepts","source_url":"skills/memory-systems/references/implementation.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"high","last_reviewed":"2026-05-15"}
5{"claim_id":"claim-advanced-evaluation-position-swap","claim_text":"Pairwise LLM evaluation should mitigate position bias by judging both response orders and treating disagreement as lower confidence.","owning_skill":"advanced-evaluation","section":"Pairwise Comparison Implementation","source_url":"examples/llm-as-judge-skills/src/tools/evaluation/pairwise-compare.ts","retrieved_at":"2026-05-15","evidence_strength":"derived","volatility":"medium","last_reviewed":"2026-05-15"}
6{"claim_id":"claim-harness-locked-evaluator","claim_text":"Autonomous loops need locked evaluators and narrow editable surfaces to prevent agents from approving their own weakened metrics.","owning_skill":"harness-engineering","section":"Core Concepts / Harness Boundary","source_url":"https://github.com/karpathy/autoresearch/blob/master/program.md","retrieved_at":"2026-05-15","evidence_strength":"primary","volatility":"low","last_reviewed":"2026-05-15"}
7{"claim_id":"claim-context-compression-factory-benchmark","claim_text":"Structured, anchored compression preserves agent task continuity better than generic compression in a production-session probe evaluation, while artifact tracking remains weak across methods.","owning_skill":"context-compression","section":"Core Concepts / Artifact Trail","source_url":"docs/compression.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"medium","last_reviewed":"2026-05-15"}
8{"claim_id":"claim-context-degradation-lost-middle-ruler","claim_text":"Long-context systems show middle-position recall degradation and advertised context length does not guarantee task performance at that length.","owning_skill":"context-degradation","section":"Core Concepts / Lost-in-Middle","source_url":"docs/claude_research.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"medium","last_reviewed":"2026-05-15"}
9{"claim_id":"claim-context-degradation-distractor-shuffled","claim_text":"Distractors and context ordering can materially affect retrieval behavior; some shuffled haystack setups outperform coherent ordering for specific retrieval tasks.","owning_skill":"context-degradation","section":"Detailed Topics / Counterintuitive Findings","source_url":"docs/claude_research.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"medium","last_reviewed":"2026-05-15"}
10{"claim_id":"claim-tool-design-vercel-d0-reduction","claim_text":"Vercel's d0 case study reports better measured outcomes after reducing an agent from many specialized tools to a small primitive tool set.","owning_skill":"tool-design","section":"Core Concepts / Consolidation Principle","source_url":"docs/vercel_tool.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"medium","last_reviewed":"2026-05-15"}
11{"claim_id":"claim-project-development-vercel-d0-reduction","claim_text":"Vercel's d0 case study shows architectural reduction can improve agent success, latency, token usage, and step count when the underlying data layer is well documented.","owning_skill":"project-development","section":"Detailed Topics / Architectural Reduction","source_url":"docs/vercel_tool.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"medium","last_reviewed":"2026-05-15"}
12{"claim_id":"claim-latent-briefing-public-results","claim_text":"Public Latent Briefing results report substantial worker-token reduction, material total-token savings, and low-single-digit-second compaction overhead on long-document QA workloads.","owning_skill":"latent-briefing","section":"Core Concepts / Reference result shape","source_url":"skills/latent-briefing/references/attention-matching-formulation.md","retrieved_at":"2026-05-15","evidence_strength":"secondary","volatility":"high","last_reviewed":"2026-05-15"}
13
Preparing the source view

Agent Skills for Context Engineering

researcher/claims/index.jsonl