Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/fixtures/activation-cases.jsonl
1{"case_id":"activation-evaluation-vs-advanced-judge","prompt":"Create a pairwise LLM-as-judge rubric with position-bias mitigation for comparing two model outputs.","expected_primary_skill":"advanced-evaluation","acceptable_secondary_skills":["evaluation"],"rejected_skills":["context-optimization","project-development"],"reason":"Judge design and pairwise bias mitigation belong to advanced-evaluation, not general evaluation."}2{"case_id":"activation-evaluation-general-quality-gate","prompt":"Build a deterministic quality gate and regression test suite for an agent pipeline before deployment.","expected_primary_skill":"evaluation","acceptable_secondary_skills":["advanced-evaluation","harness-engineering"],"rejected_skills":["tool-design"],"reason":"General quality gates and regression suites belong to evaluation unless judge-specific calibration or autonomous control-surface design dominates."}3{"case_id":"activation-compression-vs-optimization","prompt":"Summarize a long agent session into a compact handoff that preserves files, decisions, risks, and next actions.","expected_primary_skill":"context-compression","acceptable_secondary_skills":["filesystem-context"],"rejected_skills":["context-optimization"],"reason":"The work is compaction and handoff preservation rather than broader token-budget optimization."}4{"case_id":"activation-optimization-vs-compression","prompt":"Reduce token cost by masking tool outputs, partitioning context across subagents, and improving cache hit rate.","expected_primary_skill":"context-optimization","acceptable_secondary_skills":["context-compression","multi-agent-patterns"],"rejected_skills":["memory-systems"],"reason":"The primary goal is context efficiency through masking, partitioning, and caching."}5{"case_id":"activation-filesystem-vs-memory","prompt":"Store large tool outputs and terminal logs in searchable files so agents can retrieve only relevant slices later.","expected_primary_skill":"filesystem-context","acceptable_secondary_skills":["context-optimization"],"rejected_skills":["memory-systems"],"reason":"This is file-backed context offloading, not long-term entity memory."}6{"case_id":"activation-memory-vs-filesystem","prompt":"Design cross-session entity memory with temporal validity, graph traversal, and retrieval updates for a personal assistant.","expected_primary_skill":"memory-systems","acceptable_secondary_skills":["filesystem-context"],"rejected_skills":["context-compression"],"reason":"The work is persistent memory architecture with temporal/entity semantics."}7{"case_id":"activation-harness-vs-project","prompt":"Design an autonomous research loop with locked rubrics, editable drafts, rollback, novelty gates, and human merge approval.","expected_primary_skill":"harness-engineering","acceptable_secondary_skills":["project-development","evaluation"],"rejected_skills":["hosted-agents"],"reason":"The core problem is control surfaces and governance around autonomy."}8{"case_id":"activation-project-vs-harness","prompt":"Decide whether an LLM batch pipeline is appropriate, estimate cost, and structure acquire prepare process parse render stages.","expected_primary_skill":"project-development","acceptable_secondary_skills":["evaluation"],"rejected_skills":["harness-engineering"],"reason":"The core problem is project fit and pipeline shape, not autonomous loop governance."}9{"case_id":"activation-fundamentals-vs-degradation","prompt":"Explain why context windows degrade as they fill, and how attention mechanics make middle-of-context information less recoverable.","expected_primary_skill":"context-fundamentals","acceptable_secondary_skills":["context-degradation","context-optimization"],"rejected_skills":["project-development"],"reason":"Foundational explanation of attention mechanics is context-fundamentals; degradation and optimization are adjacent operational skills."}10{"case_id":"activation-fundamentals-onboarding","prompt":"Explain to a new team member what context is in an agent system and why context quality matters more than raw token count.","expected_primary_skill":"context-fundamentals","acceptable_secondary_skills":[],"rejected_skills":["context-degradation","context-optimization","project-development","evaluation"],"reason":"Onboarding and conceptual explanation of context is context-fundamentals; not an operational debugging or optimization task."}11{"case_id":"activation-fundamentals-vs-optimization","prompt":"Explain the conceptual trade-off between including more context up-front versus retrieving smaller targeted context, and why this trade-off matters for agent design.","expected_primary_skill":"context-fundamentals","acceptable_secondary_skills":["context-optimization","project-development"],"rejected_skills":["memory-systems","bdi-mental-states"],"reason":"Conceptual trade-off explanation is context-fundamentals; context-optimization owns the actual optimization technique selection."}12{"case_id":"activation-tool-vs-project-structured-output","prompt":"Outline why structured output design improves downstream parsing and recommend prompt patterns for it.","expected_primary_skill":"project-development","acceptable_secondary_skills":["tool-design"],"rejected_skills":["context-degradation"],"reason":"Structured output design at the pipeline-shape level is project-development; tool-design owns the per-tool schema, project-development owns the pipeline contract."}13{"case_id":"activation-tool-individual-tool","prompt":"Help me make my agent's individual tool schemas, names, and error messages clearer so the agent picks the right tool more often.","expected_primary_skill":"tool-design","acceptable_secondary_skills":["context-optimization","project-development"],"rejected_skills":["memory-systems","latent-briefing"],"reason":"Per-tool description quality, response formats, and error messages are tool-design; project-development handles project-level decisions, not individual tool tuning."}14{"case_id":"activation-tool-consolidation","prompt":"My agent has 17 specialized tools and I want to consolidate them down to a smaller, more general set without losing capability.","expected_primary_skill":"tool-design","acceptable_secondary_skills":["project-development"],"rejected_skills":["context-degradation","memory-systems"],"reason":"Tool-set consolidation is the tool-design pattern; project-development can advise on architectural reduction at project scope but the unit of work here is the tool catalog."}15{"case_id":"activation-bdi-vs-memory","prompt":"Transform RDF triples about a scheduled meeting into beliefs, desires, intentions, justifications, validity intervals, and an action plan that can be queried with SPARQL.","expected_primary_skill":"bdi-mental-states","acceptable_secondary_skills":["memory-systems"],"rejected_skills":["filesystem-context","context-compression"],"reason":"Formal belief-desire-intention modeling over RDF belongs to bdi-mental-states, not ordinary persistent memory or summarization."}16{"case_id":"activation-degradation-poisoning","prompt":"My agent keeps repeating an incorrect retrieved fact even after I correct it. Diagnose whether this is context poisoning, clash, or a prompt issue and propose recovery.","expected_primary_skill":"context-degradation","acceptable_secondary_skills":["evaluation"],"rejected_skills":["context-compression","context-fundamentals"],"reason":"An active context failure diagnosis belongs to context-degradation; compression may be a later mitigation."}17{"case_id":"activation-hosted-vs-harness","prompt":"Design remote sandboxes, warm pools, snapshots, session teardown, user-authored PR creation, and streaming clients for background coding agents.","expected_primary_skill":"hosted-agents","acceptable_secondary_skills":["harness-engineering","tool-design"],"rejected_skills":["context-optimization","bdi-mental-states"],"reason":"Hosted runtime infrastructure belongs to hosted-agents, while harness-engineering owns governance around the loop."}18{"case_id":"activation-latent-briefing-vs-memory","prompt":"Workers in my orchestrator-worker agent need task-relevant orchestrator trajectory state without replaying all text, and I control the worker model KV cache.","expected_primary_skill":"latent-briefing","acceptable_secondary_skills":["multi-agent-patterns","context-optimization"],"rejected_skills":["memory-systems","context-compression"],"reason":"Representation-level KV cache compaction for worker handoff belongs to latent-briefing, not ordinary memory or text summaries."}19{"case_id":"activation-multi-agent-topology","prompt":"Choose between supervisor, swarm, and hierarchical agents, define explicit handoff protocols, and decide how to validate worker outputs before aggregation.","expected_primary_skill":"multi-agent-patterns","acceptable_secondary_skills":["evaluation","hosted-agents"],"rejected_skills":["project-development"],"reason":"Agent topology and handoff protocols belong to multi-agent-patterns; hosted-agents is adjacent only when remote runtime infrastructure dominates."}20