Source from repo
Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
muratcankoylanGitHub muratcankoylanSource repo Original GitHub link
Files
241
Skill
n/a
Size
2.6 MB
Entrypoint
SKILL.md
Format
git-repo
Open file
skills/context-optimization/references/optimization_techniques.md

Syntax-highlighted preview of this file as included in the skill package.
Rendered Source
markdown273 linesFree
skills/context-optimization/references/optimization_techniques.md
1# Context Optimization Reference
2 
3This document provides detailed technical reference for context optimization techniques and strategies.
4 
5## Compaction Strategies
6 
7### Summary-Based Compaction
8 
9Summary-based compaction replaces verbose content with concise summaries while preserving key information. The approach works by identifying sections that can be compressed, generating summaries that capture essential points, and replacing full content with summaries.
10 
11The effectiveness of compaction depends on what information is preserved. Critical decisions, user preferences, and current task state should never be compacted. Intermediate results and supporting evidence can be summarized more aggressively. Boilerplate, repeated information, and exploratory reasoning can often be removed entirely.
12 
13### Token Budget Allocation
14 
15Effective context budgeting requires understanding how different context components consume tokens and allocating budget strategically:
16 
17| Component | Typical Range | Notes |
18|-----------|---------------|-------|
19| System prompt | 500-2000 tokens | Stable across session |
20| Tool definitions | 100-500 per tool | Grows with tool count |
21| Retrieved documents | Variable | Often largest consumer |
22| Message history | Variable | Grows with conversation |
23| Tool outputs | Variable | Can dominate context |
24 
25### Compaction Thresholds
26 
27Trigger compaction at appropriate thresholds to maintain performance:
28 
29- Warning threshold at 70% of effective context limit
30- Compaction trigger at 80% of effective context limit
31- Aggressive compaction at 90% of effective context limit
32 
33The exact thresholds depend on model behavior and task characteristics. Some models show graceful degradation while others exhibit sharp performance cliffs.
34 
35## Observation Masking Patterns
36 
37### Selective Masking
38 
39Not all observations should be masked equally. Consider masking observations that have served their purpose and are no longer needed for active reasoning. Keep observations that are central to the current task. Keep observations from the most recent turn. Keep observations that may be referenced again.
40 
41### Masking Implementation
42 
43```python
44def selective_mask(observations: List[Dict], current_task: Dict) -> List[Dict]:
45    """
46    Selectively mask observations based on relevance.
47    
48    Returns observations with mask field indicating masked content.
49    """
50    masked = []
51    
52    for obs in observations:
53        relevance = calculate_relevance(obs, current_task)
54        
55        if relevance < 0.3 and obs["age"] > 3:
56            # Low relevance and old - mask
57            masked.append({
58                **obs,
59                "masked": True,
60                "reference": store_for_reference(obs["content"]),
61                "summary": summarize_content(obs["content"])
62            })
63        else:
64            masked.append({
65                **obs,
66                "masked": False
67            })
68    
69    return masked
70```
71 
72## KV-Cache Optimization
73 
74### Prefix Stability
75 
76KV-cache hit rates depend on prefix stability. Stable prefixes enable cache reuse across requests. Dynamic prefixes invalidate cache and force recomputation.
77 
78Elements that should remain stable include system prompts, tool definitions, and frequently used templates. Elements that may vary include timestamps, session identifiers, and query-specific content.
79 
80### Cache-Friendly Design
81 
82Design prompts to maximize cache hit rates:
83 
841. Place stable content at the beginning
852. Use consistent formatting across requests
863. Avoid dynamic content in prompts when possible
874. Use placeholders for dynamic content
88 
89```python
90# Cache-unfriendly: Dynamic timestamp in prompt
91system_prompt = f"""
92Current time: {datetime.now().isoformat()}
93You are a helpful assistant.
94"""
95 
96# Cache-friendly: Stable prompt with dynamic time as variable
97system_prompt = """
98You are a helpful assistant.
99Current time is provided separately when relevant.
100"""
101```
102 
103## Context Partitioning Strategies
104 
105### Sub-Agent Isolation
106 
107Partition work across sub-agents to prevent any single context from growing too large. Each sub-agent operates with a clean context focused on its subtask.
108 
109### Partition Planning
110 
111```python
112def plan_partitioning(task: Dict, context_limit: int) -> Dict:
113    """
114    Plan how to partition a task based on context limits.
115    
116    Returns partitioning strategy and subtask definitions.
117    """
118    estimated_context = estimate_task_context(task)
119    
120    if estimated_context <= context_limit:
121        return {
122            "strategy": "single_agent",
123            "subtasks": [task]
124        }
125    
126    # Plan multi-agent approach
127    subtasks = decompose_task(task)
128    
129    return {
130        "strategy": "multi_agent",
131        "subtasks": subtasks,
132        "coordination": "hierarchical"
133    }
134```
135 
136## Optimization Decision Framework
137 
138### When to Optimize
139 
140Consider context optimization when context utilization exceeds 70%, when response quality degrades as conversations extend, when costs increase due to long contexts, or when latency increases with conversation length.
141 
142### What Optimization to Apply
143 
144Choose optimization strategies based on context composition:
145 
146If tool outputs dominate context, apply observation masking. If retrieved documents dominate context, apply summarization or partitioning. If message history dominates context, apply compaction with summarization. If multiple components contribute, combine strategies.
147 
148### Evaluation of Optimization
149 
150After applying optimization, evaluate effectiveness:
151 
152- Measure token reduction achieved
153- Measure quality preservation (output quality should not degrade)
154- Measure latency improvement
155- Measure cost reduction
156 
157Iterate on optimization strategies based on evaluation results.
158 
159## Common Pitfalls
160 
161### Over-Aggressive Compaction
162 
163Compacting too aggressively can remove critical information. Always preserve task goals, user preferences, and recent conversation context. Test compaction at increasing aggressiveness levels to find the optimal balance.
164 
165### Masking Critical Observations
166 
167Masking observations that are still needed can cause errors. Track observation usage and only mask content that is no longer referenced. Consider keeping references to masked content that could be retrieved if needed.
168 
169### Ignoring Attention Distribution
170 
171The lost-in-middle phenomenon means that information placement matters. Place critical information at attention-favored positions (beginning and end of context). Use explicit markers to highlight important content.
172 
173### Premature Optimization
174 
175Not all contexts require optimization. Adding optimization machinery has overhead. Optimize only when context limits actually constrain agent performance.
176 
177## Monitoring and Alerting
178 
179### Key Metrics
180 
181Track these metrics to understand optimization needs:
182 
183- Context token count over time
184- Cache hit rates for repeated patterns
185- Response quality metrics by context size
186- Cost per conversation by context length
187- Latency by context size
188 
189### Alert Thresholds
190 
191Set alerts for:
192 
193- Context utilization above 80%
194- Cache hit rate below 50%
195- Quality score drop of more than 10%
196- Cost increase above baseline
197 
198## Integration Patterns
199 
200### Integration with Agent Framework
201 
202Integrate optimization into agent workflow:
203 
204```python
205class OptimizingAgent:
206    def __init__(self, context_limit: int = 80000):
207        self.context_limit = context_limit
208        self.optimizer = ContextOptimizer()
209    
210    def process(self, user_input: str, context: Dict) -> Dict:
211        # Check if optimization needed
212        if self.optimizer.should_compact(context):
213            context = self.optimizer.compact(context)
214        
215        # Process with optimized context
216        response = self._call_model(user_input, context)
217        
218        # Track metrics
219        self.optimizer.record_metrics(context, response)
220        
221        return response
222```
223 
224### Integration with Memory Systems
225 
226Connect optimization with memory systems:
227 
228```python
229class MemoryAwareOptimizer:
230    def __init__(self, memory_system, context_limit: int):
231        self.memory = memory_system
232        self.limit = context_limit
233    
234    def optimize_context(self, current_context: Dict, task: str) -> Dict:
235        # Check if information is in memory
236        relevant_memories = self.memory.retrieve(task)
237        
238        # Move information to memory if not needed in context
239        for mem in relevant_memories:
240            if mem["importance"] < threshold:
241                current_context = remove_from_context(current_context, mem)
242                # Keep reference that memory can be retrieved
243        
244        return current_context
245```
246 
247## Performance Benchmarks
248 
249### Compaction Performance
250 
251Compaction should reduce token count while preserving quality. Target:
252 
253- 50-70% token reduction for aggressive compaction
254- Less than 5% quality degradation from compaction
255- Less than 10% latency increase from compaction overhead
256 
257### Masking Performance
258 
259Observation masking should reduce token count significantly:
260 
261- 60-80% reduction in masked observations
262- Less than 2% quality impact from masking
263- Near-zero latency overhead
264 
265### Cache Performance
266 
267KV-cache optimization should improve cost and latency:
268 
269- 70%+ cache hit rate for stable workloads
270- 50%+ cost reduction from cache hits
271- 40%+ latency reduction from cache hits
272 
273
Preparing the source view

Agent Skills for Context Engineering

skills/context-optimization/references/optimization_techniques.md