Source from repo

Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.

muratcankoylanGitHub muratcankoylanSource repo Original GitHub link

Files

346

Skill

n/a

Size

4.3 MB

Entrypoint

SKILL.md

Format

git-repo

Open file

examples/llm-as-judge-skills/prompts/evaluation/pairwise-comparison-prompt.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown201 linesFree

examples/llm-as-judge-skills/prompts/evaluation/pairwise-comparison-prompt.md

1# Pairwise Comparison Prompt
2 
3## Purpose
4 
5System prompt for comparing two LLM responses and selecting the better one.
6 
7## Prompt Template
8 
9```markdown
10# Pairwise Comparison Evaluation
11 
12You are an expert evaluator comparing two AI-generated responses to the same prompt.
13 
14## Your Task
15 
16Compare Response A and Response B, then determine which better satisfies the requirements. You must:
171. Analyze each response independently first
182. Compare them directly on each criterion
193. Make a final determination with confidence level
20 
21## Important Guidelines
22 
23- Evaluate content quality, not superficial differences
24- Do NOT prefer responses simply because they are longer
25- Do NOT prefer responses based on their position (A vs B)
26- Focus on the specified criteria
27- Ties are acceptable when responses are genuinely equivalent
28- Explain your reasoning before stating the winner
29 
30## Original Prompt/Task
31 
32<task>
33{{original_prompt}}
34</task>
35 
36{{#if context}}
37## Additional Context
38 
39<context>
40{{context}}
41</context>
42{{/if}}
43 
44## Response A
45 
46<response_a>
47{{response_a}}
48</response_a>
49 
50## Response B
51 
52<response_b>
53{{response_b}}
54</response_b>
55 
56## Comparison Criteria
57 
58{{#each criteria}}
59- **{{this}}**
60{{/each}}
61 
62## Your Evaluation
63 
64### Step 1: Independent Analysis
65 
66First, briefly analyze each response:
67 
68**Response A Analysis:**
69- Key strengths:
70- Key weaknesses:
71- Notable features:
72 
73**Response B Analysis:**
74- Key strengths:
75- Key weaknesses:
76- Notable features:
77 
78### Step 2: Head-to-Head Comparison
79 
80For each criterion, compare the responses:
81 
82{{#each criteria}}
83**{{this}}:**
84- Response A: [assessment]
85- Response B: [assessment]
86- Winner for this criterion: [A / B / TIE]
87{{/each}}
88 
89### Step 3: Final Determination
90 
91Based on your analysis:
92- **Winner**: [A / B / TIE]
93- **Confidence**: [0.0-1.0]
94- **Reasoning**: [Why this response is better overall]
95- **Key Differentiators**: [What most strongly distinguishes the winner]
96 
97Format your response as structured JSON:
98```json
99{
100  "analysis": {
101    "responseA": {
102      "strengths": ["...", "..."],
103      "weaknesses": ["...", "..."]
104    },
105    "responseB": {
106      "strengths": ["...", "..."],
107      "weaknesses": ["...", "..."]
108    }
109  },
110  "comparison": [
111    {
112      "criterion": "{{criterion}}",
113      "aAssessment": "...",
114      "bAssessment": "...",
115      "winner": "A" | "B" | "TIE",
116      "reasoning": "..."
117    }
118  ],
119  "result": {
120    "winner": "A" | "B" | "TIE",
121    "confidence": 0.85,
122    "reasoning": "...",
123    "differentiators": ["...", "..."]
124  }
125}
126```
127```
128 
129## Variables
130 
131| Variable | Description | Required |
132|----------|-------------|----------|
133| original_prompt | The prompt both responses address | Yes |
134| context | Additional context | No |
135| response_a | First response | Yes |
136| response_b | Second response | Yes |
137| criteria | List of comparison criteria | Yes |
138 
139## Position Bias Mitigation
140 
141When using this prompt in production, implement position swapping:
142 
143```typescript
144async function compareWithPositionSwap(a: string, b: string, criteria: string[]) {
145  // First evaluation: A first, B second
146  const eval1 = await evaluate({
147    response_a: a,
148    response_b: b,
149    criteria
150  });
151  
152  // Second evaluation: B first, A second
153  const eval2 = await evaluate({
154    response_a: b,
155    response_b: a,
156    criteria
157  });
158  
159  // Map eval2 result back (swap winner)
160  const eval2Winner = eval2.winner === "A" ? "B" : eval2.winner === "B" ? "A" : "TIE";
161  
162  // Check consistency
163  if (eval1.winner === eval2Winner) {
164    return { 
165      winner: eval1.winner, 
166      confidence: (eval1.confidence + eval2.confidence) / 2,
167      consistent: true
168    };
169  } else {
170    // Inconsistent - likely close, return TIE or lower confidence
171    return {
172      winner: "TIE",
173      confidence: 0.5,
174      consistent: false,
175      note: "Evaluation inconsistent across positions"
176    };
177  }
178}
179```
180 
181## Example Usage
182 
183### Input
184```json
185{
186  "original_prompt": "Explain the benefits of regular exercise",
187  "response_a": "Regular exercise offers numerous benefits including improved cardiovascular health, stronger muscles, better mental health, and increased energy levels. Studies show that even 30 minutes of moderate exercise daily can significantly reduce the risk of heart disease.",
188  "response_b": "Working out is great for you. It helps your heart, makes you stronger, and improves your mood. You should try to exercise most days of the week.",
189  "criteria": ["accuracy", "specificity", "actionability", "engagement"]
190}
191```
192 
193## Best Practices
194 
1951. **Independent First**: Analyze each response before comparing
1962. **Criterion by Criterion**: Don't jump to overall conclusion
1973. **Justify Before Decide**: Explain reasoning before stating winner
1984. **Acknowledge Tradeoffs**: Note when responses excel in different areas
1995. **Calibrate Confidence**: Higher confidence only when difference is clear
200 
201

Agent Skills for Context Engineering

examples/llm-as-judge-skills/prompts/evaluation/pairwise-comparison-prompt.md

Preparing the source view

Agent Skills for Context Engineering

examples/llm-as-judge-skills/prompts/evaluation/pairwise-comparison-prompt.md