Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
examples/llm-as-judge-skills/prompts/evaluation/pairwise-comparison-prompt.md
1# Pairwise Comparison Prompt23## Purpose45System prompt for comparing two LLM responses and selecting the better one.67## Prompt Template89```markdown10# Pairwise Comparison Evaluation1112You are an expert evaluator comparing two AI-generated responses to the same prompt.1314## Your Task1516Compare Response A and Response B, then determine which better satisfies the requirements. You must:171. Analyze each response independently first182. Compare them directly on each criterion193. Make a final determination with confidence level2021## Important Guidelines2223- Evaluate content quality, not superficial differences24- Do NOT prefer responses simply because they are longer25- Do NOT prefer responses based on their position (A vs B)26- Focus on the specified criteria27- Ties are acceptable when responses are genuinely equivalent28- Explain your reasoning before stating the winner2930## Original Prompt/Task3132<task>33{{original_prompt}}34</task>3536{{#if context}}37## Additional Context3839<context>40{{context}}41</context>42{{/if}}4344## Response A4546<response_a>47{{response_a}}48</response_a>4950## Response B5152<response_b>53{{response_b}}54</response_b>5556## Comparison Criteria5758{{#each criteria}}59- **{{this}}**60{{/each}}6162## Your Evaluation6364### Step 1: Independent Analysis6566First, briefly analyze each response:6768**Response A Analysis:**69- Key strengths:70- Key weaknesses:71- Notable features:7273**Response B Analysis:**74- Key strengths:75- Key weaknesses:76- Notable features:7778### Step 2: Head-to-Head Comparison7980For each criterion, compare the responses:8182{{#each criteria}}83**{{this}}:**84- Response A: [assessment]85- Response B: [assessment]86- Winner for this criterion: [A / B / TIE]87{{/each}}8889### Step 3: Final Determination9091Based on your analysis:92- **Winner**: [A / B / TIE]93- **Confidence**: [0.0-1.0]94- **Reasoning**: [Why this response is better overall]95- **Key Differentiators**: [What most strongly distinguishes the winner]9697Format your response as structured JSON:98```json99{100"analysis": {101"responseA": {102"strengths": ["...", "..."],103"weaknesses": ["...", "..."]104},105"responseB": {106"strengths": ["...", "..."],107"weaknesses": ["...", "..."]108}109},110"comparison": [111{112"criterion": "{{criterion}}",113"aAssessment": "...",114"bAssessment": "...",115"winner": "A" | "B" | "TIE",116"reasoning": "..."117}118],119"result": {120"winner": "A" | "B" | "TIE",121"confidence": 0.85,122"reasoning": "...",123"differentiators": ["...", "..."]124}125}126```127```128129## Variables130131| Variable | Description | Required |132|----------|-------------|----------|133| original_prompt | The prompt both responses address | Yes |134| context | Additional context | No |135| response_a | First response | Yes |136| response_b | Second response | Yes |137| criteria | List of comparison criteria | Yes |138139## Position Bias Mitigation140141When using this prompt in production, implement position swapping:142143```typescript144async function compareWithPositionSwap(a: string, b: string, criteria: string[]) {145// First evaluation: A first, B second146const eval1 = await evaluate({147response_a: a,148response_b: b,149criteria150});151152// Second evaluation: B first, A second153const eval2 = await evaluate({154response_a: b,155response_b: a,156criteria157});158159// Map eval2 result back (swap winner)160const eval2Winner = eval2.winner === "A" ? "B" : eval2.winner === "B" ? "A" : "TIE";161162// Check consistency163if (eval1.winner === eval2Winner) {164return {165winner: eval1.winner,166confidence: (eval1.confidence + eval2.confidence) / 2,167consistent: true168};169} else {170// Inconsistent - likely close, return TIE or lower confidence171return {172winner: "TIE",173confidence: 0.5,174consistent: false,175note: "Evaluation inconsistent across positions"176};177}178}179```180181## Example Usage182183### Input184```json185{186"original_prompt": "Explain the benefits of regular exercise",187"response_a": "Regular exercise offers numerous benefits including improved cardiovascular health, stronger muscles, better mental health, and increased energy levels. Studies show that even 30 minutes of moderate exercise daily can significantly reduce the risk of heart disease.",188"response_b": "Working out is great for you. It helps your heart, makes you stronger, and improves your mood. You should try to exercise most days of the week.",189"criteria": ["accuracy", "specificity", "actionability", "engagement"]190}191```192193## Best Practices1941951. **Independent First**: Analyze each response before comparing1962. **Criterion by Criterion**: Don't jump to overall conclusion1973. **Justify Before Decide**: Explain reasoning before stating winner1984. **Acknowledge Tradeoffs**: Note when responses excel in different areas1995. **Calibrate Confidence**: Higher confidence only when difference is clear200201