Source from repo

Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.

muratcankoylanGitHub muratcankoylanSource repo Original GitHub link

Files

241

Skill

n/a

Size

2.6 MB

Entrypoint

SKILL.md

Format

git-repo

Open file

examples/llm-as-judge-skills/tools/evaluation/generate-rubric.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown190 linesFree

examples/llm-as-judge-skills/tools/evaluation/generate-rubric.md

1# Generate Rubric Tool
2 
3## Purpose
4 
5Automatically generate a scoring rubric for a given evaluation criterion. Creates detailed descriptions for each score level to ensure consistent evaluation.
6 
7## Tool Definition
8 
9```typescript
10import { tool } from "ai";
11import { z } from "zod";
12 
13export const generateRubric = tool({
14  description: `Generate a scoring rubric for an evaluation criterion.
15Creates detailed descriptions for each score level.
16Use when you need to establish consistent evaluation standards.`,
17 
18  parameters: z.object({
19    criterionName: z.string()
20      .describe("Name of the criterion (e.g., 'Factual Accuracy')"),
21    
22    criterionDescription: z.string()
23      .describe("What this criterion measures"),
24    
25    scale: z.enum(["1-3", "1-5", "1-10"]).default("1-5")
26      .describe("Scoring scale to use"),
27    
28    domain: z.string().optional()
29      .describe("Domain context (e.g., 'medical writing', 'code review')"),
30    
31    includeExamples: z.boolean().default(true)
32      .describe("Include example text for each score level"),
33    
34    strictness: z.enum(["lenient", "balanced", "strict"]).default("balanced")
35      .describe("How strictly to define score boundaries")
36  }),
37 
38  execute: async (input) => {
39    return generateRubricWithLLM(input);
40  }
41});
42```
43 
44## Input Schema
45 
46| Field | Type | Required | Description |
47|-------|------|----------|-------------|
48| criterionName | string | Yes | Name of criterion |
49| criterionDescription | string | Yes | What criterion measures |
50| scale | enum | No | Scoring scale (default: 1-5) |
51| domain | string | No | Domain for context |
52| includeExamples | boolean | No | Include examples (default: true) |
53| strictness | enum | No | Score boundary strictness |
54 
55## Output Schema
56 
57```typescript
58interface GeneratedRubric {
59  success: boolean;
60  
61  criterion: {
62    name: string;
63    description: string;
64  };
65  
66  scale: {
67    min: number;
68    max: number;
69    type: string;
70  };
71  
72  levels: {
73    score: number;
74    label: string;        // e.g., "Excellent", "Poor"
75    description: string;  // Detailed description
76    characteristics: string[];  // Key characteristics
77    example?: string;     // Example text at this level
78  }[];
79  
80  scoringGuidelines: string[];
81  
82  edgeCases: {
83    situation: string;
84    guidance: string;
85  }[];
86  
87  metadata: {
88    domain: string | null;
89    strictness: string;
90    generationTimeMs: number;
91  };
92}
93```
94 
95## Usage Example
96 
97```typescript
98const rubric = await generateRubric.execute({
99  criterionName: "Code Readability",
100  criterionDescription: "How easy the code is to read and understand",
101  scale: "1-5",
102  domain: "code review",
103  includeExamples: true,
104  strictness: "balanced"
105});
106 
107// Result:
108// {
109//   criterion: {
110//     name: "Code Readability",
111//     description: "How easy the code is to read and understand"
112//   },
113//   scale: { min: 1, max: 5, type: "1-5" },
114//   levels: [
115//     {
116//       score: 1,
117//       label: "Poor",
118//       description: "Code is extremely difficult to understand...",
119//       characteristics: [
120//         "No meaningful variable names",
121//         "Deeply nested logic without explanation",
122//         "No comments on complex sections"
123//       ],
124//       example: "function x(a,b,c){return a?b+c:c-b;}"
125//     },
126//     {
127//       score: 5,
128//       label: "Excellent",
129//       description: "Code is immediately understandable...",
130//       characteristics: [
131//         "Self-documenting variable and function names",
132//         "Appropriate comments explaining 'why'",
133//         "Clear logical structure"
134//       ],
135//       example: "function calculateShippingCost(weight, distance, expedited) {\n  // Base rate plus per-mile charge\n  const baseCost = weight * BASE_RATE_PER_KG;\n  ..."
136//     },
137//     ...
138//   ],
139//   scoringGuidelines: [
140//     "Focus on clarity for someone unfamiliar with the codebase",
141//     "Consider both naming and structure",
142//     "Comments should explain 'why', not 'what'"
143//   ],
144//   edgeCases: [
145//     {
146//       situation: "Code uses domain-specific abbreviations",
147//       guidance: "Accept if abbreviations are standard in the domain"
148//     }
149//   ]
150// }
151```
152 
153## Rubric Templates
154 
155### Factual Accuracy (1-5)
156```
1575: All claims factually correct, properly sourced
1584: Minor factual issues, non-critical
1593: Some factual errors, main points correct
1602: Multiple factual errors affecting reliability
1611: Fundamentally incorrect or misleading
162```
163 
164### Clarity (1-5)
165```
1665: Immediately understandable, well-structured
1674: Clear with minor ambiguities
1683: Generally clear, some confusing sections
1692: Difficult to follow, unclear organization
1701: Incomprehensible or incoherent
171```
172 
173### Completeness (1-5)
174```
1755: Addresses all aspects comprehensively
1764: Covers main points, minor gaps
1773: Addresses core requirements, notable gaps
1782: Missing significant required elements
1791: Fails to address the question
180```
181 
182## Implementation Notes
183 
1841. **Domain Adaptation**: Rubrics should reflect domain-specific expectations
1852. **Boundary Clarity**: Clear distinctions between adjacent scores
1863. **Example Quality**: Examples should be realistic, not strawmen
1874. **Edge Case Coverage**: Anticipate common ambiguous situations
1885. **Calibration**: Test rubric against known samples before use
189 
190

Preparing the source view

Agent Skills for Context Engineering

examples/llm-as-judge-skills/tools/evaluation/generate-rubric.md