Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
examples/llm-as-judge-skills/tools/evaluation/generate-rubric.md
1# Generate Rubric Tool23## Purpose45Automatically generate a scoring rubric for a given evaluation criterion. Creates detailed descriptions for each score level to ensure consistent evaluation.67## Tool Definition89```typescript10import { tool } from "ai";11import { z } from "zod";1213export const generateRubric = tool({14description: `Generate a scoring rubric for an evaluation criterion.15Creates detailed descriptions for each score level.16Use when you need to establish consistent evaluation standards.`,1718parameters: z.object({19criterionName: z.string()20.describe("Name of the criterion (e.g., 'Factual Accuracy')"),2122criterionDescription: z.string()23.describe("What this criterion measures"),2425scale: z.enum(["1-3", "1-5", "1-10"]).default("1-5")26.describe("Scoring scale to use"),2728domain: z.string().optional()29.describe("Domain context (e.g., 'medical writing', 'code review')"),3031includeExamples: z.boolean().default(true)32.describe("Include example text for each score level"),3334strictness: z.enum(["lenient", "balanced", "strict"]).default("balanced")35.describe("How strictly to define score boundaries")36}),3738execute: async (input) => {39return generateRubricWithLLM(input);40}41});42```4344## Input Schema4546| Field | Type | Required | Description |47|-------|------|----------|-------------|48| criterionName | string | Yes | Name of criterion |49| criterionDescription | string | Yes | What criterion measures |50| scale | enum | No | Scoring scale (default: 1-5) |51| domain | string | No | Domain for context |52| includeExamples | boolean | No | Include examples (default: true) |53| strictness | enum | No | Score boundary strictness |5455## Output Schema5657```typescript58interface GeneratedRubric {59success: boolean;6061criterion: {62name: string;63description: string;64};6566scale: {67min: number;68max: number;69type: string;70};7172levels: {73score: number;74label: string; // e.g., "Excellent", "Poor"75description: string; // Detailed description76characteristics: string[]; // Key characteristics77example?: string; // Example text at this level78}[];7980scoringGuidelines: string[];8182edgeCases: {83situation: string;84guidance: string;85}[];8687metadata: {88domain: string | null;89strictness: string;90generationTimeMs: number;91};92}93```9495## Usage Example9697```typescript98const rubric = await generateRubric.execute({99criterionName: "Code Readability",100criterionDescription: "How easy the code is to read and understand",101scale: "1-5",102domain: "code review",103includeExamples: true,104strictness: "balanced"105});106107// Result:108// {109// criterion: {110// name: "Code Readability",111// description: "How easy the code is to read and understand"112// },113// scale: { min: 1, max: 5, type: "1-5" },114// levels: [115// {116// score: 1,117// label: "Poor",118// description: "Code is extremely difficult to understand...",119// characteristics: [120// "No meaningful variable names",121// "Deeply nested logic without explanation",122// "No comments on complex sections"123// ],124// example: "function x(a,b,c){return a?b+c:c-b;}"125// },126// {127// score: 5,128// label: "Excellent",129// description: "Code is immediately understandable...",130// characteristics: [131// "Self-documenting variable and function names",132// "Appropriate comments explaining 'why'",133// "Clear logical structure"134// ],135// example: "function calculateShippingCost(weight, distance, expedited) {\n // Base rate plus per-mile charge\n const baseCost = weight * BASE_RATE_PER_KG;\n ..."136// },137// ...138// ],139// scoringGuidelines: [140// "Focus on clarity for someone unfamiliar with the codebase",141// "Consider both naming and structure",142// "Comments should explain 'why', not 'what'"143// ],144// edgeCases: [145// {146// situation: "Code uses domain-specific abbreviations",147// guidance: "Accept if abbreviations are standard in the domain"148// }149// ]150// }151```152153## Rubric Templates154155### Factual Accuracy (1-5)156```1575: All claims factually correct, properly sourced1584: Minor factual issues, non-critical1593: Some factual errors, main points correct1602: Multiple factual errors affecting reliability1611: Fundamentally incorrect or misleading162```163164### Clarity (1-5)165```1665: Immediately understandable, well-structured1674: Clear with minor ambiguities1683: Generally clear, some confusing sections1692: Difficult to follow, unclear organization1701: Incomprehensible or incoherent171```172173### Completeness (1-5)174```1755: Addresses all aspects comprehensively1764: Covers main points, minor gaps1773: Addresses core requirements, notable gaps1782: Missing significant required elements1791: Fails to address the question180```181182## Implementation Notes1831841. **Domain Adaptation**: Rubrics should reflect domain-specific expectations1852. **Boundary Clarity**: Clear distinctions between adjacent scores1863. **Example Quality**: Examples should be realistic, not strawmen1874. **Edge Case Coverage**: Anticipate common ambiguous situations1885. **Calibration**: Test rubric against known samples before use189190