Source from repo

Agent Skills for Context Engineering

A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.

muratcankoylanGitHub muratcankoylanSource repo Original GitHub link

Files

339

Skill

n/a

Size

4.3 MB

Entrypoint

SKILL.md

Format

git-repo

Open file

skills/advanced-evaluation/references/evaluation-pipeline.md

Syntax-highlighted preview of this file as included in the skill package.

Rendered Source

markdown44 linesFree

skills/advanced-evaluation/references/evaluation-pipeline.md

1# Evaluation Pipeline Diagram
2 
3Visual layout of a production evaluation pipeline.
4 
5```
6┌─────────────────────────────────────────────────┐
7│                 Evaluation Pipeline              │
8├─────────────────────────────────────────────────┤
9│                                                   │
10│  Input: Response + Prompt + Context               │
11│           │                                       │
12│           ▼                                       │
13│  ┌─────────────────────┐                         │
14│  │   Criteria Loader   │ ◄── Rubrics, weights    │
15│  └──────────┬──────────┘                         │
16│             │                                     │
17│             ▼                                     │
18│  ┌─────────────────────┐                         │
19│  │   Primary Scorer    │ ◄── Direct or Pairwise  │
20│  └──────────┬──────────┘                         │
21│             │                                     │
22│             ▼                                     │
23│  ┌─────────────────────┐                         │
24│  │   Bias Mitigation   │ ◄── Position swap, etc. │
25│  └──────────┬──────────┘                         │
26│             │                                     │
27│             ▼                                     │
28│  ┌─────────────────────┐                         │
29│  │ Confidence Scoring  │ ◄── Calibration         │
30│  └──────────┬──────────┘                         │
31│             │                                     │
32│             ▼                                     │
33│  Output: Scores + Justifications + Confidence     │
34│                                                   │
35└─────────────────────────────────────────────────┘
36```
37 
38## Pipeline Stages
39 
401. **Criteria Loader**: Loads rubrics and criterion weights from configuration
412. **Primary Scorer**: Applies direct scoring or pairwise comparison
423. **Bias Mitigation**: Runs position swaps, length normalization, and other debiasing
434. **Confidence Scoring**: Calibrates confidence based on position consistency and evidence strength
44

Preparing the source view

Agent Skills for Context Engineering

skills/advanced-evaluation/references/evaluation-pipeline.md