Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
researcher/benchmarks/effectiveness/tasks/001-filesystem-context-offload/verify.sh
1#!/usr/bin/env bash2# Verifier for task 001-filesystem-context-offload.3# Runs inside the temp workspace built by the SDK runner. Exit 0 = task passed.4set -uo pipefail56EXPECTED_VALUE="8475"7EXPECTED_LINE="API_RATE_LIMIT=${EXPECTED_VALUE}"89# Check 1: the agent actually located the right value in its final response.10# The runner writes the agent's final assistant text to .runner/final.txt before invoking verify.11if [ ! -f .runner/final.txt ]; then12echo "verify: missing .runner/final.txt (runner did not stage final response)" >&213exit 1114fi1516if ! grep -q "${EXPECTED_LINE}" .runner/final.txt; then17echo "verify: final response does not contain ${EXPECTED_LINE}" >&218exit 1219fi2021# Check 2 (skill-behavior signal): scratch directory exists.22if [ ! -d scratch ]; then23echo "verify: no scratch/ directory; agent did not offload (still counts as task pass on response, but logged)" >&224echo "scratch_dir_missing" > .runner/notes.txt25exit 026fi2728# Check 3 (skill-behavior signal): something in scratch/ contains lines copied from tool_output.txt.29shopt -s nullglob30if compgen -G "scratch/*" > /dev/null; then31if grep -F -l -m 1 -q "API_RATE_LIMIT" scratch/* 2>/dev/null; then32echo "scratch_used" > .runner/notes.txt33else34echo "scratch_empty_or_unrelated" > .runner/notes.txt35fi36fi3738exit 039