Loading source
Pulling the file list, source metadata, and syntax-aware rendering for this listing.
Source from repo
A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems.
Files
Skill
Size
Entrypoint
Format
Open file
Syntax-highlighted preview of this file as included in the skill package.
skills/hosted-agents/SKILL.md
1---2name: hosted-agents3description: This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents, sandboxed VMs, agent infrastructure, Modal sandboxes, self-spawning agents, or remote coding environments.4---56# Hosted Agent Infrastructure78Hosted agents run in remote sandboxed environments rather than on local machines. When designed well, they provide unlimited concurrency, consistent execution environments, and multiplayer collaboration. The critical insight is that session speed should be limited only by model provider time-to-first-token, with all infrastructure setup completed before the user starts their session.910## When to Activate1112Activate this skill when:13- Building background coding agents that run independently of user devices14- Designing sandboxed execution environments for agent workloads15- Implementing multiplayer agent sessions with shared state16- Creating multi-client agent interfaces (Slack, Web, Chrome extensions)17- Scaling agent infrastructure beyond local machine constraints18- Building systems where agents spawn sub-agents for parallel work1920## Core Concepts2122Move agent execution to remote sandboxed environments to eliminate the fundamental limits of local execution: resource contention, environment inconsistency, and single-user constraints. Remote sandboxes unlock unlimited concurrency, reproducible environments, and collaborative workflows because each session gets its own isolated compute with a known-good environment image.2324Design the architecture in three layers because each layer scales independently. Build sandbox infrastructure for isolated execution, an API layer for state management and client coordination, and client interfaces for user interaction across platforms. Keep these layers cleanly separated so sandbox changes do not ripple into clients.2526## Detailed Topics2728### Sandbox Infrastructure2930**The Core Challenge**31Eliminate sandbox spin-up latency because users perceive anything over a few seconds as broken. Development environments require cloning repositories, installing dependencies, and running build steps -- do all of this before the user ever submits a prompt.3233**Image Registry Pattern**34Pre-build environment images on a regular cadence (every 30 minutes works well) because this makes synchronization with the latest code a fast delta rather than a full clone. Include in each image:35- Cloned repository at a known commit36- All runtime dependencies installed37- Initial setup and build commands completed38- Cached files from running app and test suite once3940When starting a session, spin up a sandbox from the most recent image. The repository is at most 30 minutes out of date, making the remaining git sync fast.4142**Snapshot and Restore**43Take filesystem snapshots at key points to enable instant restoration for follow-up prompts without re-running setup:44- After initial image build (base snapshot)45- When agent finishes making changes (session snapshot)46- Before sandbox exit for potential follow-up4748**Git Configuration for Background Agents**49Configure git identity explicitly in every sandbox because background agents are not tied to a specific user during image builds:50- Generate GitHub app installation tokens for repository access during clone51- Set git config `user.name` and `user.email` when committing and pushing changes52- Use the prompting user's identity for commits, not the app identity5354**Warm Pool Strategy**55Maintain a pool of pre-warmed sandboxes for high-volume repositories because cold starts are the primary source of user frustration:56- Keep sandboxes ready before users start sessions57- Expire and recreate pool entries as new image builds complete58- Start warming a sandbox as soon as a user begins typing (predictive warm-up)5960### Agent Framework Selection6162**Server-First Architecture**63Structure the agent framework as a server first, with TUI and desktop apps as thin clients, because this prevents duplicating agent logic across surfaces:64- Multiple custom clients share one agent backend65- Consistent behavior across all interaction surfaces66- Plugin systems extend functionality without client changes67- Event-driven architectures deliver real-time updates to any connected client6869**Code as Source of Truth**70Select frameworks where the agent can read its own source code to understand behavior. Prioritize this because having code as source of truth prevents the agent from hallucinating about its own capabilities -- an underrated failure mode in AI development.7172**Plugin System Requirements**73Require a plugin system that supports runtime interception because this enables safety controls and observability without modifying core agent logic:74- Listen to tool execution events (e.g., `tool.execute.before`)75- Block or modify tool calls conditionally76- Inject context or state at runtime7778### Speed Optimizations7980**Predictive Warm-Up**81Start warming the sandbox as soon as a user begins typing their prompt, not when they submit it, because the typing interval (5-30 seconds) is enough to complete most setup:82- Clone latest changes in parallel with user typing83- Run initial setup before user hits enter84- For fast spin-up, sandbox can be ready before user finishes typing8586**Parallel File Reading**87Allow the agent to start reading files immediately even if sync from latest base branch is not complete, because in large repositories incoming prompts rarely touch recently-changed files:88- Agent can research immediately without waiting for git sync89- Block file edits (not reads) until synchronization completes90- This separation is safe because read-time data staleness of 30 minutes rarely matters for research9192**Maximize Build-Time Work**93Move everything possible to the image build step because build-time duration is invisible to users:94- Full dependency installation95- Database schema setup96- Initial app and test suite runs (populates caches)9798### Self-Spawning Agents99100**Agent-Spawned Sessions**101Build tools that allow agents to spawn new sessions because frontier models are capable of decomposing work and coordinating sub-tasks:102- Research tasks across different repositories103- Parallel subtask execution for large changes104- Multiple smaller PRs from one major task105106Expose three primitives: start a new session with specified parameters, read status of any session (check-in capability), and continue main work while sub-sessions run in parallel.107108**Prompt Engineering for Self-Spawning**109Engineer prompts that guide when agents should spawn sub-sessions rather than doing work inline:110- Research tasks that require cross-repository exploration111- Breaking monolithic changes into smaller PRs112- Parallel exploration of different approaches113114### API Layer115116**Per-Session State Isolation**117Isolate state per session (SQLite per session works well) because cross-session interference is a subtle and hard-to-debug failure mode:118- Dedicated database per session119- No session can impact another's performance120- Architecture handles hundreds of concurrent sessions121122**Real-Time Streaming**123Stream all agent work in real-time because high-frequency feedback is critical for user trust:124- Token streaming from model providers125- Tool execution status updates126- File change notifications127128Use WebSocket connections with hibernation APIs to reduce compute costs during idle periods while maintaining open connections.129130**Synchronization Across Clients**131Build a single state system that synchronizes across all clients (chat interfaces, Slack bots, Chrome extensions, web interfaces, VS Code instances) because users switch surfaces frequently and expect continuity. All changes sync to the session state, enabling seamless client switching.132133### Multiplayer Support134135**Why Multiplayer Matters**136Design for multiplayer from day one because it is nearly free to add with proper synchronization architecture, and it unlocks high-value workflows:137- Teaching non-engineers to use AI effectively138- Live QA sessions with multiple team members139- Real-time PR review with immediate changes140- Collaborative debugging sessions141142**Implementation Requirements**143Build the data model so sessions are not tied to single authors because multiplayer fails silently if authorship is hardcoded:144- Pass authorship info to each prompt145- Attribute code changes to the prompting user146- Share session links for instant collaboration147148### Authentication and Authorization149150**User-Based Commits**151Use GitHub authentication to open PRs on behalf of the user (not the app) because this preserves the audit trail and prevents users from approving their own AI-generated changes:152- Obtain user tokens for PR creation153- PRs appear as authored by the human, not the bot154155**Sandbox-to-API Flow**156Follow this sequence because it keeps sandbox permissions minimal while letting the API handle sensitive operations:1571. Sandbox pushes changes (updating git user config)1582. Sandbox sends event to API with branch name and session ID1593. API uses user's GitHub token to create PR1604. GitHub webhooks notify API of PR events161162### Client Implementations163164**Slack Integration**165Prioritize Slack as the first distribution channel for internal adoption because it creates a virality loop as team members see others using it:166- No syntax required, natural chat interface167- Build a classifier (fast model with repo descriptions) to determine which repository to work in168- Include hints for common repositories; allow "unknown" for ambiguous cases169170**Web Interface**171Build a web interface with these features because it serves as the primary power-user surface:172- Real-time streaming of agent work on desktop and mobile173- Hosted VS Code instance running inside sandbox174- Streamed desktop view for visual verification175- Before/after screenshots for PRs176- Statistics page: sessions resulting in merged PRs (primary metric), usage over time, live "humans prompting" count177178**Chrome Extension**179Build a Chrome extension for non-engineering users because DOM and React internals extraction gives higher precision than raw screenshots at lower token cost:180- Sidebar chat interface with screenshot tool181- Extract DOM/React internals instead of raw images182- Distribute via managed device policy (bypasses Chrome Web Store)183184## Practical Guidance185186### Follow-Up Message Handling187188Choose between queueing and inserting follow-up messages sent during execution. Prefer queueing because it is simpler to manage and lets users send thoughts on next steps while the agent works. Build a mechanism to stop the agent mid-execution when needed, because without it users feel trapped.189190### Metrics That Matter191192Track these metrics because they indicate real value rather than vanity usage:193- Sessions resulting in merged PRs (primary success metric)194- Time from session start to first model response195- PR approval rate and revision count196- Agent-written code percentage across repositories197198### Adoption Strategy199200Drive internal adoption through visibility rather than mandates because forced usage breeds resentment:201- Work in public spaces (Slack channels) for visibility202- Let the product create virality loops203- Do not force usage over existing tools204- Build to people's needs, not hypothetical requirements205206## Guidelines2072081. Pre-build environment images on regular cadence (30 minutes is a good default)2092. Start warming sandboxes when users begin typing, not when they submit2103. Allow file reads before git sync completes; block only writes2114. Structure agent framework as server-first with clients as thin wrappers2125. Isolate state per session to prevent cross-session interference2136. Attribute commits to the user who prompted, not the app2147. Track merged PRs as primary success metric2158. Build for multiplayer from the start; it is nearly free with proper sync architecture216217## Gotchas2182191. **Cold start latency**: First sandbox spin-up takes 30-60s and users perceive this as broken. Use warm pools and predictive warm-up on keystroke to eliminate perceived wait time.2202. **Image staleness**: Infrequent image rebuilds mean agents run with outdated dependencies or code. Set a 30-minute rebuild cadence and monitor image age; alert if builds fail silently.2213. **Sandbox cost runaway**: Long-running agents without timeout or budget caps accumulate unexpected costs. Set hard timeout limits (default 4 hours) and per-session cost ceilings.2224. **Auth token expiration mid-session**: Long tasks fail when GitHub tokens expire partway through. Implement token refresh logic and check token validity before sensitive operations like PR creation.2235. **Git config in sandboxes**: Missing `user.name` or `user.email` causes commit failures in background agents. Always set git identity explicitly during sandbox configuration, never assume it carries over from the image.2246. **State loss on sandbox recycle**: Agents lose completed work if the sandbox is recycled or times out before results are extracted. Always snapshot before termination and extract artifacts (branches, PRs, files) before letting the sandbox die.2257. **Oversubscribing warm pools**: Maintaining too many warm sandboxes wastes money during low-traffic periods. Scale pool size based on traffic patterns and time-of-day; use autoscaling rather than fixed pool sizes.2268. **Missing output extraction**: Agents complete work inside the sandbox but results never get pulled out to the user. Build explicit extraction steps (push branch, create PR, return file contents) into the session teardown flow.227228## Integration229230This skill builds on multi-agent-patterns for agent coordination and tool-design for agent-tool interfaces. It connects to:231232- multi-agent-patterns - Self-spawning agents follow supervisor patterns233- tool-design - Building tools for agent spawning and status checking234- context-optimization - Managing context across distributed sessions235- filesystem-context - Using filesystem for session state and artifacts236237## References238239Internal reference:240- [Infrastructure Patterns](./references/infrastructure-patterns.md) - Read when: implementing sandbox lifecycle, image builds, or warm pool logic for the first time241242Related skills in this collection:243- multi-agent-patterns - Read when: designing self-spawning or supervisor coordination patterns244- tool-design - Read when: building tools for agent session management or status checking245- context-optimization - Read when: context windows fill up across distributed agent sessions246247External resources:248- [Ramp](https://builders.ramp.com/post/why-we-built-our-background-agent) - Read when: evaluating whether to build vs. buy background agent infrastructure249- [Modal Sandboxes](https://modal.com/docs/guide/sandbox) - Read when: choosing a cloud sandbox provider or comparing isolation models250- [Cloudflare Durable Objects](https://developers.cloudflare.com/durable-objects/) - Read when: designing per-session state management with WebSocket hibernation251- [OpenCode](https://github.com/sst/opencode) - Read when: selecting a server-first agent framework or studying plugin architectures252253---254255## Skill Metadata256257**Created**: 2026-01-12258**Last Updated**: 2026-03-17259**Author**: Agent Skills for Context Engineering Contributors260**Version**: 1.1.0261