How Docker’s Virtual Agent Fleet Accelerates Shipping with Autonomous AI Roles

Published: 2026-05-01 18:24:21 | Category: AI & Machine Learning

Introduction

At Docker, the Coding Agent Sandboxes (sbx) team has reimagined how software development and testing can be automated. They created a fleet of seven AI agents—each with a distinct role—that autonomously test products, triage issues, post release notes, and even fix bugs. Running entirely within CI and on developers’ laptops, this virtual team marks a shift from scripted automation to intelligent, judgment-driven workflows.

How Docker’s Virtual Agent Fleet Accelerates Shipping with Autonomous AI Roles — Source: www.docker.com

The Fleet Approach: Roles, Not Scripts

The Fleet is built on Claude Code skills—markdown files that define a persona, responsibilities, and allowed tools. Unlike traditional automation scripts that execute fixed steps, a skill file acts as a role description: “You are the build engineer. Here’s what you know and how you make decisions.” This distinction is crucial because agents require judgment, not just instructions. When a test fails unexpectedly, a script stops; a role investigates.

One skill file defines the behavior whether running on a local machine or in CI. The same file, the same logic, no environment-specific adaptations.

Building Agent Skills: Iterative, Personal, and Transparent

Creating an agent skill is an iterative process. The team begins by running the skill locally, watching the agent think, make decisions, and report results. They tweak the skill file until it behaves correctly in their terminal. Only then is it integrated into a CI pipeline.

This local-first approach avoids the painful cycle of commit-push-wait-read-logs. Debugging takes seconds instead of minutes. Developers see where the agent gets confused and fix the skill file immediately. The same skill later runs on GitHub Actions runners for macOS, Linux, and Windows, with the workflow simply setting up the environment and invoking the skill.

Local First, CI Second: Empowering Rapid Iteration

The Coding Agent Sandboxes CLI (sbx) manages sandbox lifecycles on macOS, Linux, and Windows. Each release demands testing across platforms, upgrade paths, and sustained load to catch resource leaks. The team also needs daily visibility into shipped changes and a way to manage a growing issue backlog—without it becoming a full-time job.

Rather than writing traditional test scripts, they built agent roles that handle these tasks autonomously. The design principle is simple: every skill runs on your machine first, then in CI. This eliminates translation layers and ensures that the CI version is identical to the local version.

For instance, the /cli-tester skill—the Fleet’s exploratory tester—was developed entirely locally. The developer watched it build binaries, exercise CLI commands, find issues, and report back. Only after the skill worked correctly was it wired into a nightly workflow that runs on all platforms.

A Closer Look at the Agent Roles

The Fleet comprises seven distinct AI agent roles, each with a specific responsibility:

Exploratory Tester – Probes the CLI for unexpected behavior and edge cases.
Release Notes Writer – Compiles changelogs from recent commits and issue resolutions.
Bug Triage Specialist – Categorizes incoming issues, assigns severity, and routes to the right team member.
Build Engineer – Manages compilation, packaging, and dependency updates across platforms.
Load Tester – Simulates sustained usage to detect memory leaks and performance regressions.
Documentation Auditor – Reviews docs for accuracy and completeness against code changes.
Quality Gatekeeper – Runs before each release to verify all criteria are met.

These roles operate autonomously in CI, but they can also be summoned locally for manual testing or debugging. The agent’s decisions are transparent—developers can inspect logs, see the reasoning, and even intervene if the agent gets stuck.

Results and Impact: Shipping Faster with Confidence

Since deploying the Fleet, the sbx team reports significant gains in productivity and reliability. The agents run nightly, catching regressions that would otherwise require manual testing. Release notes are generated automatically, and the issue backlog is triaged continuously. Bug fixes are even proposed by the agents, reducing the time from discovery to fix.

By using the same skill files locally and in CI, the team eliminates the disconnect between development and automation. Any developer can invoke any agent from their terminal, test changes, and commit improved skills—all without touching the CI configuration. This accelerates the iteration loop and empowers every team member to refine the Fleet.

Conclusion: The Future of Autonomous Development

The Docker Coding Agent Sandboxes team’s virtual agent fleet demonstrates a new paradigm for software lifecycle automation. Instead of brittle scripts, they built roles that exercise judgment, learn from context, and operate consistently across environments. The local-first, CI-second approach reduces friction and makes agent development a joy rather than a chore.

As AI coding agents become more capable, teams like sbx show that the real power lies not in replacing humans, but in creating intelligent, autonomous teammates that work alongside them—shipping faster, with fewer bugs, and with greater clarity.

Jeribah