Orchestrating AI — Session Grounding

01

How do I give an agent enough context to succeed?

"I can't pay attention to everything at once. And there's also the risk of two different agents modifying things that are slightly related."

— User research participant

The quality of your outcome is decided before the first line of code. A clear plan is the starting line — sloppy input yields sloppy output, regardless of model.

Show the difference between "build me X" and a plan-first approach. The agent orchestrates parallel subagents under the hood — research, codebase scan, doc lookup — you just give it a clear target.

MCPs connect planning to where real work lives — pull from GitHub issues, Work IQ meeting summaries, Jira. Don't copy-paste context; let the agent pull it.

Breaking down plans practically: UI/Scaffold/Tests-first as a forcing function. Tests-first gives the agent a verifiable contract. Scaffold-first prevents architectural drift.

Features to demo

Plan mode

Start with a clear plan before implementation

Clarifying questions

Agent asks instead of assumes

MCPs for context

GitHub issues and Work IQ inputs

Parallel subagents during plan mode

Mermaid diagrams in chat

MCP Apps / Excalidraw (spatial planning)

02

How do I stop repeating myself across sessions?

"There's a lot to sort of process in my brain."

— User research participant

Context engineering is the skill. Not prompting — engineering the persistent world the agent operates in. The best orchestrators build better environments, not better prompts.

Start with the easiest unlock: "just ask the agent." It can scaffold its own instruction files, generate skill files, write its own .agent.md.

Instructions

Repo-level conventions. Always loaded. Set-and-forget.

Skills

Domain expertise. Loaded when relevant. Shareable like linter configs.

Custom Agents

Full personas with tools, skills, grounding docs. Trust you can commit.

Memory

Session memory preserves decisions. Repo memory compounds learning.

Hooks

Deterministic guardrails. Not prompts — actual code that executes.

Features to demo

"Ask the agent"

Generate instruction and skill files in-workspace

Layered customization

Instructions → Skills → Custom agents

Memory

Session + repo memory — persist and reuse context

Hooks

Linter-on-edit, deny patterns, org policy

AGENTS.md / .prompt.md layers

Skills from extensions (chatSkills)

03

When should I run more than one agent?

"They're helping me get a week's worth of work done in a day. But it also feels like I'm having to process a week's worth of information in a day as well."

— User research participant

The answer isn't "always" — it's about matching agent type to task shape. Three types, three decision criteria.

The operating model: stay hands-on locally for visual work, delegate scoped tasks to background, send explorative work to cloud. This is the judgment call — a practice, not a feature.

The daily rhythm: your actual morning/evening pattern. When do you use plan mode? When do you skip it? When do you go straight to background? This is orchestration in practice.

✎

Local

Visual, interactive, needs your judgment mid-flight. Stay hands-on.

your workspace

⚙

Background

Well-scoped, deterministic. Runs in an isolated worktree with terminal sandbox.

isolated worktree

☁

Cloud

Explorative, can take time. Changes land as PRs with CI gates.

lands as PR

Features to demo

Agent Sessions view

Mission control — status and check-ins

Background + worktrees

Run scoped tasks in isolation

Cloud agents

Explorative work → PR landing

Copilot in PRs

Assign issue → wake up to drafts

Terminal sandbox (fs/network controls)

Multi-model (Claude/Codex/Copilot)

Subagents (internal parallelism)

Terminal-first CLI flows

04

How do I know when to trust the output?

"The anxiety was not worth the few seconds I saved... I would still stick to one task at a time."

— User research participant

Trust isn't binary — it's a spectrum you calibrate over time. The anxiety is real and valid. What resolves it isn't blind confidence — it's layered controls.

Start tight, loosen as you learn. Side project? Approve-all. Production code? Full sandbox + hooks + code review. The controls exist so you go faster with confidence.

During execution

Prevention

Hooks fire on every tool call. Terminal sandbox restricts filesystem and network. Deterministic code, not prompts.

Hooks Sandbox Auto-approve

At review time

Review

Copilot Code Review on PRs with risk-prioritized highlighting. Grounded in your custom instructions.

CCR Risk flags Summaries

After completion

Verification

Custom agents that encode what "good" looks like. Dogfooding agent with persona, browser MCP, product brief.

.agent.md Playwright Self-review

Features to demo

Hooks + sandbox

Prevention layer during execution

Copilot Code Review

Risk-prioritized review layer

Dogfooding agent

.agent.md + Playwright MCP + skills

Auto-approval rules

Agent self-review patterns

Vision support (screenshot review)

How do I give an agent enough context to succeed?

How do I stop repeating myself across sessions?

When should I run more than one agent?

How do I know when to trust the output?

Plan, direct, multiply, ship.