Developer documents

Use SAI through Conductor.

Conductor exposes SAI's vision-and-action loop as MCP tools. Agents observe with describe, resolve targets with locate, act through OS input, then re-describe to verify progress.

Contents

Core loop Setup Model stack Tools Knowledge base Dashboard Config reference Patterns Examples Troubleshooting

Core Loop

Describe

Call describe with a focused hint, such as foreground browser window.

Locate

Resolve the visible target before acting.

Act

Use click, type_text, scroll, drag, key, or hotkey through OS input.

Verify

Re-describe the affected area and check whether state changed.

Retry

If the result is wrong, re-describe, re-locate, and retry with a more specific target.

Setup

On macOS, Conductor runs as a native app and needs Accessibility permission for input control. The grounder endpoint is configurable: local, self-hosted, or remote.

Step 1 — clear macOS quarantine flag on first run

xattr -dr com.apple.quarantine /Applications/conductor-mcp.app

Step 2 — register with Claude Code

claude mcp add conductor-desktop --scope user \
  --env CONDUCTOR_MCP_MOBILE_ANDROID="true" \
  --env CONDUCTOR_MCP_SCREENSHOT_MAX_WIDTH="1280" \
  --env CONDUCTOR_MCP_BACKEND="resident" \
  --env CONDUCTOR_MCP_SCREENSHOT_MODE="file" \
  --env CONDUCTOR_MCP_AUTO_SCREENSHOT="false" \
  --env CONDUCTOR_MCP_GROUNDER_URL="https://your-grounder.example.com" \
  -- /Applications/conductor-mcp.app/Contents/MacOS/conductor-mcp

Model Stack

Conductor uses two separate models with distinct roles. They do not share a context window.

Orchestrating LLM

Your agent model

The model driving the task — Claude, GPT-4o, or any MCP-compatible LLM. It receives prose, labels, coordinates, and state from the grounder, not raw image pixels. It decides what to do next and calls Conductor tools.

Grounder VLM

Vision model

A vision-language model that turns the screen into structured descriptions and coordinates. Powers describe and locate. Configurable via CONDUCTOR_MCP_GROUNDER_URL. Reference model: Qwen3-VL-4B-Instruct (Q4_K_M, GGUF) served via llama.cpp. The grounder_family config key controls prompt formatting.

The grounder can run locally on-device, on a self-hosted GPU server, or at a remote endpoint. Do not assume a specific model, fixed latency, or cloud-free processing unless the deployment confirms it.

Tool Surface

Perception

describe	Read the visible screen and return prose, labels, state, and context.
locate	Resolve a natural-language target to screen coordinates.
crop	Inspect a focused visual region when the full screen is too broad.
wait_for	Wait until a named UI element appears before continuing.
list_windows	Read window titles, focus, bounds, z-order, and modality.
get_scene_graph	Inspect the window topology tree.

OS Input

click / double_click / right_click	Mouse actions against semantic targets.
click_at / drag_at	Coordinate-based precision actions.
type_text	Type into the currently focused field.
key / hotkey	Send keyboard keys and shortcuts.
drag	Drag from one semantic target to another.
scroll	Scroll within a visible region.
mouse_move	Move without clicking, usually for hover UI.

Browser

web_list_tabs	List Chrome tabs when DevTools Protocol is available.
web_eval	Run JavaScript in a tab and return a result.
web_crop	Capture a DOM element by selector, text, or JavaScript expression.
web_mark	Outline an element visually, then use visual tools.

Mobile

mobile_list_devices	List connected Android devices.
describe(device_id)	Observe a specific mobile device screen.
locate(device_id)	Resolve a mobile UI target to coordinates.
mobile_tap / mobile_swipe	Touch input on the device.
mobile_type / mobile_key	Text and key input on the device.
mobile_app_launch / mobile_shell	Launch Android apps or run adb shell commands.

Knowledge Base

Conductor includes a persistent knowledge base (SQLite) that agents can read and write across sessions. It stores three types of entries: Skills (repeatable action patterns), Facts (discovered environment details), and Experiences (completed task outcomes). Enable it by setting CONDUCTOR_MCP_KB_PATH.

# Add to your claude mcp add command:
--env CONDUCTOR_MCP_KB_PATH="/path/to/brain.db" \
--env CONDUCTOR_MCP_KB_WRITE_ENABLED="true"

kb_record_skill	Record a repeatable action pattern the agent has learned.
kb_record_fact	Store a discovered fact about the environment or UI.
kb_record_experience	Log a completed task outcome for future reference.
kb_search	Search the KB with a natural-language query.
kb_get_skill	Retrieve a specific skill by name.
kb_mark_contradicted	Mark a KB entry as outdated or incorrect.
kb_brief	Summarise KB contents relevant to the current task.

kb_write_enabled defaults to false — the agent can read and search the KB but not write new entries unless explicitly enabled. The KB tab in the dashboard shows all stored entries across the three subtabs.

Dashboard

When the resident backend is running, a local dashboard is available at http://127.0.0.1:8765/dashboard. The port can be changed via the port config key.

Overview

State pillIDLE / RUNNING — live agent state

Roleholder or subordinate agent mode

Backendresident or stdio

Grounder URLActive grounder endpoint

KB attachedWhether a knowledge base file is mounted

TransportMCP transport in use

Input pausedWhether desktop input is currently suspended

PID / UptimeProcess ID and running time

Re-run permission testRe-validates macOS Accessibility access

Tool calls

Live history of every tool call: name, klass (obs for perception tools, act for input/action tools), status (ok / err), duration, timestamp, and truncated args. Most recent first. The system tray icon also shows the last 5 calls.

Knowledge base

Browse KB entries across three subtabs: Skills, Facts, Experiences. Requires CONDUCTOR_MCP_KB_PATH to be set.

Config

Live view of all config keys split into hot-reloadable (take effect on next tool call) and restart-required. See the Config reference section below for the full key list.

Config Reference

All keys are set as environment variables on the claude mcp add command with the CONDUCTOR_MCP_ prefix (e.g. CONDUCTOR_MCP_TEXT_ONLY=true). Hot-reloadable keys take effect on the next tool call without restarting Conductor.

Hot-reloadableeffective on next tool call

Key	Default	Description
text_only	false	Return text descriptions only — no image payload. Eliminates image tokens from the agent loop.
auto_screenshot	false	Capture a screenshot after every tool call.
screenshot_max_width	1280	Maximum pixel width of screenshots sent to the agent.
payload_max_width	540	Maximum width of inline image payloads in tool results.
tool_timing	false	Append execution duration to every tool result.
deltas_enabled	false	Include structural UI delta events (navigations, focus changes, node additions) in tool results.

Restart-requiredneeds conductor-mcp restart

Key	Default	Description
backend	resident	Transport backend. resident keeps the process alive; stdio restarts per call.
grounder_url	—	URL of the VLM grounder endpoint. Supports local (llama.cpp), self-hosted, or remote.
grounder_family	qwen	Grounder model family — determines prompt formatting. Options: qwen, openai.
grounder_timeout	30	Seconds before a grounder request times out.
host	127.0.0.1	Host the dashboard and resident backend bind to.
port	8765	Port the dashboard serves on.
transport	stdio	MCP transport layer. stdio for Claude Code; sse for other clients.
kb_path	~/.conductor-mcp/brain.db	Path to the SQLite knowledge base file. Required to enable the KB tab and KB tools.
kb_write_enabled	false	Allow the agent to write new entries to the KB. False = read-only.

Advancedless commonly tuned

Key	Default	Description
coord_system	norm1000	Coordinate space used by locate. norm1000 normalises to 0–1000 on each axis.
cdp_host / cdp_port	127.0.0.1 / 9222	Chrome DevTools Protocol endpoint for web_eval and web_crop.
cdp_default_tab_filter	(empty)	Default tab substring filter when no match_url/match_title is passed to web tools.
mobile_android_enabled	true	Enable ADB-based Android device support.
mobile_ios_enabled	false	Enable iOS device support via WebDriverAgent.
mobile_ios_wda_url	(empty)	WebDriverAgent URL for iOS device control.
qdrant_url	(empty)	Qdrant vector DB URL for semantic KB search. Leave empty to use SQLite FTS only.
qdrant_collection_prefix	conductor	Prefix for Qdrant collection names.
tei_dense_url	(empty)	Text Embeddings Inference URL for dense vector embeddings (KB semantic search).
tei_sparse_doc_url / tei_sparse_query_url	(empty)	TEI URLs for sparse SPLADE embeddings.
screenshot_mode	file	How screenshots are returned: file writes to disk; inline sends base64.
screenshot_dir	~/.conductor-mcp/screenshots	Directory where screenshot files are saved.
wait_for_poll_interval	0.5	Seconds between polls when wait_for is watching for an element.

Agent Pattern

Prompt rule

Observe with describe before acting.
Use locate for semantic targets.
Prefer click/type/scroll/drag/hotkey through OS input.
After each action, re-describe the affected area and verify progress.
If a click misses, re-describe, re-locate, and retry with a more specific target.

Deployment wording

The agent receives descriptions, labels, coordinates, and state in the normal loop. The grounder can run locally, self-hosted, or at a configured endpoint. Do not claim fixed latency, no-cloud processing, or a specific model unless the deployment proves it.

Workflow Examples

Click a visible button

describe(hint="foreground browser window")
locate(target="Get notified button in the SAI hero")
click(target="Get notified button in the SAI hero")
describe(hint="dialog opened by Get notified")

Scroll a page

describe(hint="browser page content")
scroll(target="browser page content", direction="down", amount=5)
describe(hint="newly visible section after scrolling")

Fill a form

describe(hint="email capture dialog")
click(target="email input in the dialog")
type_text(text="developer@example.com")
click(target="submit button in the dialog")
wait_for(target="success message in the dialog")

Recover from a misclick

describe(hint="settings panel")
locate(target="Wi-Fi toggle in the network settings panel")
click(target="Wi-Fi toggle in the network settings panel")
describe(hint="network settings panel")
locate(target="actual Wi-Fi on/off switch, not the sidebar row")
click(target="actual Wi-Fi on/off switch, not the sidebar row")

Troubleshooting

Input lock

If click, scroll, or type actions fail with an input-lock error, another Conductor client owns desktop input. Read-only tools may still work until that client exits.

CDP unavailable

If Chrome DevTools Protocol is not reachable, use the visual path: describe, locate, click, scroll, and type_text.

Target not found

Rephrase with more visual context: blue Save button in the top-right toolbar, or actual Wi-Fi switch, not the sidebar row.

UI still loading

Use wait_for with a named element instead of repeatedly clicking while the screen is changing.