Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
npm install -g agent-browser
agent-browser install # Download Chromiumgit clone https://github.com/browserbase/agent-browser
cd agent-browser
pnpm install
pnpm build
agent-browser installOn Linux, install system dependencies:
agent-browser install --with-deps
# or manually: npx playwright install-deps chromiumagent-browser open example.com
agent-browser snapshot # Get accessibility tree with refs
agent-browser click @e2 # Click by ref from snapshot
agent-browser fill @e3 "test@example.com" # Fill by ref
agent-browser get text @e1 # Get text by ref
agent-browser screenshot page.png
agent-browser closeagent-browser click "#submit"
agent-browser fill "#email" "test@example.com"
agent-browser find role button click --name "Submit"agent-browser open <url> # Navigate to URL
agent-browser click <sel> # Click element
agent-browser dblclick <sel> # Double-click element
agent-browser focus <sel> # Focus element
agent-browser type <sel> <text> # Type into element
agent-browser fill <sel> <text> # Clear and fill
agent-browser press <key> # Press key (Enter, Tab, Control+a)
agent-browser keydown <key> # Hold key down
agent-browser keyup <key> # Release key
agent-browser hover <sel> # Hover element
agent-browser select <sel> <val> # Select dropdown option
agent-browser check <sel> # Check checkbox
agent-browser uncheck <sel> # Uncheck checkbox
agent-browser scroll <dir> [px] # Scroll (up/down/left/right)
agent-browser scrollintoview <sel> # Scroll element into view
agent-browser drag <src> <tgt> # Drag and drop
agent-browser upload <sel> <files> # Upload files
agent-browser screenshot [path] # Take screenshot (--full for full page)
agent-browser pdf <path> # Save as PDF
agent-browser snapshot # Accessibility tree with refs (best for AI)
agent-browser eval <js> # Run JavaScript
agent-browser close # Close browseragent-browser get text <sel> # Get text content
agent-browser get html <sel> # Get innerHTML
agent-browser get value <sel> # Get input value
agent-browser get attr <sel> <attr> # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count <sel> # Count matching elements
agent-browser get box <sel> # Get bounding boxagent-browser is visible <sel> # Check if visible
agent-browser is enabled <sel> # Check if enabled
agent-browser is checked <sel> # Check if checkedagent-browser find role <role> <action> [value] # By ARIA role
agent-browser find text <text> <action> # By text content
agent-browser find label <label> <action> [value] # By label
agent-browser find placeholder <ph> <action> [value] # By placeholder
agent-browser find alt <text> <action> # By alt text
agent-browser find title <text> <action> # By title attr
agent-browser find testid <id> <action> [value] # By data-testid
agent-browser find first <sel> <action> [value] # First match
agent-browser find last <sel> <action> [value] # Last match
agent-browser find nth <n> <sel> <action> [value] # Nth matchActions: click, fill, check, hover, text
Examples:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "test@test.com"
agent-browser find first ".item" click
agent-browser find nth 2 "a" textagent-browser wait <selector> # Wait for element
agent-browser wait <ms> # Wait for time
agent-browser wait --text "Welcome" # Wait for text
agent-browser wait --url "**/dash" # Wait for URL pattern
agent-browser wait --load networkidle # Wait for load state
agent-browser wait --fn "window.ready === true" # Wait for JS conditionLoad states: load, domcontentloaded, networkidle
agent-browser mouse move <x> <y> # Move mouse
agent-browser mouse down [button] # Press button (left/right/middle)
agent-browser mouse up [button] # Release button
agent-browser mouse wheel <dy> [dx] # Scroll wheelagent-browser set viewport <w> <h> # Set viewport size
agent-browser set device <name> # Emulate device ("iPhone 14")
agent-browser set geo <lat> <lng> # Set geolocation
agent-browser set offline [on|off] # Toggle offline mode
agent-browser set headers <json> # Extra HTTP headers
agent-browser set credentials <u> <p> # HTTP basic auth
agent-browser set media [dark|light] # Emulate color schemeagent-browser cookies # Get all cookies
agent-browser cookies set <name> <val> # Set cookie
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local <key> # Get specific key
agent-browser storage local set <k> <v> # Set value
agent-browser storage local clear # Clear all
agent-browser storage session # Same for sessionStorageagent-browser network route <url> # Intercept requests
agent-browser network route <url> --abort # Block requests
agent-browser network route <url> --body <json> # Mock response
agent-browser network unroute [url] # Remove routes
agent-browser network requests # View tracked requests
agent-browser network requests --filter api # Filter requestsagent-browser tab # List tabs
agent-browser tab new [url] # New tab (optionally with URL)
agent-browser tab <n> # Switch to tab n
agent-browser tab close [n] # Close tab
agent-browser window new # New windowagent-browser frame <sel> # Switch to iframe
agent-browser frame main # Back to main frameagent-browser dialog accept [text] # Accept (with optional prompt text)
agent-browser dialog dismiss # Dismissagent-browser trace start [path] # Start recording trace
agent-browser trace stop [path] # Stop and save trace
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight <sel> # Highlight element
agent-browser state save <path> # Save auth state
agent-browser state load <path> # Load auth stateagent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload pageagent-browser install # Download Chromium browser
agent-browser install --with-deps # Also install system deps (Linux)Run multiple isolated browser instances:
# Different sessions
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
# Or via environment variable
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
# List active sessions
agent-browser session list
# Show current session
agent-browser sessionEach session has its own:
- Browser instance
- Cookies and storage
- Navigation history
- Authentication state
The snapshot command supports filtering to reduce output size:
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3 levels
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options| Option | Description |
|---|---|
-i, --interactive |
Only show interactive elements (buttons, links, inputs) |
-c, --compact |
Remove empty structural elements |
-d, --depth <n> |
Limit tree depth |
-s, --selector <sel> |
Scope to CSS selector |
| Option | Description |
|---|---|
--session <name> |
Use isolated session (or AGENT_BROWSER_SESSION env) |
--json |
JSON output (for agents) |
--full, -f |
Full page screenshot |
--name, -n |
Locator name filter |
--exact |
Exact text match |
--headed |
Show browser window (not headless) |
--debug |
Debug output |
Refs provide deterministic element selection from snapshots:
# 1. Get snapshot with refs
agent-browser snapshot
# Output:
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]
# - link "Learn more" [ref=e4]
# 2. Use refs to interact
agent-browser click @e2 # Click the button
agent-browser fill @e3 "test@example.com" # Fill the textbox
agent-browser get text @e1 # Get heading text
agent-browser hover @e4 # Hover the linkWhy use refs?
- Deterministic: Ref points to exact element from snapshot
- Fast: No DOM re-query needed
- AI-friendly: Snapshot + ref workflow is optimal for LLMs
agent-browser click "#id"
agent-browser click ".class"
agent-browser click "div > button"agent-browser click "text=Submit"
agent-browser click "xpath=//button"agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"Use --json for machine-readable output:
agent-browser snapshot --json
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
agent-browser get text @e1 --json
agent-browser is visible @e2 --json# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i --json # AI parses tree and refs
# 2. AI identifies target refs from snapshot
# 3. Execute actions using refs
agent-browser click @e2
agent-browser fill @e3 "input text"
# 4. Get new snapshot if page changed
agent-browser snapshot -i --jsonShow the browser window for debugging:
agent-browser open example.com --headedThis opens a visible browser window instead of running headless.
agent-browser uses a client-daemon architecture:
- Rust CLI (fast native binary) - Parses commands, communicates with daemon
- Node.js Daemon - Manages Playwright browser instance
- Fallback - If native binary unavailable, uses Node.js directly
The daemon starts automatically on first command and persists between commands for fast subsequent operations.
| Platform | Binary | Fallback |
|---|---|---|
| macOS ARM64 | ✅ Native Rust | Node.js |
| macOS x64 | ✅ Native Rust | Node.js |
| Linux ARM64 | ✅ Native Rust | Node.js |
| Linux x64 | ✅ Native Rust | Node.js |
| Windows | - | Node.js |
Apache-2.0