Step API

The Step API is our library of step functions wrapping the Chrome API with high-level browser actions you call against a cloud browser session, one at a time, choosing each step based on what the previous one returned. Open a session, fire goto, click, enterText, scrape, and so on, then close the session when you're done. Use this when you want to drive a browser dynamically from your code rather than running a pre-built automation end to end.

Under the hood, the step functions run as HTTP calls against the cloud browser's pod. Each call goes to POST /api/v5/step (or /api/v5/browser/open / /api/v5/browser/close for lifecycle). You have three ways to drive a session — pick whichever fits your stack.

How to call them


axiom-api Node library

The canonical wrapper. npm install, instantiate, and call the named methods: axiom.goto(), axiom.click(), axiom.enterText(), axiom.scrape(), and so on. Node only; not required, but saves you from writing the HTTP boilerplate and handles transparent retries when a long-running step's request times out.

npm install axiom-api

See Start a session for the full instantiate-and-open flow.

Puppeteer or any CDP client

If you'd rather have full Chrome DevTools Protocol control than the high-level step-function helpers, point any CDP-speaking client at the cloud browser WebSocket directly. You get every Puppeteer method, just running on our infrastructure instead of your own.

wss://cdp-lb.axiom.ai/?token=YOUR_API_KEY

See Authentication for how the key is passed, and Endpoints for the canonical list.

MCP server

Expose step functions as MCP tools so Claude or another LLM client can drive a session step by step. See Build your own MCP server for the patterns and reference implementations.

When to use this vs other layers


You want toUse
Run a pre-built No-Code Tool automation from start to finish/trigger
Run a pre-authored Puppeteer / CDP script in the cloudRun a Code Dashboard automation
Drive a session step by step, choosing each action based on the pageThis section

/trigger and the Code Dashboard path are fully pre-authored (the workflow is baked in). The Step API gives you scripted high-level actions with the dynamism of choosing each call at runtime.

Session lifecycle


A session is an isolated cloud browser. Open one with axiom.browserOpen(), drive it with the action methods, then close it with axiom.browserClose(). Sessions left open consume runtime quota, so always close yours.

new AxiomApi(key)   →   axiom.browserOpen()
                              │
                              ├─ axiom.goto(url)
                              ├─ axiom.click(selector)
                              ├─ axiom.enterText(selector, text)
                              ├─ axiom.pressKeys("Enter")
                              ├─ axiom.scrape(url, selector, ...)
                              ├─ axiom.wait(2000)
                              │
                              ▼
                         axiom.browserClose()

Available step methods


MethodPurpose
browserOpen() / browserClose()Open / close the session.
goto(url, ...)Navigate the page to a URL.
click(select, ...)Click an element.
clickMultiple(select, ...)Click every matching element up to a max.
clickEngagementButton(select, ...)Toggle a like/follow/subscribe-style button only if it isn't already in the target state.
hover(select)Hover the mouse over an element.
clickAndDrag(start, end)Mouse-press at one coordinate, release at another.
enterText(selectTextField, text, ...)Enter text into an input.
pressKeys(key, ...)Fire keyboard events (Enter, Tab, arrow keys, …).
selectList(select, text)Pick an option in a <select> dropdown.
datePicker(...)Navigate a calendar widget and pick a date.
getClipboardContents()Read the cloud browser's clipboard (after a copy step).
switchBrowserTab(selectTab)Switch the active tab in the session.
scrape(url, selector, pager, max_results, settings)Smart-scrape a list of records, optionally paginating.
scrapeMetadata(metadata)Pull structured fields (title, description, OG tags, …) from the current page.
integrateAI(aiOptions)Run an LLM call inline (summarise, classify, extract).
solveCaptcha(apiKey)Hand the current page's captcha to a solver.
wait(time)Pause the session on the pod for time milliseconds (keeps the session alive).
restartBrowser()Restart the cloud browser within the same session.

End-to-end example


A canonical "log in, scrape, store" flow:

import { AxiomApi } from 'axiom-api';

const axiom = new AxiomApi(process.env.AXIOM_API_KEY);

await axiom.browserOpen();
try {
  await axiom.goto("https://example.com/login");
  await axiom.enterText("#email", "user@example.com");
  await axiom.enterText("#password", process.env.PW);
  await axiom.click("button[type=submit]");
  await axiom.wait(2000);

  const rows = await axiom.scrape(
    null,                       // stay on the current page
    ".product-card",            // record selector
    null,                       // no pagination
    50,                         // max results
    {}                          // default settings
  );
  // ... persist `rows` somewhere
} finally {
  await axiom.browserClose();
}

Synchronous-feeling, with async safety net


Each step method makes a single HTTP request that blocks until the pod returns the step's result. If the request times out at the network layer or the pod reports that a step is already in flight, the library transparently polls POST /api/v5/step/result with exponential backoff until the step finishes (default deadline: 1 hour). You write straight-line code; the library handles long-running steps and flaky connections.

What's not exposed yet


The step-trigger surface is intentionally focused on common interaction primitives. Things you'd need to work around today:

  • Raw page text / HTML readout. No getText() / getHtml() method. Use axiom.scrape() for record extraction or axiom.scrapeMetadata() for page-level fields. For anything more bespoke, fall back to /trigger with a Get data step.
  • Screenshots. Not exposed. Use a No-Code Tool automation with Save screenshot locally and trigger via /trigger.
  • File upload / download. Not exposed.
  • Iframe traversal. Selectors operate on the top-level document only.
  • Direct JS evaluation. No evaluate() method. For arbitrary JS, use /trigger with a Write javascript step inside an automation, or drop down to the CDP socket and call Runtime.evaluate yourself.
  • Cookie / storage management. Sessions are stateless across browserOpen() calls. The doNotShareLocalstorage flag on axiom.goto() isolates a single navigation; for finer-grained cookie handling, fall back to /trigger.

If you need any of the above, the typical workaround is to author a small No-Code Tool automation that includes the missing capability and call it via /trigger instead.

In this section


Start a session

Install axiom-api, instantiate the AxiomApi class with your API key, then open a cloud browser session.

Go to URL

Send the cloud browser session to a URL using axiom.goto() and wait for the page to load.

Click

Click a button, link, or any other element in a cloud browser session by passing a CSS selector to axiom.click().

Enter text

Enter text into an input, textarea, or other focusable element in a cloud browser session using axiom.enterText().

Click multiple

Click every element matching a CSS selector, up to a maximum, using axiom.clickMultiple().

Click engagement button

Click a like, follow, subscribe, or similar toggle only when it isn't already in the target state, using axiom.clickEngagementButton().

Hover

Move the mouse over an element to trigger hover-only UI (dropdown menus, tooltips, lazy-loaded content) using axiom.hover().

Click and drag

Press the mouse at one coordinate, drag to another, and release using axiom.clickAndDrag(). Useful for sliders, range pickers, drag-and-drop UIs, and slider captchas.

Press keys

Send keyboard key presses (Enter, Tab, arrow keys, modifier combinations) to the currently focused element using axiom.pressKeys().

Select list

Pick an option in a native HTML <select> element by visible text using axiom.selectList().

Date picker

Navigate a calendar widget month by month and pick a target day using axiom.datePicker().

Switch browser tab

Switch the cloud browser session's focus to another tab using axiom.switchBrowserTab().

Get clipboard contents

Read the contents of the cloud browser's clipboard, useful for pages that put their copy output on the clipboard rather than in the DOM.

Scrape

Smart-scrape a list of records from one or more pages, with optional pagination and a maximum-results cap, using axiom.scrape().

Scrape metadata

Extract page-level metadata (title, analytics IDs, schema.org structured data, or any meta-tag content) using axiom.scrapeMetadata().

Integrate AI

Run an LLM call inline within a step-function session for prompt completion, classification, or extraction using axiom.integrateAI().

Solve captcha

Send the current page's captcha to a third-party solver and submit the result using axiom.solveCaptcha().

Wait

Pause the session for a fixed duration on the pod, keeping the session alive while you wait for content to render or for a server-side process to finish.

Restart browser

Restart the cloud browser within the current session to recover from a wedged state without losing the session itself, using axiom.restartBrowser().

Close a session

Close the cloud browser session to free its resources and stop consuming runtime quota.

Step function vs No-Code step

Find the axiom-api method equivalent of every No-Code Tool step, useful when porting a visual automation to code.

Need help? Contact support or ask a question in our Reddit community.