Step API
The Step API is our library of step functions wrapping the Chrome API with high-level browser actions you call against a cloud browser session, one at a time, choosing each step based on what the previous one returned. Open a session, fire goto, click, enterText, scrape, and so on, then close the session when you're done. Use this when you want to drive a browser dynamically from your code rather than running a pre-built automation end to end.
Under the hood, the step functions run as HTTP calls against the cloud browser's pod. Each call goes to POST /api/v5/step (or /api/v5/browser/open / /api/v5/browser/close for lifecycle). You have three ways to drive a session — pick whichever fits your stack.
How to call them
axiom-api Node library
The canonical wrapper. npm install, instantiate, and call the named methods: axiom.goto(), axiom.click(), axiom.enterText(), axiom.scrape(), and so on. Node only; not required, but saves you from writing the HTTP boilerplate and handles transparent retries when a long-running step's request times out.
npm install axiom-api
See Start a session for the full instantiate-and-open flow.
Puppeteer or any CDP client
If you'd rather have full Chrome DevTools Protocol control than the high-level step-function helpers, point any CDP-speaking client at the cloud browser WebSocket directly. You get every Puppeteer method, just running on our infrastructure instead of your own.
wss://cdp-lb.axiom.ai/?token=YOUR_API_KEY
See Authentication for how the key is passed, and Endpoints for the canonical list.
MCP server
Expose step functions as MCP tools so Claude or another LLM client can drive a session step by step. See Build your own MCP server for the patterns and reference implementations.
When to use this vs other layers
| You want to | Use |
|---|---|
| Run a pre-built No-Code Tool automation from start to finish | /trigger |
| Run a pre-authored Puppeteer / CDP script in the cloud | Run a Code Dashboard automation |
| Drive a session step by step, choosing each action based on the page | This section |
/trigger and the Code Dashboard path are fully pre-authored (the workflow is baked in). The Step API gives you scripted high-level actions with the dynamism of choosing each call at runtime.
Session lifecycle
A session is an isolated cloud browser. Open one with axiom.browserOpen(), drive it with the action methods, then close it with axiom.browserClose(). Sessions left open consume runtime quota, so always close yours.
new AxiomApi(key) → axiom.browserOpen()
│
├─ axiom.goto(url)
├─ axiom.click(selector)
├─ axiom.enterText(selector, text)
├─ axiom.pressKeys("Enter")
├─ axiom.scrape(url, selector, ...)
├─ axiom.wait(2000)
│
▼
axiom.browserClose()
Available step methods
| Method | Purpose |
|---|---|
browserOpen() / browserClose() | Open / close the session. |
goto(url, ...) | Navigate the page to a URL. |
click(select, ...) | Click an element. |
clickMultiple(select, ...) | Click every matching element up to a max. |
clickEngagementButton(select, ...) | Toggle a like/follow/subscribe-style button only if it isn't already in the target state. |
hover(select) | Hover the mouse over an element. |
clickAndDrag(start, end) | Mouse-press at one coordinate, release at another. |
enterText(selectTextField, text, ...) | Enter text into an input. |
pressKeys(key, ...) | Fire keyboard events (Enter, Tab, arrow keys, …). |
selectList(select, text) | Pick an option in a <select> dropdown. |
datePicker(...) | Navigate a calendar widget and pick a date. |
getClipboardContents() | Read the cloud browser's clipboard (after a copy step). |
switchBrowserTab(selectTab) | Switch the active tab in the session. |
scrape(url, selector, pager, max_results, settings) | Smart-scrape a list of records, optionally paginating. |
scrapeMetadata(metadata) | Pull structured fields (title, description, OG tags, …) from the current page. |
integrateAI(aiOptions) | Run an LLM call inline (summarise, classify, extract). |
solveCaptcha(apiKey) | Hand the current page's captcha to a solver. |
wait(time) | Pause the session on the pod for time milliseconds (keeps the session alive). |
restartBrowser() | Restart the cloud browser within the same session. |
End-to-end example
A canonical "log in, scrape, store" flow:
import { AxiomApi } from 'axiom-api';
const axiom = new AxiomApi(process.env.AXIOM_API_KEY);
await axiom.browserOpen();
try {
await axiom.goto("https://example.com/login");
await axiom.enterText("#email", "user@example.com");
await axiom.enterText("#password", process.env.PW);
await axiom.click("button[type=submit]");
await axiom.wait(2000);
const rows = await axiom.scrape(
null, // stay on the current page
".product-card", // record selector
null, // no pagination
50, // max results
{} // default settings
);
// ... persist `rows` somewhere
} finally {
await axiom.browserClose();
}
Synchronous-feeling, with async safety net
Each step method makes a single HTTP request that blocks until the pod returns the step's result. If the request times out at the network layer or the pod reports that a step is already in flight, the library transparently polls POST /api/v5/step/result with exponential backoff until the step finishes (default deadline: 1 hour). You write straight-line code; the library handles long-running steps and flaky connections.
What's not exposed yet
The step-trigger surface is intentionally focused on common interaction primitives. Things you'd need to work around today:
- Raw page text / HTML readout. No
getText()/getHtml()method. Useaxiom.scrape()for record extraction oraxiom.scrapeMetadata()for page-level fields. For anything more bespoke, fall back to/triggerwith a Get data step. - Screenshots. Not exposed. Use a No-Code Tool automation with Save screenshot locally and trigger via
/trigger. - File upload / download. Not exposed.
- Iframe traversal. Selectors operate on the top-level document only.
- Direct JS evaluation. No
evaluate()method. For arbitrary JS, use/triggerwith a Write javascript step inside an automation, or drop down to the CDP socket and callRuntime.evaluateyourself. - Cookie / storage management. Sessions are stateless across
browserOpen()calls. ThedoNotShareLocalstorageflag onaxiom.goto()isolates a single navigation; for finer-grained cookie handling, fall back to/trigger.
If you need any of the above, the typical workaround is to author a small No-Code Tool automation that includes the missing capability and call it via /trigger instead.
In this section
Start a session
Install axiom-api, instantiate the AxiomApi class with your API key, then open a cloud browser session.
Go to URL
Send the cloud browser session to a URL using axiom.goto() and wait for the page to load.
Click
Click a button, link, or any other element in a cloud browser session by passing a CSS selector to axiom.click().
Enter text
Enter text into an input, textarea, or other focusable element in a cloud browser session using axiom.enterText().
Click multiple
Click every element matching a CSS selector, up to a maximum, using axiom.clickMultiple().
Click engagement button
Click a like, follow, subscribe, or similar toggle only when it isn't already in the target state, using axiom.clickEngagementButton().
Hover
Move the mouse over an element to trigger hover-only UI (dropdown menus, tooltips, lazy-loaded content) using axiom.hover().
Click and drag
Press the mouse at one coordinate, drag to another, and release using axiom.clickAndDrag(). Useful for sliders, range pickers, drag-and-drop UIs, and slider captchas.
Press keys
Send keyboard key presses (Enter, Tab, arrow keys, modifier combinations) to the currently focused element using axiom.pressKeys().
Select list
Pick an option in a native HTML <select> element by visible text using axiom.selectList().
Date picker
Navigate a calendar widget month by month and pick a target day using axiom.datePicker().
Switch browser tab
Switch the cloud browser session's focus to another tab using axiom.switchBrowserTab().
Get clipboard contents
Read the contents of the cloud browser's clipboard, useful for pages that put their copy output on the clipboard rather than in the DOM.
Scrape
Smart-scrape a list of records from one or more pages, with optional pagination and a maximum-results cap, using axiom.scrape().
Scrape metadata
Extract page-level metadata (title, analytics IDs, schema.org structured data, or any meta-tag content) using axiom.scrapeMetadata().
Integrate AI
Run an LLM call inline within a step-function session for prompt completion, classification, or extraction using axiom.integrateAI().
Solve captcha
Send the current page's captcha to a third-party solver and submit the result using axiom.solveCaptcha().
Wait
Pause the session for a fixed duration on the pod, keeping the session alive while you wait for content to render or for a server-side process to finish.
Restart browser
Restart the cloud browser within the current session to recover from a wedged state without losing the session itself, using axiom.restartBrowser().
Close a session
Close the cloud browser session to free its resources and stop consuming runtime quota.
Step function vs No-Code step
Find the axiom-api method equivalent of every No-Code Tool step, useful when porting a visual automation to code.
Need help? Contact support or ask a question in our Reddit community.