Extracting content from a web page
# How to use the web scraper
To get started with scraping data, you should use the "Get data from a webpage" step. To add the "Get data from a webpage" step, click the "Add a step" to add a new step and select it in the list:
The step will automatically fill in the current page as the target page to scrape from.
To choose what data to extract, click the "Select Data" button. This will then open the selector tool.
Axiom lets you add scraped data into a table structure, made up of columns and rows. You select the data you want to go in each column directly from the page.
A column is a single type of data that you want to scrape. For example, if you were trying to select names, addresses and phone numbers from a list of contact details, this should be created as three columns of data in Axiom: one for name, one for address, and one for phone numbers.
To select data for a column, click something you want to scrape on the page. Anything selected will be highlighted in orange, and will show up in the preview table at the bottom of the page:
If something is missing that you want to select, click on this too. The tool will automatically try to pick up a pattern in what you have clicked, and will populate similar elements. It usually only takes 2 or 3 clicks to get the pattern down, but sometimes it may take more. Keep selecting until you can see the data that you want highlighted in orange.
If the tool selects too much, you can also remove items from a column by clicking on them. This will exclude them from the column's data.
NOTE: when selecting data that has more than one column and more than one row, Axiom will try and group the elements to keep the relationships between them; it will assume that the data you are scraping is a list of entries with a common format. For example, if column A contains names, and column B contains phone numbers, the tool will automatically try and match each name to the phone number it is associated to on the web page.
If the tool is unable to find a good grouping between the elements you have selected, you may get unexpected results. Make sure that, when you are choosing which elements to put into columns, that each column is related to the others in the data set.
Each column can have its result type changed by clicking on the column header. Currently you can select text (the default), links, or HTML as the type of data to extract:
Once you have one column set up, click the "+" button on the preview table to add a new column:
You can switch between columns at any time by clicking on the column header.
You can also remove any column by clicking the cross in its top corner:
Once you are happy with your selections, click the "Confirm" button to save your selections. If something goes wrong somewhere, you can click the "Reset" button, which clears all selections and columns so you can start again. If you don't want to save the selections you have made, click "Cancel" to return to the builder without saving the changes.
If the data you wish to scrape uses a pager, you can select this by clicking on the "Find Pager" button and selecting the "Next" element that the scraper should click on to advance to the next step. Note that you should not select the second page, as this will cause the bot to keep selecting the second page instead of cycling through all the pages as needed!
By default, the scraper is set to a maximum of either 20 or 1 result, depending on the context in which the step was added. You can easily turn this off by clicking the toggle under "Max Results", which will allow all results to be selected:
A "result" here is a single row of data, so many columns can be selected and this will still only count as one result. For example, the following data scrape is one result, because it is one row long:
The advantage of setting a limited number of results is that it allows Axiom to process the information more quickly. If no upper limit is set, Axiom will wait for a while to see if anything new loads in before continuing. This is useful when testing your automation.
The "Max Results" field applies only to its scrape step, and not to the bot as a whole. This means that if you are only scraping a single record from a page, for best results you should set this to 1. A default of 1 is automatically set for scrapers set as a sub-step of an "Interact with a page's interface" step.
# How to loop through a list of pages
Commonly, you might want to perform a set of steps on a series of pages, rather than just one. Some examples might going through a list of accounts in a CRM and updating account details, or going through a series of product pages and retrieving product information.
You can do this with Axiom in three simple steps:
Firstly, scrape the links to the pages you want to visit. To quickly scrape links, add the "Get a list of links to pages" step. This step works exactly as described in How to use the web scraper. You usually can find these links on an app's search or listing page.
Secondly, Add an "Interact with the page's interface" step, as described in How to automate the UI
Thirdly, you need to tell Axiom to visit your scraped links. To do this, you can select the scraped data from the yellow dropdown labelled "Insert data". This will be called "scraped-data" - you can see the name of the data you can use in the header of any step that outputs data.
A preview will now appear, showing you the scraped data. Select the column which contains the links you scraped in the previous step, and click "Close and save":
Now, when you run this Axiom, the bot will visit each of the pages as given by the links you scraped.
You can perform any actions you like on each page by adding new sub-steps to the "Interact with a page's UI" step. To do this click on the "Add step to UI interaction" button, and choose the appropriate action. For example, you might scrape further data, as described in How to use the web scraper, or automate some UI interactions, as described in How to automate the UI.