Get data from a webpage step

# What to use the Get data from a webpage step for

The 'Get data from a webpage' step is axiom.ai's multi-purpose web scraping tool. This versatile tool can scrape data from tables, pages, and listing pages from virtually any web application or website. It works with pagination and infinite scroll and comes with a point-and-click selector tool that allows you to choose content to scrape right off the page without coding. Whether you want to scrape some data for a report or thousands of pages to populate a database, this tool will do the job.

If you are looking to create a large scale scraper we recommend a design pattern called 'batch scraping'. You will also find a batch scraping template here. If you just want to scrape links use this 'scraper'.

You can use this step to scrape:

  • Amazon product pages
  • Social media platforms like Instagram
  • CRM's like Apollo or hubspot
  • Government websites
  • Start up databases
  • Google Maps

# How to configure the Get data from a webpage step

# Select

Click 'Select' to choose the data you wish to scrape. The display will transform into the selector tool, and it will guide you by showing you how to select data from the webpage.

The Multi-selector tool comes with several valuable features accessed by clicking custom:

  • Ability to use custom CSS selectors
  • 'Use element text' allows you to click buttons based on the button text i.e. 'Submit'
  • Pass CSS selectors in from data sources

Watch these video guides to learn more about the selector tool.

# Find pager (optional)

Select the 'Next' button for the pager, if there is one. If the button features text such as 'Next,' why not try the 'Use element text' method? Click 'Custom' on the selector toolbar, then click 'Use element text'.

# Max results

We set the max results to 1 to speed up the testing of your bots as you make them. It's best to do short runs while testing your bot.

# Wait time between scrolls (ms)

Adjust the wait time between scrolls to increase or decrease the loading time of content. This feature is particularly useful when scrolling down listing pages with slow-loading content. However, insufficient waiting time could mean that content is not loaded. Therefore, experiment with caution.

# No. of retry attempts when results not found

To speed up your runs, reduce the retry runs. But keep in mind content could be missed. Make sure to do some test runs.

# Minimum wait before scraping (ms)

To speed up your runs, reduce your wait time. However, keep in mind that some content may not have finished loading yet, which is why we wait.

# Page number to start scraping on

For paginated pages, you can specify a starting page. However, not all pages support this.

Configuration settings

    Step type

    Web scraping