How to scrape large volumes of pages

This templatee uses a sheet to store URLs and then scrapes from these URLs one by one.

If you need to run large-scale scrapes in the thousands of pages, the best approach is to batch-run you're scraping. Why? There are many reasons to do this, from rate limiting to breakages. We also recommend this method as scraped data is written to the Google sheet each loop.

# Install the batch page scraper bot template

To install this web scraper Axiom click 'Install template'. If you are a new user you will be required to create an Axiom.ai account before you can edit the template.

# Set up a Google Sheet

  1. Create a new Google Sheet. You can do this in your Chrome browser by entering the shortcut 'sheet.new', presuming you already have an account.
  2. Name your sheet something like 'Scraper’.
  3. Set up two tabs titled 'Links' and 'Scraped Data’.
  4. Add your links to the ‘Links’ tab on your sheet. Each link should be added to a new row.

# Set up the 'Read data from Google Sheet' step

  1. Spreadsheet - In the field called 'Spreadsheet', you can search for the Google Sheet you created. Once found, click to select.
  2. Sheet name - Choose the sheet tab called 'Links'.
  3. First cell - If the links are in column a set to 'A1'.
  4. Last cell - Enter 'A1' the bot will now only pass a single row of data.
  5. You should now see a preview of the data.
Axiom.ai setting up a google read data step

# Configure the 'Interact with a page's interface' and select the content to scrape

  1. Enter URL - Click 'Insert Data' select 'google-sheet-data' and select the column with the links in.
  2. Get data from a webpage - Select the content on the page you wish to scrape. Its as easy as point and click, this video shows you how.
  3. Find pager (optional) - Select the next button if the page has a pager. If it scrolls then do nothing.
  4. Max Results - set to all.
Axiom.ai scraping made simple

# Set up the 'Write Data to a Google Sheet' step

  1. Spreadsheet - In the field called 'Spreadsheet', you can search for the Google Sheet you created. Once found, click to select.
  2. Sheet name - Choose the tab you created.
  3. DATA - Select the Scrape data.
  4. Clear data before writing | Add to existing data - Set this option to 'Add to existing data'.

# Set up the 'Delete rows from a Google Sheet' step

  1. Spreadsheet - In the field called 'Spreadsheet', you can search for the Google Sheet you created. Once found, click to select.
  2. Sheet name - Choose the tab you created for links.
  3. First row - set to 1
  4. Last row - set to 1

# Jump to another step

  1. Jump to step - set the step you want to jump to make the bot loop in this case step one.
  2. Maximum cycles - set the amount of loops the bot must perform

# Test run

We always recommed doing a test run - click run then check the output from the scraper. In this case in the sheet. To test set 'Maximum cycles' in the 'Jump to another step' to maxium of 3.

Try the following if your bot is not working. We would also recommend you watch the video to troubleshoot.

  • The links come out as text - Set the selector type to link not text.
  • Selectors fail and the bot stops, try a new selection.
  • The pager is not working - Re-select the next button or try custom selector.
  • No output in the sheet - In 'Write Data to a Google Sheet' check data step is connected.
  • Selectors fail you may need to use custom selectors
  • The bot loops through the same page - check the delete step

# More web scraping templates