How to extract content from a web page
To get started making a web scraper to extract content, we have four great starting points: start with the Quick builder, using a snippet, a template, or starting from scratch to create your own, all without writing a single line of code.
# Get started with our new Quick builder in beta
The best way to get started is to use our new "Quick Builder". Choose "Scrape data from a website", answer a couple of easy questions to specify the type of scraper you wish to make and then Axiom.ai will scaffold the web scraping bot for you. All you will need to do is follow the instructions to complete the setup.
How
- Click "New automation".
- Then click "Give it a go" top right.
- Finally, select "Scrape data from a website" and answer the questions.
# Start scrapping with a Snippet
Snippets are step combinations used to create common web automations like scrapers. We offer three snippets for making different types of web scrapers. To access these snippets, follow the instructions:
How
- Click "New automation".
- Then click "Add first step".
- Finally, select the scraping snippet you wish to use.
These are the three available snippets:
- Scrape data from a page to a Google Sheet.
- Scrape links from a website and scrape each page.
- Scrape pages from a list of links in a Google Sheet.
You can easily adapt snippets to meet your requirements by adding or removing steps.
# Start scraping with a Template
We also have pre-made templates for web scrapers with video guides to help get you started. These templates have a base set of steps you can add to complete your automation. You can add templates from the builder or our website:
How
# Start from blank and make your own web scraping bot
You will also find it straightforward to make your own web scraper from scratch just by adding a couple of steps. With Axiom.ai, you can pretty much make any type of scraper, from bots that can log in to websites to looping through thousands of web pages. Here are a few examples to get you started. Just click "Add first step" to begin.
Below are two tutorials: one for building a simple scraper that writes data to a CSV file and another for creating a scraper that loops through URLs to extract content.
# A simple scraper to get data from a page
To create a simple web scraper that extracts content from a webpage, from the dashboard click "New automation" and then click "Add first step" to open the step finder.
How
- Open the step finder and search for "Get data".
- Click and add the step "Get data from a URL".
- In the URL field, insert the URL to be scraped.
- To choose the data to be scraped, click "Select", then point and click to select. To learn more about our selector tool, click here.
- Find pager - Use only if there is a pager or a "load more" button. Axiom.ai will auto scroll down pages.
- Max results - Set the amount of data you wish to scrape.
- Open the step finder and search for the "Export CSV" step.
- Configure the step and click "Insert Data" and select the scraper step token.
Design pattern: Simple scraper + export
- 1 Get data from a URL
- 2 write to CSV
# A looping scraper to loop through URLs scraping multiple pages
Making a scraper that loops through URLs scraping multiple pages can be done with just a couple of steps.
# First, add URLs to loop through
Design pattern: Simple looping scraper
- 1 Read data from a Google Sheet
To make this scraper, we will need a list of URLs to loop through. For this example, let's assume we have a list of URLs in a Google Sheet. We'll add a "Read data from a Google Sheet" step and fetch the URLs from a column in that sheet. So get a Google sheet ready.
How 1 of 6
- Open the step finder and search for "Read" then add the "Read data from a Google Sheet" step.
- Configure the step inserting your sheet containing the URLs.
# Add loop step
Design pattern: Simple looping scraper
- 1 Read data from a Google Sheet
- 2 Loop through data
Now add a "Loop through data" step. This step will loop through the Google Sheet one row at a time
How 2 of 6
- Open the step finder and search for "Loop" then add the "Loop through data" step.
# In the loop first add a "Go to page" step
Design pattern: Simple looping scraper
- 1 Read data from a Google Sheet
- 2 Loop through data
- 2.1 Go to page
How 3 of 6
- Open the step finder and search for "Go to page" and add the step.
- In the URL field, click "Insert Data" and choose the URLs from the Google Sheet.
# Next add the scraper step in the loop
Design pattern: Simple looping scraper
- 1 Read data from a Google Sheet
- 2 Loop through data
- 2.1 Go to page
- 2.2 Get data from current page
How 4 of 6
- Open the step finder and search for "Get data".
- Click and add the step "Get data from current page".
- To choose the data to be scraped click "Select" the point and click to select to learn more about our selector tool click here.
- Find pager - Use only if there is a pager or load more.
# Insert a "Write data to a Google Sheet" step
We are now going to output the data into a Google Sheet. We will insert the write step into the Loop. This is important because the data will be written each loop.
Design pattern: Simple looping scraper
- 1 Read data from a Google Sheet
- 2 Loop through data
- 2.1 Go to page
- 2.2 Get data from current page
- 2.3 Write data to a Google Sheet
How 5 of 6
- Open the step finder and search for "Write" and add the step.
- Select a Google Sheet.
- Click "Data" and pass the data from the scrape step.
# Finally in the loop add a "Delete data from a Google Sheet" step
Finally, we add a "Delete row from Google Sheet" step. This will remove the row that has been processed, so in the next loop the automation will scrape a new URL.
Design pattern: Simple looping scraper
- 1 Read data from a Google Sheet
- 2 Loop through data
- 2.1 Go to page
- 2.2 Get data from current page
- 2.3 Write data to a Google Sheet
- 2.4 Delete data from a Google Sheet
How 6 of 6
- Open the step finder and search for "Delete" and add the step.
- Select a google sheet to delete from, or paste its URL here.
- Optionally specify a sheet name to delete from.