How to extract data from HTML with an AI
Learn how simple it is to create a web scraper that loops through URLs in a Google Sheets, scrapes the HTML, and then uses ChatGPT to extract data—all by combining just a few steps and completely bypassing the need for CSS selectors. get started quickly with this ChatGPT web scraper template. Visit these pages to learn more about getting started with extracting data and using our [builder.](/docs/no-code-tool/the-builder/builder A ChatGPT subscription is required to run this bot.
Start from blank, adding the following steps
In the axiom.ai Chrome extension dashboard, click "New Automation" and then select "Add first step". Use the step finder to add the steps outlined below.
Prepare your Google Sheet
Separate your URLs row by row in the same column.
| Col A | Col B |
|---|---|
| Insert your URLs like this | --- |
| Insert your URLs like this | --- |
| Insert your URLs like this | --- |
Create your bot
- 1.0
Read data from a Google SheetSpreadsheet: Search for the Google Sheet you created. Once found, click to select.Sheet name: Choose a sheet tab or leave blank to use the first tab.First cell: Set to"A1".Last cell: Set to"AB1".
- 2.0
Loop through dataLoop source: Click"Insert data", then selectgoogle-sheet-data.
- 2.1
Go to pageEnter URL: Click"Insert data", selectgoogle-sheet-data.
- 2.2
Get data from bot's current pageSelect: Click and choose the outermost HTML element to scrape.Data type: Set to"Select HTML". See how here.
- 2.3
Extract data with ChatGPTChatGPT API key: Enter your API key.Data: Insert[scrape-data].Extract values: Enter the fields you want to extract, e.g."name, email, job title".
- 2.4
Write data to a Google SheetSpreadsheet: Select your Google Sheet or paste the URL.Sheet name: Choose a sheet tab or leave blank for the first tab.Data: Click"Insert data"and selectscraped-link-data.Write options: Set to"Add to existing data".
- 2.5
Delete rows from a Google SheetSpreadsheet: Select the same Google Sheet.Sheet name: Choose the same tab or leave blank.First row to delete: Set to1.Last row to delete: Set to1.
Wrapping up
In just seven steps you can create a simple web scraper to extract data from any website with ChatGPT and write it to a Google Sheet. The super cool thing is that this scraper does not rely on CSS selectors to extract data that can change, because it uses AI to extract the data. So this design pattern will work on any website.