How to simply scrape ASIN data from Amazon
Welcome! In this guide we are going to cover how to scrape ASINs from a list of products, where each product has a number of variants that all need scraping.
This can be a daunting task, but never fear - by using some clever techniques you can hugely simplify the problem and get the data you need for your business.
# Why it's hard
The first idea that comes to mind is to vist the listing pages, scrape the link to each page, and then use those links to visit the product pages to scrape each variant's ASIN.
Unfortunately, it quickly becomes apparent that this is not going to work. There are two major problems that present themselves:
- Each product can have its own unique collection of variants. How can we tell the bot to select them without hardcoding hundreds of clicks?
- Clicking each variant can often alter the layout of the page. This means a single scraper won't work - we'd need to figure out a way to dynamically switch the scraper setup.
Multiply this by the number of products we have to deal with and the automation becomes unmanageable. You'd spend more time trying to build this than you save by running it. So is there a better way?
# How to use Amazon's coding to our advantage
When using Axiom, you can get a lot of mileage out of observing how a site is built and using this to your advantage.
In this case, when performing an Amazon search we can see that each of the product variants has its own listing in the search.
This is useful, because now it looks like we can avoid the problem of figuring out how to cycle through each variant; Amazon's dev team have already written this code for us. All we have to do is type the product name into the search bar, and all the variants will appear in the search listing.
However, we're still left with our second problem above, about how to select which scraper we need for a particular page. Can we bypass this as well?
It turns out that the answer is yes, we can! Let's take a look at the anatomy of an Amazon page URL:
What's that number highlighted in green? It's the ASIN we're looking for! So if we can make a bot that grabs the URLs and extracts the ASIN, we're home and dry.
There's one final fly in the ointment: sponsored content. Amazon adds this related content to the top of all search pages, but that poses a problem if you're after specific products. Luckily, Axiom provides tools to deal with this case; we can make a selection that exludes these promoted products.
Now we have all the pieces we need, let's build a bot!
# How to build a bot to scrape Amazon ASIN data
# 1. Create a Google Sheet
Create a Google Sheet containing two tabs. We'll call them "Search" and "Results". Add all the product names you want to scrape ASINs for in column A of the tab called "Search".
# 2. Create a new Axiom
Click on the "+ New Automation" button to make a new automation! Exciting.
# 3. Start from blank
We're building this one from from scratch by adding our own steps.
# 4. Read data from a Google Sheet
Add a "Read data from a Google Sheet" step and select the tab called "Search".
# 5. Interact with Amazon's web page
Add an "Interact with a page's interface" step. This step contains all the sub step we need to interact with Amazon's webpage.
# 6. Go to URL
Set the Amazon search page in the "Go to URL" sub step
# 7. Enter text into Amazon's search field
Add an "Enter Text" substep. Click "Select" and select the search bar's input field. Then click "Insert Data" to add google-sheet-data, and from the popup select the column that contains your product names.
# 9. Trigger the search on Amazon
Add a "Click Element" sub step. This step clicks the search button and updates the search results.
That's the first part of the bot done - only a few more steps to add. Feel free to do a test run!
# 10. Add a step to scrape the data from Amazon
Add a "Get data from a webpage" step. First, let's scrape the product titles.
Select a title (ignoring any sponsored content), then select a second title to create a repeating selection. Add a new column, and click the dropdown to select the "Link" data type. Then select the titles again to grab the links to the product pages.
Set the Max Results setting to 10 for the first few runs when you are testing the bot - we can turn this off later when we're happy everything is working.
# 11. Write the data to the Google Sheet
Add a "Write data to a Google Sheet" step and set "Sheet name" to "Results". The "Data" dropdown should already be set correctly to use the interact-data variable from our 'Interact' step, but double check it's looking OK.
# 12. Read the scraped data from the Google Sheet
Now add another "Read data from a Google Sheet". This time select the sheet name called "Results".
# 14. Extract the ASIN from the url part 1
Add a new step called "Split by character". In the "Data" field select the "google-sheet-data__1" step - that's our second "Read Data from a Google Sheet" step - and choose column B, the one with the url.
In the field called "Character" add "dp/" (without the quotation marks!)
# 15. Extract the ASIN from the url part 2
Add one more "Split by character" set the "Character" to "/" (again, no quotation marks). In "Data" select "split-by-character" - what we want here is to pass the result of the first split into the second split, this chaining them together. When the preview appears, select column A.
# 16. Write the data to the Google Sheet
We're almost done! Let's add a final "Write data to a Google Sheet" step.
In the "Spreadsheet url" field, find the sheet you previously created. Set "Sheet name" to "Results", make sure "Data" is set to "split-by-character__1", the data variable for our last "Split by character" step.
Toggle "Clear data before writing | Add to existing data" to display "Add to existing data" and set the starting cell option to "C1".
# 17. The bot is complete and ready to scrape Amazon
Woo, we're done! You have just built a bot for scraping ASIN's from Amazon's website all without a single line of code. Go ahead and run the bot.
Some websites don't make it easy to scrape data, but by analysing a site's structure and observing its behaviour we can often work out how to scrape the data we want.
Always remember that the more you can try and simplify the approach, the easier it will be to build things. It's worth taking a little time to think through the best way to solve your problem before diving in. A small saving in complexity can reap much bigger rewards than you might think!