How to simply scrape ASIN data from Amazon

Welcome! In this guide we are going to cover how to scrape ASINs from a list of products, where each product has a number of variants that all need scraping.

This can be a daunting task, but never fear - by using some clever techniques you can hugely simplify the problem and get the data you need for your business.

# Why it's hard

The first idea that comes to mind is to vist the listing pages, scrape the link to each page, and then use those links to visit the product pages to scrape each variant's ASIN.

Unfortunately, it quickly becomes apparent that this is not going to work. There are two major problems that present themselves:

  • Each product can have its own unique collection of variants. How can we tell the bot to select them without hardcoding hundreds of clicks?
  • Clicking each variant can often alter the layout of the page. This means a single scraper won't work - we'd need to figure out a way to dynamically switch the scraper setup.

Multiply this by the number of products we have to deal with and the automation becomes unmanageable. You'd spend more time trying to build this than you save by running it. So is there a better way?

# How to use Amazon's coding to our advantage

When using Axiom, you can get a lot of mileage out of observing how a site is built and using this to your advantage.

In this case, when performing an Amazon search we can see that each of the product variants has its own listing in the search.

This is useful, because now it looks like we can avoid the problem of figuring out how to cycle through each variant; Amazon's dev team have already written this code for us. All we have to do is type the product name into the search bar, and all the variants will appear in the search listing.

However, we're still left with our second problem above, about how to select which scraper we need for a particular page. Can we bypass this as well?

It turns out that the answer is yes, we can! Let's take a look at the anatomy of an Amazon page URL:

Axiom.ai can scrape Amazons ASINs from the URL string

See the page (opens new window)

What's that number highlighted in green? It's the ASIN we're looking for! So if we can make a bot that grabs the URLs and extracts the ASIN, we're home and dry.

There's one final fly in the ointment: sponsored content. Amazon adds this related content to the top of all search pages, but that poses a problem if you're after specific products. Luckily, Axiom provides tools to deal with this case; we can make a selection that exludes these promoted products.

Now we have all the pieces we need, let's build a bot!

# How to build a bot to scrape Amazon ASIN data

# 1. Create a Google Sheet

Create a Google Sheet containing two tabs. We'll call them "Search" and "Results". Add all the product names you want to scrape ASINs for in column A of the tab called "Search".

Create a Google Sheet

# 2. Create a new Axiom

Click on the "+ New Automation" button to make a new automation! Exciting.

# 3. Start from blank

We're building this one from from scratch by adding our own steps.

Start from blank - build a bot to automate amazon with Axiom.ai

# 4. Read data from a Google Sheet

Add a "Read data from a Google Sheet" step and select the tab called "Search".

# 5. Interact with Amazon's web page

Add an "Interact with a page's interface" step. This step contains all the sub step we need to interact with Amazon's webpage.

Add an interact step Axioma.i

# 6. Go to URL

Set the Amazon search page in the "Go to URL" sub step

# 7. Enter text into Amazon's search field

Add an "Enter Text" substep. Click "Select" and select the search bar's input field. Then click "Insert Data" to add google-sheet-data, and from the popup select the column that contains your product names.

Enter data into Amazon's search bar with Axiom.ai

# 9. Trigger the search on Amazon

Add a "Click Element" sub step. This step clicks the search button and updates the search results.

Add an interact step Axioma.i

That's the first part of the bot done - only a few more steps to add. Feel free to do a test run!

# 10. Add a step to scrape the data from Amazon

Add a "Get data from a webpage" step. First, let's scrape the product titles.

Select a title (ignoring any sponsored content), then select a second title to create a repeating selection. Add a new column, and click the dropdown to select the "Link" data type. Then select the titles again to grab the links to the product pages.

Set the Max Results setting to 10 for the first few runs when you are testing the bot - we can turn this off later when we're happy everything is working.

Selecting data to scrape on Amazon with Axiom.ai's no code selector tool

# 11. Write the data to the Google Sheet

Add a "Write data to a Google Sheet" step and set "Sheet name" to "Results". The "Data" dropdown should already be set correctly to use the interact-data variable from our 'Interact' step, but double check it's looking OK.

# 12. Read the scraped data from the Google Sheet

Now add another "Read data from a Google Sheet". This time select the sheet name called "Results".

Reading Amazon product data from a Google Sheet in Axom.ai

# 14. Extract the ASIN from the url part 1

Add a new step called "Split by character". In the "Data" field select the "google-sheet-data__1" step - that's our second "Read Data from a Google Sheet" step - and choose column B, the one with the url.

Select data to split

In the field called "Character" add "dp/" (without the quotation marks!)

Axiom.ai extracting ASINs from Amazon

# 15. Extract the ASIN from the url part 2

Add one more "Split by character" set the "Character" to "/" (again, no quotation marks). In "Data" select "split-by-character" - what we want here is to pass the result of the first split into the second split, this chaining them together. When the preview appears, select column A.

Split character step extracting ASIN from Amazo using Axiom.ai's no code bot builder

# 16. Write the data to the Google Sheet

We're almost done! Let's add a final "Write data to a Google Sheet" step.

In the "Spreadsheet url" field, find the sheet you previously created. Set "Sheet name" to "Results", make sure "Data" is set to "split-by-character__1", the data variable for our last "Split by character" step.

Toggle "Clear data before writing | Add to existing data" to display "Add to existing data" and set the starting cell option to "C1".

Writing data to a Google Sheet in Axiom.ai

# 17. The bot is complete and ready to scrape Amazon

Woo, we're done! You have just built a bot for scraping ASIN's from Amazon's website all without a single line of code. Go ahead and run the bot.

web scraping a list of ASINs from Amazon into Google Sheets

# Conclusion

Some websites don't make it easy to scrape data, but by analysing a site's structure and observing its behaviour we can often work out how to scrape the data we want.

Always remember that the more you can try and simplify the approach, the easier it will be to build things. It's worth taking a little time to think through the best way to solve your problem before diving in. A small saving in complexity can reap much bigger rewards than you might think!

Contents

    Install the Chrome Extension

    Two hours of free runtime, no credit card required