fullscraper
Request a quote
Back to journal
12 min readamazon

Amazon Review Scraper 2026: 3 Methods + Which One Fits You

Scrape Amazon reviews by ASIN in 2026: Python script, no-code tool, or managed delivery. Real code, costs, anti-bot pitfalls, and when each method breaks.

Elliot

Elliot

Author

Key takeaways

  • βœ“ Three paths exist to extract Amazon reviews in 2026: DIY Python (cheapest, most fragile), no-code scrapers (fastest to start), managed delivery (fastest to results). Most tutorials only cover path one.
  • βœ“ Amazon caps public review pages at ~100 per product (roughly 1,000 reviews). Anything deeper needs the seller-side API (only for listing owners) or reverse-engineering the mobile app endpoints.
  • βœ“ The real enemy is not HTML parsing, it is anti-bot: datacenter IPs get captcha'd within 30 requests, residential proxies cost $5 to $15 per GB, and headless browsers trigger fingerprint checks.
  • βœ“ Cost benchmarks for 10,000 reviews: DIY $15-40 in proxies, ScraperAPI ~$80, Apify actor ~$8, managed service $150-300.
  • βœ“ For one-shot extractions under 1,000 reviews, use a Chrome extension. For 10,000 to 100,000 recurring, use Apify. For weekly pipelines without maintenance, skip the tooling and buy the output.

An Amazon review scraper pulls structured data (rating, author, date, body, verified-purchase flag) from product review pages into a CSV or JSON feed. In 2026 the tooling has split into three distinct categories, and the choice depends almost entirely on your use case, not on the tools themselves.

This guide covers the three methods, the anti-bot reality Amazon throws at you, the real cost per 10,000 reviews, legal considerations (yes, it is legal), and four concrete use cases with a verdict for each.

What you can and cannot extract in 2026

Amazon product review pages expose more than people think, but less than they used to. Here is what is public and what is gated.

What you can scrape

  • βœ“ Star rating (1 to 5)
  • βœ“ Review title and body text
  • βœ“ Reviewer display name and profile URL
  • βœ“ Review date and verified-purchase badge
  • βœ“ Helpful vote count
  • βœ“ Variant attributes (size, color, bundle)
  • βœ“ Image and video thumbnails (URLs)
  • βœ“ "Top positive" and "Top critical" highlights

What you cannot

  • βœ— Reviewer email or phone
  • βœ— Historical reviews beyond page 100 (~1,000 per ASIN)
  • βœ— Private Vine pre-release reviews
  • βœ— Seller responses (visible only on some marketplaces)
  • βœ— Return / refund metadata
  • βœ— Cross-ASIN reviewer history unless profile is public
  • βœ— Deleted or moderated reviews
  • βœ— A-to-Z guarantee claims

One detail most guides skip: Amazon serves different reviews by marketplace (amazon.com, amazon.co.uk, amazon.fr, amazon.de). Reviews on US and UK listings do not overlap. If you want a global view of a product, you scrape each marketplace separately.

Method 1 β€” Python + Requests (DIY)

The DIY path works fine for small volumes and short-lived scripts. It breaks fast on production use because Amazon's anti-bot tightens every quarter.

Minimal viable scraper with requests + BeautifulSoup:

import requests
from bs4 import BeautifulSoup
import time, random, json

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
    "Accept-Language": "en-US,en;q=0.9",
}

def fetch_reviews(asin: str, max_pages: int = 10) -> list[dict]:
    reviews = []
    for page in range(1, max_pages + 1):
        url = (
            f"https://www.amazon.com/product-reviews/{asin}"
            f"/ref=cm_cr_arp_d_paging_btm_next_{page}"
            f"?pageNumber={page}&reviewerType=all_reviews"
        )
        r = requests.get(url, headers=HEADERS, timeout=15)
        if r.status_code != 200:
            print(f"page {page} -> {r.status_code}, stopping")
            break
        soup = BeautifulSoup(r.text, "html.parser")
        blocks = soup.select('div[data-hook="review"]')
        if not blocks:
            break
        for b in blocks:
            reviews.append({
                "rating": b.select_one('i[data-hook="review-star-rating"]').text.split()[0],
                "title": b.select_one('a[data-hook="review-title"]').text.strip(),
                "author": b.select_one('span.a-profile-name').text.strip(),
                "date": b.select_one('span[data-hook="review-date"]').text.strip(),
                "verified": bool(b.select_one('span[data-hook="avp-badge"]')),
                "body": b.select_one('span[data-hook="review-body"]').text.strip(),
            })
        time.sleep(random.uniform(1.5, 3.5))
    return reviews

if __name__ == "__main__":
    data = fetch_reviews("B08N5WRWNW", max_pages=5)
    print(json.dumps(data, indent=2, ensure_ascii=False))
    print(f"{len(data)} reviews")

This script works on the first 20 to 40 requests from a clean residential IP. After that, Amazon serves one of three responses: the captcha page (To discuss automated access...), a 503, or a stripped-down listing without reviews. None of them are parseable.

What breaks in production

Anti-bot reality check

From a single datacenter IP (AWS, GCP, DigitalOcean), Amazon typically captchas within 10 to 30 requests. From a single residential IP, 200 to 500 requests before rate limiting kicks in. From rotating residential proxies, throughput depends on your proxy provider's pool quality. Expect 5 to 15% failed requests even with premium proxies.

To push past the captcha wall you need rotating residential proxies (Bright Data, Oxylabs, Smartproxy, Soax), a realistic header set (not just a UA), sometimes TLS fingerprint spoofing (curl_cffi instead of plain requests), and backoff logic. At that point the "simple script" is 400 lines of Python with a retry queue.

When DIY makes sense: one-time extraction under 1,000 reviews, your team has a Python developer, you do not need recurring updates.

When it breaks: any production workflow where data freshness matters and someone on the team has to fix the script when Amazon changes selectors (usually every 8 to 12 weeks).

Method 2 β€” No-code scrapers

No-code tools do the proxy + parsing work for you. You paste ASINs, you get a CSV. The tradeoffs are cost per review and flexibility on edge cases.

Chrome extensions (single ASIN)

Tools like Amazon Review Exporter or Helium 10 Chrome extension work on whatever product page you open. Click, wait, download. Good for one ASIN at a time, useless for batch work. Pricing ranges from free (with ads/limits) to $29/month for unlimited exports.

Apify β€” Amazon Reviews Scraper

Apify's Amazon Reviews Scraper actor is the most battle-tested no-code option for volume. You paste a list of ASINs or product URLs, set how many pages per product, and run. Output is JSON, CSV, or Excel. Pricing sits around $0.80 per 1,000 reviews, proxies included. At 10,000 reviews you pay roughly $8 to $12. Apify handles captcha, proxy rotation, and selector updates. When Amazon ships a page change, Apify's maintainer team patches the actor within days, and your runs keep working.

Outscraper

Outscraper offers a similar batch endpoint with a free tier (500 reviews) and paid tiers around $30 per 10,000. Less flexible than Apify on configuration but easier for non-technical users. The CSV columns are opinionated, not configurable.

ScraperAPI / Zyte / Oxylabs

These are API-first: you send a URL, they return the HTML (with proxy + captcha solving handled). You still parse the HTML yourself. Makes sense if you already have the parser but do not want to maintain proxy infrastructure. Costs around $49/month for 100,000 API calls, which translates to roughly 100,000 to 300,000 reviews depending on pagination.

ToolPrice per 10k reviewsBest forWeak spot
Chrome extension~$0 to $10One product, manualNo batch, no API
Apify Reviews actor~$8 to $12Recurring pipelinesRequires setup
Outscraper~$30Non-technical usersOutput schema fixed
ScraperAPI~$15 to $30You have a parserYou parse HTML yourself
DIY Python + proxies~$15 to $40One-time, smallBreaks every 2 months

Method 3 β€” Managed delivery (done-for-you)

The third path skips the tool question entirely. You brief the target (ASINs, marketplaces, review count, freshness), a team runs the extraction, you receive the CSV. No proxies to buy, no selectors to debug, no actor quotas to manage.

This method wins on three specific scenarios: your use case is one-shot (competitor analysis, product due diligence, market entry), you do not have a developer who wants to own the scraping infrastructure, or your budget for the data is lower than the budget for the engineering time to build it.

It loses when you need real-time updates (sub-hour freshness), when you run dozens of extractions per week and volume discounts kick in on platforms, or when you already have the scraping infrastructure.

Done-for-you Amazon data

Get the reviews, skip the proxies

Brief the ASINs and marketplaces you need. Receive a clean CSV in 48 to 72 hours with rating, author, date, body, verified-purchase flag, and helpful votes. No accounts to create, no infrastructure to rent.

Request a quote β†’

Anti-bot: what actually blocks you on Amazon

Amazon runs one of the most sophisticated anti-bot stacks in e-commerce. These are the defenses that trip up naive scrapers, ordered by how aggressive they are.

IP reputation. Datacenter IPs (AWS, GCP, Azure, OVH, Hetzner) are blocklisted aggressively. Residential IPs from consumer ISPs pass by default. Mobile IPs (4G/5G) are the cleanest but bandwidth is expensive.

Rate limiting. Even clean residential IPs get throttled past ~2 requests per second per session. Past 200 requests without a cookie reset, Amazon flags the session and starts returning empty review blocks.

Captcha. Once flagged, you get the "Enter the characters you see below" page. It is not reCAPTCHA β€” it is Amazon's own image-distortion captcha, and public solvers (2Captcha, CapSolver) handle it at roughly $0.001 to $0.003 per solve.

TLS fingerprinting. Amazon inspects your TLS handshake. Python's requests library has a recognizable fingerprint that Amazon flags on repeat visits from the same IP. curl_cffi (which impersonates Chrome's TLS stack) bypasses this.

JavaScript challenges. On some marketplaces (notably amazon.co.jp), Amazon serves a JS challenge that executes before the page renders. Plain requests cannot solve this β€” you need a headless browser (Playwright) or a managed fetching service that handles it for you.

Fingerprinting. If you use Playwright or Puppeteer with default settings, Amazon detects headless mode via navigator.webdriver, missing chrome.runtime, and canvas fingerprint anomalies. Stealth plugins (playwright-stealth, puppeteer-extra) cover most of this but not all.

Legal and ToS considerations

Public Amazon reviews are public data. Scraping them is legal in the United States under the hiQ v. LinkedIn precedent (9th Circuit, 2022), which confirmed that accessing publicly visible data without authentication does not violate the Computer Fraud and Abuse Act.

That said, scraping violates Amazon's Conditions of Use, which prohibit "any data gathering and extraction tools". This is a civil matter, not a criminal one. Amazon's enforcement path is to block your IP, terminate your seller or buyer account if tied, or file a civil suit if the scraping causes damage (rare in practice for review data).

For EU operations, GDPR considerations kick in because reviewer names are personal data. Under Article 6(1)(f), you need a legitimate interest basis, and the reviewer retains the right to request deletion. Most teams handle this by anonymizing names at ingestion and keeping only the text + rating for analysis.

For commercial use of the extracted data (selling it, republishing it, training LLMs), copyright applies to individual review text. Fair use covers aggregation and analysis; direct republication does not.

4 concrete use cases with verdict

Picking the right method depends on the shape of your problem. Here are four real scenarios with the call.

Use case 1 β€” Competitor product due diligence (one-time, 500 reviews per competitor, 10 competitors)

5,000 reviews total. One-shot. Verdict: managed delivery or Apify single run. DIY Python is overkill for a one-week project. Chrome extension is too slow for 10 products. Apify actor at ~$5 is cheapest if you can set up the input list; managed delivery at $150 to $250 is faster if you cannot.

Use case 2 β€” Amazon seller monitoring own listings (daily, 50 ASINs, ~200 reviews each per week)

~10,000 reviews per week. Verdict: Apify actor with a scheduled run + webhook to your warehouse. Cost lands around $40/month. Built-in freshness windows mean you are not re-scraping already-indexed reviews. DIY is possible but the maintenance burden exceeds the $40.

Use case 3 β€” Academic research on product sentiment (one-time, 100,000 reviews across 50 ASINs)

Large volume, single extraction. Verdict: ScraperAPI + custom parser, or Apify bulk run. Expected cost $80 to $150. Managed delivery is overkill at this volume unless timeline is tight. DIY is possible if the researcher has Python skills and 2 weeks of runway.

Use case 4 β€” Market entry research for a new category (recurring, 500 ASINs, monthly snapshot)

~50,000 reviews per month. Verdict: managed delivery with monthly contract, or an internal data engineering setup with ScraperAPI. The break-even point between buy vs build is around $500/month β€” below that, managed wins on maintenance overhead; above that, in-house wins on unit economics.

FAQ

Is it legal to scrape Amazon reviews? Public review data is legal to scrape in the US (hiQ precedent). It violates Amazon's Terms of Service, which means account-level consequences, not criminal ones.

What is the best Amazon review scraper? No single tool is "best". Apify wins on recurring volume, Chrome extensions win on one-off small extractions, managed services win on non-technical teams.

How many reviews can I pull per product? Amazon caps public access at ~1,000 reviews per ASIN (100 pages Γ— 10 reviews). Higher depth requires the seller-side API (listing owners only).

Do I need proxies? Yes, past ~30 requests from a single IP. Residential proxies ($5 to $15 per GB) are the minimum for any serious volume.

How often does Amazon change its HTML? Major selector changes roughly every 8 to 12 weeks on the main marketplace. Smaller A/B tests happen weekly and can break narrow CSS selectors if you are not defensive.

Can I scrape reviews in languages other than English? Yes. Amazon serves reviews in the marketplace's primary language. For a product sold in amazon.de, reviews are in German. You scrape each marketplace separately.

Related reading

Next stops if you are evaluating the broader scraping stack:

amazonreviewsscrapingpythondata

Ready to start?

Your next lead list, delivered in 72 hours

LinkedIn, Google Maps, marketplaces: we scrape the source you need, enrich emails and phone numbers, and ship a clean file ready to import into your CRM.

Get a quote

Related articles