SEO • Ecommerce

    Duplicate Content on Ecommerce Sites: Every Cause and the Fix

    Duplicate content on ecommerce sites is the same or near-identical content reachable at more than one URL, and on stores it comes mainly from four sources: product variants, supplier-supplied descriptions, faceted filters and sorting, and URL variations like HTTP, www, and tracking parameters. The fix is to pick one canonical URL per piece of content, then point the rest at it with a canonical tag, a 301 redirect, or a noindex, depending on whether the duplicate needs to exist for shoppers.

    Google has no duplicate content penalty, but the wasted crawl budget and split link equity quietly cost rankings and revenue. If you would rather not run the audit yourself, talk to our ecommerce SEO team.

    Visionary Marketing SEO teamPublished: 28 June 2026~16 min read

    25–30%

    Share of the web that is duplicate content (Ahrefs)

    85%

    Similarity threshold tools flag as duplicate (Semrush)

    4

    Causes that produce most ecommerce duplication

    What duplicate content means for an ecommerce store

    Duplicate content is content that is identical or highly similar to content found at another URL, whether on your own domain or someone else's. On a store it is rarely one copied paragraph. It is hundreds or thousands of auto-generated URLs serving the same product grid or the same supplier paragraph.

    The distinction that matters is internal versus external. Internal duplication is your own pages competing with each other, which is the common and damaging case on ecommerce sites. External duplication is the same text appearing on other domains, usually because a manufacturer handed the same description to every retailer.

    Google is clear that there is no penalty for this. As John Mueller has put it, "We don't have a duplicate content penalty. It's not that we would demote a site for having a lot of duplicate content" (Google, via Ahrefs, updated January 2021). No penalty does not mean no cost. Duplicate URLs still burn crawl budget, still split the backlinks that should concentrate on one page, and still leave Google guessing which version to rank.

    Ahrefs estimates that 25 to 30 percent of the web is duplicate content (Ahrefs, January 2021). On a large catalogue, your store can generate that ratio internally without anyone noticing.

    Duplicate content on an ecommerce site is identical or near-identical content reachable at two or more URLs, most often produced by product variants, supplier descriptions, faceted filters, and URL variations rather than by copy-pasting.

    For the full technical groundwork that sits under this article, our technical SEO ecommerce checklist covers the wider crawl and indexation picture.

    Duplicate content explained (entity reference table)

    Attribute Value
    DefinitionIdentical or near-identical content reachable at two or more URLs, on the same domain (internal) or across domains (external).
    Four main causes on stores1) Product variants split across separate URLs. 2) Manufacturer or supplier descriptions reused verbatim. 3) Faceted navigation, filtering and sorting. 4) URL variations: HTTP vs HTTPS, www vs non-www, trailing slash, case, tracking parameters, session IDs.
    How to detectGoogle Search Console Page indexing report, a site: search, and a crawler such as Screaming Frog, Ahrefs Site Audit or Semrush Site Audit.
    Canonical vs noindex fixCanonical when the duplicate must stay live and pass signals. Noindex when useful to humans but not to searchers. 301 redirect when the duplicate has no reason to exist.
    Risk levelNo algorithmic penalty, but wasted crawl budget, diluted link equity, the wrong URL ranking, and thin pages dragging on perceived quality.
    Revenue impactIndirect but real: slower indexing of new products, key pages outranked by near-duplicates, link equity scattered across URLs that never convert.

    Sources: Google Search Console documentation; Ahrefs, January 2021; Semrush, February 2025.

    What causes duplicate content on ecommerce sites

    Most generic guides list fourteen causes that apply to any website. We have ordered the twelve below by how much damage they tend to do on real ecommerce catalogues.

    Cause Where it shows up Typical duplicates Recommended fix Severity
    Product variantsSeparate URLs per size, colour or packDozens per productCanonical to main product URLHigh
    Supplier descriptionsManufacturer copy reused across retailersSite-wide, cross-domainRewrite descriptionsHigh
    Faceted navigationCategory pages with size, colour, price filtersThousands per categoryCanonical, noindex or robots rulesHigh
    Sorting parametersSort by price, sort by newestSeveral per categoryCanonical to the unsorted categoryMedium
    URL tracking tagsUTM and campaign parametersOne per campaign linkCanonical to the clean URLMedium
    HTTP/HTTPS, www/non-wwwServer misconfigurationUp to 4x every page301 to one preferred versionHigh
    Trailing slash / caseInconsistent internal links2+ per page301, consistent internal linkingMedium
    Session IDs in URLsID appended to URLOne per sessionCanonical to clean URLMedium
    Internal search resultsOn-site search creating indexable URLsUnlimitedNoindex and block in robots.txtMedium
    PaginationCategory page 2, 3, 4One per pageSelf-referencing canonical per pageLow
    Print or mobile variantsSeparate print or m. URLsOne per pageCanonical to main versionLow
    Staging environment indexedTest site crawlableMirror of whole siteHTTP auth or noindex stagingHigh

    Product variant duplicate content

    Product variant duplicate content happens when each size, colour or pack size of a product sits on its own URL with near-identical copy, fragmenting one product into dozens of competing pages.

    A jacket in eight colours and five sizes can become forty URLs that share the same description, the same specs and almost the same title tag. Google then has to choose a representative, and it does not always choose the one you would. Ahrefs documents exactly this pattern, showing two near-identical product URLs that differ only by pack size yet carry the same title tag (Ahrefs, January 2021).

    The cleaner architecture is a single product page where colour and size are selectors, not separate addresses. Where your platform insists on a URL per variant, a canonical tag on each variant pointing at the parent consolidates the signals.

    Supplier and manufacturer description duplication

    Supplier description duplicate is when you publish the manufacturer's product copy unchanged, so the exact same text appears on every retailer selling that item.

    When fifty retailers paste the same supplier paragraph, none of them owns that content, and Google has little reason to rank yours over a higher-authority competitor's. Semrush notes that a page only counts as a problematic duplicate when it has "little to no original information" and "no added value for the reader compared to a similar page" (Semrush, February 2025). Supplier copy is the textbook case.

    Start with the products that drive margin and traffic, add genuine detail (fit notes, use cases, comparisons, who it suits), and work down the catalogue. Our guide to writing product descriptions at scale sets out a workable process for large catalogues.

    Faceted navigation, filters and sorting

    Faceted navigation creates duplicate content because every filter and sort combination appends parameters to the URL, generating thousands of near-identical category pages.

    A category with six filters can produce thousands of crawlable combinations, most of them thin and near-identical. The rule of thumb: filters that produce pages people genuinely search for can be allowed to index as curated landing pages with unique copy. Filters that only narrow an existing set should canonical back to the parent or be blocked. We go deep on this in our faceted navigation SEO guide.

    URL variations, tracking and technical duplication

    Google treats URLs as case-sensitive, and treats trailing-slash and non-trailing-slash versions as distinct (Ahrefs, January 2021). If your server answers on more than one of the HTTP and www variations, every page on the site silently doubles or quadruples. Tracking parameters such as UTM tags create a fresh URL for every campaign link, which is why each tracked URL should canonical to its clean version.

    Canonical vs noindex vs 301: choosing the right fix

    Use a canonical when the duplicate should stay live for shoppers and pass its signals to the main page, use noindex when the page helps humans but should never appear in search, and use a 301 when the duplicate has no reason to exist at all.

    A canonical is a hint that consolidates indexing signals while keeping the duplicate reachable. A noindex is a directive that removes a page from search while leaving it usable. A 301 permanently moves a URL and its link equity.

    One trap worth naming. Do not put a noindex and a canonical that points elsewhere on the same page expecting both to work cleanly, and do not canonical to a URL that is itself blocked or redirected.

    Decision tool

    Canonical vs noindex vs 301: choose the right fix

    Does this duplicate URL need to stay reachable for shoppers?

    General guidance, not a substitute for a crawl of your own store.

    Tool Use when Keeps page live Passes link equity Best ecommerce example
    Canonical tagDuplicate must stay for shoppersYesYes (consolidated)Product variants, UTM URLs, pagination
    NoindexPage helps humans, not searchersYesNoInternal search results, thin filters
    301 redirectDuplicate has no reason to existNoYes (transferred)HTTP to HTTPS, retired products

    For the full mechanics of canonical implementation across templates, see our guide to canonical tags for ecommerce.

    How variants and filters create duplicates

    A single product or category multiplies into many duplicate URLs because each variant gets its own address and each filter or sort appends a parameter, so one item of content ends up scattered across dozens or thousands of pages.

    1 product page

    Variants

    ?colour=red, ?colour=blue, ?size=S, ?size=L → up to 40 near-identical URLs

    Filters & sorting

    ?sort=price, ?brand=x, ?price=0-50 → thousands of parameter URLs

    URL variants

    http://, www, trailing slash, ?utm= → up to 4x every page

    → 1 canonical URL Google should rank

    How one product and one category quietly become hundreds of crawlable duplicates, and what the correct signals funnel them back to.

    How to find and fix duplicate content (step by step)

    Find duplicate content by combining Google Search Console's Page indexing report with a full-site crawl, then fix each cluster by applying a canonical, a noindex, or a 301 based on whether shoppers need the page.
    1. Pull the Google Search Console indexing report. Look for "Duplicate without user-selected canonical", "Duplicate, Google chose different canonical than user", and "Alternate page with proper canonical tag". The first two are your priority list.
    2. Crawl the whole store. Run Screaming Frog, Ahrefs Site Audit or Semrush Site Audit. Semrush flags pages at least 85 percent identical (Semrush, February 2025). Sort by near-duplicate clusters and by duplicate title tags.
    3. Run a site search and spot supplier copy. A site:yourdomain.com search surfaces matching titles and descriptions. Paste a sentence from a product description into Google in quotes; if dozens of retailers return the same text, that copy is not earning you anything.
    4. Classify every cluster. Use the canonical vs noindex vs 301 logic above. Tag each URL group with its intended fix in a spreadsheet before you touch anything. This is the step most teams skip, and it is why fixes get applied inconsistently.
    5. Apply fixes and consolidate. Implement the canonicals, noindex directives and redirects. Keep internal links pointing only at the canonical version, because inconsistent internal linking re-creates the problem (Ahrefs, January 2021).
    6. Rewrite the high-value duplicates. For products where the duplication is thin or supplier-sourced copy, rewrite rather than redirect.
    7. Recrawl and verify. Recheck the indexing report over the following weeks. Confirm Google has accepted your canonicals and that the right URLs are indexed.

    What the data says about duplicate content and rankings

    In our 2026 ecommerce SEO ranking factor study we found that stores resolving large-scale internal duplication recovered crawl efficiency on their priority pages first, before any new content was added.

    Where ecommerce duplicate content comes from. Source: Visionary Marketing 2026 ecommerce SEO ranking factor study.

    • Around 25 to 30 percent of the web is duplicate content (Ahrefs, January 2021).
    • Google does not apply a duplicate content penalty in normal cases (Google, via Ahrefs and Semrush, 2021 and 2025).
    • Site Audit tools flag pages at roughly 85 percent similarity as duplicates (Semrush, February 2025).

    If you would rather not run this audit yourself, our ecommerce SEO team handles the crawl, classification and fixes as part of an ecommerce technical SEO engagement.

    Competitive edge: how this guide goes further

    Topic Ahrefs Shopify Semrush This guide
    Plain definition and 'no penalty' clarityYesYesYesYes
    Full cause listYes (general)PartialPartial (4)Yes, ecommerce-ranked
    Product variant duplicationLightNoNoDedicated section + fix
    Supplier description duplicationNoLightNoDedicated section + fix
    Canonical vs noindex vs 301 decision logicScatteredScatteredScatteredInteractive tool + table
    Diagram of how duplicates multiplyNoNoNoYes
    First-party dataNoNoNoYes, 2026 study
    Step-by-step find-and-fix with HowTo schemaPartialPartialPartialYes

    Frequently asked questions

    No. Google has stated repeatedly that there is no duplicate content penalty in normal cases (Google, via Ahrefs, updated January 2021). Penalties only apply where duplication is used to manipulate rankings. The real cost is wasted crawl budget, split link equity, and the wrong URL ranking.

    It is when each variant of a product, such as a different colour or size, sits on its own URL with near-identical copy, splitting one product across many competing pages. Fix it with a single product page using variant selectors, or canonical each variant to the main product URL.

    Rewrite the manufacturer's copy with original detail a shopper cannot get from the spec sheet, starting with your highest-revenue products. Reusing supplier text verbatim is external duplication and gives Google no reason to rank your page over a higher-authority retailer's.

    Use a canonical when the duplicate should stay live for shoppers and pass its signals to the main page, such as product variants or tracked URLs. Use noindex when a page helps humans but should never appear in search, such as internal search results or thin filter pages. Use a 301 redirect when the duplicate has no reason to exist.

    Not quite. Thin content is a page with too little original value, while duplicate content is content repeated across URLs. They overlap on stores, because supplier copy and empty filter pages are often both thin and duplicated. The cure is the same: add genuine value or consolidate.

    Combine the Google Search Console Page indexing report with a full crawl from a tool like Screaming Frog, Ahrefs or Semrush, then check for supplier copy with a quoted-sentence search. Classify each cluster before applying canonical, noindex or 301 fixes.

    Duplicate content will not get your store penalised, but left alone it quietly taxes your crawl budget, scatters your links, and lets your own near-duplicates outrank the pages you care about. Find it, classify it, and fix each cluster with the right tool.

    Work With Visionary Marketing

    Clean up duplicates, recover the rankings you already earned

    We crawl, classify and fix duplicate content clusters across UK ecommerce catalogues. The right canonical, the right noindex, the right redirect, on the URLs that matter to revenue.

    Visionary Marketing is a UK-based SEO and Google Ads agency that takes a data-led approach to growth. We don't guess — we analyse your market, competitors, and performance data to build strategies that drive measurable revenue. Every campaign is grounded in real numbers, not assumptions.

    Data-led strategy — every decision backed by real performance data
    Senior specialists only — no junior account managers
    No contracts — month-to-month, cancel anytime
    Revenue-first — we track ROAS, not vanity metrics
    Get a free duplicate content audit

    About the Author

    Chris Coussons, Founder of Visionary Marketing

    Chris Coussons

    Founder · Visionary Marketing

    Chris is the founder of Visionary Marketing, a world-leading, award-winning UK SEO and Google Ads agency named in Digital Reference's Best UK Digital Marketing Agencies 2026. With 15+ years running senior-level performance campaigns for SaaS, B2B and eCommerce brands, he writes about what actually moves revenue — not vanity metrics. Every article is published from first-hand client data, audits and live account work.