SEO • Ecommerce
Duplicate Content on Ecommerce Sites: Every Cause and the Fix
Duplicate content on ecommerce sites is the same or near-identical content reachable at more than one URL, and on stores it comes mainly from four sources: product variants, supplier-supplied descriptions, faceted filters and sorting, and URL variations like HTTP, www, and tracking parameters. The fix is to pick one canonical URL per piece of content, then point the rest at it with a canonical tag, a 301 redirect, or a noindex, depending on whether the duplicate needs to exist for shoppers.
Google has no duplicate content penalty, but the wasted crawl budget and split link equity quietly cost rankings and revenue. If you would rather not run the audit yourself, talk to our ecommerce SEO team.
25–30%
Share of the web that is duplicate content (Ahrefs)
85%
Similarity threshold tools flag as duplicate (Semrush)
4
Causes that produce most ecommerce duplication
What duplicate content means for an ecommerce store
Duplicate content is content that is identical or highly similar to content found at another URL, whether on your own domain or someone else's. On a store it is rarely one copied paragraph. It is hundreds or thousands of auto-generated URLs serving the same product grid or the same supplier paragraph.
The distinction that matters is internal versus external. Internal duplication is your own pages competing with each other, which is the common and damaging case on ecommerce sites. External duplication is the same text appearing on other domains, usually because a manufacturer handed the same description to every retailer.
Google is clear that there is no penalty for this. As John Mueller has put it, "We don't have a duplicate content penalty. It's not that we would demote a site for having a lot of duplicate content" (Google, via Ahrefs, updated January 2021). No penalty does not mean no cost. Duplicate URLs still burn crawl budget, still split the backlinks that should concentrate on one page, and still leave Google guessing which version to rank.
Ahrefs estimates that 25 to 30 percent of the web is duplicate content (Ahrefs, January 2021). On a large catalogue, your store can generate that ratio internally without anyone noticing.
Duplicate content on an ecommerce site is identical or near-identical content reachable at two or more URLs, most often produced by product variants, supplier descriptions, faceted filters, and URL variations rather than by copy-pasting.
For the full technical groundwork that sits under this article, our technical SEO ecommerce checklist covers the wider crawl and indexation picture.
Duplicate content explained (entity reference table)
| Attribute | Value |
|---|---|
| Definition | Identical or near-identical content reachable at two or more URLs, on the same domain (internal) or across domains (external). |
| Four main causes on stores | 1) Product variants split across separate URLs. 2) Manufacturer or supplier descriptions reused verbatim. 3) Faceted navigation, filtering and sorting. 4) URL variations: HTTP vs HTTPS, www vs non-www, trailing slash, case, tracking parameters, session IDs. |
| How to detect | Google Search Console Page indexing report, a site: search, and a crawler such as Screaming Frog, Ahrefs Site Audit or Semrush Site Audit. |
| Canonical vs noindex fix | Canonical when the duplicate must stay live and pass signals. Noindex when useful to humans but not to searchers. 301 redirect when the duplicate has no reason to exist. |
| Risk level | No algorithmic penalty, but wasted crawl budget, diluted link equity, the wrong URL ranking, and thin pages dragging on perceived quality. |
| Revenue impact | Indirect but real: slower indexing of new products, key pages outranked by near-duplicates, link equity scattered across URLs that never convert. |
Sources: Google Search Console documentation; Ahrefs, January 2021; Semrush, February 2025.
What causes duplicate content on ecommerce sites
Most generic guides list fourteen causes that apply to any website. We have ordered the twelve below by how much damage they tend to do on real ecommerce catalogues.
| Cause | Where it shows up | Typical duplicates | Recommended fix | Severity |
|---|---|---|---|---|
| Product variants | Separate URLs per size, colour or pack | Dozens per product | Canonical to main product URL | High |
| Supplier descriptions | Manufacturer copy reused across retailers | Site-wide, cross-domain | Rewrite descriptions | High |
| Faceted navigation | Category pages with size, colour, price filters | Thousands per category | Canonical, noindex or robots rules | High |
| Sorting parameters | Sort by price, sort by newest | Several per category | Canonical to the unsorted category | Medium |
| URL tracking tags | UTM and campaign parameters | One per campaign link | Canonical to the clean URL | Medium |
| HTTP/HTTPS, www/non-www | Server misconfiguration | Up to 4x every page | 301 to one preferred version | High |
| Trailing slash / case | Inconsistent internal links | 2+ per page | 301, consistent internal linking | Medium |
| Session IDs in URLs | ID appended to URL | One per session | Canonical to clean URL | Medium |
| Internal search results | On-site search creating indexable URLs | Unlimited | Noindex and block in robots.txt | Medium |
| Pagination | Category page 2, 3, 4 | One per page | Self-referencing canonical per page | Low |
| Print or mobile variants | Separate print or m. URLs | One per page | Canonical to main version | Low |
| Staging environment indexed | Test site crawlable | Mirror of whole site | HTTP auth or noindex staging | High |
Product variant duplicate content
Product variant duplicate content happens when each size, colour or pack size of a product sits on its own URL with near-identical copy, fragmenting one product into dozens of competing pages.
A jacket in eight colours and five sizes can become forty URLs that share the same description, the same specs and almost the same title tag. Google then has to choose a representative, and it does not always choose the one you would. Ahrefs documents exactly this pattern, showing two near-identical product URLs that differ only by pack size yet carry the same title tag (Ahrefs, January 2021).
The cleaner architecture is a single product page where colour and size are selectors, not separate addresses. Where your platform insists on a URL per variant, a canonical tag on each variant pointing at the parent consolidates the signals.
Supplier and manufacturer description duplication
Supplier description duplicate is when you publish the manufacturer's product copy unchanged, so the exact same text appears on every retailer selling that item.
When fifty retailers paste the same supplier paragraph, none of them owns that content, and Google has little reason to rank yours over a higher-authority competitor's. Semrush notes that a page only counts as a problematic duplicate when it has "little to no original information" and "no added value for the reader compared to a similar page" (Semrush, February 2025). Supplier copy is the textbook case.
Start with the products that drive margin and traffic, add genuine detail (fit notes, use cases, comparisons, who it suits), and work down the catalogue. Our guide to writing product descriptions at scale sets out a workable process for large catalogues.
Faceted navigation, filters and sorting
Faceted navigation creates duplicate content because every filter and sort combination appends parameters to the URL, generating thousands of near-identical category pages.
A category with six filters can produce thousands of crawlable combinations, most of them thin and near-identical. The rule of thumb: filters that produce pages people genuinely search for can be allowed to index as curated landing pages with unique copy. Filters that only narrow an existing set should canonical back to the parent or be blocked. We go deep on this in our faceted navigation SEO guide.
URL variations, tracking and technical duplication
Google treats URLs as case-sensitive, and treats trailing-slash and non-trailing-slash versions as distinct (Ahrefs, January 2021). If your server answers on more than one of the HTTP and www variations, every page on the site silently doubles or quadruples. Tracking parameters such as UTM tags create a fresh URL for every campaign link, which is why each tracked URL should canonical to its clean version.
Canonical vs noindex vs 301: choosing the right fix
Use a canonical when the duplicate should stay live for shoppers and pass its signals to the main page, use noindex when the page helps humans but should never appear in search, and use a 301 when the duplicate has no reason to exist at all.
A canonical is a hint that consolidates indexing signals while keeping the duplicate reachable. A noindex is a directive that removes a page from search while leaving it usable. A 301 permanently moves a URL and its link equity.
One trap worth naming. Do not put a noindex and a canonical that points elsewhere on the same page expecting both to work cleanly, and do not canonical to a URL that is itself blocked or redirected.
Decision tool
Canonical vs noindex vs 301: choose the right fix
Does this duplicate URL need to stay reachable for shoppers?
General guidance, not a substitute for a crawl of your own store.
| Tool | Use when | Keeps page live | Passes link equity | Best ecommerce example |
|---|---|---|---|---|
| Canonical tag | Duplicate must stay for shoppers | Yes | Yes (consolidated) | Product variants, UTM URLs, pagination |
| Noindex | Page helps humans, not searchers | Yes | No | Internal search results, thin filters |
| 301 redirect | Duplicate has no reason to exist | No | Yes (transferred) | HTTP to HTTPS, retired products |
For the full mechanics of canonical implementation across templates, see our guide to canonical tags for ecommerce.
How variants and filters create duplicates
A single product or category multiplies into many duplicate URLs because each variant gets its own address and each filter or sort appends a parameter, so one item of content ends up scattered across dozens or thousands of pages.
Variants
?colour=red, ?colour=blue, ?size=S, ?size=L → up to 40 near-identical URLs
Filters & sorting
?sort=price, ?brand=x, ?price=0-50 → thousands of parameter URLs
URL variants
http://, www, trailing slash, ?utm= → up to 4x every page
How one product and one category quietly become hundreds of crawlable duplicates, and what the correct signals funnel them back to.
How to find and fix duplicate content (step by step)
Find duplicate content by combining Google Search Console's Page indexing report with a full-site crawl, then fix each cluster by applying a canonical, a noindex, or a 301 based on whether shoppers need the page.
- Pull the Google Search Console indexing report. Look for "Duplicate without user-selected canonical", "Duplicate, Google chose different canonical than user", and "Alternate page with proper canonical tag". The first two are your priority list.
- Crawl the whole store. Run Screaming Frog, Ahrefs Site Audit or Semrush Site Audit. Semrush flags pages at least 85 percent identical (Semrush, February 2025). Sort by near-duplicate clusters and by duplicate title tags.
- Run a site search and spot supplier copy. A site:yourdomain.com search surfaces matching titles and descriptions. Paste a sentence from a product description into Google in quotes; if dozens of retailers return the same text, that copy is not earning you anything.
- Classify every cluster. Use the canonical vs noindex vs 301 logic above. Tag each URL group with its intended fix in a spreadsheet before you touch anything. This is the step most teams skip, and it is why fixes get applied inconsistently.
- Apply fixes and consolidate. Implement the canonicals, noindex directives and redirects. Keep internal links pointing only at the canonical version, because inconsistent internal linking re-creates the problem (Ahrefs, January 2021).
- Rewrite the high-value duplicates. For products where the duplication is thin or supplier-sourced copy, rewrite rather than redirect.
- Recrawl and verify. Recheck the indexing report over the following weeks. Confirm Google has accepted your canonicals and that the right URLs are indexed.
What the data says about duplicate content and rankings
In our 2026 ecommerce SEO ranking factor study we found that stores resolving large-scale internal duplication recovered crawl efficiency on their priority pages first, before any new content was added.
Where ecommerce duplicate content comes from. Source: Visionary Marketing 2026 ecommerce SEO ranking factor study.
- Around 25 to 30 percent of the web is duplicate content (Ahrefs, January 2021).
- Google does not apply a duplicate content penalty in normal cases (Google, via Ahrefs and Semrush, 2021 and 2025).
- Site Audit tools flag pages at roughly 85 percent similarity as duplicates (Semrush, February 2025).
If you would rather not run this audit yourself, our ecommerce SEO team handles the crawl, classification and fixes as part of an ecommerce technical SEO engagement.
Competitive edge: how this guide goes further
| Topic | Ahrefs | Shopify | Semrush | This guide |
|---|---|---|---|---|
| Plain definition and 'no penalty' clarity | Yes | Yes | Yes | Yes |
| Full cause list | Yes (general) | Partial | Partial (4) | Yes, ecommerce-ranked |
| Product variant duplication | Light | No | No | Dedicated section + fix |
| Supplier description duplication | No | Light | No | Dedicated section + fix |
| Canonical vs noindex vs 301 decision logic | Scattered | Scattered | Scattered | Interactive tool + table |
| Diagram of how duplicates multiply | No | No | No | Yes |
| First-party data | No | No | No | Yes, 2026 study |
| Step-by-step find-and-fix with HowTo schema | Partial | Partial | Partial | Yes |
Frequently asked questions
Duplicate content will not get your store penalised, but left alone it quietly taxes your crawl budget, scatters your links, and lets your own near-duplicates outrank the pages you care about. Find it, classify it, and fix each cluster with the right tool.
Work With Visionary Marketing
Clean up duplicates, recover the rankings you already earned
We crawl, classify and fix duplicate content clusters across UK ecommerce catalogues. The right canonical, the right noindex, the right redirect, on the URLs that matter to revenue.
Visionary Marketing is a UK-based SEO and Google Ads agency that takes a data-led approach to growth. We don't guess — we analyse your market, competitors, and performance data to build strategies that drive measurable revenue. Every campaign is grounded in real numbers, not assumptions.
Related Services
How We Can Help
Ecommerce SEO Agency
Crawl, indexation and content work on stores with thousands of URLs.
Learn MoreTechnical SEO
Canonicalisation, redirects, schema and Core Web Vitals as one programme.
Learn MoreTechnical SEO Ecommerce Checklist
The pillar this article supports.
Learn MoreFaceted Navigation SEO
Decide which facets to index, canonical or block.
Learn More