What is crawl budget for an ecommerce site?

Crawl budget is the number of URLs Googlebot is willing and able to crawl on your store in a given window. It is set by crawl capacity limit (how hard Google can hit your server safely) and crawl demand (how much Google wants your pages). On a large catalogue, filter and variant URLs eat into it. Source: Google, Crawl Budget Management, updated 2025.

Does crawl budget matter for small stores?

Usually not. Google says crawl budget management is for sites with over a million pages that change weekly, sites over ten thousand pages that change daily, or sites with many URLs stuck in 'Discovered, currently not indexed'. A small store crawled the same day it publishes does not need to worry. Source: Google, 2025.

What wastes the most crawl budget on ecommerce?

Faceted navigation. Filters, sort orders and variant combinations spin up huge numbers of near-duplicate URLs that soak up crawl and leave less for product and category pages. In our 2026 study, unmanaged faceted navigation wasted 2.7x more crawl on parameter URLs. Source: Visionary 2026 Ecommerce SEO Ranking Factor Study.

How do I check what Google is crawling?

Server log file analysis. Filter your logs to verified Googlebot, then group requests by URL type to see how much crawl lands on product pages versus filter and search URLs. Cross-reference with the Crawl Stats report in Google Search Console for total requests and server response time. Source: Google, 2025.

Should I use noindex or robots.txt to control crawl budget?

robots.txt for crawl control. Google still has to request a noindexed page before it can read the tag, which spends budget. To keep junk URLs out of the crawl entirely, disallow them in robots.txt. Source: Google, 2025.

What is crawl depth and why does it matter?

Crawl depth is the number of clicks from the homepage to a page. Pages buried deep get crawled less often and rank worse. Keep every revenue page within three or four clicks of the homepage.

How long does it take to see results after fixing crawl budget?

It varies by catalogue size and crawl rate. Re-pull your logs a few weeks after the fix and check whether the share of crawl spent on parameter URLs has dropped and the share on revenue pages has risen. That shift is the leading indicator, before rankings move.

SEO • Ecommerce

Crawl Budget for Large Ecommerce Sites: A Practical Playbook

Crawl budget is the number of URLs Google is willing and able to crawl on your site in a given window. On a large store it gets eaten alive by faceted navigation, filter and sort parameters, and variant URLs, so your money pages go uncrawled. Large stores fix it by blocking junk URLs in robots.txt, canonicalising filters, flattening crawl depth, and proving the gains with log files.

Source: Google, Crawl Budget Management, updated 2025.

Chris Coussons | Visionary Marketing•Published: 28 June 2026•14 min read

40%

Share of URLs Google crawls on an unoptimised store (Botify)

Share crawled on one 10m-page marketplace (Botify)

+0.35 / -0.34

Rank correlation: managed vs unmanaged facets (VM 2026)

What crawl budget actually is (and why only big stores need to care)

Crawl budget is the set of URLs Google can and wants to crawl on a host, set by two things: crawl capacity limit (how hard Google can hit your server without slowing it down) and crawl demand (how much Google wants your pages, driven by popularity, freshness and perceived inventory). Source: Google, Crawl Budget Management, updated 2025.

Google itself is clear that most sites do not need to worry about this. The guidance is aimed at sites with over a million pages that change weekly, sites over ten thousand pages that change daily, or sites with a large share of URLs sitting in "Discovered, currently not indexed". Large catalogues hit all three.

This article is the playbook that sits under our wider technical SEO for ecommerce checklist, focused entirely on getting Google to spend its crawl on the pages that earn.

The crawl budget reference table

A one-screen summary of the concept, the drains and the fix levers before we go deeper.

Attribute	Value
Definition	The set of URLs Googlebot can and will crawl on a host in a given window, governed by crawl capacity limit and crawl demand. Source: Google, 2025.
What consumes it on ecommerce	Faceted navigation, filter and sort parameters, internal search result pages, session IDs, infinite pagination, and a separate URL per variant.
How you detect it	Server log file analysis (Googlebot hits by URL type), plus the Crawl Stats report in Google Search Console.
Primary fix levers	robots.txt disallow rules for parameters, canonical tags on filtered pages, flatter crawl depth, clean XML sitemaps, 404/410 for dead URLs, faster server response.
Revenue impact	Uncrawled product and category pages cannot be indexed, so they cannot rank or earn. On an unoptimised store Google crawls roughly 40% of URLs. Source: Botify.
Thresholds by catalogue size	Under ~10k URLs: rarely an issue. 10k to 1m with frequent change: monitor. 1m+ URLs: active management required. Source: Google, 2025.

What drains crawl budget on a large catalogue

The top culprit is faceted navigation. Filter, sort and variant combinations multiply your URL count into the millions without adding a single new product.

A store with 5,000 products and twenty filterable attributes can spin up millions of addresses, most near-duplicate. The six biggest drains we see on real catalogues:

Filter and sort parameters (colour, size, price brackets, sort order).
Faceted navigation combinations (multiple filters applied together).
Internal search result pages crawlable and indexable.
Session IDs and tracking parameters appended to URLs.
Infinite or unbounded pagination chains.
A unique URL per variant (every colour and size of one product).

On the facet point, the deep treatment is in our faceted navigation SEO guide. On the near-duplicate URL point, see ecommerce index bloat.

How to detect crawl waste with log file analysis

The only way to see what Google actually crawls, rather than what you think it crawls, is your server log files.

A log line records the requesting user agent, the URL, the status code, the timestamp and the bytes returned. Verify it really is Googlebot with reverse DNS rather than trusting the user-agent string (Google, verifying Googlebot guidance, 2025).

The practical method: pull a representative window of logs, isolate verified Googlebot, group requests by URL type (product, category, filter/parameter, search, other), and calculate the share of crawl spent on each. The headline metric is the percentage of crawl landing on parameter and search URLs that should never be crawled.

Pair this with the Crawl Stats report in Search Console, which shows total crawl requests, average response time and host availability problems. That is your capacity-limit signal. Site architecture also shapes what gets crawled, which we cover in ecommerce site architecture.

66.249.66.1 "GET /shop/jackets/oxford-jacket HTTP/1.1" 200 → wanted: product

66.249.66.1 "GET /shop/jackets/category HTTP/1.1" 200 → wanted: category

66.249.66.1 "GET /shop/jackets?colour=red&sort=price HTTP/1.1" 200 → wasted: filter

66.249.66.1 "GET /shop/jackets?colour=red&size=L&sort=price HTTP/1.1" 200 → wasted: facet combo

66.249.66.1 "GET /search?q=red+jacket HTTP/1.1" 200 → wasted: internal search

Verify Googlebot by reverse DNS, not the user-agent string. Source: Google, 2025.

The crawl budget calculator

Plug in your catalogue size and Googlebot's daily rate, then see how long Google needs to crawl every URL at your current waste level versus a healthier 10%.

Interactive

Crawl Budget Calculator: how long to crawl your whole catalogue

Total crawlable URLs

Average Googlebot crawl rate: 5,000 URLs/day

Share of crawl wasted on filter / parameter URLs: 60%

Days to full crawl as-is

125 days

After fixing waste to 10%

56 days

Days saved

69 days

At this crawl rate, cutting waste from 60% to 10% gets your full catalogue crawled 69 days sooner.

Indicative estimate. Real crawl rate varies by server response, site authority and demand. Crawl-rate concept and capacity limit per Google, Crawl Budget Management, 2025.

Crawl depth in plain terms is the number of clicks from the homepage to a page. Pages buried six clicks deep get crawled rarely. Our rule: every revenue page within three or four clicks of the homepage.

How to reclaim crawl budget in seven steps

You fix crawl budget by removing junk from the crawl, then concentrating it on revenue pages.

Block parameter and sort URLs in robots.txt. Disallow the filter, sort and session parameters that create near-duplicate URLs. Google will not shift freed budget elsewhere unless it is already hitting your serving limit, so do this to stop waste, not as a magic boost. Source: Google, 2025.
Canonicalise filtered pages to the parent category. Point every filtered or sorted version at the clean category URL so signals consolidate on one page.
Do not rely on noindex for crawl control. Google still has to request a noindexed page before it sees the tag, which spends budget. Use robots.txt to keep it out of the crawl entirely. Source: Google, 2025.
Return 404 or 410 for dead URLs, and flatten redirect chains. A 404 is a strong signal to stop crawling a URL. Long redirect chains waste crawl and bleed PageRank at every hop. Source: Google, 2025.
Flatten crawl depth and fix orphan pages. Get revenue pages within three or four clicks of the homepage and make sure every important page has internal links pointing at it.
Keep XML sitemaps to clean, indexable, canonical URLs only, with accurate lastmod values so Google can prioritise what changed.
Speed up server response and re-crawl to confirm. Faster responses raise the crawl capacity limit, so Google can read more. Then re-pull logs and confirm the parameter-crawl share has dropped. Source: Google, 2025.

First-party data: what the numbers say

Faceted navigation handling correlates with ecommerce rank at +0.35 when managed and -0.34 when left unmanaged, and unmanaged faceted navigation wastes 2.7x more crawl budget on parameter URLs. Source: Visionary 2026 Ecommerce SEO Ranking Factor Study, 100,000 pages crawled Q1 2026. Combine with the Botify floor of 40% URLs crawled on unoptimised stores.

Days to crawl a 250,000-URL catalogue, by crawl rate and waste level. Computed from total URLs divided by effective crawl rate. Crawl-rate model per Google, 2025.

How this playbook beats the standard advice

What you get	Google's own doc	Typical SEO tool guide	This playbook
Definition of crawl budget	Yes	Yes	Yes
Ecommerce-specific drains named	Partial	Partial	Yes, all six
Log file analysis walkthrough	No	Sometimes	Yes, step by step
Interactive crawl budget calculator	No	No	Yes
First-party correlation data	No	No	Yes, 100k-page study
UK ecommerce framing and GBP	No	No	Yes

Where this fits, and where to get help

Crawl budget is the first technical fix to make on any large store, ahead of speed and schema work. A fast page Google never crawls earns nothing.

This is exactly the work we do inside our ecommerce SEO services. Log-led crawl audits are the first step on large catalogues.

Methodology and sources

Google, Crawl Budget Management, Google Search Central, updated 2025.
Google, verifying Googlebot guidance, Google Search Central, 2025.
Botify, crawl budget research (carried via our technical SEO for ecommerce pillar).
Visionary 2026 Ecommerce SEO Ranking Factor Study, 100,000 pages crawled Q1 2026.

Frequently Asked Questions

Work With Visionary Marketing

Reclaim the crawl Google is wasting

We run log-led crawl audits on large UK ecommerce catalogues. Find the waste, fix it, and prove the recovery in the next log pull.

Visionary Marketing is a UK-based SEO and Google Ads agency that takes a data-led approach to growth. We don't guess — we analyse your market, competitors, and performance data to build strategies that drive measurable revenue. Every campaign is grounded in real numbers, not assumptions.

Data-led strategy — every decision backed by real performance data

Senior specialists only — no junior account managers

No contracts — month-to-month, cancel anytime

Revenue-first — we track ROAS, not vanity metrics

Get a free crawl audit

Related Services