How to Exclude Bot Traffic in GA4: Filtering Known Bots and Spiders
Bot traffic distorts every metric that matters in Google Analytics 4. Conversion rates get diluted, page views inflate artificially, average session duration collapses, and the bounce rate becomes meaningless. If you base spending decisions on that data, you are optimising against pollution. Reliable analytics is the foundation of every campaign call in our SEO services for UK businesses, and the first job is always getting the bots out of the numbers.
GA4 ships with a filter that promises to do this automatically. It does part of the job. But it misses the type of bot pollution most UK websites are seeing in 2026, which is referral-style and direct traffic spam from a handful of countries that has nothing to do with crawlers. Below you will find the built-in filter, the GA4 admin tools that catch what it misses, the exact pattern we are seeing on our own analytics this month, and the segment recipe we use to strip bots out of reports before sending them to clients.
Why bot traffic distorts your GA4 data
Real users and bots leave wildly different fingerprints. A genuine visitor arrives, reads, scrolls, clicks, and sometimes converts. A bot opens a page in a fraction of a second, fires zero engagement events, and disappears. When both get counted equally, every metric you rely on becomes harder to trust.
Bounce rate is the obvious casualty. If a bot lands on one page and leaves, that single-page session inflates the bounce rate. If a bot navigates through dozens of pages at machine speed, average session duration crashes and pages-per-session looks artificially healthy. Neither pattern matches the behaviour of a buyer evaluating your services.
The damage shows up most clearly in attribution. If you are running a targeted campaign through our Facebook advertising service and a slug of referral spam lands on the same landing page, the campaign’s measured conversion rate drops even though the actual buyer behaviour was fine. Worst case, you pause a profitable campaign because the dashboard told you it had stopped converting.
How GA4’s built-in bot filter works
GA4’s “Exclude all hits from known bots and spiders” filter compares the user agent string on every hit against a known-bot list maintained by Google. When the user agent matches, the hit is dropped before it lands in your reports.
It is on by default for every GA4 property. You cannot turn it off through the standard interface, and you do not need to. There is no setting for it because Google treats this as table stakes.
The two things to understand about it:
- It only filters hits where the user agent self-identifies as a known bot. Bots that present a normal browser user agent string slip through.
- It does not retroactively clean historical data. If your June 2025 reports were contaminated, they stay contaminated.
That second point matters when you are diagnosing a sudden traffic spike. Once a bot has been counted, the only way to exclude it from a comparison view is to use a segment, an audience, or a filter inside Explorations.
Which bots GA4’s built-in filter catches
Google bases its bot list on the IAB and ABC International Spiders and Bots List. This is the same list used by most ad-verification platforms and is updated on a quarterly cycle.
In practical terms, it catches:
- Search engine crawlers (Googlebot, Bingbot, DuckDuckBot, Baidu, Yandex)
- SEO tool crawlers (Ahrefsbot, Semrushbot, Mj12bot, Dotbot)
- Social platform link previewers (Facebook External Hit, Twitterbot, LinkedInBot)
- Major uptime monitors (Pingdom, Uptime Robot, StatusCake)
- Most archive and research crawlers (Common Crawl, Internet Archive)
What it does not catch:
- Headless browsers running automation frameworks (Puppeteer, Playwright, Selenium) when configured to present a normal Chrome user agent
- Scraping farms running residential proxies with rotating, browser-like user agents
- Click-fraud botnets simulating organic user behaviour
- The referral-spam pattern most UK websites see, which we cover in detail below
The list is updated, but updates lag the bot ecosystem by weeks at minimum. Anything new gets through until Google catches up.
How to enable GA4’s bot filter (2026 step-by-step)
GA4’s bot filter is enabled by default on every property created since launch, so for most people there is nothing to enable. Where confusion arises is around the related Internal Traffic and Developer Traffic filters, which are separate, off by default, and where most of the useful filtering work happens.
If you want to confirm the bot filter is doing its job:
- Open Admin (gear icon, bottom left)
- Under the Property column, click Data Streams
- Open your website data stream
- Scroll to Google tag and click Configure tag settings
- Open Show all under Settings
- Look for Define internal traffic and Filter unwanted referrals (covered below)
There is no toggle for the bots filter itself on this screen. That is normal. The GA4 data filters documentation confirms it runs without configuration.
What you do want to set up on this screen are two filters Google does not turn on for you: an Internal Traffic data filter and an Unwanted Referrals exclusion. Both are covered below.
The bot pattern most guides miss: direct traffic from non-target countries
The biggest pollution problem on UK B2B websites in 2026 is not crawlers. The bot list catches those. The problem is referral-style spam and direct traffic spoofing from a small set of countries, almost always presenting as Direct or (not set) source, with zero engagement.
Here is the pattern from our own GA4 data in the first half of June 2026. Source: Direct, country not the UK, engagement rate zero, session duration close to zero:
| Date | Singapore | United States | China | Vietnam | Engagement |
|---|---|---|---|---|---|
| 5 June 2026 | 32 | 8 | 3 | 2 | 0% |
| 6 June 2026 | 21 | 6 | 2 | 1 | 0% |
| 7 June 2026 | 14 | 1 | 4 | 1 | 0% |
| 8 June 2026 | 27 | 10 | 5 | 0 | 0% |
| 11 June 2026 | 40 | 15 | 6 | 0 | 0% |
On any given day, that is between thirty and sixty sessions of pure noise. None of it comes from a user agent on the IAB list, so the built-in filter cannot touch it. Most of it presents as Direct traffic with no referrer, which is a clue that something is forging a Measurement Protocol payload directly into GA4 rather than loading the website at all.
There are three reliable signals that flag this kind of traffic:
- Source is Direct and engagement rate is exactly zero across the session
- Country is one you do not market in
- Average session duration is under two seconds and pages per session is exactly one
Any one of those alone is not conclusive. All three together is a near-certain bot signature, and that is the basis of the segment recipe further down.
Set up an Unwanted Referrals exclusion in five minutes
Unwanted Referrals are the easiest single fix. They tell GA4 that traffic appearing to come from specific domains should not be attributed as referral traffic. Useful when you see your own checkout provider, your own payment gateway, or a known spam domain showing up in your acquisition reports.
To configure:
- Open Admin and click Data Streams under the Property column
- Open your website data stream
- Scroll to Google tag and click Configure tag settings
- Click List unwanted referrals
- Add each domain you want excluded (one rule per domain, or use a regex match for a pattern)
Common domains worth adding for UK websites:
- Your payment processor (stripe.com, paypal.com, klarna.com) so checkout returns are not counted as referrals
- Your CRM or marketing automation tool when it sends traffic back to your website
- Known spam referrer patterns (semalt.com, buttons-for-website.com, free-share-buttons.com and similar legacy referral-spam domains)
This will not stop the traffic appearing in your reports altogether. It just stops it being mis-attributed. That distinction matters when you are evaluating channel performance.
Create a Data Filter for Internal Traffic
The Internal Traffic filter is the most underused tool in GA4. It removes traffic from a defined IP range (or a defined traffic_type parameter) from your reports completely, including in historical comparisons.
This is the right place to exclude your own office and your client’s office. It does the same job that view-level IP filters used to do in Universal Analytics, and it works.
Configure in two steps. First, define what counts as internal:
- Admin then Data Streams then your website data stream
- Configure tag settings then Define internal traffic
- Add a rule: traffic_type equals internal, IP address equals or matches CIDR range, with the office IP
Second, turn the filter on:
- Back in Admin under the Property column, click Data Filters
- The Internal Traffic filter is listed but in Testing mode by default
- Click into it and switch it from Testing to Active
The Testing-to-Active step is the one most people forget. While it is in Testing mode, the filter does nothing to your standard reports. Switching to Active strips internal traffic from every standard report going forward. Read the Google Analytics data filter documentation before going live with this, because Active filters cannot be applied retroactively to data collected while the filter was in Testing.
Build a real-users-only segment in Explorations
For ad hoc bot exclusion in a specific report, the Explorations area is the right tool. Create a segment that defines real users and apply it to any analysis. The segment we use combines the three signals from the bot pattern section above.
- Open Explorations in the left navigation, then create a Blank exploration
- Under Segments, click the plus and choose New segment
- Pick Session segment
- Add three conditions joined with AND: Country exactly matches United Kingdom; Engagement rate is greater than zero; Session source / medium does not exactly match (direct) / (none)
- Save the segment as Real UK Users and apply it to any exploration
Adjust the country to wherever you market. The point is to use one segment definition across every analysis so all of your reporting compares on a clean baseline. Without that, two reports built in different weeks can look contradictory simply because one accidentally included bot traffic and the other did not.
If your bot tail comes from a specific country that is also a legitimate market, lean on the engagement signal more heavily. Most real users register at least some engagement; pure forge-the-payload bots register none.
Save the segment definition somewhere you can re-create it quickly, because GA4 segments are scoped to the exploration they were built in. Sharing the exploration with colleagues is the standard way to make the segment portable.
When to add Cloudflare or server-side bot filtering
GA4 filtering cleans up your analytics. It does nothing to stop the bots hitting your website in the first place. For most UK businesses, that is fine: a Direct-(none) bot does not load the page, does not consume server resources, and is just a payload sent straight to GA4. Strip it from analytics and you are done.
When you do need to go further is when you see one of these signals:
- Server logs show the same bot pattern at the application layer (high request volume from suspect IPs, not just GA4 hits)
- Form spam volume is rising despite the GA4 noise being filtered
- Origin server CPU or bandwidth costs are climbing without a matching rise in real traffic
At that point, an upstream layer becomes worth the effort. The Cloudflare bot management documentation covers the options at edge level. WordPress websites can use a security plugin to apply application-layer rate limits and bot scoring. Whichever path you take, the GA4 filtering work above still pays for itself because it gives you a clean baseline to measure the upstream change against.
Common bot-filtering mistakes to avoid
The biggest mistake is over-filtering. We have seen businesses build segment conditions so aggressive that they accidentally exclude high-intent buyers who happen to browse fast, users on screen readers, and users on slower connections whose sessions log as brief.
Specific patterns to be careful with:
- Filtering on session duration alone catches some real users. Always combine it with another signal (country, engagement, or source).
- Filtering on browser type catches accessibility tools, kiosk browsers, and embedded browsers used in some B2B portals.
- Country filters need to match your reach. Some UK businesses do legitimate trade with US prospects or Singapore enterprise buyers. A blanket country exclusion costs you those signals.
- Custom segments built once and never reviewed drift out of date. Bots change. Real-user behaviour changes. Audit the segment quarterly.
The other recurring trap is filtering inconsistently across tools. If GA4 reports show 100 sessions a day after filtering and your CRM attribution tool shows 140 because it does not have the same filter applied, every report comparing them looks wrong. Document the filter rules once and apply them to every attribution surface, including your content marketing tracking and your LinkedIn advertising reporting.
Bot traffic will not stop. New patterns emerge every quarter and the IAB list, however well-maintained, lags reality. Treat bot filtering as ongoing maintenance rather than a one-off setup, and your reporting will hold up under scrutiny.
FAQs
Does GA4 automatically filter out bot traffic?
GA4’s built-in bot filter is on by default and removes traffic from any user agent on the IAB and ABC International Spiders and Bots List. That covers search crawlers, SEO tools, social link previewers, and major uptime monitors. It does not cover headless browsers presenting a normal Chrome user agent, scraping farms on residential proxies, or the Direct-source bot pattern most UK websites see today. You will need additional filters for the bots the list does not catch.
How can I tell if bot traffic is affecting my GA4 data?
Look for three signals together. Direct or (not set) source, country outside your normal markets, and engagement rate of exactly zero. Any one of those alone is not conclusive but all three together is a near-certain bot signature. Sudden traffic spikes from one country, zero-duration sessions, and pages per session of exactly one are the most reliable indicators.
Will enabling GA4 bot filters clean up historical data?
No. GA4 filters apply only to data collected after the filter is active. Historical data stays as it was recorded. The workaround is to build a segment in Explorations that excludes the bot pattern and apply it to historical comparisons; you cannot strip the data out of standard reports retroactively.
What is the difference between GA4's bot filter and the Internal Traffic filter?
The bot filter is automatic, applies to every property, and excludes traffic from user agents on Google’s known-bot list. The Internal Traffic filter is one you configure yourself, off by default, and excludes traffic matching a defined IP range or traffic_type parameter. Set up Internal Traffic to exclude your office and client offices from reports.
Why is my GA4 Direct traffic so high?
Two main causes. One is legitimate Direct traffic from buyers typing your URL or returning via bookmarks. The other is bot pollution presenting as Direct because something is forging a Measurement Protocol payload to GA4 without loading your website. The fix is the same as any other bot signal: filter Direct + non-target country + zero engagement out of your standard reports using a segment in Explorations.
Will Cloudflare stop GA4 bot traffic?
Cloudflare can block bots before they reach your website but most GA4 bot pollution does not load your website at all. It hits the Measurement Protocol endpoint directly. So Cloudflare alone will not fix GA4 numbers unless the bot is loading pages. Pair Cloudflare bot management with the GA4 filters above for the cleanest possible analytics.