FREE WEBINAR

Amazon Full Service: Common Mistakes in Account Management

AI Bots Are Visiting Your Website — And You Can’t See Them

Key Takeaways

  • AI crawlers from ChatGPT, Claude, Perplexity, and Meta AI now account for over 4% of all web requests globally, yet Google Analytics records none of this traffic.
  • Server-side middleware tracking reveals the true scale: one website logged 754 AI bot requests from 9 different AI systems in a single 24-hour window.
  • Ahrefs Brand Radar detects approximately 2-3% of actual AI mentions, missing around 97% because it cannot see the hidden sub-queries AI platforms use to retrieve content.
  • ChatGPT sources its URLs from Bing’s index, which means stale or duplicate Bing entries send ChatGPT to dead pages on your site and waste your crawl budget.
  • Pages with self-referencing canonical tags are 1.92 times more likely to get cited by AI, making canonical hygiene one of the highest-return technical fixes available today.
  • AI visibility and Google search ranking are not the same channel: data shows 191 pages being crawled by AI bots with zero Google search traffic, and 14 pages ranking in Google that AI bots had never visited.

General Summary

AI systems are already crawling your website, reading your content, and deciding whether to recommend your brand every time a potential customer asks them a relevant question. For Amazon sellers generating significant revenue, this is not a future concern — it is happening right now, at scale, entirely outside the visibility of standard analytics tools. Over 51% of all internet traffic is now automated bots, a threshold crossed for the first time in 2024. AI crawlers specifically account for over 4% of all web requests globally. Standard tools like Google Analytics 4 cannot detect any of this activity because they rely on client-side JavaScript that AI bots never execute. The brands that understand this gap and start measuring AI visibility today will be the ones showing up when their customers ask AI what to buy. The ones that wait will be recommending their competitors without knowing it.

Extractive Summary

Google Analytics 4 cannot record AI bot visits because AI systems send raw server requests that never load JavaScript. Ahrefs Brand Radar detects roughly 2-3% of actual AI brand mentions due to a structural limitation called the dark query problem. Server-side middleware is the only tracking method that logs every request to your site deterministically, whether from a human or a bot. ChatGPT alone generated 517 hits on one website in 24 hours, all from live retrieval events where real users were getting answers sourced from that site’s content. URL fragmentation and stale Bing index entries are the two most common technical faults reducing AI crawl efficiency. ChatGPT and Claude fetch your live page content, while Perplexity and Gemini rely entirely on cached search index snapshots, which means optimisation must run on two separate tracks. AI citation and Google search ranking overlap but are not interchangeable: the two channels serve different content from different pages in many cases.

Abstractive Summary

The rise of AI-driven search represents a structural shift in how brands get discovered online. For the past decade, visibility meant ranking in Google. That model is not disappearing, but a second visibility layer has quietly appeared on top of it: AI recommendation. Unlike search rankings, which a brand can track and optimise in near-real time, AI recommendation has been essentially invisible. The measurement gap is the core problem. Without data, there is no optimisation. Without optimisation, there is no presence. The brands that solve the measurement problem first will have a meaningful head start over those still relying on tools designed for a different era.

For Amazon sellers, the implications go further. Amazon’s own AI system, Amazon Q, is already crawling external websites focused on Amazon strategy and advertising. This suggests that AI influence on e-commerce extends beyond consumer-facing search engines and into the tools Amazon itself uses to evaluate content and credibility. Sellers who treat their website as an AI-readable asset — structured, technically clean, and regularly updated — are building a channel that most competitors do not yet understand. The sellers who moved early on Amazon PPC when it launched had a lasting advantage. The same dynamic is likely to play out here.

Why Can’t Your Analytics Tool See AI Bot Traffic?

Google Analytics 4 cannot record AI bot visits because AI systems send raw server requests that never load the JavaScript tracking code GA4 depends on. When a real person visits your site, their browser executes a small tracking script. That script fires and logs the session. AI bots skip this entirely.

AI systems like ChatGPT and Claude send an HTTP request directly to your server, download the HTML, extract what they need, and move on. No browser. No JavaScript execution. No cookie. No pageview. From GA4’s perspective, that visit did not happen.

The problem compounds when real people arrive via AI. When a user clicks a link in a ChatGPT answer and lands on your site, GA4 often logs that session as Direct traffic. Platforms like Claude and Perplexity do not consistently pass referrer data. So you are not just missing bot visits. You are also miscounting the human visits that AI generated.

Over 51% of all internet traffic is now automated bots, according to 2024 web traffic analysis by Imperva. AI crawlers specifically account for over 4% of all web requests globally. That is not a rounding error. It is an entire traffic channel, and your current tools cannot see any of it.

How Big Is the Blind Spot in Practice?

Server-side middleware deployed on one website logged 754 AI bot requests in a single 24-hour window, from 9 different AI systems, with zero of those visits appearing in Google Analytics. ChatGPT alone accounted for 517 of those hits. Every one of them was a live retrieval event: a real person had asked ChatGPT a question, and ChatGPT fetched that website’s content to build its answer.

This was not a high-traffic enterprise website. It was a specialist agency site. If 754 invisible AI visits happened there in 24 hours, the same pattern is almost certainly playing out on your site right now. The question is not whether AI is reading your content. The question is whether you know what it is reading and what it is deciding.

What Does Ahrefs Brand Radar Actually Measure?

Ahrefs Brand Radar measures an estimated sample of brand mentions across six AI platforms by running automated queries and reading the text responses, but it detects approximately 2-3% of actual mentions. An independent comparison found that where Ahrefs reported 3 brand mentions on ChatGPT, the real number was 123. On Perplexity, Ahrefs found 6. The actual number was 212.

Ahrefs Brand Radar is still useful for benchmarking your brand against competitors and tracking directional trends. The problem comes when sellers treat it as a reliable measure of AI visibility for optimisation decisions. At 2-3% detection accuracy, it is not capable of driving meaningful optimisation work.

What Is the Dark Query Problem?

The dark query problem is the structural reason why keyword-sampling tools miss most AI retrieval events: when a user types a question into ChatGPT, the system breaks that question into multiple hidden sub-queries and retrieves content for each one separately, and around 88% of actual retrieval events come from these sub-queries that no human ever types. Ahrefs queries the AI platforms using the same visible queries humans type. It cannot access the sub-queries because they are generated internally by the AI and never exposed.

The dark query problem is not unique to Ahrefs. No keyword-based tool can solve it. It is a structural feature of how large language models retrieve information. The only way to see what AI systems are actually reading from your site is to log it at the server level, before the AI has processed anything.

What Are the Three Levels of AI Visibility Tracking?

The three levels of AI visibility tracking are Google Analytics, Ahrefs Brand Radar, and server-side middleware, and they differ significantly in what they can detect, what they miss, and what decisions they can reliably support.

What Does Level 1 — Google Analytics — Show You?

Google Analytics 4 tracks human visitors accurately and handles standard traffic analysis well, but it shows zero AI bot traffic because it was designed for human behaviour analytics and relies on JavaScript that AI bots never execute. Cost: free. What it misses: all AI bot traffic and a meaningful portion of AI-referred human traffic logged as Direct.

GA4 is not broken. It is doing its job. The limitation is that its job does not include the new traffic layer that AI systems represent. Continuing to rely on GA4 alone for visibility decisions is like measuring your total sales using only one of your marketplaces.

What Does Level 2 — Ahrefs Brand Radar — Show You?

Ahrefs Brand Radar provides estimated brand mention counts across six AI platforms and is useful for competitive benchmarking, but it misses around 97% of actual mentions and provides no page-level data about which content AI systems are reading. Cost: approximately 800 to 1,100 pounds per month for full coverage.

Brand Radar answers one useful question: how does your brand’s estimated AI mention share compare to competitors? For that narrow purpose, it has value. For understanding what content AI reads, which pages get cited, or where the funnel breaks, it provides nothing actionable.

What Does Level 3 — Server-Side Middleware — Show You?

Server-side middleware intercepts every request to your site at the server level, logging the exact bot, the exact page, the exact timestamp, and whether the request was a training crawl or a live retrieval event — with 100% deterministic accuracy and no sampling. This is the only tracking method that captures the complete picture.

The practical value goes beyond counting. With server-side tracking in place, you can run live tests. Type a query into ChatGPT that your site should answer. Check the server log immediately. Did ChatGPT fetch your page to build that answer? If yes, you know which page it pulled and can optimise it. If no, the funnel is broken and you know exactly where to investigate. That feedback loop does not exist at any other tracking level.

Which AI Systems Are Actually Crawling Your Site?

The AI systems most actively crawling websites in 2025 include ChatGPT, Meta AI, Amazon Q, TikTok AI, Claude, Apple AI, Perplexity, and Gemini, with ChatGPT and Meta AI accounting for the largest share of AI crawl traffic on most sites. The breakdown differs significantly from what most sellers would expect.

Why Is ChatGPT the Largest Source of AI Crawl Traffic?

ChatGPT generates the highest volume of live retrieval crawls because it fetches real-time content to answer user queries, rather than relying solely on its training data. In the 24-hour tracking window examined here, ChatGPT accounted for 517 of 754 total AI hits — 68.6% of all AI traffic.

The user agent for these requests was ChatGPT-User, which identifies live retrieval as opposed to training crawls. Every one of those 517 requests represents a real person asking ChatGPT a question and ChatGPT reading that website to build its answer. That is 517 opportunities to appear in a response, or to be absent from one.

Why Does Meta AI Crawl So Much More Than Most Sellers Realise?

Meta AI generates over half of all AI crawler traffic globally by volume, making it the most aggressive AI crawler on the web even though it receives far less attention than ChatGPT in discussions about AI search. In the tracking data examined, Meta AI produced 107 hits — the second largest category after ChatGPT.

Meta AI’s crawls appear to be broad discovery passes rather than targeted live retrievals. For sellers with product content, brand stories, or advertising guides on their sites, Meta AI is reading it. Whether that reading translates into recommendations is a separate question, but the crawl activity is real and significant.

What Is Amazon Q Doing on Your Website?

Amazon Q is Amazon’s own enterprise AI system, and it was found crawling an Amazon advertising agency’s website — specifically hitting PPC strategy pages, ad type breakdowns, budget management guides, and team member profiles. Amazon Q accounted for 86 hits, or 11.4% of total AI traffic in that 24-hour window.

The pattern of pages Amazon Q targeted is notable. It was not crawling the homepage or general brand content. It was reading technical advertising strategy pages. And it was reading team profiles, which suggests it is evaluating organisational credibility alongside content depth. For Amazon sellers with educational content about Amazon advertising on their websites, Amazon’s own AI is likely already reading it.

What Technical Problems Does AI Bot Tracking Expose?

Server-side AI tracking exposes technical problems that standard SEO audits typically miss, including URL fragmentation, stale index entries in Bing, and misrouted crawl budget — all of which reduce the accuracy and efficiency of how AI systems read your content. Volume data alone is the least valuable output of this tracking.

What Is URL Fragmentation and How Does It Hurt AI Citation?

URL fragmentation occurs when the same piece of content is accessible at multiple URL variants, splitting the authority signal that AI systems use to evaluate that content’s credibility. In the 24-hour tracking data, the top-performing page on one site was being crawled at three different URL variants: with a trailing slash, without a trailing slash, and as a lowercase variant.

AI platforms treat each URL as a separate page. A page with 38 total AI crawl hits split across three URLs sends a weaker authority signal than the same 38 hits concentrated on one canonical URL. The fix is direct: choose one URL format, implement 301 redirects from all variants, and add self-referencing canonical tags to every page. Research from Authoritas shows that pages with self-referencing canonical tags are 1.92 times more likely to be cited by AI systems.

Why Does Your Bing Index Affect ChatGPT’s Behaviour?

ChatGPT sources the URLs it fetches from Bing’s search index, which means any stale, broken, or duplicate URLs that Bing has indexed for your site will be visited by ChatGPT as if they are live content pages. In the tracking data, ChatGPT was fetching WordPress parameter URLs such as ?p=11077 — internal shortlinks that serve no user purpose and waste crawl budget.

Cleaning your Bing index is no longer just a secondary SEO task. For ChatGPT visibility specifically, Bing index hygiene is a direct optimisation lever. Submit a clean sitemap to Bing Webmaster Tools. Block parameter URLs in robots.txt. Remove outdated entries manually where needed. Each stale URL that ChatGPT fetches is a wasted retrieval opportunity that could have gone to a content page.

How Do Different AI Systems Actually Retrieve Content?

Different AI systems retrieve content in fundamentally different ways: ChatGPT and Claude fetch your live page at the moment of answering a query, while Perplexity and Gemini rely entirely on cached snapshots from Google and Bing and never visit your actual page. This distinction determines what you need to optimise and where.

What Matters for Live-Fetch AI Systems Like ChatGPT and Claude?

Live-fetch AI systems like ChatGPT (user-agent: ChatGPT-User) and Claude (user-agent: ClaudeBot for live queries) read the actual HTML of your page at the moment they build their answer, which means page content quality, heading structure, load speed, and technical accessibility all directly affect what they extract and cite. If your page takes more than a few seconds to respond, or if your content is buried in JavaScript components that render slowly, the AI may receive an incomplete or empty response.

For these platforms, the optimisation target is the real page: structured headings, direct answers near the top of each section, clean HTML, fast server response. The content the AI reads is the content your visitors see. Optimising for AI citation and optimising for human readability are the same task here.

What Matters for Index-Dependent AI Systems Like Perplexity and Gemini?

Index-dependent AI systems like Perplexity and Gemini read cached snapshots stored in Google and Bing’s indexes, not your live page, which means your SERP presence, metadata quality, and snippet optimisation are the primary levers for influencing how these platforms discover and represent your content. Improving your actual page content helps only when Google or Bing recrawls and updates the cached version.

This two-track reality is important for Amazon sellers running content programmes. Time and budget spent optimising page structure for ChatGPT has limited effect on Perplexity or Gemini visibility. And improvements to your Google rankings and snippet quality have limited effect on ChatGPT’s retrieval behaviour. Both tracks matter. Neither is a substitute for the other.

How Do AI Citation and Google Search Ranking Differ?

AI citation and Google search ranking are distinct channels that overlap but are not interchangeable: data shows that AI bots crawl many pages that Google ignores, and Google ranks many pages that AI bots have never visited. Treating them as equivalent leads to optimisation decisions that improve performance on one channel while doing nothing for the other.

In the tracking data reviewed, 191 pages were being actively crawled by AI bots with zero recorded Google search traffic. These were pages that Google had either not ranked or not indexed prominently, yet AI systems had found and were reading them. The reverse was also true: 14 pages with solid Google rankings had received zero AI bot visits.

ChatGPT cites approximately 7.8% of Google’s top 3 search results exactly, according to analysis by SparkToro. The two channels share some overlap, but the majority of AI citation activity happens outside Google’s top results entirely. A page can rank on page two of Google and still be a primary source for ChatGPT answers. A page can rank in Google’s top three and never appear in a ChatGPT response.

For Amazon sellers, this means the content programme that supports your AI visibility is not the same as your SEO programme, though the two reinforce each other. Content written for AI extraction — structured, direct, factually grounded, technically accessible — tends to rank well in Google too. But the priorities are different, and the measurement systems need to be separate.

What Should You Do First to Improve Your AI Visibility?

The highest-return starting point for improving AI visibility is implementing server-side request logging, which gives you the data needed for every subsequent decision. Without measurement, optimisation is guesswork. The fixes that follow from having real data are specific, testable, and achievable without large budgets.

What Are the Quick Technical Wins?

The three technical fixes with the highest immediate impact are: canonical tag standardisation across all pages, Bing Webmaster Tools cleanup and sitemap submission, and robots.txt updates to block parameter URLs. These changes directly affect how ChatGPT retrieves your content and how AI systems evaluate your pages’ authority signals.

Canonical standardisation is the fastest: choose one URL format for your entire site, implement 301 redirects from all variants, and add self-referencing canonical tags. Pages with correct canonical tags are 1.92 times more likely to get cited by AI, according to Authoritas research. That is a concrete, measurable improvement available to any site within a single sprint.

The Bing cleanup is equally direct. Log into Bing Webmaster Tools, review the indexed URLs for your domain, identify parameter URLs and outdated pages, and submit a clean sitemap. Block parameter URL patterns in robots.txt. The effect is that ChatGPT stops wasting retrieval budget on dead ends and reaches your actual content pages instead.

How Should You Structure Content for AI Extraction?

Content structured for AI extraction uses question-format headings, direct answers in the first two sentences of each section, short paragraphs of two to four sentences, and standalone factual statements that make sense without surrounding context. AI systems extract from the opening of each section and from sentences that are self-contained enough to quote.

For Amazon sellers, this means converting existing pillar content — ad type guides, budget management explainers, listing optimisation frameworks — into structures where every section header is a question a potential customer might ask, and the first sentence answers it directly. The content you already have may be valuable. The structure may be the problem.

Every statistic should include a named source. Every claim about platform behaviour should reference specific data or named research. AI systems cross-reference claims against multiple sources and deprioritise unsupported assertions. Content that cites Amazon’s Q4 earnings report or a named industry study performs better in AI extraction than content that says the same thing without attribution.

What Happens to the Brands That Start Measuring Now?

The brands that implement AI visibility tracking in 2025 gain a compounding advantage: each month of data reveals new optimisation opportunities, while competitors are still making decisions based on analytics tools that cannot see the channel at all. The gap between measured and unmeasured brands widens over time, not stays constant.

754 invisible visits in 24 hours on one website. That is not a future problem waiting to arrive. It is the current state. AI systems are already choosing which brands to recommend when your customers ask questions about your category. The brands showing up consistently in those answers have not necessarily done anything radically different. Some have simply fixed their canonical tags, cleaned up their Bing index, and made their content readable at the server level.

The measurement problem is solvable. The optimisation work that follows is concrete. And the window to build an early advantage over competitors who are still relying on GA4 alone is open right now.

AI bots from ChatGPT, Claude, and Meta are crawling your website daily — and Google Analytics records none of it. Here is how to measure and optimise your AI search visibility.

See how we can help you maximise revenue from your ad spend

Scroll to Top