Tracking published crawler IP ranges

Simon Thompson

July 19, 2025

A key part of ensuring a site's online visibility is making sure that good crawlers (e.g. Googlebot, Bingbot) aren't being blocked by rate limits or bot protections in your CDNs or other security systems.

Alongside a reliable regular expression to verify distinct user agents, it's important to maintain an up-to-date list of IP ranges that each crawler operates from. These ranges change frequently, so keeping track of them all can prove tricky.

Fortunately, this is becoming simpler thanks to a de-facto standard emerging, with the major crawlers publishing their IP prefixes in a common JSON format. For instance, here's a sample of what Google provides for Googlebot:

{
  "creationTime": "2025-07-18T14:46:17.000000",
  "prefixes": [
    { "ipv6Prefix": "2001:4860:4801:10::/64" },
    { "ipv4Prefix": "192.178.4.0/27" }
    /** Truncated **/
  ]
}

These feeds can be periodically consumed by your security systems and/or tooling to generate a fresh allowlist and avoid good bots being blocked.

To help keep track of changes to these feeds, I've set up a repo which uses Simon Willison's git scraping technique to track any changes for each feed into the ./src/ directory, plus generate a combined feed for all sources.

Below is a list of the feeds being tracked at the time of writing:

Vendor	Source
Google	googlebot.json
Google	special-crawlers.json
Google	user-triggered-fetchers.json
Google	user-triggered-fetchers-google.json
OpenAI	searchbot.json
OpenAI	chatgpt-user.json
OpenAI	gptbot.json
Perplexity	perplexitybot.json
Perplexity	perplexity-user.json
Microsoft	bingbot.json
DuckDuckGo	duckduckbot.json
DuckDuckGo	duckassistbot.json
Apple	applebot.json
Mistral	mistralai-user-ips.json
CommonCrawl	ccbot.json

You can view the repository using the link below:

View Repository

If you have any questions, please feel free to reach out!