Sitemap Keyword Filter
FreePull every URL from a sitemap, keep only the ones that match your keywords, and get a Search Console regex for free.
Sitemap Keyword Filter tool
Run the tool to see matching URLs. Run the tool to build a regex. GSC uses RE2 and caps filters at ~4,096 characters. For very large result sets, narrow your keywords first.
The Sitemap Keyword Filter turns a sprawling XML sitemap into a focused list of the URLs you actually care about, and hands you a Search Console regex to match them. It’s the quickest way to scope a content audit, a migration, or a performance deep-dive to a single section of a site.
Step by step
How to use it
-
Find your sitemap
Most sites expose one at /sitemap.xml or /sitemap_index.xml. Check robots.txt, where the Sitemap directive lists the canonical location. Large sites usually publish an index that points at many child sitemaps.
-
Add your source
Paste a single sitemap index URL and the tool expands every child sitemap for you, or drop in up to six individual sitemap URLs, one per line. No network handy? Switch to the Paste XML tab and the tool runs entirely in your browser.
-
Enter keywords
Type comma-separated keywords or path fragments, for example "/blog/, guide, 2026". Toggle "Match all" to require every keyword (AND) instead of any (OR), and "Case-sensitive" when casing matters.
-
Filter and export
Run the tool to get the matching URLs in the output panel. Switch to the GSC regex tab for a regular expression that selects exactly those pages, then copy either output with one click.
Background
What makes this hard
-
Browsers block cross-origin requests, so a pure client-side tool can't fetch another site's sitemap directly.
How the tool handles it Requests are routed through a lightweight same-origin proxy (a Cloudflare Pages Function in production) that fetches the XML and returns it, with private/loopback hosts blocked and an 8 MB cap.
-
Sitemap indexes nest. An index points at sitemaps that may point at more sitemaps, so a naive fetch only sees a list of files, not pages.
How the tool handles it Give it the index URL and it walks the tree, de-duplicating as it goes, until it has every <loc> URL. A safety budget stops runaway crawls.
-
Search Console's regex filter uses RE2 and rejects unescaped special characters, so a hand-built pattern often silently matches nothing.
How the tool handles it The generated regex escapes every reserved character and alternates exact path strings, so it pastes into GSC and works first time.
Options & methods
Ways to feed it
Single sitemap index
Point the tool at one index URL and let it expand everything underneath. Best for most sites.
https://example.com/sitemap_index.xml Up to six specific sitemaps
When an index lists dozens of child sitemaps but you only care about a few sections, paste just those, one per line.
https://example.com/post-sitemap.xml
https://example.com/product-sitemap.xml
https://example.com/category-sitemap.xml Paste raw XML
No proxy, no network. Paste a <urlset> document and the filtering happens entirely client-side. Handy behind a firewall or for ad-hoc exports.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://example.com/blog/post/</loc></url>
</urlset> Pitfalls
Common mistakes
| Mistake | How to fix it |
|---|---|
| Filtering on a sitemap index URL and getting nothing. | An index contains <sitemap> entries, not <url> entries. Use the From URLs tab so the tool expands the index, or paste a child <urlset> directly. |
| Pasting a Search Console regex that matches zero rows. | GSC uses RE2, which is anchored and case-sensitive by default and needs special characters escaped. Use this tool's generated regex, which handles the escaping for you. |
| A huge result set exceeds the GSC filter limit. | Search Console caps regex filters near 4,096 characters. Narrow your keywords, or filter by a path prefix, so the alternation stays short. |
| Keywords matching the domain instead of the path. | Keyword matching runs against the full URL. Include a leading slash (/blog/) to pin matches to the path and avoid hitting the hostname. |
Use cases
When you need this
- Auditing which URLs in a section are actually submitted in the sitemap.
- Building a Search Console regex to analyse performance for one content type.
- Pulling a clean list of product or blog URLs for a crawl or a content audit.
- Spotting orphaned or unexpected URLs before a migration.
- Handing a developer an exact list of pages to redirect or noindex.
FAQ
Questions
Is the Sitemap Keyword Filter free?
Yes, completely free, no account, no login, no limits beyond a sensible 8 MB per-sitemap cap. It runs in your browser with a thin proxy only used to fetch the XML.
Do you store the sitemaps or URLs I enter?
No. The proxy fetches the sitemap and streams it back without writing it anywhere, and the filtering happens in your browser. Nothing is logged or retained.
How many sitemap URLs can I add?
Up to six individual sitemap URLs, or a single sitemap index URL which the tool expands automatically into all of its child sitemaps.
Why does my Search Console regex match nothing?
Search Console uses RE2, which needs special characters escaped and is case-sensitive by default. Copy the regex this tool generates, which escapes everything and alternates exact paths, so it works on the first paste.
Can I use it without an internet connection?
Yes. Switch to the Paste XML tab and paste a urlset document. Filtering and regex generation then run entirely client-side with no network request at all.
Go beyond
All tools →
More free, no-login SEO utilities, built the same way, fast and private.
Sources