Sitemap Keyword Filter

Free

Pull every URL from a sitemap, keep only the ones that match your keywords, and get a Search Console regex for free.

Sitemap Keyword Filter tool

1 · Source

2 · Keywords

Results will appear here.
Run the tool to see matching URLs.

The Sitemap Keyword Filter turns a sprawling XML sitemap into a focused list of the URLs you actually care about, and hands you a Search Console regex to match them. It’s the quickest way to scope a content audit, a migration, or a performance deep-dive to a single section of a site.

Step by step

How to use it

  1. Find your sitemap

    Most sites expose one at /sitemap.xml or /sitemap_index.xml. Check robots.txt, where the Sitemap directive lists the canonical location. Large sites usually publish an index that points at many child sitemaps.

  2. Add your source

    Paste a single sitemap index URL and the tool expands every child sitemap for you, or drop in up to six individual sitemap URLs, one per line. No network handy? Switch to the Paste XML tab and the tool runs entirely in your browser.

  3. Enter keywords

    Type comma-separated keywords or path fragments, for example "/blog/, guide, 2026". Toggle "Match all" to require every keyword (AND) instead of any (OR), and "Case-sensitive" when casing matters.

  4. Filter and export

    Run the tool to get the matching URLs in the output panel. Switch to the GSC regex tab for a regular expression that selects exactly those pages, then copy either output with one click.

Important. The tool fetches sitemaps through a small server-side proxy so your browser never hits a CORS wall. Nothing you enter is stored or logged. The proxy reads the sitemap and streams it straight back.

Background

What makes this hard

  1. Browsers block cross-origin requests, so a pure client-side tool can't fetch another site's sitemap directly.

    How the tool handles it Requests are routed through a lightweight same-origin proxy (a Cloudflare Pages Function in production) that fetches the XML and returns it, with private/loopback hosts blocked and an 8 MB cap.

  2. Sitemap indexes nest. An index points at sitemaps that may point at more sitemaps, so a naive fetch only sees a list of files, not pages.

    How the tool handles it Give it the index URL and it walks the tree, de-duplicating as it goes, until it has every <loc> URL. A safety budget stops runaway crawls.

  3. Search Console's regex filter uses RE2 and rejects unescaped special characters, so a hand-built pattern often silently matches nothing.

    How the tool handles it The generated regex escapes every reserved character and alternates exact path strings, so it pastes into GSC and works first time.

Options & methods

Ways to feed it

Most common

Single sitemap index

Point the tool at one index URL and let it expand everything underneath. Best for most sites.

https://example.com/sitemap_index.xml
Large sites

Up to six specific sitemaps

When an index lists dozens of child sitemaps but you only care about a few sections, paste just those, one per line.

https://example.com/post-sitemap.xml
https://example.com/product-sitemap.xml
https://example.com/category-sitemap.xml
Offline

Paste raw XML

No proxy, no network. Paste a <urlset> document and the filtering happens entirely client-side. Handy behind a firewall or for ad-hoc exports.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url><loc>https://example.com/blog/post/</loc></url>
</urlset>

Pitfalls

Common mistakes

Mistake How to fix it
Filtering on a sitemap index URL and getting nothing. An index contains <sitemap> entries, not <url> entries. Use the From URLs tab so the tool expands the index, or paste a child <urlset> directly.
Pasting a Search Console regex that matches zero rows. GSC uses RE2, which is anchored and case-sensitive by default and needs special characters escaped. Use this tool's generated regex, which handles the escaping for you.
A huge result set exceeds the GSC filter limit. Search Console caps regex filters near 4,096 characters. Narrow your keywords, or filter by a path prefix, so the alternation stays short.
Keywords matching the domain instead of the path. Keyword matching runs against the full URL. Include a leading slash (/blog/) to pin matches to the path and avoid hitting the hostname.

Use cases

When you need this

  • Auditing which URLs in a section are actually submitted in the sitemap.
  • Building a Search Console regex to analyse performance for one content type.
  • Pulling a clean list of product or blog URLs for a crawl or a content audit.
  • Spotting orphaned or unexpected URLs before a migration.
  • Handing a developer an exact list of pages to redirect or noindex.

FAQ

Questions

Is the Sitemap Keyword Filter free?

Yes, completely free, no account, no login, no limits beyond a sensible 8 MB per-sitemap cap. It runs in your browser with a thin proxy only used to fetch the XML.

Do you store the sitemaps or URLs I enter?

No. The proxy fetches the sitemap and streams it back without writing it anywhere, and the filtering happens in your browser. Nothing is logged or retained.

How many sitemap URLs can I add?

Up to six individual sitemap URLs, or a single sitemap index URL which the tool expands automatically into all of its child sitemaps.

Why does my Search Console regex match nothing?

Search Console uses RE2, which needs special characters escaped and is case-sensitive by default. Copy the regex this tool generates, which escapes everything and alternates exact paths, so it works on the first paste.

Can I use it without an internet connection?

Yes. Switch to the Paste XML tab and paste a urlset document. Filtering and regex generation then run entirely client-side with no network request at all.

Go beyond

All tools →

More free, no-login SEO utilities, built the same way, fast and private.

Sources

Authoritative references