Robots.txt Generator

Create robots.txt files to control how search engine crawlers access your website. Set rules for specific user agents, allow or disallow paths, and add your sitemap URL.

Rule 1

User-agent

Disallow Paths

Allow Paths

Sitemap URL

Generated robots.txt

User-agent: *
Disallow:

Instructions

Copy the generated content above
Create a file named robots.txt
Paste the content and save
Upload to your website's root directory

Robots.txt is a polite request, not a lock. Write it like you mean it.

Robots.txt is the most misunderstood file in SEO. It does not hide pages from Google. It does not stop indexing. It does not protect anything sensitive. All it does is ask well-behaved crawlers to skip certain paths. Google, Bing, and most legitimate bots honor it. Malicious scrapers and a long tail of AI training crawlers either ignore it or read it for reconnaissance. If you treat robots.txt as security, you will eventually find your "blocked" admin URLs in someone's leaked dataset.

This tool builds a syntactically correct robots.txt with the rules you actually need: per-user-agent Allow and Disallow lines, a Sitemap directive, and optional blocks for AI training crawlers like GPTBot, ClaudeBot, Google-Extended, PerplexityBot, and CCBot. The goal is a file that's short, intentional, and easy for the next person to read. Robots files rot when they're inherited: the rule from a 2014 plugin nobody remembers is usually the rule causing the current crawl problem.

The single most important thing to understand: Disallow blocks crawling, not indexing. A URL blocked in robots.txt can still appear in search results if other sites link to it, because Google indexes the URL based on those external signals even though it never fetched the page. The result is a search snippet with no description and a frustrated site owner. If you actually want a page out of the index, use a meta robots noindex tag and let Google crawl the page to see it. Block crawling and noindex are not interchangeable.

The other mistake people inherit from old advice: blocking JavaScript and CSS. There was a brief era around 2010-2013 when blocking /wp-includes/ or asset directories looked clever. Google now renders pages, and blocking the assets it needs to render means Google sees a broken layout, sometimes interpreting your responsive site as non-mobile-friendly. Leave JS and CSS crawlable unless you have a specific, documented reason not to.

AI crawler control is the new layer. You can block GPTBot, ClaudeBot, PerplexityBot, CCBot, and Google-Extended from training on your content while still letting Googlebot crawl normally for search. Whether you should block them is a business decision, not a technical one. Blocking AI training preserves your content from being learned but doesn't affect whether your site shows up in AI search answers, which often use live retrieval bots with different user-agent strings. Pick a policy on purpose, not by copying someone's blog post.

When the Robots.txt Generator is the right tool

You have private or staging paths to keep out of crawls

Admin areas, internal search, checkout funnels, draft URLs: these waste crawl budget and add no organic value. A Disallow keeps cooperative bots away. Pair it with proper authentication if the content is actually sensitive.

You want to opt out of AI training on your content

Add User-agent blocks for GPTBot, ClaudeBot, Google-Extended, CCBot, and PerplexityBot. They mostly honor it. The smaller AI scrapers don't, and that's a separate problem you solve with WAF rules, not robots.txt.

You're reorganizing a large site and want to throttle crawl pressure

During a migration, you can temporarily block low-value sections so Google spends its crawl budget on the URLs that actually matter. Remove the rules once the dust settles. Don't leave temporary blocks in place for years.

You need to advertise your sitemap

Adding a Sitemap: line at the bottom is the most universally supported way to point any crawler at your XML sitemap, including the ones without Search Console-style submission flows.

You inherited a robots.txt and have no idea what it does

Generate a clean replacement from scratch with only the rules you can explain out loud. If a Disallow exists but nobody can defend it, it shouldn't survive the rewrite.

How to use the Robots.txt Generator

Create a valid robots.txt tailored to your site in under a minute.

Pick crawler rules

Choose which bots to allow or block and specify path-level rules.

Add your sitemap URL

Reference your sitemap.xml so crawlers can discover it via robots.txt.

Copy and deploy

Copy the generated file and upload it to /robots.txt at the root of your domain.

Mistakes we see all the time

Treating Disallow as a way to hide pages

Disallow blocks crawling. The URL can still get indexed from external links, appearing as a bare title with no description. If you want a page out of the index, allow it to be crawled and use meta robots noindex on the page itself.

Blocking /wp-content/, /assets/, or any JS/CSS directory

Modern Google renders pages. Blocking assets gives it a broken view of your site, which can hurt mobile-friendliness scores and structured data parsing. This is leftover advice from a different decade. Delete it.

Disallowing /, then wondering why nothing is indexed

A single User-agent: * with Disallow: / blocks the entire site from cooperative crawlers. It happens more often than it should, usually because a staging robots.txt got pushed to production. Always diff before deploying.

Trying to noindex via robots.txt

Google removed support for the unofficial Noindex: directive in robots.txt in 2019. If you see it in your file, it does nothing. Move the directive to a meta tag or X-Robots-Tag header on the affected URLs.

Robots.txt Generator — Frequently Asked Questions

What does this robots.txt generator do?

It builds a syntactically valid robots.txt with allow/disallow rules, sitemap references, and crawl-delay directives.

Can I target specific bots like Googlebot or GPTBot?

Yes — add per-user-agent blocks for Googlebot, Bingbot, GPTBot, ClaudeBot, and others.

Will it validate my output?

Yes. The generator flags conflicting rules and unreachable paths before you copy the file.

Does the order of rules in robots.txt matter?

For Google, the most specific matching rule wins, not the first one listed. For some older crawlers, order still matters. The safest approach is to keep rules per User-agent block clear and non-overlapping so behavior is predictable across every crawler.

Can I have multiple User-agent blocks for the same bot?

You can, but you shouldn't. Consolidate rules under a single User-agent block per bot. Fragmenting rules makes the file harder to audit and increases the chance of contradictions that different crawlers resolve differently.

What happens if robots.txt returns a 5xx error?

Google interprets a sustained 5xx as "site temporarily unavailable" and may stop crawling until it resolves. A 404 is treated as "no restrictions, crawl everything." Make sure your robots.txt is reliably reachable and returns a 200.

Does robots.txt apply to subdomains?

No. Each subdomain needs its own robots.txt. The file at example.com/robots.txt does not govern blog.example.com. This trips up a lot of teams running marketing sites and apps on separate subdomains.

Should I block AI bots like GPTBot and ClaudeBot?

That's a content licensing call, not an SEO one. Blocking them stops your content from being used in training datasets for the major LLMs. It doesn't stop your site from appearing in AI search answers, which usually use different live-retrieval user agents.

What's the maximum file size for robots.txt?

Google enforces a 500 KiB limit. Anything beyond that gets truncated and the rest is ignored. If your robots.txt is anywhere near that size, you've almost certainly got rules that belong somewhere else, like a Disallow list that should be a noindex pattern.

Can I use wildcards and end-of-URL matching?

Yes. The * matches any sequence of characters and $ anchors to the end of the URL. So Disallow: /*.pdf$ blocks PDFs. Most major crawlers including Googlebot and Bingbot support both. Older or obscure bots may not, so don't rely on wildcards for anything critical.

Robots.txt is a small file with big consequences when it goes wrong. Keep it short, keep it intentional, and review it any time you change CMS, hosting, or site structure. Every line should be one you could explain to a teammate without checking a blog post. If you can't defend a rule, it doesn't belong in the file.