Robots.txt Generator

Create robots.txt files to control how search engine crawlers access your website. Set rules for specific user agents, allow or disallow paths, and add your sitemap URL.

Rule 1

Generated robots.txt

User-agent: *
Disallow:

Instructions

  1. Copy the generated content above
  2. Create a file named robots.txt
  3. Paste the content and save
  4. Upload to your website's root directory

Robots.txt is a polite request, not a lock. Write it like you mean it.

Robots.txt is the most misunderstood file in SEO. It does not hide pages from Google. It does not stop indexing. It does not protect anything sensitive. All it does is ask well-behaved crawlers to skip certain paths. Google, Bing, and most legitimate bots honor it. Malicious scrapers and a long tail of AI training crawlers either ignore it or read it for reconnaissance. If you treat robots.txt as security, you will eventually find your "blocked" admin URLs in someone's leaked dataset.

This tool builds a syntactically correct robots.txt with the rules you actually need: per-user-agent Allow and Disallow lines, a Sitemap directive, and optional blocks for AI training crawlers like GPTBot, ClaudeBot, Google-Extended, PerplexityBot, and CCBot. The goal is a file that's short, intentional, and easy for the next person to read. Robots files rot when they're inherited — the rule from a 2014 plugin nobody remembers is usually the rule causing the current crawl problem.

The single most important thing to understand: Disallow blocks crawling, not indexing. A URL blocked in robots.txt can still appear in search results if other sites link to it, because Google indexes the URL based on those external signals even though it never fetched the page. The result is a search snippet with no description and a frustrated site owner. If you actually want a page out of the index, use a meta robots noindex tag and let Google crawl the page to see it. Block crawling and noindex are not interchangeable.

The other mistake people inherit from old advice: blocking JavaScript and CSS. There was a brief era around 2010-2013 when blocking /wp-includes/ or asset directories looked clever. Google now renders pages, and blocking the assets it needs to render means Google sees a broken layout — sometimes interpreting your responsive site as non-mobile-friendly. Leave JS and CSS crawlable unless you have a specific, documented reason not to.

AI crawler control is the new layer. You can block GPTBot, ClaudeBot, PerplexityBot, CCBot, and Google-Extended from training on your content while still letting Googlebot crawl normally for search. Whether you should block them is a business decision, not a technical one — blocking AI training preserves your content from being learned but doesn't affect whether your site shows up in AI search answers, which often use live retrieval bots with different user-agent strings. Pick a policy on purpose, not by copying someone's blog post.

When the Robots.txt Generator is the right tool

How to use the Robots.txt Generator

Create a valid robots.txt tailored to your site in under a minute.

  1. Pick crawler rules

    Choose which bots to allow or block and specify path-level rules.

  2. Add your sitemap URL

    Reference your sitemap.xml so crawlers can discover it via robots.txt.

  3. Copy and deploy

    Copy the generated file and upload it to /robots.txt at the root of your domain.

Mistakes we see all the time

Robots.txt Generator — Frequently Asked Questions

What does this robots.txt generator do?
It builds a syntactically valid robots.txt with allow/disallow rules, sitemap references, and crawl-delay directives.
Can I target specific bots like Googlebot or GPTBot?
Yes — add per-user-agent blocks for Googlebot, Bingbot, GPTBot, ClaudeBot, and others.
Will it validate my output?
Yes. The generator flags conflicting rules and unreachable paths before you copy the file.
Does the order of rules in robots.txt matter?
For Google, the most specific matching rule wins, not the first one listed. For some older crawlers, order still matters. The safest approach is to keep rules per User-agent block clear and non-overlapping so behavior is predictable across every crawler.
Can I have multiple User-agent blocks for the same bot?
You can, but you shouldn't. Consolidate rules under a single User-agent block per bot. Fragmenting rules makes the file harder to audit and increases the chance of contradictions that different crawlers resolve differently.
What happens if robots.txt returns a 5xx error?
Google interprets a sustained 5xx as "site temporarily unavailable" and may stop crawling until it resolves. A 404 is treated as "no restrictions, crawl everything." Make sure your robots.txt is reliably reachable and returns a 200.
Does robots.txt apply to subdomains?
No. Each subdomain needs its own robots.txt. The file at example.com/robots.txt does not govern blog.example.com. This trips up a lot of teams running marketing sites and apps on separate subdomains.
Should I block AI bots like GPTBot and ClaudeBot?
That's a content licensing call, not an SEO one. Blocking them stops your content from being used in training datasets for the major LLMs. It doesn't stop your site from appearing in AI search answers, which usually use different live-retrieval user agents.
What's the maximum file size for robots.txt?
Google enforces a 500 KiB limit. Anything beyond that gets truncated and the rest is ignored. If your robots.txt is anywhere near that size, you've almost certainly got rules that belong somewhere else, like a Disallow list that should be a noindex pattern.
Can I use wildcards and end-of-URL matching?
Yes — * matches any sequence of characters and $ anchors to the end of the URL. So Disallow: /*.pdf$ blocks PDFs. Most major crawlers including Googlebot and Bingbot support both. Older or obscure bots may not, so don't rely on wildcards for anything critical.

Robots.txt is a small file with big consequences when it goes wrong. Keep it short, keep it intentional, and review it any time you change CMS, hosting, or site structure. Every line should be one you could explain to a teammate without checking a blog post. If you can't defend a rule, it doesn't belong in the file.

Related free SEO tools