Test your robots.txt rules and see which crawlers are allowed on which paths.
Build a valid robots.txt from scratch, or validate any site's existing file for errors and Googlebot issues.
Robots.txt is a plain text file at the root of a domain that tells crawlers which paths they are allowed to access. Every major search engine and AI crawler checks this file before fetching pages. A Disallow rule for a path means the crawler skips it entirely. An Allow rule overrides a broader Disallow.
The file is not a security measure. It controls crawler behaviour by convention, not by enforcing access restrictions. Malicious bots ignore it. For legitimate crawlers it is the primary signal for what to index and what to skip.
| Crawler | Company | What it affects |
|---|---|---|
| Googlebot | Google Search rankings and AI Overviews | |
| Bingbot | Microsoft | Bing Search and Copilot answers |
| GPTBot | OpenAI | ChatGPT training and search citations |
| OAI-SearchBot | OpenAI | ChatGPT real-time search |
| ClaudeBot | Anthropic | Claude training data |
| Claude-SearchBot | Anthropic | Claude real-time search |
| PerplexityBot | Perplexity | Perplexity AI answer citations |
Blocking the entire site with Disallow: / under User-agent: * is the most common and destructive error. It happens during site migrations when developers enable a maintenance-mode robots.txt and forget to remove it after launch.
The second most common issue is blocking CSS and JavaScript files. Google needs to render pages to understand them. If Googlebot cannot load your stylesheets or scripts, it sees a broken page and may rank it lower as a result.
Not declaring your sitemap is a quieter problem. Most crawlers look for a Sitemap: directive in robots.txt as their first way to discover new pages. Missing it means crawlers have to find your sitemap some other way, which slows down new page discovery.
The tool fetches your robots.txt file and tests a URL against the rules for each major crawler. It shows whether Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, and others are allowed or blocked for the path you specify. It also checks whether your sitemap is declared in the robots.txt file.
Free, with no account required. Enter any domain and URL path to test crawler access. No usage limits.
That depends on your goals. Blocking AI crawlers prevents your content from being used in AI training and from being cited in AI-generated answers. If visibility in tools like Perplexity or ChatGPT matters to your business, allowing these crawlers is the right call. If your content is proprietary and you want to keep it out of AI systems, blocking is reasonable. Either way, make sure your robots.txt reflects your actual intent.
Robots.txt controls which pages crawlers can access. Disallowing a page does not directly lower its ranking signal, but a crawler that cannot fetch a page cannot index it either. An unindexed page cannot rank. Misconfigured robots.txt files are a common cause of sudden ranking drops, usually because a developer accidentally blocked important paths during a site update.
Your content will not be used in training data for those AI systems, and those systems will not be able to cite your pages in their answers. For most businesses this means lower visibility in AI search results like Perplexity and ChatGPT. Google's AI features (AI Overviews) use Googlebot, not GPTBot, so blocking GPTBot does not affect Google's own AI summaries.