Is Your robots.txt Blocking AI Crawlers?

Enter any domain to see which AI bots can access your content.

Checking...

Something went wrong. Please try again.

What Is robots.txt and How Do AI Crawlers Use It?

A robots.txt file is a plain text file at the root of your website (yourdomain.com/robots.txt) that tells web crawlers which pages they can and can't visit. It was originally built for search engine bots like Googlebot, but it now controls access for a growing number of AI crawlers — including GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and many others.

When any AI crawler visits your site, the first thing it does is read your robots.txt file. If it finds a Disallow: / rule for its user-agent, it won't crawl your content. That means your pages won't show up in AI-generated answers, won't be included in model training data, and won't appear in AI-powered search results.

The problem is that most website owners have no idea what their robots.txt says about AI bots — or whether it says anything at all. Many sites are either accidentally blocking AI crawlers they want to allow, or wide open to bots they'd rather keep out. This AI robots.txt checker lets you find out in seconds which AI bots can currently access your site.

AI Training Bots vs. AI Search Bots — The Key Difference

Not all AI crawlers do the same thing. Before you decide what to block or allow, it helps to understand the two main types.

Training Bots

Training bots scrape your content to build datasets for AI model training. Once your content enters the training data, it gets blended into the model's knowledge — future answers are generated from it without direct attribution or a link back to your page.

Common training bots include GPTBot (OpenAI), CCBot (Common Crawl), Google-Extended (Google), Bytespider (ByteDance/TikTok), and Meta-ExternalAgent (Meta).

AI Search Bots

AI search bots fetch your content in real time when a user asks a question. They pull relevant information from your page and typically cite you as the source with a direct link. This is essentially referral traffic from AI search engines.

Common AI search bots include PerplexityBot (Perplexity), OAI-SearchBot (ChatGPT Search), Applebot-Extended (Apple/Siri), and ChatGPT-User (activated when a user explicitly browses a link in ChatGPT).

What This Means for Your Strategy

For most websites, the smart move is to allow AI search bots (so you get cited and linked when AI platforms answer questions your content covers) while making an informed decision about training bots based on your content licensing and business goals. Some publishers block all training bots but keep search bots open. Others allow everything to maximize exposure.

There's no universally right answer — but you need to know where you stand. Use this checker to see exactly which bots your current robots.txt blocks or allows.

Complete List of AI Crawlers Checked

This tool checks your robots.txt against the following AI bots. The list is updated as new crawlers emerge.

Bot Name Operator Type What It Does
GPTBot OpenAI Training Collects data for training GPT models
ChatGPT-User OpenAI Search/Browse Fetches pages when users click links in ChatGPT
OAI-SearchBot OpenAI Search Indexes content for ChatGPT Search results
ClaudeBot Anthropic Training Collects data for training Claude models
anthropic-ai Anthropic Training Older Anthropic training crawler
PerplexityBot Perplexity Search Fetches content for real-time AI search answers
Google-Extended Google Training Controls use of content for Gemini/AI training
Applebot-Extended Apple Search Powers AI features in Siri and Apple Intelligence
CCBot Common Crawl Training Open dataset used by many AI companies for training
Bytespider ByteDance Training Collects data for TikTok/ByteDance AI models
Meta-ExternalAgent Meta Training Collects data for Meta's AI products
Amazonbot Amazon Search/Training Powers Alexa and Amazon AI features
cohere-ai Cohere Training Collects data for Cohere language models
DuckAssistBot DuckDuckGo Search Powers DuckDuckGo's AI-assisted answers
YouBot You.com Search Indexes content for You.com AI search

How to Block or Allow Specific AI Crawlers

Once you've checked your robots.txt with this tool, you might want to make changes. Here's how.

Block a Specific AI Bot

Add these lines to your robots.txt file to prevent a specific crawler from accessing your site:

User-agent: GPTBot
Disallow: /

Replace GPTBot with any bot name from the list above. Each bot needs its own block — you can't combine them on one line.

Allow AI Search Bots but Block Training Bots

This is the most popular configuration for publishers who want AI visibility without contributing training data:

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

# Allow AI search crawlers (for citations and referral traffic)
User-agent: PerplexityBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Applebot-Extended
Allow: /

Block All AI Crawlers

If you want to prevent any AI bot from crawling your content, you'll need to list each one individually. There is no single wildcard that targets only AI bots. See the full list above and add a Disallow: / block for each.

After Making Changes

Save your robots.txt file and come back to this tool to verify the changes took effect. Bots will respect the updated rules on their next visit — there's no way to force an immediate recrawl.

Checking Your AI Visibility Beyond robots.txt

Your robots.txt file is the first thing AI crawlers check, but it's not the only factor that determines whether your content shows up in AI-generated answers. A complete AI visibility audit also looks at:

  • Structured data (Schema markup): AI systems rely on structured data to understand what your page is about. Proper schema markup helps bots extract accurate information from your content.
  • Semantic HTML and heading structure: Clear, hierarchical headings (H1, H2, H3) help AI crawlers parse the structure of your content and pull relevant sections for answers.
  • Content clarity and answer-readiness: AI search engines prefer content that directly answers questions in a clear, concise format. Pages structured around specific questions tend to get cited more often.
  • Crawl accessibility: Beyond robots.txt, issues like slow load times, JavaScript rendering requirements, or authentication walls can prevent AI bots from accessing your content.

These factors together determine your AEO (Answer Engine Optimization) score — a measure of how ready your site is to appear in AI-generated answers across platforms like ChatGPT, Perplexity, Google AI Overviews, and Claude.

Want a full AI visibility check that covers all of this? Install the AI Visibility Tool Chrome extension for a complete analysis.

Frequently Asked Questions

What is an AI robots.txt checker?

An AI robots.txt checker is a tool that analyzes your website's robots.txt file specifically for AI crawler directives. Unlike general robots.txt testers that focus on search engine bots like Googlebot, an AI-focused checker shows you which AI bots — such as GPTBot, ClaudeBot, and PerplexityBot — are allowed or blocked from crawling your content.

How do I know if my site is visible to AI?

The quickest way is to enter your domain in the checker above. It will fetch your robots.txt file and show you the status of every major AI crawler. If a bot shows as "blocked," your content won't appear in that platform's AI-generated answers. For a deeper check that goes beyond robots.txt, use the AI Visibility Tool Chrome extension.

Does blocking AI crawlers affect my Google rankings?

No. Blocking AI-specific crawlers like GPTBot, ClaudeBot, or CCBot has no impact on your traditional Google search rankings. These bots are separate from Googlebot, which handles search indexing. However, blocking Google-Extended may prevent your content from appearing in Google's AI Overviews.

Should I block all AI crawlers?

It depends on your goals. If you want maximum exposure and don't mind your content being used for AI training, allow everything. If you want to be cited in AI search results but don't want to contribute training data, allow search bots (PerplexityBot, OAI-SearchBot) and block training bots (GPTBot, CCBot). If you want to keep your content entirely out of AI systems, block all of them — but know that you'll miss out on a growing source of referral traffic.

Do AI crawlers respect robots.txt?

Major AI crawlers from established companies — OpenAI, Anthropic, Google, Perplexity, Apple — do respect robots.txt directives. However, robots.txt is a voluntary protocol, not a technical barrier. Some lesser-known or rogue scrapers may ignore it. For stronger protection, you can combine robots.txt rules with server-level blocking using IP ranges or a service like Cloudflare.

How often should I check my robots.txt for AI bots?

At least quarterly. The AI crawler landscape changes fast — new bots launch regularly, and companies sometimes rename or restructure their crawlers (OpenAI alone operates three different bots). A configuration that was correct six months ago might be missing newly introduced crawlers.

By Ruslan S. Senior Software Engineer

Want a Full AI Visibility Check?

Go beyond robots.txt. Analyze structured data, semantic headings, crawlability, and more.

Add to Chrome — It's Free