Robots.txt
Traditional SEO
Robots.txt is a plain text file placed at the root of a website that tells search engine crawlers which pages or sections they are allowed or not allowed to access, giving site owners control over what gets indexed.
Definition
Robots.txt is a plain text file placed at the root of a website that tells search engine crawlers which pages or sections they are allowed or not allowed to access, giving site owners control over what gets indexed. It is found at https://yourdomain.com/robots.txt and is the first thing most search engine crawlers check before visiting any other page on a site. Used correctly, robots.txt protects non-public pages from being indexed and preserves crawl budget for the pages that actually matter.
How It Works
Robots.txt uses a simple directive syntax. Disallow: /admin/ tells all crawlers not to access pages under the admin directory. Allow: / permits access to everything. You can write rules for specific crawlers (like Googlebot) or apply rules to all bots.
Robots.txt is not a security tool. It relies on bots to voluntarily comply, and most legitimate search engines do. But a malicious scraper or bad actor can ignore it. For truly private content, authentication is required. Robots.txt is specifically for managing search indexing, not access control.
A misconfigured robots.txt is one of the most damaging technical SEO mistakes possible. If someone accidentally adds Disallow: / to a production site, Google stops crawling everything and the site can drop out of search results entirely within days.
Why It Matters
A well-configured robots.txt prevents internal tools, staging environments, duplicate pages, and low-value URL patterns from wasting crawl budget. It also prevents sensitive internal pages (like order confirmation pages, thank-you pages, or account management pages) from appearing in search results. For small businesses using common CMS platforms, checking robots.txt for accidental blocks is a standard part of any technical SEO audit.
Example
A local contractor's website was migrated from a staging subdomain to the live domain, but the developer forgot to update the robots.txt. The live site still has Disallow: / carried over from staging, blocking all of Google's crawlers. The site has not appeared in search results for three weeks. Once discovered and corrected, Google re-crawls the site and rankings return within two weeks.
Related Terms
Technical SEO, Crawl Budget, XML Sitemap, Canonical Tag, On-Page SEOIf you are working on your business's search visibility and want a practical starting point, the AI Workflow Audit includes a review of your current content and search presence. Calculate how much slow follow-up costs your business while you are at it.
Related terms
Not sure where to start?
The AI Workflow Audit maps your current operations and builds a prioritized automation plan.