What is robots.txt?
A robots.txt file tells search engine crawlers which pages or files they can or cannot request from your site. It's mainly used to keep search engines from overloading your site with requests.
Common Directives
- User-agent: Specifies which crawler the rules apply to
- Disallow: Tells crawlers which pages they shouldn't access
- Allow: Tells crawlers which pages they can access
- Sitemap: Specifies the location of your sitemap
Best Practices
- Place robots.txt in your website's root directory
- Use Disallow to block sensitive pages (admin, login, etc.)
- Include your sitemap URL for better indexing
- Test your robots.txt with Google Search Console