What is a Robots.txt File?
The robots.txt file tells search engines which pages of your site they can or cannot crawl.
The robots.txt file is a plain text file placed at the root of your website (e.g., h1site.com/robots.txt). It acts as a guide for search engine crawlers like Google, Bing, or Yahoo.
What does it do?
- ●Control crawling: You can block access to certain sections of your site (admin pages, sensitive files, duplicate pages).
- ●Save crawl budget: Search engines allocate a limited number of requests per site. By blocking unimportant pages, you direct bots to your priority content.
- ●Point to the sitemap: You can specify the location of your sitemap.xml file.
Example robots.txt file
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Allow: /
Sitemap: https://h1site.com/sitemap.xml
Common mistakes
- ●Accidentally blocking the entire site with
Disallow: / - ●Forgetting to reference the sitemap
- ●Thinking robots.txt prevents indexing (it prevents crawling, not indexing)
SEO Impact
A well-configured robots.txt improves crawl efficiency and ensures Google focuses its resources on your most important pages.
Related Terms
What is a Meta Description?
The meta description is a short HTML summary that appears below the title in Google search results.
What is the Title Tag?
The title tag is the most important HTML element for SEO. It's the blue clickable title in Google.
What is a Canonical Tag?
The canonical tag tells Google which is the preferred version of a page when duplicate content exists.
What is Hreflang?
Hreflang is an HTML attribute that tells Google the language and region a page targets for multilingual sites.
What is X-Default Hreflang?
X-default is a special hreflang value that designates the default page for users whose language isn't specifically targeted.
What is an XML Sitemap?
An XML sitemap is a file that lists all important pages of your site to help Google discover them.