A robots.txt file is a text file is used to instruct search engine robots (also known as spiders or crawlers) how to crawl index pages of website. It is placed in the root directory of website. It tells search engine robots which pages or files the robot can or cannot request from website. This is used to help prevent overloading of website with requests, as well as to prevent sensitive or private information from being indexed.
Examples
User-agent: *
Disallow: /
The above example of robots.txt file is used to tell all search engine robots that they are not allowed to access any pages on the website.
The robots.txt file also use to allow certain search robots to access certain pages, or to block certain search robots from certain pages.
Examples
User-agent: Googlebot
Disallow: /private/
Allow: /public/
User-agent: *
Disallow: /
The above example shows, the Googlebot is allowed to access pages in the public directory, but not allowed to access pages in the private directory. All other robots are not allowed to access any pages on the website.
To create a robots.txt file for your website, follow these steps:
User-agent: *
Disallow: /private/
Allow: /public/
Some common uses of the robots.txt file as folows:
Example 1
User-agent: *
Disallow: /
This code is used to block all search robots from crawling entire website.
Example 2
User-agent: *
This code will allow all robots to crawl entire website.
Disallow:
Example 3
User-agent: BadBot
This code will block the specific robot named "BadBot" from crawling any part of website.
Disallow: /
Example 4
User-agent: *
This code will block all robots from crawling any file or page within the "private" folder on website.
Disallow: /private/
Example 5
User-agent: *
This code will block all robots from crawling the specific page named "contact.html" on website.
Disallow: /contact.html
It's important to note that the robots.txt file is just a suggestion to search robots, and not all search robots will necessarily follow the instructions in it. The robots.txt file does not provide any security or protection for website. It is simply a way to communicate with web robots and to give them guidance on how to crawl on website.