Robots.txt File

Direct Jump

What is robots.txt?

A robots.txt file is a text file is used to instruct search engine robots (also known as spiders or crawlers) how to crawl index pages of website. It is placed in the root directory of website. It tells search engine robots which pages or files the robot can or cannot request from website. This is used to help prevent overloading of website with requests, as well as to prevent sensitive or private information from being indexed.

Examples


User-agent: *

Disallow: /

The above example of robots.txt file is used to tell all search engine robots that they are not allowed to access any pages on the website.

The robots.txt file also use to allow certain search robots to access certain pages, or to block certain search robots from certain pages.

Examples


User-agent: Googlebot

Disallow: /private/

Allow: /public/

User-agent: *

Disallow: /

The above example shows, the Googlebot is allowed to access pages in the public directory, but not allowed to access pages in the private directory. All other robots are not allowed to access any pages on the website.

How to make robots.txt file?

To create a robots.txt file for your website, follow these steps:

Open a notepad and type User-agent: *
This line tells all web search robots which rules to follow. We can also specify specific search robots name by replacing the *.
For example, User-agent: Googlebot, this would apply the rules only to the Googlebot.
In second line, type Allow: / or Disallow: /
This can be used to mention which pages or directories are allow or disallow.
For example,
Disallow: /private/ , to disallow a page or directory, use the Disallow directive, followed by the path to the page or directory
Allow: /public/ , To allow a page or directory, use the Allow directive, followed by the path to the page or directory
Repeat steps for any additional pages or directories that want to allow or disallow on the website.
Save the file as robots.txt and upload it to the root directory of website.
Example of a complete robots.txt file:
User-agent: * Disallow: /private/ Allow: /public/

Uses of robots.txt file :

Some common uses of the robots.txt file as folows:

Preventing overloading of server: By disallowing certain pages or directories, instruct search robots from making too many requests to server, which can help to prevent website from becoming overloaded or slowed down.
Hiding sensitive or private information: The robots.txt file is used to inform search robots from indexing pages that contain sensitive or private information, this can protect the data from being accessed by unauthorized users.
Controlling search engine results: By allowing or disallowing certain pages or directories in robots.txt, can control which pages appear in search engine results.
Testing changes to website: The robots.txt file is use to prevent web robots from crawling website while making any changes, and then remove the block when the changes are complete.

Examples

Example 1
User-agent: * Disallow: / This code is used to block all search robots from crawling entire website.

Example 2 User-agent: * Disallow: This code will allow all robots to crawl entire website.

Example 3 User-agent: BadBot Disallow: / This code will block the specific robot named "BadBot" from crawling any part of website.

Example 4 User-agent: * Disallow: /private/ This code will block all robots from crawling any file or page within the "private" folder on website.

Example 5 User-agent: * Disallow: /contact.html This code will block all robots from crawling the specific page named "contact.html" on website.

It's important to note that the robots.txt file is just a suggestion to search robots, and not all search robots will necessarily follow the instructions in it. The robots.txt file does not provide any security or protection for website. It is simply a way to communicate with web robots and to give them guidance on how to crawl on website.