书名：Hands-On Web Penetration Testing with Metasploit
作者名：Harpreet Singh Himanshu Sharma
本章字数：226字
更新时间：2025-02-26 01:49:55

Web robot page enumeration

robots.txt (or the robots exclusion standard) is a method used by websites to communicate with crawlers or bots. Let's see how enumeration is done in the following steps:

To block a subfolder from Googlebot, we will use the following syntax:

User-agent: Googlebot 
Disallow: /example-subfolder/

To tell all bots not to crawl the website, we can put the following data in the text file:

User-agent: *
Disallow: /

In this section, we will use the robots_txt auxiliary to fetch the contents of a website's robots.txt file:

Start by searching for the module with the robots_txt keyword:

Clicking on the module will redirect us to the options page, where we can set the Target Addresses, RPORT, PATH, VHOST, and so on. In our case, we have used the example of www.packtpub.com as the VHOST:

Upon clicking the Run module, a new task will be created and we will be able to see the status of the script running in the Tasks window:

Once the task is complete, we can go back to the Analysis tab and click on the Notes section of our target host to see the list of all the directories listed in the robots.txt file of the website, as shown in the following screenshot:

Next, let's find some misconfigured Git repos on a given website.