Week 9 Lab – Open Web Information Gathering Students Student...

Question

Week 9 Lab – Open Web Information Gathering Students Student ID Name Notes · This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you...

Week 9 Lab – Open Web Information Gathering

Students

Student ID	Name

Notes

· This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you are running it without Kali then google to find an alternative command for your OS.

· You can perform this exercise in groups over your online meeting tool of choice or individually if you would prefer.

· If you are performing the exercise as a group, please make sure you've joined the same group in the group selection tool on iLearn.

Background

Cyber criminals and hackers spend a lot of time browsing the web, looking for background information about their target organisation. Things that they will be interested in are: What does their target organisation/individual do? How do they interact with the world? Do they have a sales department? Are they hiring? Cyber criminals will browse the organisation’s website, looking for general information such as contact information, phone and fax numbers, emails, company structure etc. They will also look for sites that link to the target site, or for company emails floating around the web.

A lot of the time, the smallest details can give an attacker the most information. For example, how well designed is the target website? How clean is their HTML code? These things might give an attacker a clue about the organisation’s web development budget, which may reflect on their security budget.

Google is a hacker’s best friend, especially when it comes to information gathering.

Enumerating with Google

Google supports the use of various search operators, which allow a user to narrow down and pinpoint search results. For example, the ‘site’ operator will limit Google search results to a single domain. Say we want to know the approximate web presence of an organisation, we can use ‘site:Microsoft.com’ to show only results for the Microsoft.com domain. Figure 1 below shows that on 22^nd
March 2017, Google indexed around 34.5 million pages from the Microsoft.com domain. These specific queries are referred to as “Google Dorks”

Figure 1: The Google ‘site’ operator in action

Activity 1:
Practice with the ‘site’ operator

Use the ‘site’ operator and perform a Google index on 3 companies of your choice. Ideally selecting small or medium size organisations would be ideal. Record in the box below the companies that you have selected and the number of pages that Google indexed for each.

Company 1: No of pages:

Company 2: No of pages:

Company 3: No of pages:

In the Microsoft example shown in Figure 1, you will notice how most of the results originate from the
www.microsoft.com
subdomain. Now let’s filter those out to see what other subdomains may exist at microsoft.com. We can do this using the following command:

site:microsoft.com –site:www.microsoft.com

These two simple queries have revealed quite a lot of background information about the microsoft.com domain, such as their Internet presence and a list of their web accessible subdomains.

Use this simple query on your selected 3 companies and record the number of results returned for each and three subdomains for each in the box below:

Company 1: No of pages:

Subdomains:

Company 2: No of pages:

Subdomains:

Company 3: No of pages:

Subdomains:

Activity 2: Research

Perform some research and provide 3 Google Dorks that can be used to find sensitive information

Dork 1:

Purpose:

Dork 2:
Purpose:

Dork 3:
Purpose:

Activity 3: DNS lookups

We’re going to perform a zone file lookup; this can be done using specific tools but there are some websites that will allow us to do this too. We’re going to use
https://www.ultratools.com/tools/dnsLookup. Perform a lookup on the 3 domains you have chosen above.

Example:
zoom.us

Mail server:
Google

Name server:
AWS

Web server IP:
52.202.62.196

Domain 1:

Mail server:

Name server:

Web server IP:

Domain 2:

Mail server:

Name server:

Web server IP:

Domain 3:

Mail server:

Name server:

Web server IP:

Activity 4: Robots.txt

Robots.txt is publicly available and found on websites – it gives instructions to web robots (search engine crawlers) about what is and is not visible using the Robots Exclusion Protocol. The Disallow: / statement tells a browser not to visit a source. Disallow can give an attacker intelligence on what a target hopes not to disclose to the public. The Robots.txt file can be found in the root directory of a target website and is publicly available.

Go to your web browser and type in the following address:
http://www.facebook.com/robots.txt. Your search should return something like Figure 3 below.

Figure 3: Results of robot.txt search on Facebook

Robots can ignore a /robots.txt disallow command, especially malware robots that scan the web for security vulnerabilities. Email address harvesters used by spammers will also pay no attention to the robots.txt file disallow command. Anyone can see what sections of the server that the organisation doesn’t want robots to use or see – this information can be used to find information that the company wants to keep private (and this usually means that there is something there that the company wants to hide). This information is giving potential malicious actors a lot of information and intelligence about the structure of the website (and therefore, potential targets).

Enter the address of a popular website into your search engine and add robots.txt to the end of the address (see above). Record, in the box below, web pages or folders that the organisation doesn’t want you to see.

Domain 1

[Insert robots file here]

Domain 2

[Insert robots file here]

Domain 3

[Insert robots file here]

Activity 5: Email harvesting

Email harvesting is an effective way of finding emails, and possibly usernames, belonging to an organisation. These emails are useful in many ways, such as providing a potential list for client-side attacks (such as phishing), revealing the naming convention used in the organisation, or mapping out users in the organisation.

Open Kali Linux and navigate to the ‘theharvester’ tool. You can do this by clicking on:

1. Applications > Kali Linux > Information Gathering > OSINT Analysis > The Harvester

2. Next, enter the following syntax into theharvester command line:

theharvester -d microsoft -l 200 -b linkedin

3. Record the first five lines of what is returned in the box below:

4. Now try a different company and a different search engine using the following syntax: theharvester –d sixthstartech.com –l 300 –b google

5. Record what is returned in the box below:

6. Using the following syntax, enumerate email addresses belonging to one or more of the organisations you chose in Activity 1:

Ø theharvester –d [organisation] –l 300 –b [search engine name] #

-d [organisation] will be the organization from which you want to fetch the information]

-l will limit the search for a specified number

-b is used to specify the search engine name (for example, Google, Yahoo, Bing etc)

Record in the box below the information that you have been able to find about your chosen organisation(s). You can experiment with different search engines and limit searches to various numbers.

Activity 6: Research

Look into other open source techniques and describe 3 of them below

Technique 1

[Describe here]

Technique 2

[Describe here]

Technique 3

[Describe here]

Activity 7: Research

Perform some research into how this information could be used maliciously and describe below

pict8048tutorial9-50bmu5ae.docx

Answered Same DayMay 04, 2022

Abishek A · Accepted Answer

Week 9 Lab – Open Web Information Gathering
Students
	Student ID
	Name
	
	
	
	
	
	
	
	
Notes
· This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you are running it without Kali then google to find an alternative command for your OS.
· You can perform this exercise in groups over your online meeting tool of choice or individually if you would prefer.
· If you are performing the exercise as a group, please make sure you've joined the same group in the group selection tool on iLearn.
Background
Cyber criminals and hackers spend a lot of time browsing the web, looking for background information about their target organisation.  Things that they will be interested in are: What does their target organisation/individual do? How do they interact with the world?  Do they have a sales department?  Are they hiring?  Cyber criminals will browse the organisation’s website, looking for general information such as contact information, phone and fax numbers, emails, company structure etc. They will also look for sites that link to the target site, or for company emails floating around the web.  
A lot of the time, the smallest details can give an attacker the most information.  For example, how well designed is the target website?  How clean is their HTML code? These things might give an attacker a clue about the organisation’s web development budget, which may reflect on their security budget.
Google is a hacker’s best friend, especially when it comes to information gathering.
Enumerating with Google
Google supports the use of various search operators, which allow a user to narrow down and pinpoint search results.  For example, the ‘site’ operator will limit Google search results to a single domain.  Say we want to know the approximate web presence of an organisation, we can use ‘site:Microsoft.com’ to show only results for the Microsoft.com domain.   Figure 1 below shows that on 22nd March 2017, Google indexed around 34.5 million pages from the Microsoft.com domain.  These specific queries are referred to as “Google Dorks”
Figure 1: The Google ‘site’ operator in action
Activity 1: Practice with the ‘site’ operator
Use the ‘site’ operator and perform a Google index on 3 companies of your choice. Ideally selecting small or medium size organisations would be ideal. Record in the box below the companies that you have selected and the number of pages that Google indexed for each.
Company 1:  Ebay                                                  No of pages: 3,77,00,000 
Company 2:   Walmart                                          No of pages: 4,12,00,000
Company 3:     Tata                                               No of pages: 1170 Pages
In the Microsoft example shown in Figure 1, you will notice how most of the results originate from the www.microsoft.com subdomain.  Now let’s filter those out to see what other subdomains may exist at microsoft.com.  We can do this using the following command: 
site:microsoft.com –site:www.microsoft.com
These two simple queries have revealed quite a lot of background information about the microsoft.com domain, such as their Internet presence and a list of their web accessible subdomains.
Use this simple query on your selected 3 companies and record the number of results returned for each and three subdomains for each in the box below:
Company 1:  Tata   
                                                                                              No of pages: 850 Pages
Subdomains: 
 https://egotata.com/
 https://www.tata.com/jrd
 https://www.tata.com/tsmg
Company 2:  Wallmart                                                       No of pages: 5,12,00,000
Subdomains:
https://www.walmartshoplive.com
https://www.walmart.com/plus
https://www.walmartpetrx.com
Company 3:   Ebay                                                             No of pages: 3,78,00,000
Subdomains:
https://www.ebay.com/sns/
https://careers.ebayinc.com
https://pages.ebay.com
Activity 2: Research
Perform some research and provide 3 Google Dorks that can be used to find sensitive information
	
Dork 1:  site:.edu “phone number”
Purpose: This Dork searches for websites on .edu domains that contain the words “student” and “phone number”.
Dork 2: link:www.google.com
Purpose: lists webpages that have links pointing to the Google homepage.
Dork 3: allintitle: google search
Purpose: It will return only documents that have both “google” and “search” in the title.
Activity 3: DNS lookups
We’re going to perform a zone file lookup; this can be done using specific tools but there are some websites that will allow us to do this too. We’re going to use https://www.ultratools.com/tools/dnsLookup. Perform a lookup on the 3 domains you have chosen above.
Example: zoom.us
	Mail server: Google
	Name server: AWS
	Web server IP: 52.202.62.196
Domain 1: Tata.com
	Mail server: Microsoft Outlook
	Name server: Microsoft Azure
	Web server IP: 40.81.95.116
Domain 2: Walmart.com
	Mail server: www.walmart.com.edgekey.net
	Name server: www.walmart.com.edgekey.net
	Web server IP: 23.201.200.129
Domain 3: Ebay.com
	Mail server: slot9428.ebay.com.edgekey.net, 
	Name server: slot9428.ebay.com.edgekey.net
	Web server IP: 23.50.253.89
Activity 4: Robots.txt
Robots.txt is publicly available and found on websites – it gives instructions to web robots (search engine crawlers) about what is and is not visible using the Robots Exclusion Protocol.  The Disallow: / statement tells a browser not to visit a source.  Disallow can give an attacker intelligence on what a target hopes not to disclose to the public.  The Robots.txt file can be found in the root directory of a target website and is publicly available.  
 
Go to your web browser and type in the following address:  http://www.facebook.com/robots.txt.  Your search should return something like Figure 3 below. 
Figure 3: Results of robot.txt search on Facebook
Robots can ignore a /robots.txt disallow command, especially malware robots that scan the web for security vulnerabilities.  Email address harvesters used by spammers will also pay no attention to the robots.txt file disallow command.

Week 9 Lab – Open Web Information Gathering Students Student ID Name Notes · This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you...

Answer To: Week 9 Lab – Open Web Information Gathering Students Student ID Name Notes · This seminar can be...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment