Week 9 Lab – Open Web Information Gathering Students Student ID Name Notes · This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you...

1 answer below »


Week 9 Lab – Open Web Information Gathering






Students



























Student ID




Name






































Notes


· This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you are running it without Kali then google to find an alternative command for your OS.


· You can perform this exercise in groups over your online meeting tool of choice or individually if you would prefer.


· If you are performing the exercise as a group, please make sure you've joined the same group in the group selection tool on iLearn.



Background



Cyber criminals and hackers spend a lot of time browsing the web, looking for background information about their target organisation. Things that they will be interested in are: What does their target organisation/individual do? How do they interact with the world? Do they have a sales department? Are they hiring? Cyber criminals will browse the organisation’s website, looking for general information such as contact information, phone and fax numbers, emails, company structure etc. They will also look for sites that link to the target site, or for company emails floating around the web.



A lot of the time, the smallest details can give an attacker the most information. For example, how well designed is the target website? How clean is their HTML code? These things might give an attacker a clue about the organisation’s web development budget, which may reflect on their security budget.



Google is a hacker’s best friend, especially when it comes to information gathering.



Enumerating with Google



Google supports the use of various search operators, which allow a user to narrow down and pinpoint search results. For example, the ‘site’ operator will limit Google search results to a single domain. Say we want to know the approximate web presence of an organisation, we can use ‘site:Microsoft.com’ to show only results for the Microsoft.com domain. Figure 1 below shows that on 22nd
March 2017, Google indexed around 34.5 million pages from the Microsoft.com domain. These specific queries are referred to as “Google Dorks”





Figure 1: The Google ‘site’ operator in action











Activity 1:
Practice with the ‘site’ operator


Use the ‘site’ operator and perform a Google index on 3 companies of your choice. Ideally selecting small or medium size organisations would be ideal. Record in the box below the companies that you have selected and the number of pages that Google indexed for each.




Company 1: No of pages:





Company 2: No of pages:





Company 3: No of pages:




In the Microsoft example shown in Figure 1, you will notice how most of the results originate from the
www.microsoft.com
subdomain. Now let’s filter those out to see what other subdomains may exist at microsoft.com. We can do this using the following command:



site:microsoft.com –site:www.microsoft.com



These two simple queries have revealed quite a lot of background information about the microsoft.com domain, such as their Internet presence and a list of their web accessible subdomains.



Use this simple query on your selected 3 companies and record the number of results returned for each and three subdomains for each in the box below:




Company 1: No of pages:


Subdomains:





Company 2: No of pages:


Subdomains:





Company 3: No of pages:


Subdomains:
















Activity 2: Research

Perform some research and provide 3 Google Dorks that can be used to find sensitive information










Dork 1:

Purpose:




Dork 2:
Purpose:




Dork 3:
Purpose:









Activity 3: DNS lookups


We’re going to perform a zone file lookup; this can be done using specific tools but there are some websites that will allow us to do this too. We’re going to use
https://www.ultratools.com/tools/dnsLookup. Perform a lookup on the 3 domains you have chosen above.





Example:
zoom.us




Mail server:
Google




Name server:
AWS




Web server IP:
52.202.62.196






Domain 1:




Mail server:




Name server:




Web server IP:





Domain 2:




Mail server:




Name server:




Web server IP:





Domain 3:




Mail server:




Name server:




Web server IP:











Activity 4: Robots.txt


Robots.txt is publicly available and found on websites – it gives instructions to web robots (search engine crawlers) about what is and is not visible using the Robots Exclusion Protocol. The Disallow: / statement tells a browser not to visit a source. Disallow can give an attacker intelligence on what a target hopes not to disclose to the public. The Robots.txt file can be found in the root directory of a target website and is publicly available.



Go to your web browser and type in the following address:
http://www.facebook.com/robots.txt. Your search should return something like Figure 3 below.









Figure 3: Results of robot.txt search on Facebook


Robots can ignore a /robots.txt disallow command, especially malware robots that scan the web for security vulnerabilities. Email address harvesters used by spammers will also pay no attention to the robots.txt file disallow command. Anyone can see what sections of the server that the organisation doesn’t want robots to use or see – this information can be used to find information that the company wants to keep private (and this usually means that there is something there that the company wants to hide). This information is giving potential malicious actors a lot of information and intelligence about the structure of the website (and therefore, potential targets).





Enter the address of a popular website into your search engine and add robots.txt to the end of the address (see above). Record, in the box below, web pages or folders that the organisation doesn’t want you to see.






Domain 1


[Insert robots file here]







Domain 2


[Insert robots file here]









Domain 3


[Insert robots file here]






















Activity 5: Email harvesting

Email harvesting is an effective way of finding emails, and possibly usernames, belonging to an organisation. These emails are useful in many ways, such as providing a potential list for client-side attacks (such as phishing), revealing the naming convention used in the organisation, or mapping out users in the organisation.



Open Kali Linux and navigate to the ‘theharvester’ tool. You can do this by clicking on:


1. Applications > Kali Linux > Information Gathering > OSINT Analysis > The Harvester


2. Next, enter the following syntax into theharvester command line:



theharvester -d microsoft -l 200 -b linkedin


3. Record the first five lines of what is returned in the box below:



























4. Now try a different company and a different search engine using the following syntax: theharvester –d sixthstartech.com –l 300 –b google



5. Record what is returned in the box below:






















6. Using the following syntax, enumerate email addresses belonging to one or more of the organisations you chose in Activity 1:


Ø theharvester –d [organisation] –l 300 –b [search engine name] #


-d [organisation] will be the organization from which you want to fetch the information]


-l will limit the search for a specified number


-b is used to specify the search engine name (for example, Google, Yahoo, Bing etc)


Record in the box below the information that you have been able to find about your chosen organisation(s). You can experiment with different search engines and limit searches to various numbers.





















Activity 6: Research


Look into other open source techniques and describe 3 of them below




Technique 1


[Describe here]







Technique 2


[Describe here]









Technique 3


[Describe here]










Activity 7: Research


Perform some research into how this information could be used maliciously and describe below



































































Answered Same DayMay 04, 2022

Answer To: Week 9 Lab – Open Web Information Gathering Students Student ID Name Notes · This seminar can be...

Abishek A answered on May 05 2022
90 Votes
Week 9 Lab – Open Web Information Gathering
Students
    Student ID
    Name
    
    
    
    
    
    
    
    
Notes
· This seminar can be performed without a virtual machine however it’s written for Kali (setup in previous weeks). If you are running it without Kali then google to find an alternative command for your OS.
· You can perform this exercise in groups over your online meeting tool of choice or individually if you would prefer.
· If you are performing the exercise as a group, please make sure you've joined the same group in the group selection tool on iLearn.
Background
Cyber criminals and hackers spend a lot of time browsing the web, looking
for background information about their target organisation. Things that they will be interested in are: What does their target organisation/individual do? How do they interact with the world? Do they have a sales department? Are they hiring? Cyber criminals will browse the organisation’s website, looking for general information such as contact information, phone and fax numbers, emails, company structure etc. They will also look for sites that link to the target site, or for company emails floating around the web.
A lot of the time, the smallest details can give an attacker the most information. For example, how well designed is the target website? How clean is their HTML code? These things might give an attacker a clue about the organisation’s web development budget, which may reflect on their security budget.
Google is a hacker’s best friend, especially when it comes to information gathering.
Enumerating with Google
Google supports the use of various search operators, which allow a user to narrow down and pinpoint search results. For example, the ‘site’ operator will limit Google search results to a single domain. Say we want to know the approximate web presence of an organisation, we can use ‘site:Microsoft.com’ to show only results for the Microsoft.com domain. Figure 1 below shows that on 22nd March 2017, Google indexed around 34.5 million pages from the Microsoft.com domain. These specific queries are referred to as “Google Dorks”
Figure 1: The Google ‘site’ operator in action
Activity 1: Practice with the ‘site’ operator
Use the ‘site’ operator and perform a Google index on 3 companies of your choice. Ideally selecting small or medium size organisations would be ideal. Record in the box below the companies that you have selected and the number of pages that Google indexed for each.
Company 1: Ebay No of pages: 3,77,00,000
Company 2: Walmart No of pages: 4,12,00,000
Company 3: Tata No of pages: 1170 Pages
In the Microsoft example shown in Figure 1, you will notice how most of the results originate from the www.microsoft.com subdomain. Now let’s filter those out to see what other subdomains may exist at microsoft.com. We can do this using the following command:
site:microsoft.com –site:www.microsoft.com
These two simple queries have revealed quite a lot of background information about the microsoft.com domain, such as their Internet presence and a list of their web accessible subdomains.
Use this simple query on your selected 3 companies and record the number of results returned for each and three subdomains for each in the box below:
Company 1: Tata
No of pages: 850 Pages
Subdomains:
https://egotata.com/
https://www.tata.com/jrd
https://www.tata.com/tsmg
Company 2: Wallmart No of pages: 5,12,00,000
Subdomains:
https://www.walmartshoplive.com
https://www.walmart.com/plus
https://www.walmartpetrx.com
Company 3: Ebay No of pages: 3,78,00,000
Subdomains:
https://www.ebay.com/sns/
https://careers.ebayinc.com
https://pages.ebay.com
Activity 2: Research
Perform some research and provide 3 Google Dorks that can be used to find sensitive information
    
Dork 1: site:.edu “phone number”
Purpose: This Dork searches for websites on .edu domains that contain the words “student” and “phone number”.
Dork 2: link:www.google.com
Purpose: lists webpages that have links pointing to the Google homepage.
Dork 3: allintitle: google search
Purpose: It will return only documents that have both “google” and “search” in the title.
Activity 3: DNS lookups
We’re going to perform a zone file lookup; this can be done using specific tools but there are some websites that will allow us to do this too. We’re going to use https://www.ultratools.com/tools/dnsLookup. Perform a lookup on the 3 domains you have chosen above.
Example: zoom.us
    Mail server: Google
    Name server: AWS
    Web server IP: 52.202.62.196
Domain 1: Tata.com
    Mail server: Microsoft Outlook
    Name server: Microsoft Azure
    Web server IP: 40.81.95.116
Domain 2: Walmart.com
    Mail server: www.walmart.com.edgekey.net
    Name server: www.walmart.com.edgekey.net
    Web server IP: 23.201.200.129
Domain 3: Ebay.com
    Mail server: slot9428.ebay.com.edgekey.net,
    Name server: slot9428.ebay.com.edgekey.net
    Web server IP: 23.50.253.89
Activity 4: Robots.txt
Robots.txt is publicly available and found on websites – it gives instructions to web robots (search engine crawlers) about what is and is not visible using the Robots Exclusion Protocol. The Disallow: / statement tells a browser not to visit a source. Disallow can give an attacker intelligence on what a target hopes not to disclose to the public. The Robots.txt file can be found in the root directory of a target website and is publicly available.

Go to your web browser and type in the following address: http://www.facebook.com/robots.txt. Your search should return something like Figure 3 below.
Figure 3: Results of robot.txt search on Facebook
Robots can ignore a /robots.txt disallow command, especially malware robots that scan the web for security vulnerabilities. Email address harvesters used by spammers will also pay no attention to the robots.txt file disallow command. Anyone can see what sections of the server that the organisation doesn’t want robots to use or see – this information can be used to find information that the company wants to keep private (and this usually means that there is something there that the company wants to hide). ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here