Let me begin this post by saying that I am not writing this so that you can read this and become a haCkEr. I am writing this post so you can learn to identify a vulnerability and try to avoid an embarrassment.
Google is an amazing search engine. The problem is that it is too good at what it does sometimes Here are some ways that google can reveal vulnerabilities on your website by mistake.
You allowed google to index a critical file:
This happens more often than you think. WordPress for example houses important files under the wp-* folders and it is no one’s business except yours to look at these files. Other files like .htaccess htpasswd are critical to your site’s security (if you are using apache and ‘allow overrides’). Do not allow google to index them. You can prevent that by placing a robots.txt file on the root path of your website. More on that here.
The better option is to put in place a configuration that will not allow the sensitive file to be displayed in the first place. Not all robots will obey what you instruct using robots.txt. The FilesMatch directive on apache can help you protect your site.
You can double check that google can read your robots.txt using google analytics. You can check the files that google has indexed using the query‘site:yoursitename.com
Google indexed a service page that is being served on a non regular port:
Examples of this are login pages or services that do not require a password. Searching for such pages can be done using the “inurl” keyword in searches. Here is an example inurl:8080. There are ways to tweak that search string to reveal more information about services on other ports. When you complement inurl:something_unique_in_the_url with a search using quotes, like inurl:1234 intitle:”Administration blah”, it can yield some very interesting results. Pick your favorite admin tool and replace the port and title with the admin home page equivalent. The search works on many major application / web servers.
Remember that google indexes your page. Even if you correct the problem, the damage is done and is still being done. With cached pages, a service that does not ask for user name and passwords (yes there are important services that do not require a username/password) will be completely indexed. Yikes ! The data that your service exposes is cached and indexed for everyone to see. Not what we want.
To avoid this simply shutdown services you do not need. If you need a service but you want that service to be private, block the port with a firewall.
You can optionally tell google bot and other bots not to index the page in question. But that is not really a solution. Be proactive and secure the service. A cached page can end up earning you some DOS attacks.
Google cracks MD5:
I realized that google could be used to crack weak passwords from this post. If the encryption is done without salting, the password will result in the same hash every time. A weak password can be guessed easily using this technique.
The lesson here is to use a strong password that no one will guess. The other lesson is to ensure that the links on your site do not pass along sensitive information. Here is the google search in case it interests you
Cached directory pages:
Your web server is quite capable of displaying a directory listing. What this means is that besides displaying HTML, if I were to request for a directory name instead, your web server will reveal the contents of the directory to me. Why is this bad ? It helps find more vulnerable files that are housed inside those directories. You can ask apache not to serve directory content by configuring the same in httpd.conf. The line of configuration will look something like this
Options Indexes FollowSymLinks
# More stuff here
Remove the word Indexes.
The related search query in google is intitle:”index of /”. Tweaking it will provide better results.
Before you make any configuration changes, always make a backup. Read about the changes you are making and understand what you are doing before you do it. Try these tricks on your site and check if it is secure. Be creative. Think about other sensitive terms like jsessionid, username, passwd, password, id etc.