The Hidden face of Google Caching

Google is quite possibly the more powerful search engine used today and even used sometimes to check our connectivity. It has become a source of concern for many, and if not they should and we will see why!
Surprisingly, it can be used to find much more than what we even should find. It can find things like sensitive data, web vulnerabilities, etc. which means these things are exposed to everyone on the internet.
In this blog, we will be discussing Google Indexing and Caching from a technical point of view and also how to use Google Search to find some sensitive data.
I am a freelance JavaScript Developer and I’ll be explaining how to fix these problems based on my experience.
Google Indexing and Caching
Google Search, also referred to as Google Web Search is the most-used search engine handling more than three billion searches each day with 90.46% market share. This is obvious, and this is what we already know.
From a technical point of view, the main purpose of Google Search is to hunt for text in publicly accessible resources offered by web servers, etc. These resources are indexed by Google and are cached. The point is the sensitive data that has been cached is exposed to public by Google and this is what attackers love to start with if they want to find out any sensitive data about your company. So you can see that Google Caching can lead to disclosure of sensitive data.
Let me explain you with an example.
Let’s assume you want to find out whether a users’ password reset link has been cached by Google. This is an exemplary sensitive data.
https://example.in/Users/showChangePassword?resetPassword=8a76sfsw3e89ere9r794h9erer0
Go to Google search and type the following in search field,
site:example.in
This query tells Search Engine to find all the resources that belongs to this domain. You will receive plenty of search results that belongs to this domain. That’s Cool. Now to refine our search for the password reset link add the inurl filter to the above search query. This filters the results that has
restPassword
in the URL. To learn more about Google Dork Queries checkout out this Github gist.site:example.in inurl:resetPassword
Now if your lucky you will receive some results containing cached users’ password reset links. Bingo!!
Some real-world samples:





Counter Measures:
Let me now discuss how to fix these problems. We do not want Google to cache these sensitive pages. One simple fix is to add the following meta tag to the pages that we don’t want Google to cache.
<meta name="robots" content="noindex,nofollow">
And you can also prevent caching your website by Google by using this:
<meta name=”Googlebot” content=”noarchive”>
If you want to know more about this tag, checkout this blog by Google Webmasters.
Another method is blocking crawlers using Robots.txt. This tells the google crawler not to index these pages.
User-agent: *
Disallow: /sensitive-urls
The above-shown examples may look scary, but can be resolved with proper countermeasures. This non-exhaustive list of solutions may possibly help you to protect yourself against search engines and especially against Google, but you must be very careful when handling the way search engine crawler can see your website to not see your pages disappearing completely from their search engine results!
0 Comments