How Hackers Are Using Google To Pwn Your Site

October 31st, 2007 admin Posted in Web Developement No Comments »

As most of you know a few months back my site was hacked. What many people dont know is that was actually the first of 2 times the box was hacked. The first time the box was hacked I had made the mistake of making the web files on the server writeable by the web server. Again being this server (that my blog sits on) is not used for hardly any commercial activity I was a lot less security focus then something I would call “production” ready. I implemented mod_security and some other logging tools aswell as offloaded the server logs to a different server (yea the logs were owned by the apache user also).So basically when I got owned the person found a file on my server that was web accessible which then he could execute commands on behalf of the web user. Now because the files and log files were owned by this user he could write to them and even delete them. Lucky for me this guy just wanted to put up his Turkish political statement and try to infect his virus to people. So all he did was do a search on the box for any index.* files and copied his index file to over write them. Then he also deleted all files matching *log. So it was pretty obvious how the person did it but I was not sure what file was the hole in my system. This is the point where you have to weigh catching the hacker vs running a box that has been compromised. Since I really only have blogs and a few low traffic forums running on this box I thought it would be a good chance to see what was vulnerable.

So I installed mod_security and ran it pretty hardcore. Over the next couple weeks I learned more about adjusting its rulesets to allow possibly exploitable code but log it. Nothing happened for many weeks then one morning I got a page that my box was not responding. I quickly attached to my remote server via its DRAC card (Dell Remote Access). The DRAC card lets me take control of the server as if I was sitting right infront of it. I could see the box was sitting in a “kernel panic” mode and that it had crashed. I rebooted the box remotely but kept most services down so I could investigate what had happened.

Sure enough I figured out that the hacker had been back and downloaded some files to the /tmp directory (which was world writeable). Only this time I had changed ownership of all index.* files so they could not write to them. I guess they realized that in order to take over my web server he was going to need to be a bit more aggressive so he downloaded a rootkit to my tmp directory then tried to run it but fortunately for me that made the kernel panic and the server was in a frozen crashed state.

I was able to figure this out and also exactly what file they used to execute commands on my box very quickly because it was pretty much the last thing in the weblogs before the box crashed. (yay!)

So now here is where it gets interesting…. Now that I had figured out how the person was hacking into my box I was curious how in the hell the person found the file. It was in a subdirectory that I had not used in YEARS. There was no link to it from anywhere on my site. The directory structure it was in was like … html/oldforums/oldstuff/badfile.php . How in the hell did this person find this file? Well after going through the logs greping for the ip range that hacked my box I found that the person found my site from Google! Specifically using Google code search. Now while this was interesting it still did not explain how the page was even indexed…. ohh wait I use Google Sitemaps and I had it on to index everything (the default setting) OUPS!!

Now to be honest… this is my fault. I in no way blame Google what so ever. I had old exploitable code on my server and I told sitemaps to index it so… my fault.

I have since been working with the sitemaps team and I had some suggestions to leave some files off by default (like .inc .func) or only allow common web files with extensions like .php .html .asp etc… I hope they do this cause as sitemaps gets more popular its only going to expose more idiot webmasters like me that run with the default settings.

Ok so just for shits I thought I would do some querys on Google Code Search to see what kind of exploits I could find. Now keep in mind this probably will not show your site but it will show code and versions that you might be running… so once someone locates a exploitable version of code they then could just search for “Powered By X” or whatever fingerprint you could put on the exploitable program/version.

Hmm I wonder If we could find some xss exploits…

lang:php (ECHO|PRINT) .*\$_(GET|POST|COOKIE|REQUEST|FILES)

100,000+ results

How About some SQL Injection exploits?

lang:php query\(.*\$_(GET|POST|COOKIE|REQUEST|FILES).*\)

3000 results

hrmm I wonder how easy it is to find host,user,pass for mysql databases…. Lets try:

lang:php mysql_connect\((”|’)[a-zA-Z0-9_.]+(”|’),(”|’)[a-zA-Z0-9]+(”|’) -localhost -127.0.0.1 -192.168

100 results found.

This query might be a little puzzling for those that are not Google ninjas like me so.. I will explain. Basically we are checking for anything that ends in .php extension. Then we search the file for mysql_connect. If it contains Mysql we look for the pattern of a connection string. lastly we use the minus sign to get rid of all localhost databases (cause we cant access them).

So did we find anything interesting? Well…

Lets just look at the first 10 results:

www.ubio.org/downloads/XID.TAR.gz - Unknown License - PHP
connect.php

$connection = mysql_connect(”RANSOM”,”GlobalWebUser”,”goober8″) or die(”Couldn’t connect.”);
$db_name = “dwf”;

Now in this case RANSOM is probably a local box…

ohh whats this:

$f = mysql_connect(”zeus.mbl.edu”,”tns”,””);
if (empty($limit)) $limit=50;

hrmm intersting….

more?

$db=mysql_connect(”62.149.150.11″,”Sql43254″,”M9dKTz3M”);
$selezione=mysql_select_db(”Sql43254_4″, $db);

I can post tons of other examples but I think I have made my point. Watch your logs for people coming from google code search and always make sure your running the latest version of your software.

Also keep in mind my searchers were only looking for .php files. This is a small percentage of all the different languages and filetypes out there.

Be scared. Be very scared.

By Shoemoney

AddThis Social Bookmark Button

Robots.txt Files: Fence off Sections of Your Web Site from Search Engines.

September 12th, 2007 admin Posted in Web Developement No Comments »

Do you ever forget to do something that’s really simple? It’s easy to overlook some of the simple things when you’re worried about the more complex issue of Search Engine Optimization (SEO) or getting traffic to your website, etc. Robots.txt files fall into that category. Do you even have a robots.txt file on your site? It’s very simple and can help with your site’s ranking in the search engines a couple of ways.What is a robots.txt file?
A robots.txt file is a small text file that you place in the root directory of your web site. You can list directories that robots (search engine spiders) should not visit. You can get specific if you’d like and specify different things for different robots (search engines) by targeting specific user-agents, but generally that’s not necessary.

Here’s a sample robots.txt file:
User-agent: *
Disallow: /cgi-bin/
Disallow: /print-friendly/
Disallow: /~john/

Some Rules:

  • Specify one subdirectory per line.
    • The above example would stop robots from crawling the cgi-bin, print-friendly, and ~john directories.
  • You can only have one robots.txt file and it has to be in the root directory of your site.

Other ways to do it:
There is also a META tag that has just about the same meaning.
Use this meta tag in the header of a page you don’t want crawled (indexed by a search engine).

<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

Important:
Because all robots may not support or respect the robots.txt file or the META tag, your best bet is to use both.

SEO:
You may be wondering exactly how this could affect SEO (Search Engine Optimization)? The biggest way is with duplicate content. Search engines do not want to find duplicate content. If you have a printer friendly version of your blog entries, then you want to stop the duplicate printer layout versions of your entries from being crawled. I have my templates and publishing parameters setup to put all the printer friendly pages in a specific folder and I list this folder in my robots.txt file. I also configured the publishing template for the printer friendly pages to use the META tag shown above. This stops most search engines from indexing the printer friendly versions of the pages and therefore eliminates a possible problem caused by having duplicate content.

The second way this helps is to stop the printer friendly pages from showing up in the search engine result pages at all. I want people finding my site by searching to land on the regular versions of my pages, not the printer friendly versions. My printer friendly template strips off the left and right columns and therefore removes most navigation. By only having the regular pages listed in the search engines the experience of a visitor to my site is better.

Of course there are other reasons to stop certain subdirectories from being crawled. You may have products such as ebooks, training videos, or scripts and test pages that you do not want showing up in the results of a search. Because the robots.txt file is not respected by every spider crawling around out there, you should always secure sensitive data in subdirectories that are password and username protected.

The robots.txt file is just a small, easy to create text file, but small things like this can add up to make a big difference.

Learn more about robots.txt files here: www.robotstxt.org/wc/robots.html.

Fred Black

AddThis Social Bookmark Button

Author Adsense Wordpress Plugin

September 10th, 2007 admin Posted in Web Developement No Comments »

blog authors to enter their Google Adsense Publisher ID and have ads displayed on their own posts generating revenue. What this means is that blogs having multiple authors can now divide revenue on that site on a fixed ratio since their Adsense ads will be loaded as well.Installing the plugin is easy. You can set the admin publisher ID and individual authors can set their IDs in WP-Admin itself, which ensures that there is no messy editing of files.

Admins can set the ratio of how much their ads are displayed to that of author ads. However, one feature lacking is to customize this ratio by user. Currently you’re stuck with a fixed ratio, so you can’t give your better performing users higher preference over others.

AddThis Social Bookmark Button