The other day I was working on my file server after a couple of hard drives failed within almost an hour of each other. Bizzare and unfortunate as it may sound, I was not in so much of a panic as I had the two drives mirrored onto another large, hard drive daily using Delta Copy. While potentially I could have lost nearly 200GB of data on the two drives, the fact that they were mirrored meant that I only lost around 10GB of files that were mainly DVD rips (will have to rip them again).
To cut a long story short, I was waiting for Windows to install on my server when I noticed that the response time for my web server seemed to have increased considerably. While it is not the fastest web server around, it has been doing its job well for nearly five years with a few reboots in between. Usually it responds to web requests quite speedily, but for some reason it was taking its time. For a very brief period I felt a little panicked about the possibility of the server’s main hard drive failing, but after seeing half of my website load I figured something else was up.
I looked at the netstat and it showed there were quite a few connections open and on reading the server’s log file life (via tail -f) I realised that not only I had Googlebot crawling my site, I also had Slurp from Yahoo and another bot from Webalta in a Russian domain all crawling all over my web server at the same time. This kinda crawling gave my web server the creeps.
Can’t these bots at least have a roster about who was going to crawl which site at any given time? What is the point of being indexed by the search engines if no one can reach your site without patiently waiting for nearly a minute?