A couple of months ago Brad came to me to discuss problems with the search on our site. Since our site features a ton of text it’s imperative that users can easily find documents in a quick manner. Up until recently we had been using MySQL’s FULLTEXT feature, which I’ve covered extensively in articles and at conferences. FULLTEXT simply wasn’t scaling in the manner in which we needed it to. As a result, I set out looking for something that would.
I think it’s rather telling that MySQL themselves do not use this feature to index their own site, instead using Mnogo Search. While looking for a solution for us I decided against open source solutions for two reasons.
- We would still, essentially, be rolling our own search solution, which would still have to be maintained and supported internally. With two full time developers (including myself) this simply wasn’t an option. We needed a “Plug and play” solution.
- Support is non-existent for ht://Dig, but is available for Mnogo Search, however this still left us configuring, installing and supporting hardware.
This left us with two options: a hosted solution (ie. atoms.com) or an appliance (ie. the Google Search Appliance). In the end we went with the GSA GB-1001 for a few reasons.
- Even with the hefty price tag of $30,000 (USD) it was still cheaper than a hosted solution.
- It’s supported for two years and, after the two years is up, we still get to keep the hardware.
- Since we host it internally we can quickly change the XSLT stylesheets, etc. without having to call up an ASP to make changes.
So the GSA arrived and there was much rejoicing. Until we booted up the machine and noticed that it had arrived with a dead hard drive. Google says that it’s perfectly okay if a single drive fails and that they normally don’t replace a GSA with only a single dead hard drive. Okay, that’s fine, but the thing arrived with a dead hard drive. For that much money I would think it should arrive in pristine working condition.
To top off my frustrations the box locked up twice within a 24 hour period. Obviously, I wasn’t putting this thing into production anytime soon. Come to find out this locking business is a known issue with a working patch. I flat out asked my Google rep why it wasn’t shipped with the patch and they said it was because it only affected some of the GSA’s. Great. In the end we shipped our GSA backed after they shipped out a replacement.
The second one arrived with four working hard drives, but also suffered from the locking issue, which Google quickly patched by logging in via SSH. SSH? Yes, the Google Search Appliance runs RedHat Linux.
So what exactly does the GSA run on? Well I’ll list the specs out for you.
- Quad 2.66GHz Intel Xeon
- 12GB of RAM
- Five 250GB Western Digital EIDE drives (Two 250GB RAID1 mirrors on a 3ware ATA RAID card and one hot spare)
In other words it’s a really beefy linux box (in a really dorky looking yellow case). Of course the box is all locked up so you can’t look on the inside and I wasn’t looking to void our support and warranty by opening the box.
So how does it perform? Well, after much tweaking of the XML interface, it’s pretty amazing. According to Google the GB-1001 will index 500,000 documents and is capable of performing 300 queries per second.
After the initial problems with the GSA I have two major complaints with it. The first is that the support is inadequate. For $15,000 per year I expect support to be better than email-only Monday through Friday during business hours. The second is that it didn’t ship with a SOAP interface. The main Google site has one, why doesn’t the GSA? Sure I can get the response back in XML, but a SOAP interface would have been much appreciated.
Other than those two issues and a few minor quirks I give the GB-1001 a high score. If you’re simply doing a site-wide search I don’t think you’ll find a more brain dead simple solution.