It’s inevitable. That home-brew, built-from-scratch search engine code will start to show its age sooner or later. After many years and a good run my search code was starting to show its age. I did a SQL Profiler trace on MSSQL 2005 and there were several hundred searches daily that took over 10 seconds to execute. I don’t know about you, but if my search doesn’t come back in a few seconds, I go elsewhere. I’ve been farting around with the idea of integrating a Google Mini appliance for a few years but thought its time may of finally come.
Google Mini Features
We went with the highest available Google Mini Appliance that can index 300,000 pages for a retail price of ~$10,000 US. The website doesn’t show too much about the actual physical hardware besides pointing out this 10k gem has NO RAID support. For those companies that need mission critical availability, but can’t fork over the min amount to get into the real Google Appliance offering they suggest buying 2 Google minis. Uhm what? Ok so I dropped 10k on this server and you can’t throw in a $150 raid card and another hard drive. That’s just annoying.
Auto Spell Checking / Suggestions
The appliance has some kind of black magic spell check thing going on which isn’t too horribly helpful. There seems to be no way to add onto the dictionary which is pretty annoying. Even more annoying is the fact that even if you supply synonyms for words, it WILL NOT give you the spelling suggestion every time. In my testing I had to perform the exact same search with my misspelling 4-5 times before it gave me the synonym suggestion. As quoted from Google
“The Google Mini uses sophisticated algorithms to generate spelling suggestions from the content in your index. It is especially good at providing relevant spelling suggestions for proper nouns such as the names of employees and product names. The spelling system is fully automated; so it is not possible to manually edit the spelling dictionary. “
Query Expansion / Keyword Stemming
This is kind of a bummer. The Mini doesn’t support this feature. Ideally as an admin I should be able to say “Hey Mr. appliance, I want you to consider these 3 words as equals”. There are many cases where I want to server relevant search results based on a small set of keywords instead of having the results scattered across multiple keywords. For instance, what I had a ton of content that related to “Tonnea Covers” “Truck Bed Covers” or “Tonno Covers”. All 3 are the exact same thing, yet its possible if the visitor searches for one, they won’t get the content behind the others. Of course there are ways “around” this, but ideally I should be able to tell the appliance these terms are the same and interchangeable.
Synonyms
This is a way for us to suggest another term to search for based on a keyword in the original search. This is *kind* of a work around the Keyword Stemming issue, but the jist is that you’re still forcing the visitor to click around to see the other search results instead of serving them up all in one group the first time.
Meta Data Fetching / Searching
This is actually a pretty sweet feature. We can expose Meta Data on our pages and filter our search results by them, and also fetch the full meta data itself. This would allow us to cache data right on the appliance for us to pull through on the presentation side instead of being force to do lookups on presentation per data row.
Integration Options for the Google Mini
There’s basically 2 different mechanisms that we can use to integrate with the beast.
Fully Formatted Search Output From the Appliance
XML Search Output
Issues / Bugs with the Google Mini Search Appliance
Broken Preview Feature and Images.
Ok its probably not a good sign when the first thing you see when you log into the appliance after setup are a bunch of broken images. It wasn’t a huge deal..but after dropping 10 large on a machine I would hope for more. There is also a preview sections that allows you to modify the XSL stylesheet and see it before saving, that was just simply broken.
Temporary server error. Try again in a minute.
Just over a month out of the box, we started having issues. We’re indexing only a fraction of our max page limit and I would summarize our overall search load as “low”. We started getting the following ominous error “Temporary server error. Try again in a minute.” The mini comes with tech support now automatically, so it was time to get them in the loop to see what the deal was. Now if you’re one of those shops that didn’t pay for the super-uber tech support get ready to wait. Support was only via email and took about a day per email to work through things. All said and done the tech though we were experiencing a documented “Issue” and recommended we update our OS software and server software to the latest versions. Now here’s where things start to get fun. There are 2 different types of updates to view. One is for the Linux OS they have running and another for the actual search engine software running on it. As it turns out, the mini stopped spitting this out and we just decided to address it later should it arise again. Little did we know what was in store!
Google Mini Appliance Crash / Dead In the Water
About a month after the issue above, we had a hard crash. The appliance was horribly slow on the admin side and only 10% or less of the search requests were actually getting fulfilled. Those that WERE getting fulfilled were very slow (10-45 seconds each). After a full reboot, it was fine..for 1 minute..then started the same crap. At this point we decided to push forward with our full OS/Server software updates. That took about 2 hours all said and done and all issues were finally resolved after that.
coming from the XML output from the box.
Pingback: Can you find intranet content easily with a Google Mini? Search me… | MikePadgett.com