January 09, 2009

Have they broken Google Search?
It seems that Google has made a major change to how search queries work. So much of a change that for "power users" of Google search, they may have just "broken" their search engine. Google's boolean search capabilities are relied on every day by researchers, scientists, marketers and other web users.

I've been using Google since it was first in "beta" and as long as I can remember, it has always done a boolean "ANDed" search by default. This means that it looks for pages the include ALL your search terms if you type in multiple keywords or phrases.

Recently I've noticed some strange behavior in that on some occasions adding more search terms actually INCREASES the number of results I get - which is the opposite of what should happen. Let me illustrate with some example queries:

1. A search for auto dealer service hazmat charges estimate yielded 25,500 results. 
[For context, imagine an unpleasant surprise when getting my dealer's repair bill after a front end alignment.]  

2. A search for auto dealer service hazmat charges estimate -anatomy yielded 54,500 results
[I kept finding the same article on copied on multiple sites - the article used the word anatomy - so I was trying to eliminate that same content from my results.]

Some playing around with different keywords yields this surprise
3. auto dealer service hazmat charges estimate wrecking yields 5,320 results, while 


In my understanding of boolean logic, if you add the number of results from searches #3 and #4, they should total the results from search #1.  But search #1's 25,500 results are less than half of the 61,920 you get from #3 and #4 together.

What is Google up to? It's not exactly obvious.

It doesn't seem to be a simple factor of using too many search terms, as the searches below (#5) show the expected refinement and reduction of results. 

5. A search for auto dealer service hazmat charges estimate -out yields only 5,660 results, and auto dealer service hazmat charges estimate -for only gets 86 results.

Looking at the results from search #4 above yields at least a partial answer:  Google is including results that don't actually match the query.

One of the results returned for search #4 was this page:
Pulling up Google's cached version shows that it INCLUDES the words auto service hazmat estimate, but DOESN'T INCLUDE either "dealer" or "charges", which were specified in the query. 

I'm guessing Google is trying to "help" the user by returning partially-matching results rather than a "no results" page which you can still get if you do something like #6. 

6. A search for auto dealer service hazmat charges estimate lyle123 yields a page saying "Your search - auto dealer service hazmat charges estimate lyle123 - did not match any documents."

Now for the really SAD (or FUNNY) example:
If you scroll to the bottom of the search page and click "Search within results" and then search for "dealer" within the results what should you get?  If search #4 worked predictably as a proper "AND" boolean search, then it should be all 56,600 results.  Given that Google is including results that don't have the work "dealer" in them, you'd expect to get a subset of the 56,600 results when you "search within results."  Somehow Google finds it proper to return a mind-numbing 207,000 results!!!  Yes, searching *within* 56,600 results yields nearly 4 times as many results.  How's that for "search refinement?"

If anyone can explain this strange behavior to me, I'd really appreciate it.  I can imagine it may have something to do with how Google may eliminate (or not eliminate) "duplicate" results, but somehow that just doesn't seem explain it all in my train of thinking.

Not sure if I've uncovered something new here.  The hardcore SEO folks have probably noticed this already. I tried a few searches for things like "Google boolean broken" but then again I was using Google, so evidently I can't trust the results it brings up anymore. Sigh.


2 comments:

Bill said...

Hi Lyle,

I've been a little frustrated by this too. I expect that as I exclude more words, the number of results should get smaller. I've been seeing Google do this for a good number of months. I've been trying to understand why this is happening as well.

Here are some things I've been looking at:

1. The number of results that you see for search results are estimates, based upon a small percentage of Google's index, so the number of results you see in response to a query are somewhat suspect to begin with. By itself, that's an unsatisfactory answer.

2. Google may be performing query broadening when they believe that there is Inadequate Search Content.

3. Google now uses Stemming when "appropriate." I'm not sure what triggers the use of stemming yet, but it's possible that it might be appropriate for one string of queries and not for others.

4. Google may be expanding queries based upon the inclusion of synonyms, within context. See: Machine Translation for Query Expansion.

There are a number of other possibilities. It's this kind of odd behavior that makes things interesting when watching what the search engines do.

Althea said...

Hi,

This is a nice post. I've been a follower of your blog.

Pls also visit http://www.thecontentannex.com.

Sincerely,
Jeremy Byrnes