Croc o' Lyle
Proving getting serious about usability can be fun...since 2001.
About
This is a place for things I find interesting or have to say related to usability, web design, information architecture and user experience practices. I sometimes also just ramble about other stuff as well...

Lyle Kantrovich

View 

Lyle Kantrovich's profile on LinkedIn
See how we're connected

Sections

Croc O' Lyle Home

Weekly Archives

RSS (Atom) Feed

Search this site
Try these blogs
Bloug
Lou Rosenfeld co-wrote the book on Information Architecture
UsabilityBlog
Paul Sherman - smart guy, blogger, UPA leader
Keith Instone's blog
Keith's one of the smartest and nicest IA's I know.
Product Experience Blog
DeeDee DeMulling - good friend and UCD consultant
Elegant Hack: Gleanings
Christina Wodke's IA thoughts with style and insight
IA Slash
Information Architecture galore
Joel on Software
Joel Spolsky has excellent sensibilities about software development
How to Change the World
Guy Kawasaki
43 Folders
GTD Productivity tips
Try these sites
Usability Professionals'
Association
THE organization for folks who are serious about usability.
UXmatters
Online magazine about usability & user experience
Boxes and Arrows
Excellent online magazine about design, IA and usability
About the name

The name Croc O' Lyle comes from people at a previous job calling me "crocodile", as in the famous childrens' book "Lyle, Lyle Crocodile". The nickname went from "crocodile" to "croc" and then someone morphed it into Crocolyle.

It's also a play on the phrase "Crock O' Gold" -- showing the Irish in my Heinz 57 hybrid genetics.

...and some people will probably say this whole thing is simply a crock.


View 

Lyle Kantrovich's profile on LinkedIn
See how we're connected




Powered by Blogger

 
January 09, 2009
Have they broken Google Search?
It seems that Google has made a major change to how search queries work. So much of a change that for "power users" of Google search, they may have just "broken" their search engine. Google's boolean search capabilities are relied on every day by researchers, scientists, marketers and other web users.

I've been using Google since it was first in "beta" and as long as I can remember, it has always done a boolean "ANDed" search by default. This means that it looks for pages the include ALL your search terms if you type in multiple keywords or phrases.

Recently I've noticed some strange behavior in that on some occasions adding more search terms actually INCREASES the number of results I get - which is the opposite of what should happen. Let me illustrate with some example queries:

1. A search for auto dealer service hazmat charges estimate yielded 25,500 results. 
[For context, imagine an unpleasant surprise when getting my dealer's repair bill after a front end alignment.]  

2. A search for auto dealer service hazmat charges estimate -anatomy yielded 54,500 results
[I kept finding the same article on copied on multiple sites - the article used the word anatomy - so I was trying to eliminate that same content from my results.]

Some playing around with different keywords yields this surprise
3. auto dealer service hazmat charges estimate wrecking yields 5,320 results, while 


In my understanding of boolean logic, if you add the number of results from searches #3 and #4, they should total the results from search #1.  But search #1's 25,500 results are less than half of the 61,920 you get from #3 and #4 together.

What is Google up to? It's not exactly obvious.

It doesn't seem to be a simple factor of using too many search terms, as the searches below (#5) show the expected refinement and reduction of results. 

5. A search for auto dealer service hazmat charges estimate -out yields only 5,660 results, and auto dealer service hazmat charges estimate -for only gets 86 results.

Looking at the results from search #4 above yields at least a partial answer:  Google is including results that don't actually match the query.

One of the results returned for search #4 was this page:
Pulling up Google's cached version shows that it INCLUDES the words auto service hazmat estimate, but DOESN'T INCLUDE either "dealer" or "charges", which were specified in the query. 

I'm guessing Google is trying to "help" the user by returning partially-matching results rather than a "no results" page which you can still get if you do something like #6. 

6. A search for auto dealer service hazmat charges estimate lyle123 yields a page saying "Your search - auto dealer service hazmat charges estimate lyle123 - did not match any documents."

Now for the really SAD (or FUNNY) example:
If you scroll to the bottom of the search page and click "Search within results" and then search for "dealer" within the results what should you get?  If search #4 worked predictably as a proper "AND" boolean search, then it should be all 56,600 results.  Given that Google is including results that don't have the work "dealer" in them, you'd expect to get a subset of the 56,600 results when you "search within results."  Somehow Google finds it proper to return a mind-numbing 207,000 results!!!  Yes, searching *within* 56,600 results yields nearly 4 times as many results.  How's that for "search refinement?"

If anyone can explain this strange behavior to me, I'd really appreciate it.  I can imagine it may have something to do with how Google may eliminate (or not eliminate) "duplicate" results, but somehow that just doesn't seem explain it all in my train of thinking.

Not sure if I've uncovered something new here.  The hardcore SEO folks have probably noticed this already. I tried a few searches for things like "Google boolean broken" but then again I was using Google, so evidently I can't trust the results it brings up anymore. Sigh.




Perma-link                              
Comments:
Hi Lyle,

I've been a little frustrated by this too. I expect that as I exclude more words, the number of results should get smaller. I've been seeing Google do this for a good number of months. I've been trying to understand why this is happening as well.

Here are some things I've been looking at:

1. The number of results that you see for search results are estimates, based upon a small percentage of Google's index, so the number of results you see in response to a query are somewhat suspect to begin with. By itself, that's an unsatisfactory answer.

2. Google may be performing query broadening when they believe that there is Inadequate Search Content.

3. Google now uses Stemming when "appropriate." I'm not sure what triggers the use of stemming yet, but it's possible that it might be appropriate for one string of queries and not for others.

4. Google may be expanding queries based upon the inclusion of synonyms, within context. See: Machine Translation for Query Expansion.

There are a number of other possibilities. It's this kind of odd behavior that makes things interesting when watching what the search engines do.
 
Hi,

This is a nice post. I've been a follower of your blog.

Pls also visit http://www.thecontentannex.com.

Sincerely,
Jeremy Byrnes
 
Post a Comment