28 June 2011
In the last letter I promised not to mention Google for a bit. Well, I have some more to say so as a compromise I decided to write a one off Google special plus my usual monthly rantings which will appear shortly.
Before I start, sometimes I use the word Google to mean the company and other times to mean the search algorithm(s). I am sure you can work out which is which.
Right, cutting to the chase, I strongly suspect that Google has or is developing eyes, or should I say 'Panda Eyes'. That the engineers are applying machine learning to what a site really looks like by using images of pages.
If you have read any of my other pages you will probably have deduced that I can, on occasions, be as mad as a box of frogs. But this time, please bear with me and when I finish, maybe you might end up thinking there is, at the very least, a grain of truth in what I am saying.
I know that I will have missed some things and also got others slightly wrong, but the 'signals' all point one way.
Google recently introduced a new search algorithm which they nicknamed 'Panda'. As a user you probably didn't notice any difference, but your search results were much cleaner. It was a great improvement removing a lot of spam results. Many sites saw their traffic fall and for the majority of them it was richly deserved. Conversely there were sites that benefited from this change and mine certainly did :).
Since this release there has been a lot of turmoil within the website world. Guesses and speculation are rife about how it all now works.
One thing that I believe many people get wrong is they think of Google search results being composed of one major program. It is not like that. There are many different threads running, that all do their own thing.
The majority are running the whole time, sussing out different things about webpages, but a few others are run now and then by the push of a button. The suite of programs are always being tweaked and maybe not always by humans as we will discover.
I have recently started paying attention to optimizing my sites to get more traffic or at least to retain what I have. I put a recent post on the Google Webmasters forum about why my video sitemap is not being actioned. The problem is still not resolved, but I did receive a reply from a Google employee suggesting:
"You might try moving the location of the video on the page. When I look at your pages the video is off to the bottom some - not great placement. Could you move the video up on the page - that might help"
Now, this was good general advice which I followed. As I suspected, it made no difference to my problem, but it started me thinking:
"What if part of the recent changes mean that Google is now 'seeing' pages?"
I am sure it was not the way the comment was meant, but it planted a seed.
Hot on the heels of this rather bizarre thought Matt Cutts, who I spoke about in my June Post was interviewed on a radio show, this was covered by Search Engine Roundtable. The article quotes Matt as stating:
with the interviewer summarizing:
"Think about something like an Apple product, when you buy an Apple product you open it up, the box is beautiful, the packaging is beautiful, the entire experience is really wonderful."
"Google wants the web to look like a brand new Apple iPhone and he wants every entrepreneur to think as hard of the look of their site as hard as they do about the service they provide."
This show caused a minor ripple with speculation about what this meant and how Google could accomplish this. Now, there is no doubt there are many signals that can be picked up in the page markup, images etc. But, to appreciate an Apple product you have to touch it and see it! Google's recent push for fast loading webpages could be part of the search for slickness, but it doesn't seem to be what Matt is indicating.
Plus, I have noticed in the recent videos on the Google Webmaster Central Channel Matt is gently pushing for pages that 'look good' and 'engage'.
The seed of thought had begun sprouting.
So, let's just follow this thought process along a bit. To spot what looks cool, apart from analysing the code and internal images what Google would need to do is take an image of the whole page and to see what it looks like.
Oh blimey! Guess what emerged a while back? Google Instant Preview, which is available from the spyglass on the search results page. This means a lot of extra processing work and storage. Creating that image must have easily have doubled the internal storage that is needed just for the raw data: page, images, etc.
As an aside I am not sure how the preview image is created. The obvious thought is that it is a 'screenshot' whilst the spider is crawling the page. But thinking about it, maybe the shot is taken after the page is recreated on the Google servers using the downloaded page, CSS, images etc. If it was done this way there might be a lot of more for Google to learn about the site and less overhead if an item or two changed? But I digress...
Anyway, there is a lot of work and machine storage, just to show you a preview picture. Don't you think I am right about this?
Now my rather mad idea had started developing a leaf or two. Plus, in the last few weeks Google have added a tool in Google Webmaster Tools for webmasters to preview the preview screen snapshot, if that makes sense:).
At this point in the thought process I was mulling over whether it was possible to "see" the internet. I thought of the recent privacy kerfuffle when Facebook introduced facial recognition into pictures uploaded. The idea being, as an example, the guests in your party picture would automatically be tagged if Facebook recognized them.
Well, if the developers at Facebook can implement that, then those at Google can easily go a few rungs better without breaking a sweat. They are a rather smart bunch!
This crossed my mind, but I thought it was a just a fun and funky name Google gave to the project.
Then I watched a thought provoking Whiteboard Friday video on the SEOmoz website.
What the video says is that Panda is intelligent, it is learning about what we like and don't like. The days of making search engine friendly titles, headings and content are drawing to a close. That to rank high you have to appeal to people and not search engines.
During the presentation, it was mentioned that Panda was named after a Google engineer. Intrigued I found this quote from Amit Singhal in an article on Wired.
"Well, we named it internally after an engineer, and his name is Panda. So internally we called a big Panda. He was one of the key guys. He basically came up with the breakthrough a few months back that made it possible."
Amit Singhal, pictured above, is a Google Fellow, and the head of Google's core ranking team. I do not quite know what that means, apart from he is one of the smartest of the smart and so too is Navneet Panda.
Yes, it is true Google's Panda is a person not an iconic and endangered animal. Also, what was that breakthrough?
Guess what Navneet Panda, is a key figure in machine learning algorithms. He likes to make computers learn!
Now, for my final little tidbit. On Mr Panda's CV (resume for my American cousins) is a joint patent for: Learning Concept Templates from Web Images to Query Personal Image Database.
Um, I think this is about machines reading images and learning about them! If not the same, then at least in the same ballpark as:
"Look at this, it is an Apple product they are cool. Now learn how to find me others that are just as good as this one."
Well, what do you think? My one doubt, is if this is happening now, why I cannot find even a sniff about it.
But, whatever the truth the clues point to the fact that Google has or is getting Panda Eyes and even if it is not happening now, I'd wager it is closing in fast.
After reading all of what Google would call 'signals', this is my theory even if it belongs in a box of frogs!