Celtnet: how to make money online
Internet Marketing Make Money Online

Tuesday, July 11, 2006

Has Google Lost the Plot?


Part 2



In yesterday's first part of this article you learnt a little about Google's business model and how the business analysts view the company. Today's article will look at Google from the viewpoint of website owners and internet marketers.

Google (as well as all the other search engines) are in a very peculiar position as the're delivering value on top of other peoples' content. The value being the delivery of that content to the user in a way that's useful to the user (which usually means the ranking of the results returned). To remain in top position the search engines must convince their users that they're delivering content that is 'the best of the web'.

Admittedly, Google comes for a fair amount of stick in articles such as this one; mostly this is because Google is the dominant player in the web indexing business and because of this a small change in the way that Google indexes or caches websites can make a web-based company succeed or fail. Admittedly, Google, along with all the other search engines, have a problem. The web is expanding month on month and though some of the new sites are providing good and useful content many of the others are spam sites, whose sole function is to promote other peoples' products. Others are 'scraper' sites, stealing content and presenting it as their own. Yet others are 'clone' sites. Sites that have Amazon or eBay marketing areas that are duplicates of other sites and use the same code and/or pages as them. There are also those websites that take pre-existing articles from article websites and publish these as new content.

Including multiple copies of such websites would both degrade the results of any search engine and would simply swamp any good information-rich website in a sea of junk. Over the past eight months Google have been busy tweaking their algorithms and delivering a system called Bigdaddy. What few messages that have been coming out of Google have been very 'up' about this, but the news from webmasters and a number of recent disasters indicate that Google may well have dropped the ball over these updates.

Part of the problem with all this is the difficulty in actually getting any information out of Google. All their systems are automated and there are pages where you can send questions to Google, but all you ever get back from these is a standard automated response. Google can also black-list websites because of content or practices that Google deem to be bad. But they don't tell you that this has happened. The only way to find out if you have one of these Google penalties is to use a form to tell Google that you think you've had a penalty, that you've corrected the problem and that you'll never, ever, do it again. There's an implicit assumption of guilt and that all webmasters are somehow 'bad'. The only other indication of a Google penalty is that the number of pages indexed in a site (as well as the site's overall rankings) drop dramatically.

Google also change their algorithms and indexing practices at will without making much of a general announcement about it. And it's this implicit environment of secrecy that annoys many webmasters. Especially as any updates to Google's indexing strategies may make a site fall in terms of rankings or may dramatically change the number of pages from a site that are indexed. A change in Google's algorithms may have the same effect as gaining a Google penalty, which obviously dramatically increases the stress levels for may web authors.

Despite their slogan of ''Don't be evil' Google come across as being a cold and uncaring company who have an almost religious belief in their indexing algorithm. Tweaking this algorithm is what they do and it sometimes appears to outsiders as if Google firmly believes that their algorithm can never be wrong.

I can understand why Google have done this. If they didn't then they'd be inundated by umpteen million e-mails every day. However, because of this they've painted themselves as being cold and aloof: uncaring of the needs of the very people who create the content from which they make their money.

The current upheaval started about seven months ago when Google began to roll-out their new Bigdaddy datacentres. These were, according to Matt Cutts 'the human voice of Google' were new data centre infrastructures rather than a major algorithm change in indexing. Bigdaddy is, in effect, a brand-new data center that Google uses to perform core search engine tasks like cataloging Web sites or serving up localized features. Google operates thousands of data centers across the globe.

However, as the Bigdaddy datacentres were rolled out during January and February many website designers and authors reported very dramatic reductions in the number of pages from their sites that were indexed. What seemed to have happened was that as Google rolled out some new technology to fix a few things this resulted in their breaking other things, But it wasn't until May 10th that Google's co-founder, Larry Page, addressed this issue saying the problems took the company by surprise and it was now investigating what is going on. "We have a team studying it now," he told an audience of journalists at Google's annual Press Day event. It looked as if in upgrading to Bigdaddy Google was erasing its existing data caches and replacing them completely with new indexes. During this process many web pages that were previously indexed simply disappeared from Google.

At about the same time Google CEO Eric Schmodt told the New York Times about a 'server crisis' where the machines holding their indexes were completely clogged.

Another problem was that real web pages were being replaced with 'supplemental results' often from six months or so previously. These supplemental results are pretty useless in terms of searching though they do artificially inflate the number of pages indexed from a given site.

Then on May 16th Matt Cutts offered a very long post on the timeline of Google's new indexing and Bigdaddy roll-out. This said that many of the supplemental results would be refreshed. Then came the scary stuff: If a site didn't have enough incoming links then it might be considered as having 'low confidence' and wouldn't get crawled as deeply. Even worse, depending on the number of links to your site Google may be crawling your site every day to see if there's interesting content but none of that content may be indexed (this is happening to me now). So Google is basically stealing my bandwidth without giving me anything for it.

Maybe Google doesn't want the good content of the web indexed. After all, if you're not getting any visitors for the search engines you're going to have to resort to advertising and one of the big sources of advertising out there is Google's AdWords program. The truth is that Google is no longer reflecting the internet as it is, but they're reflecting the Internet as Google thinks it should be. Moreover, other search engines are catching-up with Google and are indexing more of the web than Google are.

In their recent update it seems as if Google re only really counting the number of uni-directional (ie non-reciprocal) links coming into your site from related sites as a measure of a site's worth. However, if you do a search for your site's URL on Google and then perform a links search with the links:www.mysite.com you'll see that the links: search only returns a third of the total number of links coming into your site. So Google may well be under-estimating the number of sites linking to you and so your ranking and the number of pages indexed in your site will suffer as a result. Also if you're in a specialized niche then you may not have many inbound links, but those you do come from natural links built in your area of interest. However, as you have few incoming links most of your pages won't be indexed (as will be the case for many of the sites that link to you) so even if you have good links the non-indexing of these linking pages mean that both you and the sites linking to you aren't given the credit that you should.

The upshot of this is that a perfectly good, clean, website cannot now get all or its pages indexed if it doesn't have a sufficiently good score in terms of both in-bound and out-bound links. Too few inbound links and the site won't be fully indexed. As a result Google is effectively editorializing a site's content because Google doesn't think it's 'good enough'. In reality Google's users are being deprived of many good resources because of this. Users are only being exposed to a portion of an internet — a portion that Google defines for them. Does this really mean that Google is short of indexing space and that they can only keep going by 'pruning' the web? It's certainly looking more and more as if this is the case.

Added to this Google have had other debacles of late. One of the most notable were the reports in a number of SEO fora that Google had suddenly and very mysteriously added a few billion pages to their index. This might be a cause for celebration: had Google finally relented and indexed the pages that they'd dropped or excluded from their index? Analysis of my own site indicated no for Google still only indexed 85 pages (MSN was indexing 1100+) [as an aside, since I wrote this Google have added a further 40 pages to my site's index]. Rather, all those new pages were blatant spam and contained only pay-per-click (PPC) advertising as well as scraped conttent. Even worse, many of these pages were displaying well in the search indexes. A Google representative responded via forums to the issue by calling it a "bad data push," something that met with various groans throughout the SEO community.

What seems to have happened is that a Moldovan had the enterprising idea of using the way that Google handles subdomains (in reality these are sub-components of a website but Google handles them as if they were totally independent sites). This allowed he enterprising individual to craft scripts that would generate an endless number of subdomains as soon as the googlebot dropped by his site. As Google indexed these subdomians on an essentially 'no questions asked' basis he was able to deliver single pages containing keyword-rich scraped content, keyworded links, and PPC ads for those keywords. Spambots were then sent out to put GoogleBot on the scent via referral and comment spam to tens of thousands of blogs around the world. The spambots provided the broad setup, and it didn't take much to get the dominos to fall.

As a result this individual found himself with 5+ million indexed pages on Google. Word of this spread like wildfire in the SEO communities, though the general public seems essentially ignorant of Google's latest snafu. Google responded saying that only 'millions' of pages had been indexed and that they were working on tweaking their algorithms to remove the possibility of this happening again. Though close examination strongly indicates that the domains were actually being removed manually from Google's index.

What stuck in many webmasters' craw is not the fact that Google were caught out, but the fact that this scamster managed to get millions of pages into Google whilst Google's managing to ignore many thousands of websites with perfectly good content and is not including them in its index.

Then came the story at the end of June that Google had dropped 'amazon.com' from its entire search index (and its not the only big player to go missing from Google). If Google can't do the big players, then what hope have we, the minnows in the Internet pool?

Will I be sticking with Google? The answer is that for the moment I will. Google is generating some of my best income due to the AdSense program. However, bizarrely, most of the traffic that creates my AdSense income comes from non-Google search engines!

If you want to see the entire article then you can find the full article here

0 Comments:

Post a Comment

<< Home