Search engines love human written content. This is true for all of them, but especially for Google. This is one of the reason for recruiting text analysis specialists and developing context analysis tools. It is asumed, that synonymes and overall context of the citing gives greater weight for the link. Google can distinguish set of words that are used in the same context, that are similar in meaning. You can easily test it by going to google.com (english version) and puting “remove spyware” in the query box. You will see not only query words highlighted, but word removal also.
Thats the simpliest use of context analysis. There is much more. All search algo is a large formula of filters, weight factors, context and links. Google patent covers only part of the factors used for ranking pages. However it clarifies some tendencies. The first one is that google is looking for authority sites and content. The second one is that google is trying to reduce spam page amount in the SERPS.
This brings us to the question of the day: How can spam pages be distinguished? There are many ways and some of them are more overlooked than others. One is scrapper pages, that is with content from other sites. Google is succesfuly eliminating them from the index. It is easily distinguishable having the page history in the search engine cache.
However, there is another way of spaming, that is automaticaly generated pages. Depending on the generation algorithm, they detection is harder, but solvable with text analysis. Google assumes, that automatically generated pages have always lower ranking than other pages and perhaps even penalizes the sites that use such approaches. But there is a catch: sometimes such pages can be fully legitimate and interesting for searchers. Lets imagine such simple scenario: a web site owner has a database of component parts, for example 1000 of them. They have quite few parameters. The owner puts them on web, each one in separate page. There is no way to create an additional description for each component as they are quite similar, perhaps differing in couple of numbers. However, each of the pages will get ranked lower on each component name than pages that ask about them (i.e. forums). The more there are interest in the data the less chances that the answer will be found using Google.
Thus there is issue with Google Spam filter today. Hopefully it will be solved.

Categories: SEM

Giedrius Majauskas

I am a internet company owner and project manager living at Lithuania. I am interested in computer security, health and technology topics.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *