Sign In Log out

Sunday, September 11, 2011

Semantics, History And User Experience Data For The Perfect Website

/ On : 9/11/2011 11:53:00 AM/ BLOG
We usually read many SEO theories which are often based on simply intuitions with no real experimental evidence. After all, the scientific method that we usually exploit in other disciplines needs some empirical and measurable evidence subject to specific principles of reasoning, and this is not easy to do because search engines are not well disposed to give explicit details about their algorithms. Just for complicating a little bit the whole issue, the search engine environment is subject to high-rated changes, and many features exploited are not observable at all. In this scenario it seems reasonable that we find some SEO hint if we take into account three interesting patents registered by Google in 2005, 2009 and 2011.
Actually Google does not seem to exploit exact matching between query and content of indexed pages: this is very clear if we try to search i.e., rent car, that also produces results related to keyword variations, like renting car, car hire, rental cars at low-rates and so on. This means that not only Google tries to map every interesting keyword to a set of synonyms, but it also use semantics to guess if user is referring to some real entity in the world (i.e. a famous brand) or well-known concepts. Query rewriting with entity detection was invented in 2003 by Google Inc., and described a method that determines whether the received search query includes an entity name, and whether the entity name is associated with a common word or phrase. In this last case, a link to a rewritten query is generated and suggested to user, elsewhere the query is narrow with a restrict identifier, associated somehow with the entity name, which is used for suggesting result to user. As we can easily see today, this is part of the mechanism under the well-known "Did you mean..." function, when user misspells some keyword, and need help to find what he wants. The association between wrong and right keyword is made using an apposite lookup table, which contains the knowledge base for the search engine. This is useful for SEO because we could share some links with big brands' website for enforcing some commercial agreement, and this should give use a little piece of authority in front of both users and search engines.
Information Retrieval based on historical data gives importance to the past history of every page, grouping them per similar thematic and considering each time a specific ranking weight. In some way the common rumor about the sandbox effect - maybe for avoiding new sites to climb too fast Search Engine Result Pages - seems to be confirmed by high importance given at the thickness of the site, the number of stable links (each from peers and authoritative sites) and the number of updates of every page. Among the SEO factors we can also include: domain's inception date (effectively parsed by DNS public information and detected by Google crawler); number of updates for each content page (which seem very indicative for quality scoring); number of updates for each links' page (substantially affect the ranking of pages) and having a minimum number of pages (a too small website is not considered at all). Patent gives also importance to (the number of new pages which appear through time linked by at least a source; the presence of durable and fixed-URL pages through time; good staleness of the documents (for time-invariant queries e.g., searching for recipes). Google finally seems to evaluate consistency between anchor text of a link and actual content of each page, and it considers the history of ranking for that site and pages. The users' preferences about that resource (e.g. bookmarking) helps to give the final score for that content, suggesting to create very linked documents with a lot of authoritative links and well-updated.
Finally U.S. Patent 20110179023 refers to a simple scoring method, used for filtering a set of query result, based over the time-distribution of impressions and unique visitors specifically not associated with automated agents; Some influence in this ranking dimension is also related to geographical position of each visitor. Each crawled page is assigned a score based on these usage information, and the results are organized based on the assigned scores, giving very user-oriented view of the best data. This thematic is always connected with the true essence, for many, of the art of SEO, i.e., make sure user will find the most useful content, responsive to their needs.
References: U.S. Patent 7536382: Query rewriting with entity detection (2003), U.S. Patent Application 20050071741: Information retrieval based on historical data (2005), U.S. Patent 20110179023: Methods and Apparatus for Employing Usage Statistics in Document Retrieval (2011)
Salvatore Capolupo is a computer engineer, web developer, affiliate marketer and SEO expert; I write about my SEO experiments at SEObynight.
Article Source: http://EzineArticles.com/?expert=Salvatore_Capolupo

Article Source: http://EzineArticles.com/6532150

Labels

Blog Archive