What is Google’s Penguin 2.0 all about?

By Andreas Voniatis

On the 19th May, Google updated their latest algorithm to combat unethical SEO techniques known as Penguin 2.0. As a result many sites including Compare The Market lost rankings on high traffic search phrases, which is estimated to be up to 75% losses of traffic from Google. The reaction by the Search Engine Optimisation (SEO) industry has been to cite the following practices that were targeted by Penguin 2.0:

● Overuse of keywords in clickable hyperlinks

● Links in blog comments

● Guest posts on questionable sites

● Article marketing sites

● Sites hacked or infected with malware

 However, none of these educated guesses have been statistically verified by the SEO industry and Google remains quiet on what exactly what the latest algorithm update was targeting other than stating that around 2.3% of English-US queries were affected and that the efforts by Google would go further to target unethical SEO practices. 

The challenge facing the SEO industry

With advent of scalable technology and data analysis capabilities available, the search engines are now deploying algorithms that are able to learn in the face of new data. This is known as ‘machine learning’. As a result, the SEO industry is increasingly facing challenges to understand (and to explain to their clients) how the search engines are working as it’s incredibly difficult to pinpoint the cause and effects of changing elements to search engine traffic with respect to:

● web design

● site architecture

● content

● user experience metrics such as web page loading speeds

● inbound links from external sites

 There have been correlation studies performed by several enterprise level software tool providers in their attempt to provide explanations as to what is behind search engine algorithm updates. However, these have been met with skepticism and are inconclusive with no actionable insights given that correlation is not causation. 

Meeting the challenge of Penguin 2.0

Considering the new capabilities of search engines to process large amounts of data and the growing complexity of the dynamic challenge that is SEO, a similar approach is required.

The first requirement is to gather websites that were affected by Penguin 2.0 by looking at their traffic levels before and after the update on the 19th May. To verify if the sites were affected by Penguin 2.0, a mathematical would be applied to test the probability that a step change in traffic – positive or negative – was in fact due to the update and not something incidental. This would ensure that the dataset for Penguin analysis would be valid.

The second requirement is to spot common patterns of SEO that were common to winning and losing web pages across all of the websites included in the dataset. This would be done by looking at statistical mean differences to see which patterns ‘stick out’.

Although some patterns would have emerged, the final step would be to evaluate the probability that the pattern was statistically significant. If the probability was high say 90% or higher, then there is a very high chance that the pattern was indeed targeted by Penguin 2.0.

So what is Penguin 2.0 all about?

A recent study using the above techniques found that Penguin 2.0 was targeting websites with poor readability. Although we could speculate on why Google chose to target sites with poor readability, the data showed conclusive evidence that sites that had a Dale Chall readability score of 5 or worse lost traffic as a result of the algorithm update.

The actionable insights for business owners and any institution or organisation that depends on Google for website traffic, is to ensure their copywriters produce content that is highly readable and checking the content by using formulas such as Dale Chall.

About the author

Andreas Voniatis has always been passionate about designing and building websites and it was this that fuelled his curiosity in online marketing and search engine optimisation. In 2013, he co-founded MathSight after identifying a need for applying engineering level mathematics and scientific methods to evaluate search engine algorithms and further the industry knowledge of SEO.

About the company

MathSight was launched in March 2013, it demystifies the search engine algorithms using machine learning and big data. The platform analyses both the qualitative and stylistic aspects of content, web design, and site architecture, their inter-relationships, traffic data and other key performance indicators. This enables MathSight to determine the cause of changes in search engine traffic, be it a change in the algorithm, or the SEO (onsite and offsite) of a client or competitors.


Relevant links

Bayesian probability (http://mathworld.wolfram.com/BayesianAnalysis.html )

ANOVA (http://mathworld.wolfram.com/ANOVA.html )

Dale Chall readability (http://www.impact-information.com/scales.pdf )

Compare the Market traffic losses (http://econsultancy.com/uk/blog/62887-penguin-2-0-who-were-the-winners-and-losers-2 )

Penguin 2.0 correlation studies (http://searchengineland.com/searchmetrics-releases-their-seo-ranking-factors-post-penguin-2-0-164902 )

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s