Lies, damn lies & big data

By Ingrid Froelich

Organisations now see big data and data analysis as increasingly important tools to optimise business operations, identify risk and predict future opportunities. New data types, such as social media, clickstreams, log files and real-time feeds, are constantly emerging. The key aim is to achieve a competitive advantage, with more sales, by making better business decisions and by providing better choices to customers.

Putting aside the hype, what are the ongoing challenges? Big data really has the potential to open up a world of possibilities – as long as you appreciate that there are risks. In the famous book, How to Lie with Statistics, author Darrel Huff says, “The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, opinion polls, the census.”

Since big data combines technology and analysis, marketers need to be aware (or beware) of its potential misuse so you don’t place excessive faith in the results of the analysis. The technology that gathers, links and analyses data, uses patterns and requires an understanding of data, patterns and variables. It is not simply a matter of inserting information and getting automatic results. It hinges on data accuracy, data selection, analytical capabilities and indeed technology and expertise.

Data accuracy – your data sources aren’t necessarily reliable

The issue here is inaccuracy of customer data. A survey by Customer Commons indicates that 92 % hide, lie, refuse to install or click – some of the time. This is because customers possess a growing distrust of how data will be used – especially when it seems that a site or service doesn’t really need that data. It is therefore important to respect this sentiment and provide the option to opt-out from having to answer intrusive questions.

To take this a bit further, a recent report by the Pew Research Center says, “86% of internet users have taken steps online to remove or mask their digital footprints—ranging from clearing cookies to encrypting their email. Also, 55% of internet users have taken steps to avoid observation by specific people, organisations, or the government.” This research questions the accuracy and representativeness of data from internet sources. This means that the ability to purge and discard faulty data, while retaining ‘clean’ data, remains a major challenge.

Data selection – you’re misrepresenting your variables

Dan Ness, principal research analyst at MetaFacts, states, “A lot of big data today is biased and missing context, as it’s based on convenience samples or subsets.”

This means that with data, research, statistics and calculations, error is an inevitable reality. Cherry picking variables and discarding ‘inconvenient’ data points, results in data bias and misrepresentation – what you want to prove versus what is proven. Data tools therefore need to take into account not just relationships, trends and causation, but also the very selection and use of the variables employed.

Analytical capabilities – when you’re not in control

For many organisations, rather than sourcing the expertise in-house, they choose external firms to analyse the data. Unfortunately, this reduces a company’s understanding and control over the quality of the data itself, the types of variables used in the analysis, and indicators about the accuracy of the results.

Every set of data needs to be regarded as at least somewhat flawed and viewed as a tool, rather than the ultimate goal. As we have the potential of measuring everything, results should be regarded as trends and patterns. Analysing this data for accurate results depends on both technology and human expertise and the knowledge that the patterns are continually changing.

There are many different examples of false causality, data dredging and data manipulation. Not all misuse of data comes from purposeful intent, but certainly, it can seem that sleeping with your shoes causes headaches, unless you factor in that you’ve gone to bed drunk!

When you can broadcast error in the blink of an eye

Senior analyst at Wolters Kluwer, Marcia Richards Suelzer says, “We can now make catastrophic miscalculations in nanoseconds and broadcast them universally.” So while big data offers massive potential, it is still in many respects, not fully controlled. Unfortunately, many organisations simply do not have the internal expertise or processes in place to properly control, analyse and make intelligent decisions – based on the influx of customer and market information. Add to this, is the inability of marketers to react to analysis quickly enough, and it is clear that big data is not yet the ‘magic tablet’ organisations can use to maximise its advantages.

The potential of big data is unquestionable. However, your current ability to harness this data, so that the results accurately inform business decisions, should be regarded with appropriate caution and wisdom.

About the author

Ingrid Froelich works at SDL Content Management Technologies Division.

SDL enables global businesses to enrich their customers’ experience through the entire customer journey. SDL’s technology and services help brands to predict what their customers want and engage with them across multiple languages, cultures, channels and devices. SDL has over 1,500 enterprise customers, 400 partners and a global infrastructure of 70 offices in 38 countries. 42 out of the top 50 brands work with SDL. For more information, visit  .

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s