Translating the big data language problem

By Keith Laska

The emergence of the digital age has created an explosion of data, across numerous languages and channels. Every day, 175 million Tweets are sent, 2 million blog posts are written, 68.1 million posts are created on Tumblr, and 60,000 new websites are added. Every 60 seconds on Facebook sees 293,000 new status updates. This rise in online communication and interactions spans 2 billion Internet users, speaking 6000 languages worldwide; only 20% speak English as their first language.

This huge rise in data, coupled with the variety of languages that it spans, has turned what was once a ‘big data’ problem into a ‘big language’ problem for text analytics companies, such as Ipsos, as well as CIOs and marketing executives in large organisations. From sentiment analysis to predictive analytics, search and e-discovery solutions, organisations must now address multilingual data, big language challenges if they want to deliver the most accurate and intelligent results to their global customers.

While many text analytics companies and end-users have explored using human translation or other translation technology, they have found it difficult to scale to meet big data demands. With the data trends described above, machine translation – the automatic translation of human languages by the computer – is a critical technology to assist the big language challenges of today.

The most feasible and economical strategy is to optimize an analytics solution by first translating content from all data feeds into a single language – such as English – using a machine translation (MT) solution. Once the information is in a single language, it can be mined for sentiment analysis, predictive insights, search or e-discovery. Below is a summary of four key areas in which organisations can benefit from machine translation’s linguistic capabilities:

Sentiment Analysis

Burgeoning amounts of web and social content created by end users, requires organisations to analyse sentiment so they can track trends and respond as needed to improve customer experience. Relevant content contained in billions of Tweets, Facebook posts, blog entries, and websites is significantly more meaningful if analysed across all applicable languages.

By translating all data sources into a single language using machine translation, organisations can analyse sentiment globally, to discover how consumers are using products by region and language. They can also assess what improvements can be made at a targeted level to improve customer sentiment about a company and its brand.

Predictive Analytics

Predictive analytics is about analysing large amounts of past data to predict future events. For example, a customer is likely to buy a vacation package in July, so if you want to retain this customer, you should offer them a discount on the next deal.

Textual data is playing an increasingly important role in this process, and global enterprises now require text analytics solutions to accommodate information in many languages to effectively analyse a broader and more representative set of data. Machine translation can be utilised to translate all information into one language. This enables a single team, within an organisation, to decipher the information globally – or for specific regions and demographics.

Search

Finding the right information, at the right time, is critical for organisations, whether they’re monitoring social media streams or searching for key documents. To best serve customers and evolve, global enterprises need to access this information, no matter the original language.

Adding machine translation to the search process helps return more relevant results because when keywords are translated, documents can be searched in their native language and more accurate results are returned. Additionally, integrated machine translation enables “on the fly” translation of foreign language search results for easy access by the researcher.

E-Discovery

The last decade has seen e-discovery regulations that sanction even the smallest accidental omissions. Without a robust text analytics solution that can find all relevant documents, legal departments pale at the thought of trying to find every last document requested, especially when they span languages.

Machine translation extends the capabilities of e-discovery solutions and makes all enterprise content more easily accessible. By translating all information up front, teams can quickly uncover relevant data (the needle in the haystack!) and route critical information for a complete translation.

Conclusion

Integrating machine translation directly into a text analytics solution enables the translation of all information into a single language for analysis and creates a global, language-agnostic view of customer data and marketplace dynamics.

With the ability to translate multilingual content, such as social media, web content and enterprise data in real-time, businesses can identify and quickly respond to critical global trends and customer insights. This will ultimately improve sales and enhance the customer experience.

About the author

Keith Laska is CEO SDL Language Technologies Division.

http://www.sdl.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s