Big Data: The Technologies You Need (And Where The Skills Are)

10 years ago

February 21, 2014 at 12:30 pm

While much focus and discussion of the so-called “Big Data revolution” has been on the data itself and the exciting new applications it is enabling — from Google’s self-driving cars through to CSIRO and University of Tasmania’s better information systems for oyster farmers — less focus has been on the underpinning technologies and the talent driving these technologies. At the heart of the Big Data movement is a range of next generation database technologies that enable data to be amassed and analysed on a scale and speed hitherto unseen.

Head picture from Shutterstock

Global online services such as Google, Amazon and Facebook that serve billions of people around the world in real time have been made possible due to new technologies that divide tasks and files across banks of thousands of distributed computers.

Storing the data

Traditional database technologies are built around many tables of information like spreadsheets with rows and columns and a way of asking questions of these tables in a structured way.

The structured way of asking a question of these data collections was originally named SEQUEL (Structured English Query Language), later shortened to SQL. This is the technology that Oracle pioneered in the 1970s and it has served them well to become the undisputed king of database technology ever since.

If you are familiar with Excel, you’d be familiar with the type of information this kind of technology is suited to representing. Company accounts, marketing and sales figures over time are of course perfect.

But there are other types of data that isn’t so easily stored in this way such as storing the relationships in a social network (Facebook), or index of documents stored on the web (Google), or for large collections of digital music and video (Netflix).

Fortunately there are other ways to store information other than in tables such as in trees, graphs, or in lists with an index. And some of these approaches are much better suited for humungous data sets and for data sets that don’t naturally fit into a series of tables.

The growing demand to store and analyse very large bodies of information, and information that is not readily suited to storing in tables (unstructured data), has led to a rapid growth in the popularity of these alternative types of database technologies.

Collectively they’ve become known as NoSQL technologies. Many of the leading technologies in this category are not developed by one company, such as Oracle or Microsoft, but instead are open source — developed by an open network of companies and independent developers and contributors akin to the way Wikipedia or Linux is developed.

Next-generation database technology

There are five key types of next-generation NoSQL data technologies. They are:

Document Store — suitable for storing large collections of documents
Wide Column Store — for very rapid access to structured or semi structured data
Search Engine — suitable for full text indexing of documents
Key-Value Store – suitable for rapid access to unstructured data
Graph Database – suitable for storing graph type data such as social networks.

And the leading technologies in each of these categories respectively are:

Note Apache Hadoop, which is also a leading technology, is not included in this list as it is a framework and file system and not a database technology (but can support many of these).

Where there’s talent there’s fire

By looking at the companies around the world who have the most employees with skills in each of these these frontier technologies, we can get a unique insight into organisations at the forefront of next generation big data applications.

Based on more extensive study, below is a map covering 40 leading global organisations that have the greatest number of specialists in each of the top five next-gen database technologies.

Click here to open in new window or republish.

The more detailed country-by-country analysis has revealed some organisations such as Sky in the London, Goldman Sachs in NYC are leaders in the number people they have with skills in these
emerging areas.

Authors note: The idea for this article came from the realisation that SIRCA may employ more specialists in a new generation Database technology known as Cassandra site in Australia. On further investigation, as it turned out to be true, I thought this would be a fascinating way to discover other leading companies at the forefront of Big Data Technology.

Paul McCarthy is Adjunct Professor at the University of New South Wales. . Professor McCarthy has an senior executive role with SIRCA Limited mentioned in this article.

This article was originally published on The Conversation. Read the original article.

How Could a Lack of Sleep Increase Your Risk of Type 2 Diabetes?

How to Share Your Wifi Password From Any Device

Does Electrical Muscle Stimulation Really Supercharge Your Workouts?

Use the ‘FlyLady’ Method to Make Routine Cleaning Less Overwhelming

If Your Galaxy Notifications Are Messed Up, Here’s the Fix

Here Are Amazon Australia’s Best Deals of the Week

TPG Has Changed the Prices for Almost All of Its NBN Plans

Wrap Me in ALDI’s $30 Heated Winter Travel Blanket

JB Hi-Fi Is Clearing Out Games For As Little As $2

Amazon Australia Beauty Week Sale: 24 of the Best Products to Shop

Big Data: The Technologies You Need (And Where The Skills Are)

Storing the data

Next-generation database technology

Where there’s talent there’s fire

Comments