Home Analytics Data preparation is the key to big data success

by Michael Jude

Contributor

Data preparation is the key to big data success

opinion

Feb 08, 20184 mins

AnalyticsData Management

Common barriers to big data adoption, and how to adapt and overcome those challenges

analyze data graphs laptop user man worker

Credit: Thinkstock

Big data it is often hyped, but I encourage taking a more realistic stance. I’ve seen many organizations attempt to adopt big data solutions and ultimately fail. I fear these missteps may eventually sour the market on adopting big data solutions. This would be unfortunate, because I view big data as a transformational capability, an essential part of a new IT infrastructure. To ensure greater success, this post presents common barriers to adoption that I’ve observed and provides insight into how to adapt and overcome these challenges.

One of the primary barriers to big data success is the lack of a data preparation strategy. Data preparation includes all the steps necessary to acquire, prepare, curate, and manage the data assets of the organization. Sound data is the foundation for actionable insights delivered by advanced analytic applications. If the data is tainted then conclusions based on it become questionable—moreover, debatable—and big data, if not backed by accurate intelligence, can add to confusion and organizational turmoil.

It’s not unlike the Hippocratic oath: “First, do no harm.” The worst outcome of a big data undertaking would be to make poor decisions because of bad information but be really confident!

Interestingly, most companies contemplating big data and, unfortunately, vendors selling such solutions rarely consider the implications of data preparation. Building the hardware infrastructure and software to support a big data lake can be complex and expensive, leading adopters to conclude that this is the most challenging element of the big data equation. However, once the infrastructure is in place, they are often dismayed to discover that the big data infrastructure is simply the tip of the iceberg. Collecting and managing trusted data can be much more expensive; especially if the big data project begins with a poorly understood idea of what data will ultimately be required.

So, what constitutes a good data preparation strategy?

As I have noted previously, the following six-step process will help:

1. Identify your decision set

Knowing the context of a company’s decision-making is an essential first step. It defines the data sets you will use to support a decision, how the data will be manipulated, and ultimately the analytical process that will define insight generation. Many assume that if data is simply cleansed and curated effectively, any analytic process can be supported. This is not true. Organizational leaders need to define the end game first, and data preparation will be much simpler.

2. Select the data sources to support the desired decisions

Granted, you cannot know in advance every possible data source that might be needed, but it is possible to identify the primary data sources that will need to be used. These will not only define the types of data available but will largely define the kinds of data cleansing that will need to be done.

3. Choose the right vendor of data cleansing technology

You will want technology that not only offers solutions that accommodate your initially identified data types but also offers a platform that feeds your existing cross-organizational analytic tools. In an analytics-driven company, many different levels in an organization will be usng tools to inform decisions. It is essential that the data-preparation tool provides a platform, accessible by all, to enable access to curated and trusted data. Only by starting from the same basic data set will decisions be consistent across the organization.

4. Assess and ingest additional data sets

As noted in step two, it is not possible to predefine every data set that might be necessary to support a decision; additional data sets are constantly being discovered that might be useful. Assessing new data sources is ongoing and an important part of decision-making.

5. Identify any new analytic tools that will produce the desired insights

There are many excellent analytic packages on the market, from simple statistical tools, all the way to very advanced machine-learning-based applications. They each provide different insights and require different degrees of data cleansing. For example, machine learning may be able to handle data in an essentially native mode; while a statistical tool might need very clean data, with every field reconciled.

6. Extend data preparation to incorporate new data into existing data sets

Many data sets are dynamic and evolving. As new data is discovered or becomes available, data preparation must be conducted to ensure its ready availability.

By recognizing that data preparation is a necessary first step to a big data value, an enterprise can significantly reduce the costs of big data while accelerating the delivery of actionable insights that drive good decision-making. Fortunately, the data preparation market is growing rapidly, with Frost & Sullivan projections of over $9 billion globally by 2025; finding solutions and expertise to address the need for a data preparation strategy are readily available.

by Michael Jude

Contributor

Mike Jude is Program Manager of the Consumer Communication Services analysis service within Stratecast, a division of Frost & Sullivan. Jude brings 30 years of experience in telecommunications, technology application, market research, consulting and operations management to this position.

Prior to joining Stratecast, Jude worked as a Senior Analyst at Current Analysis. He holds a bachelors and masters degree in electrical engineering and engineering management respectively, and a Ph.D. in decision analysis. Jude co-founded Nova Amber, LLC, a consulting firm specializing in business process virtualization and is an industry recognized author on the subject.

In addition to consulting and advising industry-leading companies, Jude is a respected author of numerous industry-defining studies and has written columns for Business Communications Review, eWeek, Tech Target, NoJitter and Network World Fusion. He is the co-author of, The Case for Virtual Business Processes: Reduce Costs, Improve Efficiencies and Focus on Your Core Business (Cisco Press, 2003), and iEXEC Enterprise Essentials Companion Guide (Cisco Press, 2008).

The opinions expressed in this blog are those of Michael Jude and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

Topics

About

Policies

Our Network

More

Data preparation is the key to big data success

Common barriers to big data adoption, and how to adapt and overcome those challenges

1. Identify your decision set

2. Select the data sources to support the desired decisions

3. Choose the right vendor of data cleansing technology

4. Assess and ingest additional data sets

5. Identify any new analytic tools that will produce the desired insights

6. Extend data preparation to incorporate new data into existing data sets

More from this author

Big data: enabling new approaches to IT infrastructure security

Practical AI: or why everything that says it is, isn’t

The next IT: What it is and what you need to know to be successful

Show me more

What is Rust? Safe, fast, and easy software development

Kotlin for Java developers: Classes and coroutines

Azure AI Foundry tools for changes in AI applications

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx

Data preparation is the key to big data success

Common barriers to big data adoption, and how to adapt and overcome those challenges

1. Identify your decision set

2. Select the data sources to support the desired decisions

3. Choose the right vendor of data cleansing technology

4. Assess and ingest additional data sets

5. Identify any new analytic tools that will produce the desired insights

6. Extend data preparation to incorporate new data into existing data sets

Related content

Dataframes explained: The modern in-memory data science format

Cloud providers make bank with genAI while projects fail

Overcoming data inconsistency with a universal semantic layer

Bridging the performance gap in data infrastructure for AI

More from this author

Big data: enabling new approaches to IT infrastructure security

Practical AI: or why everything that says it is, isn’t

The next IT: What it is and what you need to know to be successful

Show me more

What is Rust? Safe, fast, and easy software development

Kotlin for Java developers: Classes and coroutines

Azure AI Foundry tools for changes in AI applications

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx