Common barriers to big data adoption, and how to adapt and overcome those challenges Credit: Thinkstock Big data it is often hyped, but I encourage taking a more realistic stance. I’ve seen many organizations attempt to adopt big data solutions and ultimately fail. I fear these missteps may eventually sour the market on adopting big data solutions. This would be unfortunate, because I view big data as a transformational capability, an essential part of a new IT infrastructure. To ensure greater success, this post presents common barriers to adoption that I’ve observed and provides insight into how to adapt and overcome these challenges. One of the primary barriers to big data success is the lack of a data preparation strategy. Data preparation includes all the steps necessary to acquire, prepare, curate, and manage the data assets of the organization. Sound data is the foundation for actionable insights delivered by advanced analytic applications. If the data is tainted then conclusions based on it become questionable—moreover, debatable—and big data, if not backed by accurate intelligence, can add to confusion and organizational turmoil. It’s not unlike the Hippocratic oath: “First, do no harm.” The worst outcome of a big data undertaking would be to make poor decisions because of bad information but be really confident! Interestingly, most companies contemplating big data and, unfortunately, vendors selling such solutions rarely consider the implications of data preparation. Building the hardware infrastructure and software to support a big data lake can be complex and expensive, leading adopters to conclude that this is the most challenging element of the big data equation. However, once the infrastructure is in place, they are often dismayed to discover that the big data infrastructure is simply the tip of the iceberg. Collecting and managing trusted data can be much more expensive; especially if the big data project begins with a poorly understood idea of what data will ultimately be required. So, what constitutes a good data preparation strategy? As I have noted previously, the following six-step process will help: 1. Identify your decision set Knowing the context of a company’s decision-making is an essential first step. It defines the data sets you will use to support a decision, how the data will be manipulated, and ultimately the analytical process that will define insight generation. Many assume that if data is simply cleansed and curated effectively, any analytic process can be supported. This is not true. Organizational leaders need to define the end game first, and data preparation will be much simpler. 2. Select the data sources to support the desired decisions Granted, you cannot know in advance every possible data source that might be needed, but it is possible to identify the primary data sources that will need to be used. These will not only define the types of data available but will largely define the kinds of data cleansing that will need to be done. 3. Choose the right vendor of data cleansing technology You will want technology that not only offers solutions that accommodate your initially identified data types but also offers a platform that feeds your existing cross-organizational analytic tools. In an analytics-driven company, many different levels in an organization will be usng tools to inform decisions. It is essential that the data-preparation tool provides a platform, accessible by all, to enable access to curated and trusted data. Only by starting from the same basic data set will decisions be consistent across the organization. 4. Assess and ingest additional data sets As noted in step two, it is not possible to predefine every data set that might be necessary to support a decision; additional data sets are constantly being discovered that might be useful. Assessing new data sources is ongoing and an important part of decision-making. 5. Identify any new analytic tools that will produce the desired insights There are many excellent analytic packages on the market, from simple statistical tools, all the way to very advanced machine-learning-based applications. They each provide different insights and require different degrees of data cleansing. For example, machine learning may be able to handle data in an essentially native mode; while a statistical tool might need very clean data, with every field reconciled. 6. Extend data preparation to incorporate new data into existing data sets Many data sets are dynamic and evolving. As new data is discovered or becomes available, data preparation must be conducted to ensure its ready availability. By recognizing that data preparation is a necessary first step to a big data value, an enterprise can significantly reduce the costs of big data while accelerating the delivery of actionable insights that drive good decision-making. Fortunately, the data preparation market is growing rapidly, with Frost & Sullivan projections of over $9 billion globally by 2025; finding solutions and expertise to address the need for a data preparation strategy are readily available. Related content feature Dataframes explained: The modern in-memory data science format Dataframes are a staple element of data science libraries and frameworks. Here's why many developers prefer them for working with in-memory data. By Serdar Yegulalp Nov 06, 2024 6 mins Data Science Data Management analysis Cloud providers make bank with genAI while projects fail Generative AI is causing excitement but not success for most enterprises. This needs to change quickly, but it will take some work that enterprises may not be willing to do. By David Linthicum Nov 05, 2024 5 mins Generative AI Cloud Computing Data Management feature Overcoming data inconsistency with a universal semantic layer Disparate BI, analytics, and data science tools result in discrepancies in data interpretation, business logic, and definitions among user groups. A universal semantic layer resolves those discrepancies. By Artyom Keydunov Nov 01, 2024 7 mins Business Intelligence Data Management feature Bridging the performance gap in data infrastructure for AI A significant chasm exists between most organizations’ current data infrastructure capabilities and those necessary to effectively support AI workloads. By Colleen Tartow Oct 28, 2024 12 mins Generative AI Data Architecture Artificial Intelligence Resources Videos