In today’s complex and rapidly evolving business environment, the path from raw data to actionable insights mirrors the meticulous craftsmanship of a master artisan. Consider a scenario where a company makes a significant investment in a state-of-the-art data lake, aiming to establish a flexible, scalable repository for all its data requirements. The vision is to centralize data from various sources—structured and unstructured—into a single location, making it readily available for analysis. However, without stringent governance and thoughtful curation, this well-intentioned data lake can swiftly deteriorate into a chaotic and unusable swamp, where data is difficult to locate, analyze, or trust.
The significance of this process cannot be overstated. In today’s economy, where companies increasingly seek to monetize their data, the strategic value of data curation is immense. If a company aims to elevate its data as part of its valuation—whether for internal use or external sale—it must ensure that this data is not just collected but curated. Properly curated data, with well-defined labels and attributes, is more valuable because it is easier to analyze, more reliable, and ultimately more actionable. Conversely, data that is merely collected but not organized or enriched holds limited utility and is less attractive to potential investors.
The Bottomless Data Lake
This scenario is more common than one might think. Many companies embark on their data initiatives with ambitious goals, only to find themselves overwhelmed by the sheer volume and disorganization of their data. Initially, they adopt a warehouse mentality, storing data away for future use. Yet, as data accumulates, it shifts from being an asset to a liability. Without careful management, these lakes turn into swamps where data is stored haphazardly, and often duplicated making storage and retrieval unnecessarily expensive and slow.
The crux of the issue lies in the mistaken belief that data, once stored, will inherently become useful. In truth, without proper curation, data remains largely untapped and undervalued. Just as a museum curator carefully selects, organizes, and presents artifacts to create a meaningful experience, a data curator must organize and enhance data to make it accessible and valuable to the organization. This process involves more than merely storing data; it requires deliberate labeling, the creation of meaningful attributes, structuring the data in a manner that aligns with the organization’s strategic objectives and staging the data for efficient storage and retrieval.
Data Governance vs. Data Curation
The distinction between data governance and data curation is pivotal here. Data governance provides the essential foundation—establishing the rules, policies, and procedures that dictate how data is collected, stored, accessed, and utilized within an organization. The fact that data governance fall short of these goals and often get in the way of progress, when done right it is crucial for maintaining data quality, ensuring security, and meeting regulatory requirements. However, governance alone often implies and / or manifests itself in bureaucracy—rigid rules that can hinder innovation. Data curation, on the other hand, extends beyond control and oversight. It is about enhancing the data so that product focused teams can quickly experiment, and then ultimately create valuable insights or products.
A museum is not a building full of art. A DJ’s play list is not just the most popular songs, A reporters story is not just a list of the facts. Just a like a museum, a play list, or a Pulitzer winning article, a well-curated dataset is much greater than the sum of its parts. And the curator is not database administrator. Like all experience creators, the curator requires a deep understanding of the business, increasingly a deeper understanding of the analytics engines that will consume the data, a foundation in solution design.
A Few Things To Think About
“We have more data than we know what to do with, we must be able to use it for x.” A common refrain, and the first half is often more true than not – the organization does not know what to do with it. And at the same time, we many organizations have crossed the tipping point from not storing data to trying to store everything with the hope that one day it will be useful. They are now paying too much to store data that no longer has value at all.
For a lot of forecasting and pricing problems, the reality is that the amount of data that most organizations stored is tiny compared to the data sets used to serve online ads, train self-driving cars, diagnose medical images, etc. And when you turn your attention to solving a specific problem, it gets even “smaller”. For example, if you have seasonal sales, conventional wisdom says that you need at least three seasons worth of data to estimate the seasonal effects. That means you need three years of data to estimate the Christmas effect. Well the truth is, a lot of products do not last three years. At face value, you may have 78 weeks of data for 20,000 products at 500 store locations (780 million records) and still not have enough data to run traditional algorithms to forecast at the SKU store level. The good news is that if you have stored the right data for other products from past years, data curation and effective modeling can in fact help you solve this problem.
We also hear that common refrain that my data is not good enough. I used to accept that as a reason not to start, but the combination of effective data curation and machine learning techniques leaves strongly of the opinion that curating the data and applying algorithms not only will help you overcome these challenges to deliver value, but will also be an effective tool for identifying and rectifying data issues. The point is that an effective data curation capability helps us take the short comings of our data and makes it usable.
As we advance further into the digital age, the importance of data curation will only continue to grow. Organizations that invest in this critical capability today will reap significant benefits tomorrow, transforming their data into a true competitive advantage. The stakes are high, but the choice is clear: curate your data or be left behind. It’s not enough to merely collect and store data—companies must actively curate it to unlock its full potential. In this swiftly changing landscape, the decision is straightforward: curate or be left behind.
About the Author
Colin Kessinger is an Executive Partner at Ethos Capital and works with the investment team members and other Executive Partners to identify, analyze, and assess potential investment opportunities. He has spent the last 30 years in thought leadership and business leadership roles focused on applying quantitative techniques to supply chains, pricing, trade-promotion, customer insights, and risk management. Colin has consulted extensively in the data center, semiconductor, life sciences, capital equipment, high-tech, electronics, telecommunications, consumer electronic, CPG, and automotive sectors. He periodically serves as an adjunct professor of Operations Management at Stanford University and at U.C. Berkeley.
Sign up for the free insideAI News newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insideainews/
Join us on Facebook: https://www.facebook.com/insideAINEWSNOW
Check us out on YouTube!
Leave a Reply