The data virtualization paradox indicates as much about the current data landscape as it does the needs of the modern enterprise. On the one hand, it hasn’t necessarily gone anywhere or, aside from the inclusion of a couple key features, hasn’t deviated from its chief value proposition of providing an abstraction layer of distributed data assets.
On the other hand, it’s everywhere.
Data virtualization is the underlying technology for some of the most progressive architectures today, including that of the data mesh, data lake house, and data fabric. Although it’s still regarded as a desirable, dynamic means of integrating data, it’s silently reshaping itself into something that encompasses this attribute but, ultimately, is much more.
It’s becoming the very framework for data management itself.
“By being a data platform, data virtualization encompasses a lot of the capabilities of data management,” noted Denodo CMO Ravi Shankar. “It has the data quality capabilities and data preparation. It can do certain styles of Master Data Management. It is used for data governance by the customers. It is a metadata engine, so it contains all the information about the data everywhere. And by the same virtue, it contains the data catalog.”
Moreover, as Shankar implied, organizations can obtain all this functionality from a single platform with the flexibility for implementing the most advanced data management architectures today.
The virtual abstraction layer furnished by data virtualization serves a sundry of purposes. However, most of these elements are readily compartmentalized into three categories. “There’s the integration; there’s the management, and the delivery,” Shankur explained. “And, [data virtualization] automates across all three spectrums.” The nucleus of each of these aspects, however, is arguably the semantic layer supplied by competitive data virtualization solutions.
That layer enables organizations to integrate, manage, and deliver data to consumers in business terminology that end users comprehend—without having to understand the complexity of the respective data sources. Such source information contains technical details about tables or unstructured data, possibly. “But, the business wants to consume [that] information more like objects,” Shankar pointed out. “There needs to be a translation that happens between the first technical layer to the business layer. That’s where data virtualization has always been really good, in the semantic ability to convert the technical structure to a business structure.”
The delivery of different data sources as consumable products via a semantic layer is integral to data meshes, data products and, to a lesser extent, data fabrics and data lake houses. In this respect, the mere semantic aptitude of data virtualization is ideal for successfully managing data across numerous points of differentiation. So is the automation Shankur alluded to, specifically in relation to metadata management and the tenet of active metadata championed by the analyst community. “What makes it active is acting upon that metadata so you understand who’s using what, when, and how,” Shankar remarked. “Or how the data sources are accessed and how they’re transformed. How the datasets are combined, and what are the consumption tools used, and who’s using them.”
Metadata engines can monitor all of this activity to understand everything from data integration best practices to apropos datasets for specific users. When paired with facets of cognitive computing, they can then automate these and others tasks—which is one of the numerous attributes of a comprehensive data fabric. The addition of what Shankar termed an AI algorithm enhances such metadata capabilities, which can “actually be the eyes and ears, and then make changes based on that and provide recommendations. Either if it’s a data engineer that’s actually working the tool itself, or a business consumer who’s trying to consume the data. For a data engineer, it can say these are the things that you need to do. So, you can automate a lot of the capabilities instead of manually having to do it.” This same capacity is equally viable when applied to data meshes, data products, and data lake houses. It also behooves data management in general.
The inclusiveness of data virtualization—and its downstream data management capabilities—is well aligned with the data fabric, data mesh, data product, and data lake house architectures. Each of those paradigms includes data of any variation, regardless of source or data model. Data virtualization technologies are applicable to structured, semi-structured, and even elusive unstructured data, which present difficulty for traditional data management and integration methods.
Specifically, competitive data virtualization options can parse text, web pages, and pdfs to “convert them into structured information,” Shankar commented. “[Virtualization] is a way to get the information, whether structured, semi-structured, or unstructured, in a structured form which would be easy to use.”
Additionally, these and the other data management benefits of data virtualization work in cloud native settings suitable for accessing non-relational data for Artificial Intelligence deployments—which is one of the virtues of the data lake house. Any type of cloud is simply another source to be virtualized and integrated into data virtualization platforms.
Moreover, virtualization avails itself of fundamental cloud characteristics, such as elasticity, when managing data from such environments. “It uses the compute infrastructure of the native cloud offerings, whether that’s AWS, Google, or Microsoft, to be able to automatically do the management of the instances themselves,” Shankar revealed.
Across The Ecosystem
Data virtualization has always been foundational to expediting data integration by allowing users to consolidate and query across sources while leaving their data in place. As such, it forsakes the traditional hassle and costs of creating elaborate data pipelines for physically, instead of logically, integrating data. Now, its added sophisticated metadata management and a host of other functions, like data cataloging and data governance, to its semantic layer capabilities to become a utilitarian solution for data management in general.
“It is not a specific tool for each of these things, such as data quality or data preparation tools,” Shankar clarified. “But, it encompasses these data management principles as you integrate the data and deliver the data for consumption. So, that’s how data virtualization has kind of evolved from being a core data integration style to a much more broader cloud-based delivery, as well as data management principles.”
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1