Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Synthetic Data: The Key to Faster and More Efficient AI in a Recession. Commentary by Yashar Behzadi, CEO and Founder of Synthesis AI
Collecting and labeling real data needed to train AI/ML systems is expensive and time-intensive. In the case of complicated computer vision systems like autonomous vehicles, robotics, or satellite imagery, building the hardware to acquire data may be prohibitively high-priced. A single dataset may contain tens of millions of elements and human labeling may cost dollars an image depending on the complexity of the labels. This is simply too much for organizations, especially during a looming recession. Synthetic data is a promising approach that delivers labeled training data at a fraction of the resources and time of current human labeling approaches. Synthetic data aims to simulate real-world scenarios through generative AI and cinematic visual effect technologies to programmatically create labeled data sets. Leading computer vision teams are already embracing synthetic data to reduce costs and accelerate the development of production models. Organizations that embrace new. radically more efficient synthetic data technologies will thrive even in the midst of an economic downturn.
Need data for your AI? Try Synthetic. Commentary by John Larson, Senior Vice President at Booz Allen
Your AI is only as smart as the data it’s given. The quality of data used to train AI affects how well it performs, and the old adage of “garbage in, garbage out” still rings true today. However, there are challenges with data that are further compounded by the unique self-learning aspect of AI/ML, which augments potential data bias as algorithms learn and reinforce their decisions with increasing autonomy and speed inside of an AI system. If the initial data quality is poor, the accuracy and performance of the model will gradually decrease, while bias becomes more pervasive. However, we’re finding new ways around data limitations with methods such as the use of synthetic data: artificial data that mimics authentic observations and can be used to enhance the training of machine learning models when an organization lacks sufficient real-world data or needs to ensure the privacy in the training data through use of synthetic data clones. Not only can synthetic data supplement an existing dataset’s volume, it can also be used to round out data to make it more representative of a population to further mitigate bias. This is done by synthetically stitching together data to fill in for data gaps, for example if a dataset includes a disproportionately high sample of respondents of a certain demographic. Reliable AI/ML starts with good data, and innovative approaches such as the use of synthetic data are key to the future of the technology.
Relationships are really predictive signals into ML. Commentary by Neo4j CEO, Emil Eifrem
Data scientists are leveraging graphs to build their models and use history to predict the future. So much of our life currently relies on machine learning, and Gartner predicts that by 2025, graph-based machine learning models will replace 60 percent of all existing models built on traditional data.
Embracing BI through a vendor-agnostic, open lakehouse approach. Commentary by Billy Bosworth, CEO of Dremio
Anyone familiar with the potential of big data analytics in the cloud is likely a fan of data lakehouses. Lakehouses are shaping the future because they combine the scalability of data lakes and the quality of data warehouses. However, ensuring an open architecture is key to unlocking value in data and enabling organizations to effectively deliver insights using SQL and best-of-breed tools for years to come. Without an open approach, developers are locked into vendor-specific approaches, often with costly contracts that add time and complexity. In comparison, an open model enables reliable insights from a single source of truth, and provides maximum flexibility for future technology decisions. Not only does an open data architecture allow for easy access and the ability to analyze data without moving or duplicating it, but it’s vendor-agnostic. This enables enterprises to future-proof their data architecture and choose leading technologies as they see fit. Done correctly, cost savings are also realized due to the elimination of data copies and expensive data movement.
The Big Shift to Data-as-a-Service? Commentary by Varun Villait, Chief Product Officer at People Data Labs
We’re in the middle of a major shift across industries. Every business is either becoming – or has already become – a data business. Data-as-a-service (DaaS) companies will be critical to this transformation, empowering both legacy businesses and a rising generation of data-enabled startups. They have already begun to make themselves essential by building and delivering the data that almost every business will need to create data-driven tools, processes, and insights. The value of DaaS will also become even more apparent in an economic downturn. Businesses looking for greater efficiency can turn to data to better understand the dynamics of a changing market, evaluate competitors, and diligence potential investment targets. In addition, the DaaS model will empower businesses to build data-driven processes and automation and lower the lift required from data-science teams all of which will be valuable in a period of belt-tightening.
Emotion AI & Voice Tech: It’s Not What You Say, But How You Say It. Commentary by Sean Austin, CEO & Co-Founder, Helios Life Enterprises
Emotional intelligence helps humans make better decisions and has been shown to be more helpful than IQ in predicting success. However, emotional intelligence has declined globally over the past 50 years with the rise of technology and a user experience that neglects emotion. AI algorithms have significantly improved in analyzing everything from tone to body language due to advances in emotion detection, NLP as well as a greater combination with linguistics and psychology. AI has the potential to improve emotional intelligence, communication skills, and soft skills, but falls flat when vital components of human communication are excluded. So, why is it that software can understand human commands but not hear the frustration, excitement or trepidation in the human voice? While the capabilities exist, the majority of today’s technologies are, in essence, tone-deaf. Commands may be heard, but tone is not. Tone of the voice is the number one passive indicator of what someone is thinking and accounts for nearly 40% of human communication. A new frontier of emotional understanding through tonal analytics will become a standard necessity for automated analyses of any future voice communication. By incorporating tonal AI with other forms of emotion AI, a more comprehensive voice technology and understanding can be created. Together these technologies can intelligently connect all facets of complex and unstructured alternative data to generate a clearer macro picture. It is likely that such tonal analyses will mirror the development of automatic speech recognition (ASR) and natural language processing (NLP)—toolsets that have been in development for decades, and are integrated with most voice platforms that support billions of voice interactions globally. As a result, over the next decade, the world will witness a Cambrian explosion in growth at the cross section of AI ($1,581.70 billion By 2030), voice technology ($27.16 billion), and alternative data ($143.31 billion by 2030).
A New Threat Landscape. Commentary by Josh Stella, chief architect at Snyk and founding CEO of Fugue
Developers and engineers are increasingly using infrastructure as code (IaC) that operates against the cloud provider’s application programming interfaces (APIs) to build and modify their cloud infrastructure, including security-critical configurations, in real time as they work. Change in the cloud is a constant, and every change brings risk of a misconfiguration vulnerability that attackers can exploit quickly using automated detection. The control plane is the API surface that configures and operates the cloud. For example, you can use the control plane to build a container, modify a network route, and gain access to data in databases or snapshots of databases (which are a more popular target for hackers than breaking into live production databases). In other words, the API control plane is the collection of APIs used to configure and operate the cloud. Minimizing the potential blast radius of any successful cloud penetration event means protecting against control plane compromise in architectural design of the environment.
Not all cloud-native apps belong in the cloud. Commentary by Jon Toor, CMO, Cloudian
Cloud-native app adoption continues to take off. By 2025, Gartner estimates that over 95% of new digital workloads will be deployed on cloud-native platforms, up from 30% in 2021. However, not all cloud-native workloads will go to the public cloud. Organizations want to run cloud-native apps on their growing on-prem data sets to get more value and insights from that data. For example, they want to leverage on-prem data to support AI, ML and advanced analytics use cases. Because of data gravity, it’s not economically feasible to move all that data to the public cloud – and there are also security, compliance and performance considerations. As a result, many cloud-native workloads will exist on-prem. Public cloud providers are all launching services to support such workloads (e.g., AWS Outposts, Microsoft Azure Stack and Google Distributed Cloud Edge). MSPs are also taking measures to support this trend. These workloads employ cloud APIs that aren’t compatible with traditional IT platforms. Instead, they require modern, cloud-native storage infrastructure that provides the benefits associated with both the public cloud (high-level APIs, limitless scalability, container support) and on-prem environments (cost efficiency, security, performance).
Data Privacy Concerns In AI & Machine Learning. Commentary by Dr. Rami Hashish, Founder of pareIT
There is increasing concern regarding data privacy with AI and machine learning, and frankly, the concern is valid. Regardless of where you stand on the topic, one thing is certain: unethical collection and processing of data can have damaging societal consequences. Stronger, more accurate models require large data sets, resulting in potentially excessive data collections. While varying – and sometimes, limited, or nebulous data regulation policies in certain countries – can allow for a free for all. Indeed, considering that further regulations are inevitable (and undoubtedly required), some may view the current state as effectively promoting expedited misuse until regulations are put in place. And thus, while governments will need to expedite regulations, the onus is ultimately on companies to take responsibility and not collect data improperly and use data maliciously. The problem is that we know that dollars and cents sometimes blur the lines between what’s right and wrong. I am thus hopeful – perhaps naively so – that some sort of governing body is developed, requiring any AI company or product to go through an ethical audit prior to deployment.
International Women in Engineering Day. Commentary by Cindi Howson, Chief Data Strategy Officer, ThoughtSpot
In a world ruled by those who can dominate with data, society needs diversity of experience and talent so that we can all enjoy the most successful outcomes. Women are obviously half the population yet make up a mere 16.5% of engineering talent, according to EngineeringUK. Across all STEM fields and physical and software engineering, such proportions mean that how we design and build simply cannot account for the distinct needs of half the population – nor take advantage of their skills. We need inventors and innovators who can #ImagineTheFuture from all angles. From creating cars with child safety in mind to creating data and AI products that do not accidentally discriminate between male and female job applicants, diversity of thought minimizes bias at scale. Businesses must push for greater diversity to design better products, and more diverse teams also contribute to higher financial performance. As someone who began her career in tech assembling computers, administering a local area network, and coding reports on a mainframe, I’d like to see more women in tech overall, but also more in leadership positions deciding the future of tech for a better world.
Inventory overstock. Commentary by Verte
While a lot of attention is currently focused on product shortages, that’s only one part of the equation. We’re also seeing retailers like Target and Abercrombie trying to get their bloated inventory under control by heavily discounting products. It’s clear this is the year of inventory management — meaning it’s never been more important to leverage data and technology to better manage retail. Of course, these are not tools built overnight. The retailers and brands that invest in data and AI today will be those that rise to the top in the next few years. Retailers should also be aware of Gen Z-conscious shopping habits. For Gen Z, it’s all about the product journey and its environmental and social impact, from pollution and waste to fair labor practices. Building a foundation of sustainable, ethical operations is crucial to a brand’s story.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1