Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
Court Rules Web Scraping Legal. Commentary by Denas Grybauskas, Head of Legal at Oxylabs
In truth, the Court ruled that scraping of publicly available data (if data is stored in a way that it is in Linkedin’s case) is legal in the light of the Computer Fraud and Abuse Act (CFAA). Nothing more, nothing less. It’s definitely a great decision for the scraping industry, however, it just reaffirmed what probably the majority of those in the tech industry already knew: scraping of public data and hacking shouldn’t be treated as the same because these actions are completely different and should have entirely different legal implications. The US Court of Appeals for the Ninth Circuit once again came to the only logical conclusion here that scraping of publicly available data does not breach the Computer Fraud and Abuse Act (CFAA). Many other scraping related questions in this battle of HiQ labs and Linkedin remain to be answered (such as the alleged breach of Linkedin’s terms of service, privacy laws implications, etc). We are happy for this reasonable decision as a different ruling could have brought terrible consequences to the whole industry. But this decision does not mean that those that scrape data should go berserk from now on. One must always first evaluate the type of data they are planning to scrape. Consider what kind of other legal questions might need to be answered before starting to gather the data. And, like always, scraping ethics and well-being of the scraped targets should always be taken into account.
Web Scraping/9th Circuit Ruling. Commentary by Bright Data CEO Or Lenchner
Public web scraping lets leaders make better-informed decisions that strongly impact their organizational strategies and business outcomes. It is the power that fuels the free market. The 9th Circuit’s ruling reaffirms the foundation on which the internet, the largest database ever created, was built: democratizing information for everyone. There are plenty of compelling academic and business use cases that highlight the importance of collecting and analyzing public web data, which clearly benefits all sectors. For example, it can create benefits in the hiring process by promoting diversity. It does so by uncovering hidden and unconscious bias by flagging exclusionary language in job descriptions and other written communication, such as terms that are gender-biased and other keywords that promote exclusion. And the use for public web scraping expands beyond this one case. Researchers, academics, investors, and journalists all use public web scraping in their data strategies to gain real-time insights into public sentiments and well being, organizational team structures, growth prospects, and the competitive landscape for target audience engagement.
The Next Generation DBA Powered by Technology. Commentary by Kyle McNabb, Vice President of Solutions & Partner Marketing at Rocket Software.
Over the last few years DBAs, or more specifically DBAs with experience with Db2 on IBM mainframe, have been lessening as a result of an aging workforce and the role being increasingly automated. There’s so much data in Db2 on mainframes—and growing—that it makes it hard for firms to unlock additional insights and value without knowledgeable resources. With data being increasingly distributed across clouds, applications and machines, DBAs with knowledgeable experience are critical for organizations to be able to capitalize on the growth of their data and manage it effectively. One way businesses can solve for this challenge is to invest in software that enables you to virtualize the data on the mainframe, Db2 and more, to make it easier to push to the cloud, into analytics, and more. Firms need to start thinking about abstracting the complexities of how their data is embedded and managed in z/OS apps and push to make it more accessible to insights and analytics needs.
How AI can be Trained Against Bias. Commentary by Ricardo Amper, CEO, Incode
AI mechanisms operate as blank canvases and are trained on what to recognize when verifying digital identities. Digital authentication technology can only work when AI is fed gender neutral and diverse identities in order to effectively recognize a person’s biometric features. Unbiased recognition starts with the way technology is trained, and it starts with enabling the technology to evaluate all genders and ethnicities upon its conception.
What to Do With the Data You’re Collecting. Commentary by Nitin Mathur, SVP Customer Experience for Privacera
Regulatory bodies are constantly working to establish what data can or should be collected. There is growing evidence that some of the proposed items that should not be collected are freely shared by the majority of consumers via things like their social media accounts or email signatures. Today, finding someone’s email is as easy as a Google search. And clearly, different age groups have very different expectations. Regulatory bodies need to stay current with consumer expectations and demands and not try to “protect” the things no one believes are private anymore. A lot more dialog is required to establish acceptable privacy. In terms of usage of the collected information, companies need to consider two things. First, establish clear board and executive-level commitment to the responsible use of data. Regulations like GDPR, CCPA, and others are to be respected and followed. It is clear leading brands like Apple – and others – are viewing privacy as a differentiator and a source of competitive advantage. Secondly, move beyond the old style of privacy and security postures, which essentially determined access on very coarse grained attributes. Modern fine grained access controls can allow you to provision access to the data with, for instance, masking or removal of that column. There are many technical solutions to this, but it requires a standardized approach to data access, security, and privacy across all the varied data sources out there.
Treating ML Models as Software is a Path to Accelerating AI Innovation. Commentary by Luis Ceze, CEO of OctoML
One of the biggest—and often overlooked— problems in software development today is with building reliable and performant AI-powered applications. There is a massive shift towards democratizing model creation—Hugging Face’s open source community is a case in point. But what the industry is lacking is a way to package and deliver production-quality models that fit into existing DevOps skill sets, workflows and that can integrate with DevOps tooling. This is because models in the end are software that is part of a full stack application, but are treated as a separate bespoke process / workflow, making it inaccessible to application developers and operations teams. Bottom line: in order to break this complex cycle and accelerate AI application development, machine learning has to align with software best practices. The way to do that is to find a way to automatically generate and sustain trained models as agile, performance portable and reliable software.
The Role of AI Analytics in Creating Safer Work Environments. Commentary by Adam Logan, Vice President, Application & Data at ISN
Safety continues to be a core value to leading organizations across all industries. Maintaining a safe environment for all stakeholders who enter a company’s facilities or worksites is critical. Especially as the social pillar of the environmental, social, and governance (ESG) movement is becoming a top priority, organizations are being held responsible for quantifying their efforts to keep employees and visitors safe. AI analytics can be an extremely useful tool in unlocking data and insight for organizations to enable profiling of their operations to ultimately improve workplace safety. AI/ML technologies can provide safety officers and executives with previously unavailable data points that can be analyzed to mitigate risk. For example, in a three-year analysis of injury and incident data from over 8,500 OSHA logs, ISN identified over 13,000 additional serious injuries and fatalities (SIFs) that were previously not self-reported. To generate these insights, ISN leveraged machine learning, natural language processing and custom models validated by ISN’s HSE team to ensure accuracy. By applying AI/ML methods to further identify performance and incident trends in data collected, safety officers and executives can make smarter business decisions and create safer work sites.
Responsible AI will Play a Key Role in the Fight Against Climate Change. Commentary by Shashin Mishra, VIce President of EMEA at AiDash
A study conducted by the Harris Poll recently revealed that 68% of CEOs and C-suite are guilty of greenwashing. As the climate crisis worsens and companies face growing pressure to be transparent when it comes to meeting sustainability goals, responsible AI will play a key role in ensuring that businesses can actually live up to their lofty climate promises. Responsible AI encompasses fairness, bias detection and mitigation, and accountability. By integrating responsible AI with existing operations, business leaders can track progress and measure metrics such as carbon emissions, ensuring that data is not manipulated or altered to appear more favorable. Responsible AI also means that the tools in your sustainability toolbox matter. You don’t need to burn more carbon to measure carbon — and a responsible AI framework takes that into consideration. In the next few years, we can expect to see AI adoption soar across industries and more regulatory structure established at the government level as the fight against climate change intensifies.
The Open Data Revolution: 3 Innovations that Wouldn’t Exist Without Open Data APIs. Commentary by Renjith Babu, VP of Solutions Engineering at Cloudentity
Open Data has enormous potential to drive technological innovation and provide significant benefits to consumers and businesses. Open Data application programming interfaces (APIs) allow data sets to be standardized and exchanged between organizations that produce and consume data with explicit consent of data owners. The data sets allow service providers to create more innovative products and services leading to competition between service providers, which in turn gives consumers more choice of products and services. Three key innovations that wouldn’t exist without Open Data are: Open Banking, Open Healthcare, and Accessible Public Sector Data.
Identifying the Right Speech-tech Solutions and Provider. Commentary by Edward Miller, Founder & CEO at LumenVox
Many companies don’t realize the benefits that speech technologies can unlock, such as increased productivity and efficiency, personalized customer communications and secure authentication. To realize these benefits, businesses must look for providers who deliver speech technologies such as automatic speech recognition, call progress analysis and voice biometrics. However, the speech-tech market is incredibly competitive. Many modern speech-tech providers deliver advanced software that’s highly flexible for specific use cases, which makes it difficult for businesses to differentiate the market offerings and determine which solution is best suited for their company. I encourage decision-makers to first identify their most pressing needs or objectives and then conduct a thorough head-to-head comparison when evaluating which speech technologies to employ. Doing so can save significant time and money from investing in the wrong technology. Many solutions are alluring on the surface but won’t properly integrate with company workflows. Before selecting a provider, research what current and past customers say about the provider’s solutions and services. Gauge if they are a collaborative partner that offers flexible deployment options. Ask the following questions: What is the system’s accuracy rate? How does the provider utilize tuning to improve pronunciation and clarity? Is the pricing flexible and able to be based on a variety of factors like usage, subscription, sessions, seats? What is the hosting environment? Can it run in containers? Can it auto-scale? Is it self-healing technology? Is the system easy to use? Will the solution integrate with your existing tech stack? Will it align with how our organization operates? In addition to providing the technology that meets your goals and exceeds customer expectations, you’ll want a true business partner who can provide expert counsel in delivering modern customer experiences.
Why Data Connectivity Shouldn’t Fall Solely on the Shoulders of IT. Commentary by Jerod Johnson, Technology Evangelist at CData Software
There is often a disconnect between data owners and owners of the systems that house that data. Business users know what data they want – leads, opportunities, orders, invoices, shipments, etc. – but IT teams are the gatekeepers of that data. Historically, business users went through IT to get the data they wanted, which both delays their access to time-sensitive data and taxes IT resources. But with good reason: business users can’t be expected to learn the skills necessary to develop custom data integration pipelines. Real-time data connectivity solutions lifts the burden from the shoulders of IT by putting data access and management in the hands of the business users. With a cloud-hosted data connectivity tool, business users are empowered to create direct connections to their applications and systems without writing code or needing IT assistance at every turn. Instead of designing, provisioning, and maintaining a cumbersome data store, IT can democratize data access by enabling lines of business to work with their data in real-time, directly from the applications they use every day. This frees up time and resources for IT and transforms the way organizations work with their data.
How AI Technology is Helping Farmers Today? Commentary by Carlos Gaitan, CEO of Benchmark Labs
AI technology is helping farmers to implement more sustainable practices in many different ways, from providing recommendations for seed selection, and pest and disease modeling, to helping them save water, energy, and CO2 emissions with farm-specific analytics and forecasts. AI technology is also helping farmers to process and understand aerial imagery and to deliver alerts to improve farm management operations. Specific applications post-processing and quality checking of satellite data, classification, detection and segmentation of different types of land cover and soil characteristics to improve growing decisions. High-precision weather forecasting to deliver farm-specific rainfall and evapotranspiration estimates that are essential for water balance models, and even AI-based recommendations that take into account local environmental conditions to suggest drought-tolerant or high-yielding seeds for specific years. On the other hand, AI also helps farmers to manage and schedule labor more effectively and enables alerts in the presence of adverse environmental conditions, like heat spells affecting labor and plant health, potentially devastating frost events, or even the presence of forest fire inducing conditions. These alerts enable proactive farm management and can be the difference between marginal or significant financial losses.
Data Integrity Will be Critical to Support a Successful ESG Data Governance Framework. Commentary by Pat McCarthy, Chief Revenue Officer at Precisely
ESG is quickly taking a front seat for corporate enterprises around the world. As business partners and governing bodies begin to place more accountability on organizations to invest in practices that support positive ESG outcomes, the importance of accurate and consistent reporting is an imperative. What we’re seeing now is organizations turning to the same data governance framework they use for business operations, and applying its capabilities to track, measure, and record the underlying data elements required for consistent, accurate, and context based ESG reporting. When assessing whether a data governance framework is ready to support ESG success, leaders must first look at what they need to report on, then understand the granular details required. This starts with quality data – data that has the ability to demonstrate compliance against changing regulations and standards. Data must be ready to deliver outcomes by tracking and scoring metrics; data must be trusted to support optimal quality and health; and data must be easily found and understood so that it can be scalable and have a meaningful impact. Most companies need to enrich their datasets with data from 3rd parties in addition to making educated assumptions when granular details are not available. Without the basic foundation of data with integrity, organizations will struggle to build a credible and compliant ESG program.
Unlocking Value through Convergence of AI and IoT. Commentary by Martin Thunman, CEO and co-founder of Crosser
To fully harness the power of AI and IoT convergence, a data-driven enterprise must adopt a hybrid-first approach to data processing and intelligent integrations. As part of this approach, the edge acts as the data origin for the real-time execution of AI models, while the on-premise data center utilizes AI models upon aggregated data, and the cloud will be the center for AI model training and big data analytics. The convergence of AI and IoT will redefine the way organizations operate. However, the hybrid-first approach of AI and IoT enables edge, on-premise data centers and cloud to work in a continuous data cycle of never ending insights and actions — unlocking value like never before.
Now is the time to Spring Clean Your Data. Commentary by Andy Palmer, co-founder and CEO of Tamr
Data mastering and cleaning have always been challenging for many organizations. Now that organizations are trying to use their data as a strategic asset, they are finding that mastering their data is the most time-consuming and least-rewarding task for data scientists and data engineers. Traditional master data management with rules has become untenable. Because of the sheer volume and variety of data from different sources, by the time you figure out the thousands of rules needed, a new data source is introduced and invalidates the rules. Human guided machine learning is the only way that today’s organizations can solve data mastering problems to deliver the comprehensive, high quality data necessary to answer important business questions in a timely, accurate, and scalable manner. Benefits include the ease of integrating multiple data sources, higher accuracy, and much less manual effort. Having clean data will ultimately increase overall productivity, allowing for the highest quality information in your decision-making.
Why SAP Users May be Running into Difficulties when Migrating to the S/4 HANA Platform. Commentary by Kevin Campbell, CEO of Syniti
Organizations are beginning to realize the impact of proper data management on their businesses, but the successful ones have known this and are using data as a differentiator. Without clean, quality data, regular processes involving customer transactions, supply chain lines and more get slowed down – the end result being lost time and money. Having an experienced data partner is critical for complex data migrations.
Multilingual Support for Chatbots. Commentary by Vaibhav Nivargi, CTO & Co-Founder, Moveworks
Despite many organizations attempting to use chatbots to support their global, multilingual workforce, most chatbots fail at supporting multiple languages. Advancements in conversational AI have helped to overcome some language complexity, but few have cracked the code on true multilingual capabilities. The problem is that traditional approaches to multilingual NLU are primarily focused on translation, and based on data that is typically long-form prose. On the other hand, the nature of human conversation is often colloquial and ambiguous, with mixed languages, short utterances, typos, and abbreviations. This necessitates extra focus on context and robustness in model training and evaluation, and dynamic flow in conversation handling. Given that most chatbots follow a fixed script, it’s nearly impossible for these models to follow the natural flow of the user, regardless of how proper the translation is. ML practitioners simply lack the training data and model robustness to get a successful end-to-end pipeline that detects and understands phrases, spelling and intent. True multilingual support requires more than a fixed-script chatbot solution – it’s only possible with an extensive network of machine learning models working in unison.
AI and the Evolution of Data. Commentary by Lars Selsås, CEO of Boost.ai
Whether it’s being implemented across a customer service team, monitoring for cybersecurity threats, or doing predictive maintenance, artificial intelligence continues to redefine the ways companies leverage data to solve organizational challenges. Whereas in the past an enterprise may have sat on silos of actionable data with little clue on how to realize it, AI programs of today make more seamless data integration possible. In an environment like customer service, for example, AI solutions run in tandem with human counterparts to handle high-volume inquiries, and provide data about the types of questions customers ask in order to enhance an AI’s responses and live engagement. That kind of data-driven, continual improvement is seen across many AI use cases. With solutions that both source data and suggest courses of action based on that data, enterprises can remain more responsive and agile. Throughout the pandemic, businesses have invested heavily in digital transformation. While that’s necessary to remain competitive, it’s also likely that the data silos they so often accumulate won’t be eradicated. However, we continue to find more efficient means of managing and leveraging data from multiple sources. It’s very possible the next generation of data maintenance is on the horizon, and AI will continue to be determinative in shaping what that looks like.
How Democratization of AI Could Prevent the Great Resignation. Commentary by O’Reilly VP Mike Loukides
The rise of low-code and no-code has democratized AI, spreading AI development to a wider user base that includes those without specialized knowledge of the technology. Given that hiring is tight, particularly in areas like AI, companies may be better off training their own people for new positions, rather than assuming they can hire AI experts from the outside. This includes training employees on subjects like data collection and preparation, so they can learn what the job entails and how automated tools can be used to support them in their roles. The same is true for AI deployment roles; operations staff will need to understand how AI applications are developed and what kinds of support they require. And both groups will need basic statistics, an understanding of cloud computing, and other applied disciplines. With remote positions growing and the pandemic still raging on, employers will need to take special note of how they deliver this training in a way that doesn’t require assembling large groups of people in small spaces.
How to leverage your existing data to anticipate and mitigate the consequences of crises. Commentary by Dustin Radtke CTO at OnSolve
According to Forrester research, there has been a proliferation of persistent threats in the last two years, and 99% of organizations experienced a critical event in the past 18 months. While these events can be costly, more than 50% of leaders say their risk responses are less than effective. It’s complicated to predict a crisis. With internal and external data sources, however, organizations can anticipate how a crisis will occur and prepare for outcomes of the next major disaster. With artificial intelligence, organizations can identify real threats quickly and efficiently activate their response plans. In real-time, AI can scan data sources like news articles, social media, weather reports, police reports and more, to identify and verify potential threats and how they could impact your operations – within minutes. Correlating internal data to your organization’s people, assets and operations can then provide actionable risk intelligence about how the evolving threat could impact your business and what steps to take to protect it as that threat unfolds.
Combining a business’s critical data with other external resources empowers decision-makers. Commentary by Crux CEO Will Freiberg
Think about all the content you consume in one day–things like social media, trending topics, breaking news, etc. What do all of these have in common? They’re the threads that tie together the world we live in today and each one leaves a lasting impact on every industry it touches, from retail to supply chain to finance. Every single thing in that list results in a flurry of new data created every second of every day, whether people are Googling, purchasing, or investing as a result of what’s happening. That data can be critical for helping your organization make smarter business decisions–but how do you harness and use trillions of data points effectively? How do you use information around you to make the right sustainable investments? How do you make sure what you need is fit for purpose? The answer is simple: external data integration. While the answer is simple, the execution of it is not. An external data pipeline is critical for businesses to make smarter decisions for their organizations and the community surrounding them, but this new world of data doesn’t play by the rules – it is highly fragmented, unstructured, and unstandardized, forcing companies to go through a gauntlet of hurdles to ingest and make use of it. Determining the most efficient and stable way to allow that external data to flow into the hands of your analysts and data scientists is the first step–and arguably the most critical–in successfully harnessing external data for your organization. And when it’s used to inform business decisions, it can give you a competitive edge by giving you an augmented look at how your customers think and feel, creating gold mines where there were once blind spots. External data is the future of businesses, and using it effectively will be critical to succeed in today’s data-driven world.
CDC: Let’s do better with health data. Commentary by Todd Bellemare, SVP, Strategic Solutions at Definitive Healthcare
Throughout the pandemic, the CDC has struggled with the country’s antiquated public health data infrastructure. Now, it’s asking staff to improve how it collects and sifts through information to get at better outcomes. The truth is, there’s often a big difference from what the CDC is seeing in data and the reality our health system is experiencing. Nuances in categorization and terminology can quickly skew the data – potentially causing critical errors in strategic direction. When working with government agencies, it’s important to have clear definitions of the data points being collected and ensure that staff understand them. For example, when hospitals were at max capacity, bed data was reported but not always labeled by the different types of beds (ICU vs acute care, for example), which can lead to operational inefficiencies and poor outcomes. The answers to critical questions are often buried under disorganized data. When providers send clear, labeled data to the CDC, they need to be able to combine data points into the right format the CDC needs and have the sequel queries in place to extract this data. They also need to balance patient privacy under HIPAA. The ability to have a 360-degree view of the healthcare market can help organizations, including the CDC, understand the various nuances of the healthcare ecosystem to make informed data-driven decisions. Healthcare commercial intelligence can provide that clarity as it combines data, analytics and expertise from multiple datasets with advanced data science to create a holistic picture of the healthcare landscape.
How Conversational AI is Transforming the Healthcare Sector. Commentary by Gary Fingerhut, Kore.ai Advisor
The global health crisis has quickly shifted consumer expectations across multiple sectors, mostly notably in healthcare, where companies are continuing to find ways of embracing digital transformation post-COVID. Telehealth services are experiencing massive adoption and market growth, and there are several new AI-driven technologies that have become the critical backbone for digital support systems and their ability to provide real-time patient diagnosis, real-time evaluation, and treatment. The adoption of this technology on the operational side of the business has experienced massive expansion and created a timely solution for organizations struggling with staff satisfaction, retention and utilization. Enterprise Conversational AI technology is helping fill ongoing workforce gaps while improving experiences, stakeholder satisfaction and increasing access to care when healthcare systems need it most. Conversational AI is an emerging technology that uses natural language processing (NLP), machine learning (ML), speech recognition, and other language technologies to process, contextualize, and determine the best way to respond to user input. It can deliver a human-like conversational experience through voice and digital interactions with patients, members, caregivers, providers, agents, employees and consumers across a healthcare enterprise. These new types of advancements in AI are showing they can optimize and boost patient experiences, drive call center efficiencies, and relieve critical staffing shortages. Big health companies need to meet new expectations and implement platforms that are scalable and easy-to-use, and support both internal and external use cases as well as multiple channels and languages.
Elevating the employee experience with an intelligent data core. Commentary by Jim Stratton, Chief Technology Officer, Workday
Data is key to elevating the employee experience – which has become a business imperative amid the Great Resignation – but there remain challenges in bringing together internal and third-party data at scale, in a way that ensures consistency, security and integrity of that data. The office of the CIO is becoming the enabler – if not the outright driver – for businesses to achieve that elevated employee experience. Given the pace of change both in the nature of work and the enterprise technology landscape, CIOs need to make certain that their technology architecture is capable of bringing together HR system data, sentiment data, and external data to derive insights that will drive more informed employee strategies, programs, and decision-making. To facilitate this, CIOs need to embrace an architecture that is open and connected across their enterprise application portfolio. An intelligent data core model can help facilitate this effort. At the center should be an adaptable data pipeline that supports large-scale external operational data ingestion, blends it with internal system data, and enables it to be continuously managed, up-to-date and secure. This then provides the foundation on which composite, cross-functional applications can be built and served, providing users with timely, contextually relevant experiences within their enterprise applications. And finally, these experiences should be served across channels, meeting the end user (employee) directly in their preferred digital workspace.
How VCs Use Data Science To Identify Investments with 2x Better Returns than Human Judgment Alone. Commentary by Mark Sherman, Managing Partner at Telstra Ventures
In an ironic twist, venture investors – responsible for catalyzing many of the Data Science, AI and ML innovations driving industry disruption – has been among the slowest industries to adopt them (chest pounding not withstanding). Today, roughly 90-95% of decisions happening in venture investing are human, but by 2030, we anticipate that will drop significantly to 50-60%. Why? Because as AI becomes more sophisticated, it will be a key differentiator for how VC firms operate. We’re already seeing measurable, improved investment outcomes using data science, proprietary algorithms and systems and AI versus using human judgment alone. Consider that 7% of deals Telstra Ventures sourced during 2020 using data science raised a round in the 12 months after being sourced, versus 33% of all other deals sourced by traditional channels. In contrast, deals we sourced using data science saw a 4.0x increase in reported valuation in the next funding round vs a 2.4 x increase for deals sourced using traditional channels. As we look forward, the dominance of analogue, purely intuitive approaches to early-stage investing might be at an end. With the tools at our disposal, even now, we can eliminate major areas of uncertainty that currently – frankly – require guesswork, and which cloud decision-making unnecessarily.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1