Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
WormGPT & Cyber Attacks. Commentary by Aaron Mendes, CEO & Co-Founder of PrivacyHawk
“It’s not just businesses we need to worry about. This technology is available to criminals too. WormGPT, PoisonGPT, and others are just the beginning. An important action individuals and businesses can take is preventive: Reduce the amount of information these models have access to about you. Reduce your individual or corporate footprint so these models have less data to train on about how to exploit you. Use automation to reduce personal and corporate data easily available on the open web. Reduce the number of databases your data is in so you’re in fewer breaches. These models feed and train on your data. Make it harder for them to get it.”
Unlocking the ESG-Driven Digital Revolution – How Big Data and Active Archiving Converge to create a Sustainable, Profitable Tomorrow. Commentary by Steve Santamaria, CEO, Folio Photonics
“The fusion of ESG (Environmental, Social, and Governance) principles, big data analytics, and active archive storage has the potential to revolutionize the digital landscape by creating an ethical, sustainable foundation for data management. Embracing ESG values in big data practices shifts the focus from solely delivering profitable insights to doing so in a responsible manner that safeguards our planet. Being able to easily and quickly access historical data emerges as a critical component in this context. Active archives deliver remarkable efficiency gains while minimizing energy consumption and ensuring adherence to stringent data protection laws.
When these three elements intertwine, they form a powerful alliance that drives meaningful change, transforming the way we manage and utilize data in a manner that benefits businesses, society, and the environment alike. In an era of rapid digital transformation, embracing the synergy between ESG principles, big data, and active archiving is one of the most crucial and impactful decisions you, as an IT leader, can make. Ultimately, this decision paves the way for a more sustainable, responsible, and profitable future.”
Google accuses Azure’s of anti-competitive practices. Commentary by Mark Boost, CEO of Civo
“It is disheartening to see the persisting issue of anti-competitive practices in the cloud computing industry. The recent developments involving Google, the FTC, and Microsoft have again illustrated the urgent requirement for the needs of cloud users to be put first.
As operating costs increase across the board, companies are in dire need of better services at more affordable prices. Licencing restrictions are not the only method for restricting customer movement, with the cost of data egress fees used by many hyperscalers amounting to vendor lock-in. While these tactics may lead to greater revenues for the likes of Microsoft, they ultimately restrict business growth and hinder innovation, negatively affecting the potential of the entire cloud industry. If these anti-competitive practices continue, we will continue to see the shift towards the new breed of cloud providers with fair business philosophies and services.”
Open source improves the daily developer experience. Commentary by Adam Frank, SVP of Product and Marketing at Armory
“Open-source code is skyrocketing in use, but some leaders still harbor reservations: they fear losing control of their developers and product and believe they might open their organization to security threats or even expose their code to intrepid members of the public. While perhaps rooted in early open-source reality, these fears are from a bygone era; now, the benefits of open source far outweigh its drawbacks. Open source enables greater workplace flexibility, allowing developers to engage with various community-based tools and technologies, adapt them to their unique needs and contribute to the coding community. As we all know, greater flexibility breeds creativity and continuous learning, ultimately improving developer satisfaction and productivity. Open source also drives innovation by accelerating the development lifecycle. Developers leveraging ready-made components and libraries achieve a faster time-to-market, a pivotal competitive advantage in today’s digital economy. Finally, open source fosters a collaborative workplace culture based on open communication and symbiotic learning. These benefits are net positives for individual developers, organizations and the industry at large.”
Optimizing Systems of Engagement to Enhance Collaboration. Commentary by Jeff Robbins, Founder and CEO, LiveData
“System of engagement is a catch-all term that can describe any decentralized IT components that incorporate technologies such as email to instant messenger and social media to enterprise platforms or data integration, collaboration, and comprehensive analytics, with the goal of encouraging interaction. For some vertical markets, such as healthcare, analysts suggest industry should focus on applications that enhance usability and simplify data sharing. For example, systems of engagement that integrate structured data from electronic health records with unstructured data from other sources and then serve that data in intuitive and user-friendly formats would best meet those needs.”
AI and ML Quietly Transforming Bill Payment. Commentary by John Minor, Chief Product Officer, PayNearMe
“While ChatGPT, OpenAI and DALL-E are getting all the buzz for their ability to do everything from creating reports to generating artwork, other AI applications continue to quietly transform one ordinary but important area of modern life: bill payment and collections.
Payments technology companies are using AI and machine learning to help their clients identify patterns in their payers’ payment behaviors and compare them to companies of comparable size, type or regional location. This type of anonymized data comparison offers organizations a wide lens to view trends and patterns taking place with their customers’ bill payment behavior to reveal outliers and identify trends or potential problems. Armed with this information, AI and ML can make decisions autonomously and act on insights that directly improve performance.
AI applications can identify late payers and send each one a customized plan for how to catch up on payments, basing recommendations on millions of data points in the payment platform provider’s data warehouse. Taking it one step further, the technology can be used to forecast payment shifts so billers can better plan for the future. AI applications can make logic-based predictions about changes in ACH returns or credit card declines, based on both internal and external data sources, giving billers time to prepare for a potential drop in income. AI also can be deployed to scan large data sets and flag subtle changes in payments activity that could indicate fraud, getting the job done in a fraction of the time it would take humans to accomplish the same task.”
Should companies trust ‘trusted’ generative AI solutions? Commentary by Jean-Claude Kuo, from Talend/Qlik
“As organizations prepare for generative AI, they need a way to address the privacy and security risks that come with it. To combat these, they establish a “privacy by design” approach to achieve full data sovereignty. Organizations will need to build a trusted network that emphasizes the importance of data observability. This network should understand the organization’s primary concerns and risk posture and understand the role they play in creating an environment where data can be shared effectively and securely. There should be an understanding of the most critical assets for an organization and new ways to control these assets. Organizations need to ask and understand what is being done with the data being collected and consider what can happen if AI shuts down.
With new regulations emerging and privacy top of mind for consumers, the task of maintaining data sovereignty seems daunting. However, data sovereignty does not have to be a complete overhaul of your organization’s processes. ‘Privacy by design’ starts with organizations creating cross-functional teams that tackle issues of data sovereignty, including data residency, localization, privacy and security through collaboration and transparency. Given the rise in privacy concerns and regulations, the lack of a comprehensive, consistent approach to data governance and quality is resulting in significant risk exposure.”
What AI companies can do to raise awareness and provide solutions to thwart the damage AI innovation is causing the planet. Commentary by Anna Daugherty, Director of Product Marketing at Armory
“AI tools take a massive toll on CPUs. This issue may come as no surprise to developers, but let’s frame the problem differently: AI-based solutions are so resource-intensive they impact our environment substantially. In fact, how we interact with AI tools is entirely unsustainable. Arguably, AI — at least in its current manifestation — should be considered a non-renewable resource. DevOps leaders must take ownership of and bring awareness to this problem by taking steps to address it today.
Promising strategies include optimizing deployment processes, promoting the use of virtual environments, leveraging serverless architecture and prioritizing on-demand resource allocation. Above all, developers must maintain a more strategic deployment workflow aided by intelligent automation tools that preserve precious resources.”
Why is everyone so fixated on training AI on GPUs? Commentary by Anshu Shrivastava, CEO of ThirdAI
“The tech industry believes GPUs hold the ‘key’ to the future of AI training (with ‘key’ spelled G-P-U). Just look at Nvidia’s stock. And there’s some truth to that. The primary building blocks of AI have been the existing software stack and tools like PyTorch and TensorFlow. These tools were built when the common assumption was that dense matrix multiplication is the primary building block of AI and will remain so. The software co-evolved with GPUs, the most promising deep learning hardware of the time, and they have been optimized together for dense matrix multiplications for over a decade now.
Even our current choices of neural network architectures are biased toward this ecosystem. For example, most networks that are being studied and experimented with, including transformers, are deep and narrow, perfect for a GPU. We don’t see a lot of wide networks, even when many studies have shown that they are likely better. The primary reason is that wide neural networks are hard to train in the current GPU ecosystem. The community has tried its best to escape the memory limitations of GPUs by preferring one kind of neural architecture over others, a perfect example of co-evolution.
The explosion of AI as we see it today happened because these software stacks eliminated all the technical prerequisites needed to be an AI contributor. At this point, the AI ecosystem is more about the software, algorithms, data, and even neural architectures, all coupled with the hardware. They all co-evolved with each other for over years. Having only powerful GPUs maybe even better than NIVDIA will not give you any advantage. It is like saying my team has the best player in the world, but as a team its it wont beat the existing ecosystem.
There was a time when embedding dimensions of more than 128 to 256 was prohibitive because GPUs could not accommodate it. We all know the best-performing GPT model uses 12,000-dimensional embedding models (much more comprehensive). Only a handful of companies can train models of that size, and we now know it is needed.
We are at the point where the fundamental dense matrix multiplications are becoming prohibitive, even with the co-evolved software and hardware ecosystem, for the size of models and datasets. The AI community sees “dynamic” sparsity as the hope for the future, and many papers are written around it, including from my group and ThirdAI. However, accelerating “dynamic” sparsity is fundamentally the opposite of accelerating dense matrix multiplications. As a result, there is a need to rewire the whole ecosystem’s foundation, likely affecting every AI application out there. So it will take time. The good news is that the field is moving faster than we can imagine, and the community is more than eager to try new alternatives so long as they are in easy-to-use API forms.”
Don’t Fear the Bots: Embrace Generative AI & Automation. Commentary by Raghu Ravinutala, CEO & Co-founder, Yellow.ai
“The rapid progress of generative AI is revolutionizing automation possibilities for enterprises, introducing new avenues for enhanced efficiency. However, despite these remarkable advancements, many enterprises exhibit caution when it comes to integrating generative AI into their business processes, including functions like customer support. This hesitancy is understandable considering that while generalized large language models (LLMs) offer broad opportunities, they often lack the depth and nuance required for specific enterprise use cases, domains, and functions. To effectively harness the potential of generative AI, enterprises need to embrace domain-specific LLMs that capture the essence of particular industries or use cases, incorporating their unique jargon, context, and intricacies.
Generalized LLMs, in their attempt to cater to a wide range of potential end-user needs, increasingly risk producing instances of hallucinations or providing irrelevant information in responses. By leveraging domain-specific LLMs, enterprises can limit the scope of responses to specific use cases, thereby reducing the likelihood of inaccurate replies. To ensure a positive and productive impact of generative AI in areas like customer service, enterprises must prioritize focused, faster, and efficient models that strike a balance between intelligence and performance.”
The AI-Energy Use Conundrum We Should All Be Talking About. Commentary by Nilesh Patel, Chief Product Officer, WEKA
“It is estimated that the world’s data centers consume ~3% of global energy today – more than double what it was just 10 years ago.
There are more problems ahead: With the explosion of generative AI, which requires a tremendous amount of energy to train and run, analysts predict that number will continue to grow exponentially. Without intervention, these next-generation AI workloads are expected to drive a more than threefold increase in energy demand by 2025, eclipsing the power consumption of the world’s entire human workforce combined.
While much has been written about the energy requirements to cool the power-hungry datacenters that process AI, we should instead be focused on prevention. Rather than waste any more time treating the symptom – heat – companies should instead address the source – inefficient energy consumption in the datacenter, which increases the need for cooling.
If companies can get the foundational elements of their data architecture and infrastructure right, they can increase their efficiency, reducing energy waste and their data’s carbon footprint from the get-go. Even architecting AI data pipelines in a more efficient way can increase GPU stack efficiency up to 50%.
Let’s refocus our time and start shifting the conversation to techniques that can curb inefficiencies in the first place, upstream in the stack. It’s not too late.”
Simplifying ESG Reporting Through AI-Derived Data. Commentary by Sunil Senan – Infosys SVP and Business Head – Data & Analytics
“According to Infosys’ ESG Radar 2023, 90% of executives surveyed said that ESG initiatives show positive financial returns. ESG is no longer optional, but a business necessity, with success and transparency being rooted in data. To build sustainable operations, organizations must rely on data and analytics, as well as artificial intelligence (AI), to manage processes efficiently and measure ESG progress. We are at an exciting and critical time, where the explosion of AI has the potential to deliver world-saving insights. Companies are collecting and sharing a growing amount of sustainability data. However, businesses often fail to take advantage of the benefits that their data can provide. This critical cycle of collection and sharing can lead to insights that both improve ESG outcomes and help the enterprise achieve its business goals.
By leveraging AI-powered data tools, organizations can smoothly enable data collection, as well as coordinated integration, governance and curation at scale. This opens the door for seamless self-service ESG reporting, where goals can now be tracked through the integration of self-serve reporting of key indicators, such as carbon footprint, greenhouse gas emissions and sentiment scoring. As the amount of sustainability data available grows, it is imperative that enterprises unlock efficiencies through Autonomous AI powered solutions to share insights that can improve business outcomes, while fighting climate change, inequality and other societal issues.”
AI Responsibility: Don’t Just Join the Hype. Commentary by Leonid Belkind, Co-Founder and CTO of Torq
“With the AI boom, executives across industries are tasked with developing AI adoption plans and leading teams through this new terrain. Rather than just joining in on the AI hype, IT teams must be diligent and evaluate how the technology can help relieve pain points within their company, while setting up appropriate guardrails. More than ever, AI responsibility is critical as these integrations will leave a lasting impact. AI capabilities can boost productivity and reduce burnout for teams, but will only prove to be beneficial if integrated responsibly and strategically.”
Big Data/ML in slower moving consumer categories. Commentary by Chad Pinkston, VP of Product Management at TraQline
“CPG categories have long had the benefit of product moving quickly off the shelves. This frequency in inventory turns fueled opportunities for Big Data to predict consumer behavior. However, certain industries, such as consumer durables, have lagged behind the standard of CPG data operations – until now. Big Data has played a pivotal role and will play a greater role in the durables space going forward, for three main reasons: (i) Data Quality – in the Machine Learning world the quality of data trumps all. AI and ML models will continue to be pushed further and further to the front lines creating a world where those that execute queries do not need technical ability – they just need to know what questions they have for the data to answer and off the shelf algorithms will help them facilitate that through natural language query. But – and this is a big but – the data must be harmonized and structured in ways that allow the ML to learn and the algorithms to run queries. So in the future, those with clean harmonized data will win – but structured first party data is not enough. You will need Big Data harmonized across first party data; for example, manufacturing shipment data coupled with third party data elements like transactions, web analytics and social media mentions to generate the type of insights you need to run your business; (ii) Data Portability – combinations of first-party data and third-party data are needed to get the depth and breadth of insights to operate your business. Data sharing through data collaboration platforms will continue to rise because of the need to combine data sets. No longer can you live in a silo using your data alone. Ingestion of third-party data that is instantly harmonized and translated to your data standards will be the default operating system; (iii) Modeling Capabilities – higher confidence levels of predictive consumer behavior will be reached even when the purchase incidents are less frequent because of the ability to harmonize vast data upstream data sets, securely share data across multiple providers and use bespoke ML to derive high confidence insights from traditionally smaller sample sizes.”
From Giant AI to Smarter AI. Commentary by Shivani Shukla, PhD, Associate Professor, University of San Francisco School of Management, Department of Business Analytics and Information Systems
“OpenAI is burning through $700,000 every day to keep ChatGPT running based on figures from a research firm, SemiAnalysis. Not to mention that the platform cost more than $100 million to develop. A majority of this cost is attributed to hardware infrastructure, and as Sam Altman pointed out “the giant AI models are reaching a point of diminishing returns”. Practical scalability being a challenge, this calls for the next stage of evolution where these models ingest lesser data and compute more efficiently. Data privacy and regulations, copyright issues and monetization concerns would further prevent data accessibility. Stackoverflow and Reddit demand compensation to train algorithms and platforms like ChatGPT on its data now. We simply cannot afford to build and run models that get better with more data.
The focus will have to shift on constructing mathematically and computationally efficient models. Few-shot and Zero-shot learning models have long been explored in computer vision where a model observes and identifies samples based on auxiliary descriptions and textual cues from a class it wasn’t trained on. In the large language model space, the new LIMA language model that was launched on May 22, 2023 reached GPT-4 and Bard’s performance levels with a lot fewer parameters, and responses and prompts. Essentially, it is able to generalize to unobserved tasks. These fields of inquiry are rapidly moving towards lower input quantity requirements and at-par outputs.
With these advancements and many more on the horizon, it is highly likely that generative AI and the definition of Big Data will undergo a significant transformation in the future.”
AI and SaaS: Simplifying and Complicating Things, All at Once. Commentary by Ben Pippenger, CSO at Zylo
“No one is immune to the AI hype right now. If you’re charged with managing your organization’s SaaS, AI has the potential to make your life easier… but also a little more complex.
More and more IT and procurement professionals recognize the major benefit of integrating AI into existing workflows to streamline processes and drive efficiency with managing SaaS. With the help of AI, leaders can better navigate the complex landscape of SaaS products, leveraging these tools to process mountains of information in a matter of minutes to get a more comprehensive understanding of providers’ offerings, product versions and pricing structures.
But AI also introduces new complexities as well. There are a ton of new SaaS companies that focus on AI and larger learning models that are getting funded. On top of that, nearly every SaaS vendor out there today is working on introducing AI into their tools, which means more financial and security risks. With new features and new tools come new costs, so you must stay on top of whether these new AI tools are incurring additional fees or creating changes in pricing models for your SaaS tools.
Equally important is addressing security. Identify which of your vendors rely on third-party AI services and thoroughly investigate their data storage, encryption and usage practices. You need to understand where your organization’s data is going. This knowledge will become even more crucial as SaaS vendors introduce more AI capabilities.”
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW
Leave a Reply