
Table of Contents
ElasticGPT — Powered by Elastic, for Elastic
ElasticGPT is our internal generative AI (GenAI) assistant built on a retrieval augmented generation (RAG) framework. It is meticulously crafted using Elastic’s own technology stack to deliver secure, scalable, and context-aware knowledge discovery for Elasticians.
At its heart lies SmartSource, a private, internally built, and fine-tuned RAG model that retrieves and passes the most relevant context from our internal data sources to an OpenAI large language model (LLM) using Elasticsearch for vector search and data storage. This generative AI application also delivers private, secure access to OpenAI’s GPT-4o and GPT-4o-mini models hosted on a dedicated Azure tenant through Elastic Cloud. This architecture exemplifies the seamless integration and raw power of the Elastic AI Ecosystem — from the backend to the frontend — all monitored and optimized through Elastic’s observability tools.
In this post, we’ll peel back the layers of ElasticGPT’s architecture, spotlighting SmartSource, our internal RAG-based LLM framework and model that transforms how we retrieve and discover information from our proprietary internal data sources. We’ll also explore how private access to OpenAI’s GPT-4o and GPT-4o-mini models extends ElasticGPT’s capabilities beyond RAG, enabling broader generative tasks — all while staying true to the customer zero ethos.

The Elastic on Elastic story: Building GenAI apps with our platform
As customer zero, building ElasticGPT was not built just to validate our generative AI capabilities and gain key efficiencies in this era of AI, but also to provide feedback to our product teams and share best practices with our customers on how to build a sustainable and future-proof generative AI platform that will scale as your business grows.
ElasticGPT isn’t just another tool — it’s a living showcase of Elastic’s technologies working in unison. We built it from the ground up using solutions and capabilities within the Elastic Search AI Platform:
-
Elasticsearch powers the data and search backbone
-
Elastic Cloud ensures effortless scalability
-
EUI, Elastic’s in-built design library, delivers a polished frontend
-
Elastic Observability provides real-time insights
This approach ensures that we can provide a secure, performant, and seamless application using the integrated capabilities of Elastic to meet our team’s current demands and future expectations.

Backend architecture: Elasticsearch as the mighty core
The backbone of ElasticGPT is Elasticsearch, a versatile powerhouse that serves as both the vector database for SmartSource’s RAG capabilities and a robust repository for chat data across all models. Hosted on Elastic Cloud, this setup offers the flexibility, scalability, and performance needed to support a growing internal user base.
Vector database for SmartSource
SmartSource, our name for our internal model, taps into Elasticsearch’s vector database to store embeddings — numerical representations of our internal data sourced from Elastic’s Wiki, ServiceNow Knowledge Articles, ServiceNow News Articles, and beyond. Using Elastic’s Enterprise Connectors, we ingest this data effortlessly, break it into searchable chunks, and generate embeddings for semantic search. When a user asks “What’s our Q1 sales target?” SmartSource performs a lightning-fast vector search in Elasticsearch to retrieve the most relevant context — perhaps a snippet from a sales report or meeting notes — and feeds it to GPT-4o for a polished response.
This setup empowers SmartSource to deliver precise, context-aware answers grounded in our proprietary data, all thanks to Elastic’s unparalleled search capabilities.
Chat data storage for all models
Every interaction — whether with SmartSource, GPT-4o, or GPT-4o-mini — is meticulously logged in Elasticsearch. This includes user messages, timestamps, feedback, and metadata.
Storing this data in Elasticsearch isn’t just about record-keeping — it’s about continuous improvement. With Elastic’s analytics, we can track usage patterns, pinpoint common queries, and identify areas for refinement. Meanwhile, within Elastic Observability, application performance monitoring (APM) keeps tabs on performance, reliability, and resource utilization, ensuring response times stay lightning-fast as adoption scales. User chat data is deleted every 30 days, with only metrics saved, enabling us to retain the most relevant data cost effectively.
Frontend architecture: React and EUI for a seamless experience
ElasticGPT’s frontend is a sleek blend of React and EUI, Elastic’s own UI framework, ensuring it feels like a natural extension of our ecosystem. Hosted on Kubernetes within Elastic Cloud, it’s built to scale dynamically and integrate effortlessly with our backend.
Why EUI? A lesson in flexibility
Early on, we toyed with Hugging Face’s Chat UI for a quick start, but its limitations became clear when users demanded custom features. Switching to EUI was a no-brainer — it’s purpose-built for Elastic’s products, aligning perfectly with our design system and backend. Now, ElasticGPT’s interface mirrors tools like Kibana, offering a consistent experience while letting us iterate rapidly as generative AI evolves.
Key features: Real time and secure
The frontend streams responses in real time, so users see answers unfold naturally — think of it like a conversation, not a loading screen. Source attribution and linking builds trust, while simple feedback buttons let users rate answer quality. Security is ironclad, with Elastic’s Okta single sign-on (SSO) for authentication and end-to-end encryption for data protection. Thanks to Elastic Cloud’s Kubernetes orchestration, we can deploy updates without downtime, keeping the user experience smooth and reliable.
API: The glue between frontend and backend
ElasticGPT’s API is the unsung hero, bridging the React frontend and Elasticsearch backend with a stateless, streaming design. It’s engineered for efficiency, delivering fast, accurate responses to users in real time.
How it works for SmartSource
When a user queries SmartSource, the API triggers a vector search in Elasticsearch to fetch relevant context, sends it to GPT-4o (hosted on Azure), and streams the generated response back to the frontend.
For GPT-4o and GPT-4o-mini, the API bypasses the RAG pipeline, routing queries directly to the Azure-hosted models for non-contextual tasks like brainstorming or general Q&A.
Monitoring with Elastic APM
Elastic APM tracks every API transaction — query latency, error rates, and more — ensuring we can resolve issues before they affect users. Kibana dashboards provide a bird’s-eye view of API performance, model usage, and system health, reinforcing the advantage of a platform approach.
LangChain: Orchestrating the RAG pipeline
LangChain is the orchestration layer behind SmartSource’s RAG capabilities, tying together Elastic’s vector search with GPT-4o’s generation to deliver accurate, context-rich responses.
1. What it does: LangChain manages the RAG pipeline end-to-end: chunking ingested data, generating embeddings, retrieving context from Elasticsearch, and crafting prompts for GPT-4o. For instance, when a user asks about Q1 sales, LangChain pulls the exact chunk from a sales report — not the entire document — keeping answers concise and relevant.
2. Why it fits with Elastic: LangChain’s flexibility pairs perfectly with the Elastic Stack. Elasticsearch delivers fast, scalable vector search, while Elastic Cloud ensures the infrastructure scales with demand. Plus, Kibana lets us monitor LangChain’s performance alongside the rest of the system, creating a cohesive observability strategy.
Extended capabilities: Private access to GPT-4o and GPT-4o-mini
Beyond SmartSource, ElasticGPT offers secure access to LLMs such as OpenAI’s GPT-4o and GPT-4o-mini models, hosted on a private Azure tenant. These models shine for tasks that don’t require internal data retrieval — think general queries, content drafting, or creative brainstorming. However, since it’s in a secure environment, employees can share private company data without worrying about complying with company policy.
-
Secure and compliant: Hosting these models on Azure ensures all interactions meet Elastic’s stringent security and compliance standards. Elasticians can use them with confidence, knowing their data stays private and protected.
- Tracked in Elasticsearch: Every GPT-4o and GPT-4o-mini interaction is logged in Elasticsearch. This unified tracking lets us monitor usage, collect feedback, and maintain consistent observability across all ElasticGPT features.
Elastic’s IT team is reducing the potential impact of shadow AI by delivering secure access to multiple LLMs. The team is currently expanding coverage to other LLMs, such as Anthropic’s Claude models and Google’s Gemini models.
Why a platform approach is a winning formula
Building ElasticGPT on our own platform isn’t just practical — it’s a strategic triumph. Here’s why:
- Seamless integration: Every piece — Enterprise Connectors, Elasticsearch, EUI, APM — fits together like a puzzle, eliminating friction and compatibility issues.
- Scalability on demand: Elastic Cloud’s auto-scaling ensures ElasticGPT grows with us, handling hundreds or thousands of users without missing a beat.
- Security built in: SSO, encryption, and Elastic’s security features lock down internal data, ensuring compliance and trust.
- Monitoring in real time: Elasticsearch, Elastic Observability, and Kibana analytics dashboards reveal how ElasticGPT is used, where it excels, and where it can evolve — fueling continuous enhancement.
This platform approach has delivered a v1 that’s already slashing redundant IT queries and creating employee efficiencies. And because it’s built on our Elastic Search AI Platform, we’re poised to iterate as fast as generative AI advances.
What’s next?
As we advance ElasticGPT, we aim to extensively use our powerful stack, utilizing new features like the “Semantic Text” field type, inference endpoints, and LLM observability to continue to utilize and test our latest features.
In parallel, with the increasing prominence of agentic AI, ElasticGPT will evolve to incorporate specialized AI agents designed to streamline workflows, significantly boost productivity, and enhance the daily experience for all Elasticians.
Build generative AI applications today
Ready to begin a build of your own? Check out our free AI playground to start building today.
The release and timing of any features or functionality described in this post remain at Elastic’s sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
In this blog post, we may have used or referred to third-party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third-party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.
Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.
Leave a Reply