The history of Oak Ridge National Laboratory, tucked in the hills of Tennessee, was once a top secret government facility working on unlocking the secrets of atomic energy.
These days, the lab, while still maintaining programs researching nuclear science, has a much broader mandate, studying everything from biological and environmental systems to clean energy to the structure of the COVID-19 virus.
Underpinning nearly everything Oak Ridge studies is its supercomputing program, where world class researchers push the limits of computational power in service of scientific advancement.
And underpinning Oak Ridge’s supercomputers, helping to keep them stable and performant, is technology from Elastic. Oak Ridge’s latest supercomputer, Summit, was deployed in 2018 and has a peak performance of 200 petaFLOPS, or 200 quadrillion calculations per second. While impressive for its time, that’s nothing compared to the lab’s forthcoming supercomputer, Frontier, due to come fully online later this year.
Frontier will have a peak performance of 1.5 exaFLOPS — a 650% increase from Summit. As the first exascale computer in the United States, it will help scientists achieve previously impossible breakthroughs in energy and national security research.
Frontier will occupy the space of nearly two football fields and require 40 megawatts of power to run. Compared to Summit’s 13 megawatt power load, Frontier’s power draw means that even small tweaks can translate into huge efficiencies in operation. That in turn translates into better economics and faster breakthroughs for researchers using Frontier to solve previously unsolvable problems.
All this means that speed and performance are critical for the teams building Frontier to optimize — and why that team has turned to Elastic to monitor and optimize its performance.
The analytics and monitoring team at Oak Ridge recently discussed how they use Elastic logging to help keep a complex system like Frontier stable, and utilize Kibana data visualization to pinpoint infrastructure efficiencies. Here, we share some of their insights, which are useful to anyone running Elastic at any scale, in any size organization.