If you’re looking for big data, just look up. Weather analysis requires petascale-class pipelines, with seven-day forecasts needing approximately 2000 time-steps for prediction, and each time-step requiring trillions of computer operations. Weather forecasting and climate simulations can encompass quadrillions of operations for a single forecast run – which needs to be done several times a day.
The logistical storage and compute cycle challenges in meteorological analysis grow to potentially ungovernable levels with each increase in resolution, or as new or additional sources of raw data become available. For the UK’s Met Office, it is a scenario which requires constant innovation in provisioning.
In October the Met struck a new agreement for HPC solutions with SGI, DataDirect Networks and Bright Computing to consolidate the resource base for its Scientific Processing and Intensive Compute Environment (SPICE) initiative. Speaking to The Stack, SGI’s Vice President and General Manager of High Performance Computing, Gabriel Broner, explained that the biggest challenge is moving the data in and out of the system.
“Current HPC systems are helping with forecasting numerical models,” he says, “which solve several simultaneous differential equations on a three-dimensional grid. The advanced global models usually have a horizontal distance between grid points of about 10km, and close to 100 vertical layers, resulting in over 200M points. Out of those points, a couple hundred values are computed for each.”
Broner explains that forecast quality is improved by the ‘ensemble’ methodology, wherein multiple forecasts – each with slightly different initial conditions in order to accommodate the non-linear nature of equations – run in parallel, which increases both resolution and the total number of ensemble members. This approach is crucial for the accuracy of forecasts that extend beyond 15 days.
Upstream and downstream data
The aim behind obtaining or generating high-resolution meteorological data is similar to GPS innovations – granular and increasingly geospecific data. Broner says that the higher data resolution achieved by the ensemble method requires smaller time-steps, with accuracy that is expected to reach five kilometres soon – and perhaps even one kilometre as the system develops. “Doubling resolution adds an order of magnitude, a factor of ten, of compute and data movement to the process.”
An additional metric for necessary resources in the Met’s calculations is the volume of data under analysis. ‘Upstream’ data is observational, and chiefly provided by satellites, whilst ‘downstream’ data is simulated and comes from both forecasting and climate research.
Such data is measured in petabytes. For perspective, NCAR’s ‘Cheyenne’ supercomputer holds 50 petabytes of historical data – and SPICE is capable of outputting new data at a much faster rate.
Storing the weather
The goal for SPICE was to increase scientists’ productivity, chiefly as a result of the significant investment in the wider Met Office HPC environment and the deployment of the Cray XC40 system. This has already been largely achieved with the SGI system.
The Met uses SPICE to examine data computed by research models on the large HPC system, which helps it gain insight and a better understanding of trends and implications to be derived from the model’s raw output. Prior to SPICE, Met Office researchers used shared distributed systems that required the user to manually seek out available resources across the estate on which to run analysis work.
SGI’s input to the new system comes in the form of new cluster and DDN storage, in addition to OpenStack resources. “Users are able to rely on the system to manage available resources, and workload turnaround has improved significantly. Activities that previously were only possible to turn around in a single day can now be run several times during the same period.”
Since the forecasting workflow is time-critical, the Met models need the most up-to-date observational data, which is prepared by mapping to a three-dimensional grid and being run in a very restrictive time-window.
Another innovation of SPICE is how it makes sense of the raw output of the computational models, effectively repurposing data flows which have been originated in the service of climate studies, or even of improvements to previous modelling systems.
The Bright outlook
In adopting Bright, the Met Office was seeking a solution for a hybrid HPC environment over both bare metal and virtualised private cloud resources. This had already been attempted by a local team, but the result proved inadequate to the proposed HPC workloads.
“They had also experimented with OpenStack,” Broner says, “and although this was better suited to their needs, the learning curve and skills required had caused the project to drag out. The close relationship SGI held with the Met Office enabled them to understand the challenge the Met Office team faced, leading them to approach Bright Computing and work together to propose a winning hybrid solution based on Bright technologies.”
The Met has used Bright Cluster Manager for HPC to deploy the new spice cluster on bare metal, has also implemented Bright OpenStack for easier deployment and provisioning of a private cloud infrastructure. “The fact that Bright’s solutions can be administered from a single point of control, was a consideration in the Met Office’s decision-making process.”
Edlin Browne of DataDirect Networks notes that SPICE was developed as part of a notable investment in HPC capacity at the Met Office. “The deployment of the Cray XC40 system represented a step change in HPC capability,” he says, “and it was clear that downstream analysis systems would need investment to scale to meet the requirements of Met Office research staff, supporting the realization of the benefits of the HPC program. Previously research staff used a combination of high-powered workstations and distributed x86 servers with local storage on which to perform this work.”
Browne observes that SPICE is designed to make maximum utilisation of DDN storage environment addressing. “The solution enables efficient processing of compute and memory intensive workloads through high-speed interconnect between servers and DDN storage, high performance and scalable DDN storage and file system, ability to support future visualization, and advanced hardware and configuration management software toolsets.”
SLURM workload management software also enables users to schedule jobs without resource location, which Browne observes is a boon to productivity and turnaround.
Looking to the future of a sector which was data-hungry and cycle-dependent long before the current trends, Browne sees accelerators in various forms as a key factor for the continuing evolution of meteorological analysis, with increased usage of non-volatile flash memory. “The convergence of data analytics and HPC will have a positive impact on the climate research workflow – mostly at the post-processing stage.”