Energy-Aware Autonomous Management for Cloud-Edge Infrastructures
Research project
The Swedish Energy Markets Inspectorate (Ei) specifically highlights data centres as a cause of capacity problems in the Swedish national grid. For the first time, this project addresses the real challenges faced by operators and national power grids due to the huge energy consumption of Cloud-Edge infrastructures.
A single large-scale data centre can consume over 100 MW of power, equivalent to over 80,000 homes, while the cloud as a whole is projected to consume as much as 8% of the global electricity supply by 2030. By developing an autonomous resource manager that can reason about, and control resource allocation at a high level, additional intelligence can be introduced to optimise energy use across the system.
Wallenberg AI, Autonomous Systems and Software Program
Project description
Due to the bandwidth, processing and latency requirements of modern applications and devices, cloud infrastructure is rapidly evolving from centralised systems to geographically distributed federations of edge devices, fog nodes and clouds. Large volumes of data move back and forth between the network edge, intermediate fog nodes and remote cloud data centres, and low-latency devices connect to spatially local edge resources. These federations of nodes and devices, often called Cloud-Edge infrastructures, are the critical infrastructure on which most modern digital systems depend and are seen as a key strategic technology for Europe's digital transformation.
"Clouds" are complex and dynamic as they must process a variety of software workloads, each of which may have changing requirements and service level agreements, across geographically dispersed nodes. Each node in the system is often heterogeneous, with different types of hardware, pricing models and energy sources.
The management of such systems is therefore a huge challenge. The dynamics and complexity are too great for human control, making autonomous resource management mechanisms necessary. However, these mechanisms typically operate at the level of a single node and, importantly, do not consider energy constraints, policies, and optima across the larger federated infrastructure. This is a highly topical issue that is becoming increasingly important as society's energy challenges are now growing.
One infrastructure – multiple data centers
The enormous amounts of energy consumed by Cloud-Edge infrastructures have a profound impact on the environment and society. An infrastructure consists of multiple data centers and each data center is a complex system of systems consisting of servers – of many different configurations and quantities – placed in network racks, combined with enterprise-class cooling and power distribution systems. Some calculations show that a single large-scale data center can consume over 100 MW of power, equivalent to over 80,000 homes, while the cloud as a whole is estimated to consume as much as 8% of the global electricity supply by 2030.
Cause of capacity issues
In addition to environmental concerns, this energy consumption causes significant challenges for national power infrastructures, which must balance the needs of the clouds against the needs of other users. Sweden is an acute example of this; the Swedish Energy Market Inspectorate (Ei) particularly highlights data centers as a cause of capacity problems in several parts of the Swedish electricity grid, causing power limitations in some parts of the country that affect other companies.
New opportunities
By developing an autonomous resource manager that can reason about, and control high-level resource allocation across a federated Cloud-Edge infrastructure, additional intelligence can be introduced to optimize energy use across the entire system. For example, the allocation of workloads can be directed towards nodes that currently use a large amount of energy from sustainable sources, e.g. solar energy. Conversely, workloads can be migrated away from nodes in geographic areas to increase energy availability in local power grids at times of high demand from other users and businesses. This allows you to "balance" the draw on regional and national power grids. These scheduling decisions can be balanced against the energy price in different regions, as well as the individual requirements and service level agreements that exist for the software workloads. As an example, jobs that require extremely low latency can still be assigned to an on-premises Edge data center even if that data center uses energy from an unsustainable source, while less latency-critical workloads can be scheduled to more sustainable locations.
Four major challenges
An efficient energy-aware resource management mechanism for federated Cloud-Edge infrastructures requires solving four distinct scientific challenges. These are as follows:
A Formal Layered Model of a Cloud-Edge Infrastructure and Its Associated Workloads
Cloud-Edge infrastructures are extremely complex, dynamic, and heterogeneous—as are the software workloads that are sent into them. To reason about a complex system of systems, it is necessary to identify the most important components of the system and their relevant properties and interactions. This extends from energy sources to the physical layer and ultimately to the application layer. Despite the popularity of Cloud-Edge infrastructures, no such layered model has yet been created.
Scalable Monitoring and Prediction Architecture for Cloud-Edge
When it comes to Cloud-Edge infrastructures, monitoring is an intensive process that requires innovations in scalable monitoring of software and hardware metrics within and between data centers, along with rapid predictions of usage, energy consumption and sustainability.
Arbitration of Service Level Objectives (SLOs) with Local and Global Energy Demands
An energy-aware Cloud-Edge infrastructure has local and global optima at many different levels within the stack. For example, SLOs at workload, data center, regional and national levels. How to best arbitrate and optimize holistically across these levels is not yet fully understood.
Validation of Energy-Aware Resource Management for Cloud-Edge Infrastructures
Even when a model-based energy-aware resource management system has been developed, there are significant challenges in terms of testing and validation, especially when considering how to represent and monitor sustainability and energy consumption within the system.
Vision
The ultimate vision of this project is to develop an autonomous management system that for the first time addresses the real challenges faced by operators and national power grids due to the huge energy consumption of Cloud-Edge infrastructures.