Intelligent Service Mesh for an Autonomous Network-Compute Fabric
Research project
There are now increasingly tough and stringent performance requirements, which state-of-the-art clouds and connectivity cannot provide. In this project, we endeavour to improve the intelligence and autonomy of the core for edge computing.
The digitisation of industry, business and society, e.g. in the automotive and manufacturing industries, leads to increasingly stringent performance requirements compared to what can be offered today by state-of-the-art clouds and connectivity. Our proposal is to fulfil these requirements by improving the intelligence and autonomy of the so-called network-compute fabric, the core of the edge computing infrastructure, by unifying the management of applications over highly distributed and dynamic network-compute resources.
The network-compute fabric vision aims at providing a continuum execution environment through device, deep edge, and central cloud. This fabric plays a key role both for emerging applications and for the cloudification of the network itself. To take advantage of the distributed nature of the network-compute fabric, applications are increasingly being disaggregated into smaller components which are deployed across a distributed infrastructure. The disaggregation and distribution of applications combined with an increasingly heterogeneous infrastructure will make applications increasingly complex to manage. Hence the network-compute fabric introduces new opportunities to optimize application performance but at the cost of increasing complexity that needs to be handled by an increasing degree of automation.
Optimized handling of traffic
Our overarching research question is how intelligence and autonomy can be employed in a network- compute fabric to meet the strong performance demands of future applications. We aim at investigating and proposing intelligent and unified mechanisms for optimized handling of traffic across distributed application components based on the so-called service mesh. To ensure or guarantee the desired performance, we will introduce both awareness and control over the traffic forwarding logic across all components of a distributed cloud-native application, providing concerted management of network- compute resources.
The optimized traffic handling for distributed cloud-native applications in the network-compute fabric will involve an interplay between application requirements, application placement and scaling decisions, as well as availability of network-compute resources; which may all be subject to dynamic changes. A major challenge relates to how to introduce automation to the application traffic handling in a way that it meets performance, robustness and scalability requirements. Another challenge relates to the coordination between cloud-native applications and network workloads in a tightly integrated network-compute fabric. Such integration is foreseen to introduce opportunities for new forms of optimizations and automation.
Cloud-native approach
The Service Mesh (SM) concept was developed as a standard approach which addresses traffic management and observability for cloud-native workloads with support for very fine granularity levels of control. SM is becoming more widely adopted not only for application workloads, e.g. with Istio, but also for network functions, e.g. with NSM. SM represents a fully cloud-native approach to network programmability and contrasts the telco approach Software Defined Networking (SDN), which has played a key role for the initial enablement of network programmability.
SM has challenges in terms of performance, scalability and autonomy. Regarding the SM data plane, there is a non-negligible latency overhead and a complex trade-off between this latency overhead and the level of granularity for traffic management. Regarding the SM control plane, there are also performance and scalability challenges hindering the support for more dynamic and distributed edge cloud use-cases. The need for keeping a well-defined separation of concerns has resulted in very fragmented approaches to network programmability for cloud-native application and network workloads.
Intelligence and autonomy
SM can provide a basis for converging these approaches by enabling cloud-native network programmability as an integral part of the network-compute fabric. This research proposal addresses the evolution of SM, towards a unified cloud-native network programmability approach for both application and network workloads. It also covers the optimization of traffic management for application workloads in coordination with network workloads; and the optimization of the SM itself with higher levels of intelligence and autonomy. For more details, see the extended research project proposal attached to this document.