Toward end-to-end latency management of 5G network slicing and fronthaul traffic (Invited paper)

(1)

Available online 24 January 2023

Regular Articles

Toward end-to-end latency management of 5G network slicing and fronthaul traffic (Invited paper)

David Larrabeiti

^a^,∗

, Luis M. Contreras

^b

, Gabriel Otero

^a

, José Alberto Hernández

^a

, Juan P. Fernandez-Palacios

^b

aUniversidad Carlos III de Madrid, Av. Universidad 30, Leganes, E-28911, Madrid, Spain

bTelefonica I+D/CTIO, Madrid, Spain

A R T I C L E I N F O

Keywords:

5G slicing Latency management Fronthaul transport Backhaul Tactile services

A B S T R A C T

5G network slicing allows operators to deploy virtual connectivity services tailored for specific purposes on top of the same underlying physical infrastructure. For some 5G services, the telecommunication operator needs to provide the customer with real-time information of the end-to-end Quality of Service for a particular slice. This paper focuses on the challenges behind this target, how per-layer per-segment monitoring can be performed based on common open interfaces to standard OAM protocols, and provides practical rules to plan end-to-end latency for slices. Then it reviews a few latency engineering approaches for fronthaul traffic from work carried out in this research area.

1. Introduction

3GPP has identified network slicing as a key component of 5G. Tech- nical specification TS23.501, System architecture for the 5G System (5GS), includes slicing since 5G release 16 at stage 2.¹TS22.261 [1] for release 18, also at stage 2, specifies the support of slicing as one of the formal service and operational requirements for 5G systems, including UEs (User Equipment), NG-RAN and 5G Core network.

What is network slicing? Unlike previous 3GPP mobile architectures where a single packet transport carrier was intended to carry all types of services, in 5G the transport network is expected to provide optimized support for different types of services and users in terms of throughput, latency, positioning, mobility, reliability, and availability.

This new evolution step requires very important changes both in the wireless and wired networks, including the scalable assignment of network, computing and storage resources.

Network slicing allows the operator to deploy a number of instances of virtual connectivity service tailored for specific purposes on top of the same physical infrastructure. The network slice features the whole mobile connectivity function, including IMS (IP Multimedia Subsystem); it can have specific functionality (e.g. priority, charging, policing, security, mobility), performance (e.g. latency, jitter, availability, reliability and rate), or it can serve specific users (e.g. public safety

∗ Corresponding author.

E-mail addresses: [email protected](D. Larrabeiti),[email protected](L.M. Contreras),[email protected](G. Otero), [email protected](J.A. Hernández),[email protected](J.P. Fernandez-Palacios).

URL: https://www.it.uc3m.es(D. Larrabeiti).

1 By stage 2 is meant architectural description to fulfill requirements (stage 1), not implementation of the architecture (stage 3) (ITU-T Recommendation I.130).

users, corporate workers, roamers, or another public mobile network operator). Furthermore, 5G features network capability exposure to trusted third parties through an appropriate API. That API shall allow to create, modify or delete slices, associate UEs to slices, and also im- portantly, to monitor the network slice, including resource utilization and QoS. A recognized target slice by 3GPP is the implementation of public safety networks that compete with or replace private cellular technologies such as TETRA, while providing much higher data rates, with full isolation w.r.t. other slices, and with maximum priority in the event of a disaster (dynamic policy control). Another well-known target type of slice is the creation of multicast or broadcast networks.

Technical Specification TS22.261 [1] specifies a set of performance requirements for high data rate and traffic density scenarios, including ITS (Intelligent Transport Systems) infrastructure backhaul. A sample of the most demanding cases is summarized inTable 1. The table shows downlink (DL) and uplink (UL) rates, capacity in terms of 𝑏∕𝑠∕𝑘𝑚², latency bounds, reliability as a percentage of correctly delivered packets can include several re-transmissions, the availability accounts for the percentage of time the service is up via its interface.

Other requirements include timing, positioning, ranging (distance estimation between UEs), IoT (Internet of Things) traffic collection capability, IMS multimedia telephony service, Machine-Learning-based-

https://doi.org/10.1016/j.yofte.2022.103220

Received 6 May 2022; Received in revised form 24 December 2022; Accepted 31 December 2022

(2)

Fig. 1. An example of management domains involved in a network slice.

prediction of QoS changes, tactile and multi-modal communication, security and charging.

Services such as URLLC (Ultra-Reliable Low-Latency Communica- tions), connected industry 4.0 and V2X (Vehicle-to-everything), require stringent QoS guarantees from the network. TS22.261 makes com- pulsory to provide a mechanism to support real-time end-to-end QoS monitoring for a slice, including event notification to UEs or groups of UEs. The goal is to enable some sort of error-reaction ability to interconnected automation devices by means of a continuous stream of reports and asynchronous events upon crossing certain thresholds of loss and/or latency.

This means that, not only the operator needs to estimate, plan, deploy and monitor the performance of slices, but it must provide their users with real-time information of the perceived end-to-end QoS, for which current multi-layer multi-technology networks are not prepared.

Furthermore, slicing needs to coexist with other high priority real-time traffic like fronthaul traffic.

This article attempts to identify the challenges behind end-to-end performance monitoring, sheds light over relevant recent steps toward OAM convergence and provides an overview of latency planning techniques to deal with the question. To this end, the paper is structured as follows. Section2describes the network technologies involved in a slice. Section3describes the OAM capabilities available in each of them to contribute to a detailed latency analysis. Section4outlines the steps toward a unified mechanism that will allow a framework for end-to-end performance monitoring. Section 5proposes alternatives for latency planning in the transport network and implementing the 5G slicing architecture specification, Section6reviews the special case of latency management for fronthaul traffic (which has much more stringent requirements than regular slices) and Section 7draws conclusions. A glossary of acronyms used throughout this article can be found in this latter Section inTable 3.

2. Challenges behind end-to-end slice latency and jitter monitor- ing for 5G

When provisioning end-to-end slices for use cases of low mobility and guaranteed QoS, telecom operators need to estimate primary and backup paths with a maximum target latency. These paths can be packet-switched or, for especial applications, circuit switched. The service will be typically defined as a chain of PNF (Physical Network Function) and VNF (Virtual Network Function) instantiations. One major challenge in configuring, monitoring and troubleshooting slices comes from the intrinsic complexity of the end-to-end slice itself, made up of stitched service instantiations (most of them shared by sets of slices) across multiple network management domains each of them featuring a multiplicity of layers and technologies. This complexity is illustrated inFig. 1. The figure shows the transport network infrastructure involved in the delivery of a point-to-point network slice for an industrial remote control use case. This use case has low latency, low jitter and reliable connectivity as main requirements, and the novel additional requirement by 5G of informing of the available QoS in real time.

Table 1

Sample service requirements from 3GPP TS22.261 [1].

Service Requirements

Urban Downlink (DL) datarate: 50 Mbit/s; Uplink (UL) datarate:

25 Mbit/s; DL Capacity: 100 Gbit/s/km²UL Capacity: 50 Gbit/s/km²

Dense urban DL: 300 Mbit/s; UL: 50 Mbit/s; DL Capacity: 750 Gbit/s/km²; UL Capacity: 125 Gbit/s/km²

Indoor DL: 1 Gbit/s; UL: 500 Mbit/s; DL Capacity: 15 Tbit/s/km²; UL Capacity: 2 Tbit/s/km²

Rural DL: 50 Mbit/s; UL: 25 Mbit/s; DL Capacity: 1 Gbit/s/km²; UL Capacity: 500 Mbit/s/km²

Broadcast DL: 200 Mbit/s/TV channel; UL: N/A or 500 kbit/user Tactile/Immersive Audio delay: 50 ms; Visual delay: 15 ms delay; Tactile

delay: 25–50 ms Wireless road-side

infrastructure backhaul

Max. end-to-end latency: 30 ms; Survival time: 100 ms;

Service availability: 99.9999% ;Reliability; 99.999%

Medical monitoring Max. end-to-end latency: < 100 ms; Service availability:

>99.9999%; MTBF (Mean Time Between Failures): >> 1 month

Cloud/Edge rendering Max.end-to-end latency: 5 ms; Reliability: 99.99% (UL), 99.9% (DL)

Gaming/Interactive Max. end-to-end latency: 10 ms; Reliability: 99.99%

Virtual reality Max. end-to-end latency: 5–10 ms; Reliability: 99.99%

Split AI/ML image recognition

Max. UL end-to-end latency: 2 ms; Reliability: 99.9%;

Service availability: > 99.999%

Split control for robotics

Max. DL end-to-end latency: 12 ms

AI/ML model distribution for image recognition

Max. DL end-to-end latency: 1 s; Reliability: 99.9% for transmission of model weight factors, 99.999% for transmission of model topology

The amount of network elements involved in the service is very large. Besides the classic layer 0 (photonic) to layer 3 (IP) architecture (the figure shows ethernet switches, routers, ROADMs), as envisioned by 5G, it includes the forwarding the user traffic through data center servers (in the figure, 5G core and Multi-Access Edge Computing (MEC)) and transit operators. Furthermore, carried traffic can contain a mix of backhaul and fronthaul flows, the latter with stringent transport requirements. In order to deal with this complexity, telecom operators have split their networks into three management domains: RAN (Radio Area Network), Transport and 5G core.²The reason for this is the fact that each segment has its specific infrastructure, requirements and protocols; consequently, each one is usually managed by a different team with different management tools. This makes end-to-end monitoring a complex task. In this article, we set aside all QoS aspects related to the air interface and focus on the fixed, fiber-based side of the service.

2 It should be noted that 3GPP names Access Network the infrastructure that interconnects the User Equipment (UE) with the core, which usually spans what operators call Access and the MAN.

(3)

Fig. 2. D-RAN elements, terminology and comparison with a conventional base station.

The main characteristics of the domainsand their challenges are the following:

• RAN domain. The RAN is made up of a combination of access technologies ranging from PON (WDM-PON, NG-PON2) to point- to-point fibers, Ethernet, FlexEthernet and MPLS-TP or IP/MPLS or Segment Routing. Fig. 1shows a scenario where the left UE (User Equipment) is connected to a 5G system implementing a distributed RAN (D-RAN) configuration. In other words, the base station is actually a RU (Remote Unit) implementing part of the radio signal processing chain, that forwards the digital radio signal to the DU (Distributed Unit) in ethernet frames, where the MAC function is run. The network latency budget for this transmission RU-DU (fronthaul) is 100 μs including propagation delay [2]. Then packets are further forwarded to the CU (Central Unit) with sub-ms budget (midhaul traffic), where the final data packets to be transported are generated (backhaul traffic). This CU function may be co-located with the DU or be placed deeper into the MAN (Metropolitan Area Network). In small MANs this DU/CU functionality may be fully centralized (C-RAN).

These three elements and the terminology are outlined inFig. 2.

The architecture can even support different placements of D-RAN network functions for each service. Low latency services require the CU to be as close to the user as possible. Finally, this D-RAN traffic may share the ethernet or IP infrastructure with backhaul traffic coming from gNBs, regular 5G base stations performing RU/DU/CU functions in the base station. A microwave network segment may also be present as an alternative to optical fiber backhaul. The major challenge in this domain is fulfilling the QoS of slices transported as backhaul while contending with high-priority fronthaul. IEEE802.1CM provides configurations for Ethernet, where fronthaul has maximum priority. However not all existing RAN infrastructure is based on Ethernet switching, and the impact of this priority scheme on lower priority traffic can be important. Therefore, besides traffic engineering, proper monitoring mechanisms need to be in place along with scope for action (e.g. planned alternative paths triggered by the monitoring applications) when high traffic conditions cause QoS violations.

There are OAM mechanisms to carry out such monitoring on a per technology basis, as reviewed in the next section. This, together with a progressive aperture of the control and OAM interfaces of devices, is expected to enable a proper end-to-end latency control, as implicitly required by the 5G slice concept.

• Transport Network domain. The transport network domain is a multi-layer IP/WDM infrastructure in large settings or IP/Ethernet in smaller ones. The IP/WDM transport technology may be OTN (Optical Transport Network ITU-T G.709), usually transporting Ethernet line signals between router interfaces in an ODU carrier.

The transport network spans the MAN, from the 5G edge nodes

the packet-switched layer above. The supervision of service level objectives of network slices need to be performed both at the IP layer and underlying technologies (MPLS-TP/Ethernet, FlexE).

On the other hand, when rate or latency bounds are not met, a smart integration of service and photonic layers’ control planes is indispensable to correct and adapt the number of optical channels to the demand. In fact, the options are manyfold, from hard isolation mechanisms (slices as wavelengths or as FlexE clients) to soft isolation ones (packet-based bandwidth management in enhanced VPNs). Section4provides insight on how this integration has become doable in the light of SDN (Software Defined Networking).

• 5G Core Network domain

The design of the 5G Core (5GC) by 3GPP has pursued to leverage the advantages of virtualization, with the possibility of instan- tiating the different constituent entities as virtual functions in distributed compute facilities across the network, in the form of virtual machines, containers, etc. The entity that provides connectivity at the data plane is known as User Plane Function (UPF), which is in charge of terminating the Generic Transport Protocol (GTP) tunnels established from the radio access node (Fig. 3). The access node can be the gNB (5G base stations) in the case of the traditional solution, or CU-UP (Central Unit- User Plane, the user plane part of the Packet Data Convergence Protocol) for Distributed RAN implementations (when radio functional split based architectures are deployed). The UPF can have an intermediate entity at the edge node (I-UPF) that allows to enforce flow policies in the edge, besides serving as a switching point to other cores. Details on further 5GC entities can be found in TS23.501 [3].

The aforementioned 5GC entities can be deployed across multiple distributed compute execution environments, interconnected through segments of the transport network for inter-cloud connectivity. The interconnection of those entities to the network, including the UPF, implies the interaction of the logical connections on the compute side with the physical connection of the compute infrastructure and the network elements in the transport network. In this context, implementing the UPF as microservices in a distributed compute environment in a latency-aware way as envisioned by 3GPP seems to be the most important challenge of this domain to be met in the near future. The UPF implemented as a pure cloud native network function without dedicated hardware can provide unprecedented levels of scalability, processing flexibility and re-programmability of user plane. However, the development of such real-time systems capable of beating custom silicon is at an early stage.

3. The basic tools: relevant OAM protocols

Continuous QoS monitoring of slices is necessary because any cost- effective network design needs to rely on statistical multiplexing, which usually leads to statistical guarantees. Moreover, the operator needs accurate real-time per-segment knowledge of the latency sources. Al- though, as the previous section showed, this is quite challenging, the fact that most technologies have their own latency measurement mechanisms and OAM protocols makes it viable. A quick review of this capability in currently involved technologies follows:

(4)

Fig. 3. The user plane function in the 5G architecture.

• 5G Access Network. 3GPP TS29.244 [4] at stage 3 describes the Implementation of the interface between the Control Plane and the User Plane Nodes. It specifies the support by 5G systems of QoS monitoring and defines the signaling to setup a reporting period or programming reports when the delay exceeds a thresh- old. Measurements are taken from the UE to either the UPF(PSA) or the I-UPF (Fig. 3). The former is the entity connected to the transport data network. The latter is an optional entity located at the MEC (Multi-Access Edge Computing) edge data center. The measurement can be uplink, downlink or round trip. This means that the mechanism to convey a session’s QoS in real time (either for the UE or per flow) exists, for the segments UE-5G Core. The breakdown of latencies in those segments and in the transport networks should be obtained and monitored by other means, as described in the next section. At provisioning time the operator may run standard testing procedures such as the Two-Way Active Measurement Protocol (TWAMP) (RFC5357) to check that the slice fulfills the target performance end to end at the IP layer.

• OTN and DWDM channels. OTN ITU-T Rec. G.709/Y.1331 sup- ports round-trip time measurements between Path Connection Monitoring End Points by means of Delay Measurement (DM) bits in the ODU overhead. The corresponding remote End Point is programmed to copy back the DM bit it receives in the returning ODU frames, in such a way that whenever the initiator End Point changes the DM bit, the round-trip time can be estimated by the time the change is detected back at the initiator End Point.

The resolution achieved is the duration of two OTU frames, for instance, at 100 Gb/s the error margin is 2.6 μs. This measurement can be performed on a per tandem connection basis, up to six nested tandem connections. Similar approaches can be followed in non-OTN DWDM systems at the link layer (e.g. Ethernet link OAM).

• Passive Optical Networks. In general, TDMA (Time Division Multiple Access) PONs have issues in guaranteeing short uplink access delay, unless specific DBA (Dynamic Bandwidth Alloca- tion) protocols can guarantee TDM-based reservations with fixed transmission slots within the 125 μs-frame so that it is possible to reduce the uplink access time as much as desired [5,6]. In general, the bandwidth requirements of fronthaul traffic also demand upstream capacities of 10 Gb/s and even more to comply with reduced upstream delay percentiles [7]. This is definitely a must to transport fronthaul traffic, given the 100 μs budget, or any other sub-ms service. WDM-PON standards overcome the jitter issues of TDMA PON. WDM-PON can properly work with IEEE 802.1ag OAM as they provide point-to-point links usually exploited by ethernet transceivers. TDM-PON have embedded OAM mechanisms dealing with connectivity fault management;

however, the main TDM-PON standards lack proper per-class QoS monitoring support and latency measurement from an upper layer is required, what makes WDM-PON a better option in this regard.

• Carrier Ethernet. A complete framework for Connectivity Fault Management (CFM) protocols exist: IEEE 802.1ag and ITU-T Y.1731. Regular in-band OAM frames allow the measurements, which include frame delay and frame delay variation measurement for multi-hop flows between Maintenance Points, based on configured Maintenance Associations. IEEE 802.3-2005 (previ- ously 802.3ah, Clause 57) includes Ethernet link OAM for a single link, which supports delay measurement only in loopback mode.

• FlexEthernet. FlexE was proposed by the Optical Internetwork- ing Forum (OIF), as a mechanism to decouple MAC and PHY layers of Ethernet clients; it features bonding, sub-rating, and channelization of 1 to 𝑚 100GBase-R PHYs (200G and 400G in the future) and can be used in Router to Transport connection scenarios where the mapping/de-mapping FlexE Shim layer allows to flexibly partition and assign bandwidth groups of 5 Gb/s slots to individual flows [8]. As mentioned above, FlexE has been identified as an alternative to implement slicing in the MAN and in the transport of D-RAN traffic. As an example of this application, in [9], a number of 5 Gb/s slots was calculated for the transport of different 5G New Radio configurations, showing that latency can be bounded if the FlexE groups are properly dimen- sioned. ITU-T G.8312 Recommendation adopting FlexE describes its own performance monitoring mechanism that includes delay measurement.

• MPLS, MPLS-TP. For MPLS, ITU-T Recommendations Y.1710 and Y.1711 deal with fault management, support OAM frames, but do not include delay measurements. However, ITU-T Recommen- dation ITU-T G.8113.1/Y.1372.1 (2012) does support proactive and on-demand delay measurements for MPLS-TP. The same management entities of Ethernet layer networks (adopted in ITU-T G.8010) are used in this recommendation, what makes convergence easier. One-way delay measurements are supported if accurate time synchronization is available at both Maintenance End Points.

All these technology-specific OAM mechanisms can be leveraged to create monitoring mechanisms able to breakdown the end-to-end latency into layers and domains. The goal is being able to automatically locate the network elements responsible for a latency noncompliance.

4. Multi-layer multi-domain OAM convergence efforts

The split of network management into domains described in Sec- tion2and the contradictory need to have real end-to-end control of QoS parameters by 5G make necessary to create a converged man- agement layer on top of a service bus where all the management domains are present. This idea as depicted inFig. 4. The Figure describes how the overall network and service management system em- beds all the management domains to provide a view of the end-to- end service with detailed information of the latency components. The

(5)

Fig. 4. Convergence of management domains and layers toward real-time latency monitoring and dissection.

picture shows the RPCs issued by the respective domains to management end points at the three layers: layer 2 (ethernet), layer 3 (IP/MPLS, IP/MPLS-TP, IP/Segment Routing) and the circuit-switched optical layer (TDM/WDM channels, OTN, FlexE/WDM, FlexE/OTN).

The system takes advantage of the OAM capabilities existing at each layer to measure delay, but they are invoked through standard protocols rather than proprietary ones.

Indeed, the progressive endorsement by operators and manufacturers of open multilayer OAM solutions [10] is enabling integrated and fully detailed view of latency across layer segments with SDN philos- ophy. Connectivity and latency supervision tools of most technologies are becoming available to controllers through NETCONF or RESTCONF, as equipment manufacturers keep opening and standardizing their interfaces. For example, In 2017, IEEE, MEF, and ITU-T SG15 started a liaison toward a IEEE 802.1 CFM YANG data model, whose result is IEEE 802.1Qcp-2018, available at GitHub.

When considering the request and instantiation of a network slice at the transport side, in line with the architecture depicted inFig. 4, it can be expected the presence of a hierarchical structure where the top SDN controller is able to orchestrate different per-technology domain SDN controllers below. This is for instance the proposed architecture in relevant projects such as the Telecom Infra Project MUST [10]. The hierarchical SDN controller is expected to support the functionality of Network Slice Controller (NSC), as defined in [11]. This NSC is expected to offer at its Northbound Interface (NBI) a data model able to support network slices requests from different customers, as could be the case of an upper management system as the one from 3GPP or others. The data model conceived for the NSC [12] intends to be a technology-agnostic model in which the network slice is requested expressing the needs in terms of connectivity and the associated Service Level Objectives (SLOs), but there are no indications on how the network slice should be realized in terms of the technology to be used for that purpose. The SLOs will allow to indicate constraints for the connectivity constructs between endpoints, including latency, jitter, bandwidth, etc. This will be the base for determining later on the planning of the latency component for the transport part of the end-to- end network slice. It should be noted that such data model also includes monitoring parameters for measurable SLOs such as the ones described,

allowing to present timely information about the compliance of the committed SLOs to the customer requesting the network slice.

It is the mission of the NSC to determine the technology for the slice realization, based on the SLOs and the connectivity matrix defined by the customer. With that purpose, the NSC, once the request has been processed, will trigger the provisioning process directly interacting with the aforementioned per-technology domain SDN controllers. In the case of the optical domain, the interface of reference for such interaction is the Transport API (TAPI). TAPI [13] is a standard API developed by the Open Networking Foundation (ONF) that allows a TAPI client (e.g., an orchestrator as described before) to control a domain of transport network equipment controlled by multiple TAPI servers (e.g. a specific per-technology domain controller). TAPI allows the control of network resources at different levels of abstraction, re- sembling TMN’s layered approach (Telecommunications Management Network). Given the importance of considering Operation and Main- tenance (OAM) mechanisms to ensure the proper behavior of the allocated resources and connectivity resources, TAPI introduced the native support of OAM services. For instance, the TAPI 2.0 model has been extended to support Maintenance Entities, Maintenance Entity Groups, Maintenance End Points and Maintenance Domain Intermediate Points, in line with the ITU-T Y.1731/IEEE 802.1ag specification. This allows the TAPI client to determine where monitoring points exist along a connectionand to launch measurement cycles between them.

TAPI follows a multi-layer and multi-technology approach (Fig. 4), including optical but also Ethernet technologies in scope. However, it is being mostly positioned as an interface for optical networks, as in [14].

In this case, when interacting with the NSC above, the NSC will instruct the optical controller to set up a number of optical paths compliant with the latency expectations informed by the customer through the network slice NBI, as well as the necessary OAM measurements to ensure that the service performance observed matches the customer expectations.

Another relevant contribution which is agnostic of the control framework is IETF LIME (Layer Independent OAM Management in the Multi-Layer Environment). This working group produced three RFCs (IETF Request For Comments), currently Proposed Standard:

two YANG data models for OAM protocols connection oriented and connectionless (RFC8531, RFC8532), and a retrieval method YANG data model for connectionless OAM (RFC8533). RFC8533 provides

(6)

technology-independent RPC (Remote Procedure Call) operations for OAM protocols that use connectionless communication, extensible with technology-specific details.Fig. 4tries to illustrate this concept. Each Management Domain issues RPCs (depicted by vertical arrows) to different Network Configuration Protocol (NETCONF) servers in order to execute secure OAM commands which are multi-layer and multi-technology. The horizontal arrows describe the breakdown of latency components at each level. The ‘‘connectionless-oam-methods’’

of module RFC8533 defines RPC ‘continuity-check’, similar to IP ping (RFC792, RFC4443) and MPLS LSP ping (RFC8029), and

‘path-discovery’ (equivalent to IP traceroute) operations.

Finally, it is worth mentioning other relevant research work providing novel ideas to support real-time slice latency monitoring not relying on existing OAM protocols. The authors of [15] propose to equip network slices with an end-to-end latency sensor as an additional function in the service chain. These sensors further allow to optimize latency through path reconfiguration of optically interconnected data centers.

5. Planning and configuring bounded latency for a slice

Estimating and planning the latency of slices can be a complex task due to the length of the transmission paths crossing lots of switching and data processing devices that make up the so-called VNF service chain. The authors of [16] propose and demonstrate the use of a Latency-Aware Service Chain Computation Element (LA-SCCE) as an evolution of the classical Path Computation Element (PCE) tuned for service chain allocations, where the path is constrained to traverse a se- quence of VNFs with latency requirements. Indeed, once the maximum processing times are known to the network designer it is possible to allocate resources smartly, keeping latency under control. However the time spent by packets traversing services running on Virtual Machines (VMs) of data centers can vary a lot (due to VM mobility, availability of parallel resources, NFV, etc.) and sufficient delay margins need to be considered in the design at the transport level.

The following simplified analysis provides a set of numerical exam- ples on delay bounds that can be used as reference rules for designing network slices with latency guarantees. This should be considered as a set of simplified guidelines, rather than a precise design procedure.

As an example, consider a network slice which requires that the 99th percentile delay is strictly below 250 μs, i.e., 𝑑_.99 = 250 μs. In other words, this means that 99% of the packets must experience a latency value of 250 μs or less. Similarly, 𝑑_.95or 𝑑_.90may be used as a design criteria. Additionally, let us consider a path traversing 𝐾 links from source to destination, where each link carries traffic of at most 50% of its capacity (i.e. 50% link-load) to provide spare capacity for backing up other connections. In particular, we consider the scenario ofFig. 4, between the CU/DU and the 5G Core.

Following the Kingman’s approximation for generic G/G/1 queue- ing models, the average waiting time in queue 𝐸(𝑊_𝑞) of a packet selected at random in a link with load 𝜌 follows [17] and can be expressed as:

𝐸(W_𝑞)≤ 𝐸[S] 𝜌 1 − 𝜌

𝐶²[T] + 𝐶²[S]

2 ,

where 𝐸[S] is the average service time of a packet, and 𝐶²[T], 𝐶²[S] are the coefficient of variation of both packet arrivals and service times.

For instance, 𝐸[S] = 0.12 μs for 1500-byte packets transmitted over 100 Gb/s links; also 𝐶²[T] and 𝐶²[S] are equal to 1 for Poisson traffic with exponentially-distributed service times. In such a case:

𝐸(W𝑞)≤ 𝐸[S] 0.5 1 − 0.5

1 + 1

2 = 𝐸[S] = 0.12 μs

The total delay per hop must take into account, not only queueing delay, but also processing (between 0.1 μs and 2 μs for high-speed switches), transmission (another 0.12 μs for a 1500 byte packet at

Fig. 5. Latency components for a distributed service of 𝐾 latency-bounded stages.

100 Gb/s) and propagation (this is 5 μs per Km of fiber) delay components. In total, let us consider that each IP hop contributes with a delay which is a random variable X𝑖 whose probability density function is unknown but bounded between 𝑎 and 𝑏 with a mean 𝐸(X𝑖), 𝑖 = 1, … , 𝐾.

In this scenario, we can apply the Hoeffding’s inequality which states the following: ‘‘Suppose X1,… , X𝐾 are independent random variables taking values in [𝑎, 𝑏], and let denote their sum X =∑𝐾

𝑖=1X𝑖with mean value 𝜇 = 𝐸(X)’’. Then for any value of 𝛿 > 0:

P (X > (1 + 𝛿)𝜇) < 𝑒⁻

2𝛿2 𝜇2 𝐾(𝑏−𝑎)2.

For instance, if all aforementioned latency components of each IP hop are modeled as a random variable bounded within [5, 50] μs with mean 20 μs in a 100 Gb/s MAN path, this formula lets us check that at least 99.986% of the packets will traverse the path under a target latency budget of 0.5 ms if the path has 𝐾 = 10 hops. However, this guarantee is reduced to 92.8% for a 15-hop path.

Beyond giving a practical estimation of percentiles of conventional packet forwarding latency, Hoeffding’s inequality is particularly useful in the context of network slices for chained NFVs or any type of distributed packet processing service whose latency components are bounded. For example, let us consider a network slice for an AI/ML image recognition service pipelined throughout 𝐾 processing nodes with a total end-to-end delay requirement of 2 ms (seeTable 1), and, for simplicity, let us assume that the processing can be performed on a per-packet basis (e.g. fingerprint matching). This scenario is illustrated inFig. 5, where each node is known to cause a variable delay between [0.1, 0.5] mswith an average value of 0.2 ms.

Let us assume that the number of processing hops 𝐾 = 5. Then 2 ms = (1 + 𝛿)𝜇, leading to a value of 𝛿 = _{5×0.2 ms}^{2 ms} − 1 = 1. Hence, the Hoeffding’s inequality states that:

P(X > 2 ms) = P(𝑋 > (1 + 1) ⋅ 5 ⋅ 0.2 ms) < 𝑒⁻

2⋅12 ⋅12 5⋅0.42 = 0.082

which means that, in such situation, less than 8.2% of the packets experience delays higher than 2 ms. In other words, 2 ms would represent the 91.8th delay percentile (i.e. 𝑑_.918= 2 ms). Finally,Table 2 shows the application of Hoeffding’s inequality for different use-case scenarios. It is worth noting that when the number of IP hops is reduced to 𝐾 = 3, almost all packets are guaranteed delays below 2 ms. If this is not possible, the latency in the processors needs to be improved.

Alternatively, relaxing the delay requirements of the service to 5 ms ensures that almost 100% of the packets are within the delay bound.

6. The case of fronthaul traffic

A special type of traffic deserving careful latency engineering is fronthaul traffic. As discussed in Section2, the radio access network is evolving toward disaggregated and distributed approaches. D-RAN technologies come at the expense of high bandwidth utilization and stringent delay and jitter requirements [18]. Standardization bodies

(7)

have defined different methods for implementing the D-RAN concept in packet-based transport networks. It is worth highlighting the work of the IEEE Time-Sensitive Networking (TSN) working group to support the transport of time-sensitive fronthaul data over Ethernet networks.

The use of Ethernet networks leverages the high penetration of low-cost Ethernet hardware along with the statistical multiplexing gains offered by packet-switched networks. In the IEEE 802.1CM standard [2,19], the TSN group gives important recommendations for the configuration of the transport ethernet network to enable a successful transmission of fronthaul data. Mainly, it defines two different classes of fronthaul interfaces: Class 1 for interfaces in which the signal processing chain is decomposed according to the CPRI standard [20], and Class 2 for those decompositions of E-UTRA base station in the intra-PHY splits, i.e., Splits 𝐈_Uand 𝐈𝐈_D. Additionally, IEEE 802.1CM suggests several timing distributions to fulfill the synchronization requirements of different features (handovers, MIMO, CoMP, etc.). These are (a) High Priority Fronthaul (HPF), including Class 1 IQ data and Class 2 User Plane data, with 100 μs maximum end-to-end one-way latency, (b) Medium Priority Fronthaul (MPF), including Class 2 User Plane slow data, and Control and management fast data, with 1 ms one-way latency budget, and (c) Low Priority Fronthaul (LPF), for Class 1 and Class 2 control and management data.

Fronthaul transport may also be seen as a particular form of network slice with the most stringent requirements of all the transported traffic. Although fronthaul traffic needs to co-exist with network slicing traffic in mixed backhaul-fronthaul (xHaul) environments, it should be remarked that fronthaul connectivity is not a 5G slice per se, as no functionality to a subset of UEs is realized. In fact a fronthaul flow normally carries the traffic of multiple slices.

If the slice is transported within the fronthaul flows, the stringent requirements of the fronthaul traffic are applied to those services on top of the slice while they are within the fronthaul network. The envisioned end-to-end latency for future 5G and beyond-5G networks depends on the application. Most will require latencies below a few milliseconds [21,22], for instance, in tactile internet, factory automation (≤ 1 ms), intelligent transportation systems (≤ 4 ms), etc. This translates into even more strict requirements at the lower layers of the signal processing chain if these services are transported in the fronthaul traffic. Regarding the fronthaul itself, the end-to-end Ethernet network latency target is 100 μs for CPRI as defined in 802.1CM [19].

This is a useful design parameter which shows the importance of the characterization of the queueing delay throughout the network. By keeping the queueing delay low, the budget for propagation and fabric switching delays can be expanded, increasing the reach of the network.

Some additional delay budget can be gained by embracing higher functional splitting of the signal processing chain, e.g., Intra-PHY or MAC-PHY splits, since these have more relaxed synchronization, delay, and jitter requirements. It should be noted that the analysis addresses only the packet transport latency, i.e. the time since the first bit of the packet is sent over the network interface until the last bit of the packet is delivered at the remote interface. Additional latency components exist at the endpoints, namely jitter compensation delay (matching the maximum queueing delay) and OFDM’s symbol reassembly time (since it travels in a burst of packets).

In some of our previous research, we studied the behavior of the aggregated fronthaul traffic and developed some tools that are useful

Fig. 6. Estimated vs simulated queueing delay in a fronthaul aggregator.

for the end-to-end latency planning. Namely, in [23] we estimate the average aggregated queueing delay assuming a G/G/1 queueing model.

In [17] we further extend the work to give a closed expression of the 𝑝th percentile queueing delay for the aggregation of eCPRI fronthaul traffic,

W^(𝑝)_𝑞 = max {

0, 𝐸[S] 1 1 − 𝜌

𝐶²[T] + 𝐶²[S]

2 ln( 𝜌

1 − 𝑝) }

,

where S and T are random variables modeling the service time and interarrival times of packets at the queue, and 𝜌 is the queue occupancy.

𝐶²[X] stands for the squared coefficient of variation of a random variable, e.g., X and is defined as 𝐶²[X] = ^Var[X]_E[X]₂. From there, we derive a set of rules for Ethernet-based fronthaul network dimensioning, using high delay percentiles as the key design metric, instead of conventional average delays. As a numerical example, consider the output port of a concentrator aggregating fronthaul traffic with 70% load (𝜌 = 0.7).

Assuming T =≃ 8.8 and S = 0 as typical values characterizing the fronthaul traffic (see [17]), we get that the 90th percentile of the queueing delay is W^(90th)𝑞 = 34.25 μs, which is roughly 30 times higher than the average value and significant compared with the total aforementioned budget of 100 μs.

InFig. 6, we apply this methodology to the aggregation of fronthaul flows carrying 20 MHz LTE channels. We compare the estimated queueing delay and the simulated values as we increase the aggregated traffic. Finally, it is worth mentioning a useful fronthaul network dimensioning tool [24] for those cases where the distance between the antennas and the processing units needs to be maximized. Based on the use of very high packet delay percentiles, this method represents an alternative to the use of maximum theoretical delay as the main dimensioning metric. By interpreting the gap between the extreme percentiles and the maximum worst-case delay as an extra delay budget, the fronthaul links can be further stretched while complying with the delay and frame loss ratio defined in IEEE 802.1CM. Experiments show that the fronthaul links’ lengths can be increased by 60% and 10% for 50 MHz and 100 MHz 5G New Radio channels, respectively.

On the other hand, if the slices are transported as backhaul traffic, they might share the same transport network and compete with fronthaul flows for the network resource. In this case, special attention must be put to the network planning so as to guarantee that the requirements of both the fronthaul and backhaul traffics are met. In general, the worst-case capacity available for backhaul should be decreased by the sum of all fronthaul flows at maximum load. Then the latency for a slice packet should be additionally increased by the maximum aggregate burst size of fronthaul flows converging on the output line. In previous

(8)

Table 3 Glossary.

3GPP 3rd Generation Partnership Projecthttps://www.3gpp.org/

5GC 5G Core

API Application Programming Interface

CU Central Unit

CFM Connectivity Fault Management DWDM Dense Wavelength Division Multiplexing IMS IP Multimedia Subsystem

ITS Intelligent Transportation Systems IoT Internet of Things

IP Internet Protocol DU Distributed Unit

FlexE Flexible Ethernethttps://www.oiforum.com/

LSP Labeled Switched Path MAN Metropolitan Area Network MEC Multi-Access Edge Computing MEF https://www.mef.net/

MPLS Multiprotocol Labeled Switching

MPLS-TP MPLS Transport Profilehttps://www.itu.int/rec/T-REC-G.8110.1/en NBI Northbound Interface

NETCONF Network Configuration Protocol RFC4741 NG-RAN Next-Generation Radio Area Network NG-PON2 Next-Generation PON version 2

https://www.itu.int/rec/T-REC-G.989 NSC Network Slice Controller

OAM Operations, Administration and Management OTN Optical Transport Network ITU-T Rec. G.709 PCE Path Computation Element

PNF Physical Network Function PON Passive Optical Network QoS Quality of Service RAN Radio Area Network RPC Remote Procedure Call

RU Remote Unit

ROADM Reconfigurable Optical Add-Drop Multiplexer SDN Software Defined Networking

SLO Service Level Objective TAPI Transport API

TETRA TErrestrial Trunked RAdiohttps://www.etsi.org/technologies/tetra TMN Telecommunications Management Network ITU-T Recommendation

series X.700

TWAMP Two-Way Active Measurement Protocol UE User Equipment (5G terminals)

URLLC Ultra-Reliable Low-Latency Communications UPF User Plane Function

V2X Vehicle-to-everything

VM Virtual Machine

VNF Virtual Network Function

YANG Data Modeling Language for NETCONF RFC6020

studies [25], we proposed a distributed mechanism to orchestrate the end-to-end network resources to comply with the various requirements of different services in a heterogeneous C-RAN network, including cellular and machine-to-machine communications, MEC, and fronthaul transport services in a converged architecture.

7. Conclusions

Currently, most 3GPP specifications relevant to 5G slicing are at stage 2. One main exception is 3GPP TS29.244 [4]. At stage 3, it describes the Implementation of the interface between the Control and User Plane Nodes, mandating QoS monitoring and detailing the signaling to be used to activate QoS reporting for a slice. However, there are a lot of open research issues, mainly those regarding scalability of resource reservations in the transport network and isolation among slices in statistically multiplexed resources, before the implementation of 5G slicing becomes a reality.

The need for sharing resources among slices to achieve cost-effect- iveness enforces the use of continuous performance monitoring on a per-slice basis. This allows to perform real-time verification of service levels and react quickly to non-compliances by network orchestrators.

This paper provides some practical formulas to guide rule-of-thumb

static latency planning both in packet networks carrying slices and fronthaul traffic. However, real-time monitoring enables smart dynamic reconfiguration of network resources. The trend to open OAM interfaces is a major advance toward this goal. By adopting a unified OAM approach, recent works on data-driven dynamic resource scheduling mechanisms, such as [26,27], gain practical relevance. These works aim to optimize slice configurations upon the analysis of network usage and performance data with techniques such as deep reinforcement learning.

CRediT authorship contribution statement

David Larrabeiti: Conceptualization, Writing – original draft, Writ- ing – review & editing.Luis M. Contreras: Conceptualization, Writing – original draft, Writing – review & editing.Gabriel Otero: Writing – review & editing, Validation.José Alberto Hernández: Writing – original draft, Writing – review & editing.Juan P. Fernandez-Palacios:

Conceptualization, Investigation.

Declaration of competing interest

The authors declare that they have no known competing finan- cial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

No data was used for the research described in the article.

Acknowledgments

This work has been partially funded by the EU H2020 project Int5Gent (grant no. 957403), as well as the NextGenerationEU project 6G-Xtreme (grant no. AEI/10.13039/501100011033), TAPIR-CM (grant no. P2018 /TCS-4496) and ACHILLES (grant no. PID2019-104207RB- I00). Funding for APC: Universidad Carlos III de Madrid (Read &

Publish Agreement CRUE-CSIC 2022)

References

[1] 3GPP, Service requirements for the 5G system; stage 1, Technical Spec- ification (TS), (21.261) 3rd Generation Partnership Project (3GPP), 2022, URLhttps://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.

aspx?specificationId=3107. version 18.6.0.

[2] IEEE802.1CMde, IEEE 802.1cmde-2020 - time-sensitive networking for fronthaul amendment: Enhancements to fronthaul profiles to support new fronthaul interface, synchronization, and syntonization standards, 2020, URLhttps://standards.

ieee.org/standard/802_1CMde-2020.html.

[3] 3GPP, System architecture for the 5G System (5GS); Stage 2 (Release 16), Technical Specification (TS), (23.501) 3rd Generation Partnership Project (3GPP), 2022, Version 16.12.0.

[4] 3GPP, Interface between the Control Plane and the User Plane Nodes; Stage 3 (Release 17), Technical Specification (TS), (29.244) 3rd Generation Partnership Project (3GPP), 2022, Version 17.4.0.

[5] R. Bonk, T. Pfeiffer, New use cases for PONs beyond residential services, in:

2022 Optical Fiber Communications Conference and Exhibition, OFC, 2022, pp.

1–3.

[6] D. Eugui, J.A. Hernández, Analysis of a hybrid fixed-elastic DBA with guaranteed fronthaul delay in XG(s)-PONs, Comput. Netw. 164 (2019) 106907, http://

dx.doi.org/10.1016/j.comnet.2019.106907, URLhttps://www.sciencedirect.com/

science/article/pii/S1389128619302117.

[7] J.A. Hernandez, A. Ebrahimzadeh, M. Maier, D. Larrabeiti, Learning EPON delay models from data: a machine learning approach, J. Opt. Commun. Netw. 13 (12) (2021) 322–330,http://dx.doi.org/10.1364/JOCN.437414.

[8] A. Eira, A. Pereira, J. Pires, J. Pedro, On the efficiency of flexible ethernet client architectures in optical transport networks, J. Opt. Commun. Netw. 10 (1) (2018) A133–A143.

[9] J.A. Hernández, G. Otero, D. Larrabeiti, O.G. de Dios, Dimensioning flex ethernet groups for the transport of 5G NR fronthaul traffic in C-RAN scenarios, in: 2021 International Conference on Optical Network Design and Modeling, ONDM, 2021, pp. 1–3,http://dx.doi.org/10.23919/ONDM51796.2021.9492417.

(9)

io/D8DI15S7/as/557f4z3n738v4cww28qjxh6/MUST_Optical_Controller_NBI_

Requirements_Document_v10_FINAL_VERSION_WEBSITE.pdf.

[15] R. Montero, F. Agraz, A. Pagès, S. Spadaro, End-to-end network slicing in support of latency-sensitive 5g services, in: A. Tzanakaki, M. Varvarigos, R. Muñoz, R. Nejabati, N. Yoshikane, M. Anastasopoulos, J. Marquez-Barja (Eds.), Optical Network Design and Modeling, Springer International Publishing, Cham, 2020, pp. 51–61.

[16] F. Moreno-Muro, C. San-Nicolás-Martínez, M. Garrich, P. Pavon-Marino, O.G.

De Dios, R.L. Da Silva, Latency-aware optimization of service chain allocation with joint vnf instantiation and SDN metro network control, in: 2018 European Conference on Optical Communication, ECOC, 2018, pp. 1–3,http://dx.doi.org/

10.1109/ECOC.2018.8535492.

[17] G.O. Pérez, J.A. Hernández, D. Larrabeiti, Fronthaul network modeling and dimensioning meeting ultra-low latency requirements for 5G, J. Opt. Commun.

Netw. 10 (6) (2018) 573–581,http://dx.doi.org/10.1364/JOCN.10.000573.

[18] O-RAN, Xhaul Transport Requirements XTRP-REQ-v01.00 , Technical Report, (29.244) O-RAN, 2021, Version 17.4.0.

Wireless Broadband, ICUWB, 2017, pp. 1–5,http://dx.doi.org/10.1109/ICUWB.

2017.8250956.

[24] G. Otero Pérez, D. Larrabeiti López, J.A. Hernández, 5G new radio fronthaul network design for eCPRI-IEEE 802.1CM and extreme latency percentiles, IEEE Access 7 (2019) 82218–82230, http://dx.doi.org/10.1109/ACCESS.2019.

2923020.

[25] G.O. Pérez, A. Ebrahimzadeh, M. Maier, J.A. Hernández, D.L. López, M.F. Veiga, Decentralized coordination of converged tactile internet and MEC services in H- CRAN fiber wireless networks, J. Lightwave Technol. 38 (18) (2020) 4935–4947, http://dx.doi.org/10.1109/JLT.2020.2998001.

[26] H. Wang, Y. Wu, G. Min, J. Xu, P. Tang, Data-driven dynamic resource scheduling for network slicing: A Deep reinforcement learning approach, Inform. Sci.

498 (2019) 106–116, http://dx.doi.org/10.1016/j.ins.2019.05.012, URL https:

//www.sciencedirect.com/science/article/pii/S0020025519303986.

[27] B. Han, H.D. Schotten, Machine learning for network slicing resource management: A comprehensive survey, 2020, CoRR abs/2001.07974. URL https:

//arxiv.org/abs/2001.07974,arXiv:2001.07974.