WO2023272722A1

WO2023272722A1 - Method and apparatus for packet forwarding control with reinforcement learning

Info

Publication number: WO2023272722A1
Application number: PCT/CN2021/104265
Authority: WO
Inventors: Bolin NIE
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2023-01-05

Abstract

Embodiments of the present disclosure provide method and apparatus for packet forwarding control with reinforcement learning which is performed by a packet forwarding entity. A method performed by a packet forwarding entity comprises obtaining state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface, determining a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment, determining at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment, mapping the at least one output action to at least one control value for the at least one packet forwarding control parameter, and applying the at least one control value for the at least one packet forwarding control parameter.

Description

METHOD AND APPARATUS FOR PACKET FORWARDING CONTROL WITH REINFORCEMENT LEARNING

TECHNICAL FIELD

The non-limiting and exemplary embodiments of the present disclosure generally relate to the technical field of communications, and specifically to methods and apparatuses for packet forwarding control with reinforcement learning.

BACKGROUND

This section introduces aspects that may facilitate a better understanding of the disclosure. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is in the prior art or what is not in the prior art.

With the development of communication network such as fifth generation (5G) network, diversified business scenarios in the communication network face increasingly differentiated services with multiple dimensions of QoS (qualify of service) or traffic management (TM) requirements on bandwidth, packet delay (latency) , packet loss, etc. This creates new challenges to the QoS/TM solutions for communication network. A commonly used QoS/TM solution in the communication network is based on the DiffServ architecture from Internet engineering task force (IETF) Request for Comments (RFC) 2475, RFC 3086, RFC 2983, RFC 2597, RFC 3246, etc., the disclosure of which is incorporated by reference herein in their entirety. DiffServ architecture operates on classifying traffic into different behavior aggregates identified by DS (Differentiated Service) codepoint or VLAN (virtual LAN (local area network) ) priority, or MPLS (Multiprotocol Label Switching) EXP (experimental) bits, based on which to perform PHB (per hop behavior) forwarding.

SUMMARY

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

The QoS/TM functions to support PHB may include policing, scheduling, shaping, queuing and congestion management algorithms, buffer management algorithms, and other TM (traffic management) algorithms provided by specific forwarding chips. These QoS/TM functions have numerous control parameters, some of which may be empirically configured from management and control plane according to for example service requirements and network planning, and some of which may be statically tuned by forwarding plane.

FIG. 1 shows a diagram of simplified QoS/TM function blocks on forwarding plane according to an embodiment of the present disclosure. The management and control plane may empirically configure some QoS/TM parameter (s) to the forward plane. Some QoS/TM parameter (s) may be fixed or statically tuned. The ingress packets may be classified. A corresponding policing and/or metering may be applied to the classified packets. Then the classified packets may be put into various queues for example according to enqueue acceptance algorithms. Then congestion management, buffer management and other QoS/TM mechanism (s) may be applied per queue and/or per TC (traffic class) and/or per DP (drop precedence) and/or per queue group and/or per global. The scheduling algorithms may be applied on the packets in the queues. The output packets may be shaped and output by the egress interface.

In 5G networks as defined by 3GPP (3rd Generation Partnership Project) TS 23.501 V15.4.0, network slicing is introduced to support diverse service requirements on the top of the same physical network infrastructure. Each network slice is an isolated end-to-end logical network tailored to serve a defined business purpose, consisting of required network resources configured together. The management of network slicing is the joint end-to-end coordination of the networks (such as RAN (radio access network) , transport networks, and core networks) and of the planes (such as management plane, control plane, and forwarding plane) with the utilization of global information. Each slice of transport network is required to have isolated resources, be enabled for services, and required to support QoS/TM differentiation, which means the traditional QoS and TM techniques on forwarding plane can be applied within a slice.

QoS and TM need dynamic control (tuning) with dynamic optimization

QoS and TM on forwarding plane of packet networks needs dynamic control (tuning) with dynamic optimization due to various reasons. Dynamic traffic and instantaneous traffic congestion in packet networks may impact QoS and TM performance in a dynamic way. To achieve optimized QoS and TM performance on forwarding plane, the control needs to dynamically drive the optimized balance among different dimensions of QoS/TM requirements like bandwidth, packet delay (latency) , and packet loss ratio, etc. Additionally, the characteristics of traffic in packet networks may have dynamic patterns which may be dynamically predicted and in turn assist further optimization of dynamic control of QoS and TM.

Dynamic traffic and congestion in packet network

Due to some factors interacted, the traffic in packet networks is dynamic with characteristics like self-similar as described in “Self-similar traffic and network dynamics, A. Erramilli, M. Roughan, D. Veitch, Proceedings of the IEEE, 90 (5) : 800-819, 2002” , long scale correlation, long term correlation, short term bursty, and so on, which lead to inevitable dynamic traffic congestion. For example, the factors may comprise at least one of:

● a combination of multiple service sessions of the same or different types

● traffic over-subscription by services which have elastic traffic

● unbalanced instantaneous traffic due to dynamic routing or forwarding protocols, e.g., broadcast scenarios triggered by switch-over in EVPN (Ethernet Virtual Private Network) multihoming

● dynamic convergency due to resilience features

● uneven traffic due to reasons like

○ larger bandwidth capability (e.g. 100Gbps) of one ingress interface than the egress interface (e.g. 10Gbps)

○ bursty traffic from massive machine type communications services

○ bursty traffic from massive concurrent protocol sessions like PTP (Precision Time Protocol) or EOAM (Ethernet Operations, Administration, and Maintenance)

● dynamic flow control and re-transmission mechanisms

● other possible factors, like instantaneous high CPU (central processing unit) load causing some abnormal cases, some cases triggering forwarding recycling in forwarding pipeline which causes egress bandwidth half reduced, etc.

The dynamic traffic and consequent dynamic congestion impact QoS and TM performance in a dynamic way on multiple dimensions including bandwidth, packet delay, packet loss, etc., which means static configuration of QoS/TM control parameters cannot achieve optimal performance.

Multi-dimensional optimization of QoS and TM

Service-aware QoS/TM has multiple dimensions of requirements as described in 3GPP TS 23.501 V15.4.0, like bandwidth (guaranteed min rate and/or max rate) , packet delay, packet loss rate, etc. Hence QoS and TM optimization needs to drive a dynamic trade-off curve among different dimensions of QoS/TM performance.

For a simplified example, the max length of a queue (e.g. VoQ (virtual output queue) ) is a key control parameter of QoS/TM which should better be adjusted dynamically for different traffic situations. When the max length of a queue has a bigger value, it means better capability of holding bursts which leads to smaller packet loss ratio but bigger average packet queuing latency, and vice versa. The decision of this threshold needs to weigh between packet latency (delay) and packet loss ratio according to the corresponding requirements of carried services on this queue and the dynamic performance status.

A significant part of QoS and TM control parameters have the similar impacts on multiple dimensions of performance but in a more complex way because the outcome is the interactions of the impact factors. The optimization may better consider the linkage of multiple QoS/TM functions with multiple control parameters as a whole when driving the optimal performance balance among multiple dimensions of requirements.

Traffic pattern prediction

The dynamic traffic in packet networks is the interaction outcome of a variety of complex factors which have patterns, including service types and their combinations, regularities of business scenarios, regularities of application use, mobility pattern of end users in mobile networks, network topology characteristics, running convergency patterns of network dynamic routing and/or forwarding protocols, network packet retransmission mechanisms, etc.

These non-linear patterns are predictable to some extent by machine learning in theory. The machine learning by DNN (deep neural networks) is a useful mechanism for extracting complex non-linear spatial &temporal characteristics among competing flows, based on which to predict its dynamic behaviors is a promising solution direction. The prediction of dynamic traffic behavior can in turn contribute to better optimization of QoS and TM performance.

The control of forwarding QoS/TM needs dynamic tuning/optimization in multi-dimensional way. Specifically, it may achieve at least one of:

○ drive the optimal balance among multiple dimensions of QoS/TM performance (e.g., bandwidth, packet delay (latency) , and packet loss ratio) .

○ The optimization may be service-aware from the perspective that different service class may have different budget for a dimension of QoS/TM performance.

○ drive the optimal balance of fairness among service class.

○ make joint optimization of multiple QoS/TM functions as a whole

○ has flexibility of smooth integrating with existing Diffserv solutions, and with network slicing solution.

○ take advantage of the prediction of traffic dynamic behaviors for further/better optimization.

There are some problems of existing QoS and TM solutions. For example, forwarding plane cannot sense and/or does not consider QoS/TM requirements of packet delay (latency) or packet loss ratio when trading-off among competing flows. However, a significant part of the forwarding QoS/TM parameters impact two or three performance dimensions of the bandwidth, packet delay, and packet loss-ratio simultaneously. Hence, forwarding plane has poor capability of service-awareness of controlling multiple dimensions of QoS/TM performance fairness of bandwidth, packet latency, and packet drop ratio among competing flows, consequently cannot seek for optimized trade-off among bandwidth/packet-delay/packet-loss-ratio.

Forwarding plane cannot support diverse ingress traffic dynamics well due to lots of manual and static tuned parameters in the existing QoS/TM solutions such as diffserv-based QoS/TM solutions. Lots of manual and static tuned parameters in the existing Diffserv-based QoS are based on “certain fixed traffic model (s) ” with cumbersome methods like manual try and error/observe, hence obviously they neither consider various kinds of traffic dynamics, nor can utilize any traffic pattern prediction.

In other word, forwarding plane is not able to be auto tuned or optimized to adapt to the diverse variation of input traffic of various competing flows (or service class (es) ) . In practice, most QoS/TM parameters are usually manually and/or statically tuned under a certain traffic model, which cannot support different traffic models well and consequently lead to QoS/TM performance limitations and/or QoS/TM precision limitations.

Forwarding plane is not able to make joint optimization of multiple selected on-chip QoS/TM functions as a whole.

Forwarding plane is not able to predict or utilize the prediction of input traffic pattern.

There are diverse (heterogeneous) QoS/TM capabilities provided by different forwarding chip versions from the same chip company, or from different chip companies. Various special or advanced QoS/TM functions supported by specific forwarding chips may not be supported by other chip versions. Therefore, management and control planes usually do not have sufficient knowledge of fully and uniformly utilizing the special or advanced forwarding QoS/TM capabilities provided by a specific forwarding chip, especially by different forwarding chips. These QoS/TM capabilities are not fully utilized, not mention are utilized in a joint optimal way. Consequently, either some useful QoS/TM parameters can only be statically configured once at node startup phase instead of dynamical optimization, or some advanced QoS/TM features provided by forwarding chip are in hibernation, which may have been fully utilized to achieve better QoS/TM performance.

There are some potential problems of integrating forwarding QoS/TM solutions for network slicing. Due to resource isolation, more fine-grained slicing means worse utilization (or more waste) of forwarding bandwidth and other HW resources. Hence each slice may try to serve more services, which belong to the similar type of service scenario (e.g., eMMB (enhanced Mobile Broadband) , uRLLC (Ultra Reliable Low Latency Communications) , mMTC (massive Machine Type Communications) ) but still have different QoS/TM requirements on multiple dimensions (typically on bandwidth, packet delay, packet drop ratio, etc) .

Another scenario of dividing network slice is per big/important business customer, which means mixed services within one slice to some extent is also reasonable business scenario which require better QoS/TM support in one slice.

Therefore, within one slice, QoS/TM still has the similar technical problems as those of traditional Diffserv architecture or solutions, some of which are mitigated, but still exist.

Artificial intelligence (AI) or machine learning (ML) may be placed in management and control planes as central control point, which is good to utilize global information in end-to-end orchestration, but insufficient for fine/further optimization of forwarding QoS/TM due to some challenges. For example, it is difficult to achieve real-time optimization of control due to round-trip signaling latency on the path between control plane (such as SDN (Software-defined networking) controller or orchestration point) and forwarding chips (such as SDN agent) . To better cope with fast-changing behaviors of dynamic input traffic and consequent instantaneous congestion, and additionally better utilization of any traffic prediction in small time scale, prompt control with optimization as response to real-time traffic situation may be one of the keys for optimal performance.

With the problems described above, if there is a mechanism to utilize existing and future various unexploited QoS/TM capabilities supported by forwarding chips for automatic fine optimization of QoS/TM without much intervention by management and control planes, it may save time-to-market and cost to develop related new features across management and control planes (means CAPAX (Capital Expenditure) reduction) , and avoid unnecessary complexity exposed to or added into the management and control planes (means OPEX reduction in maintenance or customer training) .

Centralization of all the complex fine optimization of QoS/TM control of all forwarding leaf nodes may lead to serious scalability issue in design, deployment, and operation. As contrast, offload complexity of further or fine optimization to distributed forwarding nodes can avoid the issue of complexity explosion.

The dynamic control of QoS/TM is a complex optimization problem of sequential control decisions, which can be essentially modeled by Markov decision process. With various input traffic characteristics, how to real-time drive the dynamic optimal balance is very challenging and too complex to deduce analytical solution by mathematical model.

To overcome or mitigate at least one above mentioned problems or other problems, an improved solution of packet forwarding control especially for QoS/TM functions, may be desirable.

In a first aspect of the disclosure, there is provided a method performed by a packet forwarding entity. The method comprises obtaining state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface. The method further comprises determining a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment. The method further comprises determining at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment. The method further comprises mapping the at least one output action to at least one control value for the at least one packet forwarding control parameter. The method further comprises applying the at least one control value for the at least one packet forwarding control parameter.

In an embodiment, the state information of packet forwarding environment comprises at least one of real-time input traffic characteristics, real-time state of forwarding quality of service (QoS) performance and/or traffic management performance, and/or real-time state of hardware resource related to QoS and/or traffic management, and/or at least one QoS budget indicator, and/or at least one QoS performance weight, and/or control related information for the at least one packet forwarding control parameter.

In an embodiment, the at least first part of the state information of packet forwarding environment comprises at least one of real-time input traffic characteristics, and/or real-time state of forwarding QoS performance and/or traffic management performance, and/or real-time state of hardware resource related to QoS and/or traffic management, and/or at least one QoS budget indicator, and/or at least one QoS performance weight.

In an embodiment, the at least second part of the state information of packet forwarding environment comprises at least one of real-time input traffic characteristics, and/or real-time state of forwarding QoS performance and/or traffic management performance, and/or real-time state of hardware resource related to QoS and/or traffic management, and/or control related information for the at least one packet forwarding control parameter, and/or at least one QoS budget indicator, and/or at least one QoS performance weight.

In an embodiment, at least one of the at least one QoS budget indicator, and/or the at least one QoS performance weight, and/or the control related information for the at least one packet forwarding control parameter aware by a network management and/or control plane is received from a network control entity via a network interface.

In an embodiment, the control related information for the at least one packet forwarding control parameter unaware by a network management and/or control plane is obtained from the packet forwarding entity.

In an embodiment, the real-time input traffic characteristics comprises at least one of an ingress instantaneous rate of a service class, an ingress average rate of a service class, an instantaneous packet size of a service class, or an average packet size of a service class.

In an embodiment, the real-time state of forwarding QoS performance and/or traffic management performance comprises at least one of: a real output instantaneous rate of a service class, a real output average rate of a service class, a real instantaneous packet drop ratio of a service class, a real average packet drop ratio of a service class, a real maximum packet latency of a service class, a real minimum packet latency of a service class, or a real average packet latency of a service class.

In an embodiment, the real-time state of hardware resource related to QoS and/or traffic management comprises at least one of: queuing status, buffer status, or bandwidth status.

In an embodiment, the control related information for the at least one packet forwarding control parameter comprises at least one of: a baseline value of a packet forwarding control parameter, a control mode of a packet forwarding control parameter, a tune ratio of a packet forwarding control parameter, a minimal value of a packet forwarding control parameter, or a maximal value of a packet forwarding control parameter.

In an embodiment, the at least one QoS budget indicator comprises at least one of: a budget indicator for QoS requirement on packet latency for a service class, a budget indicator for QoS requirement on packet loss ratio for a service class, or a budget indicator for QoS requirement on traffic bandwidth for a service class.

In an embodiment, the at least one QoS performance weight comprises at least one of: a weight for forwarding QoS performance on packet latency for a service class, a weight for forwarding QoS performance on packet loss ratio for a service class, a weight for forwarding QoS performance on traffic bandwidth for a service class, or a weight of a service class.

In an embodiment, the at least one output action for at least one packet forwarding control parameter is determined by an agent of reinforcement learning.

In an embodiment, the agent of reinforcement learning comprises an agent of deep reinforcement learning.

In an embodiment, the agent of reinforcement learning is implemented based on at least a function approximator supporting continuous state space and continuous action space or supporting continuous state space and discrete action space.

In an embodiment, the agent of deep reinforcement learning is implemented based on at least one deep neural network supporting continuous state space and continuous action space or supporting continuous state space and discrete action space.

In an embodiment, the at least one deep neural network comprises at least one of convolutional neural network, recurrent neural network or attention neural network.

In an embodiment, mapping the at least one output action to at least one control value for the at least one packet forwarding control parameter comprises mapping the at least one output action to at least one control value for the at least one packet forwarding control parameter based on at least one of: a control mode of the at least one packet forwarding control parameter, a baseline value, a tune ratio, or a specified value range.

In an embodiment, a control mode of a packet forwarding control parameter comprises at least one of: a control mode indicating the packet forwarding control parameter is controlled by an agent of reinforcement learning in a packet forwarding entity based on at least one of a tune ratio, a specified value range, or an initial baseline value, a control mode indicating the packet forwarding control parameter is not allowed to be controlled by an agent of reinforcement learning in a packet forwarding entity, or a control mode indicating the packet forwarding control parameter is freely controlled by an agent of reinforcement learning in a packet forwarding entity.

In an embodiment, the at least one packet forwarding control parameter comprises at least one of: one or more packet forwarding control parameters for QoS function, or one or more packet forwarding control parameters for traffic management function.

In an embodiment, the reward score of the state information of packet forwarding environment is determined based on at least one of below factors or a weighted combination of at least one of below factors. A positive reward score is given to a situation that all service classes have zero queuing packet. For a service class, a larger elastic bandwidth relative to a corresponding bandwidth budget, a larger reward component of the service class is given. For a service class, more service-aware fairness of elastic bandwidth compares to other service classes, a larger reward component of the service class is given. For a service class, a smaller packet latency relative to a corresponding packet latency budget, a larger reward component of the service class is given. For a service class, more service-aware fairness of packet latency compares to other service classes, a larger reward component of the service class is given. For a service class, a smaller drop ratio relative to a corresponding drop ratio budget, a larger reward component of the service class is given. For a service class, more service-aware fairness of drop ratio compares to other service classes, a larger reward component of the service class is given.

In an embodiment, there is corresponding state information of packet forwarding environment for a specific service class.

In an embodiment, the specific service class is identified by at least one of traffic class, or drop precedence, or a combination of traffic class and drop precedence.

In a second aspect of the disclosure, there is provided a method performed by a network control entity. The method comprises obtaining information used for participating in determining a reward score of state information of a packet forwarding environment and/or control related information for at least one packet forwarding control parameter. The method further comprises sending the information used for participating in determining the reward score of the state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter to a packet forwarding entity via a network interface. The control related information for at least one packet forwarding control parameter is used for participating in determining at least one output action for the at least one packet forwarding control parameter.

In an embodiment, the information used for participating in determining the reward score of the state information of the packet forwarding environment comprises at least one of at least one QoS budget indicator, or at least one QoS performance weight.

In an embodiment, the at least one QoS budget indicator comprises at least one of a budget indicator for QoS requirement on packet latency for a service class, a budget indicator for QoS requirement on packet loss ratio for a service class, or a budget indicator for QoS requirement on traffic bandwidth for a service class.

In an embodiment, the at least one QoS performance weight comprises at least one of a weight for forwarding QoS performance on packet latency for a service class, a weight for forwarding QoS performance on packet loss ratio for a service class, a weight for forwarding QoS performance on traffic bandwidth for a service class, or a weight of a service class.

In an embodiment, a QoS budget indicator is determined based on at least one of: service level agreement, flow-path mapping, or path topology.

In an embodiment, the information used for participating in determining the reward score of state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter of the packet forwarding entity are sent to the packet forwarding entity via a network interface. The network interface may be existing or future southbound interface between network control plane and packet forwarding plane. There are southbound interfaces of network control plane such as Netconf/Yang or OpenFlow Management and Configuration Protocol or Simple Network Management Protocol (SNMP) or OpenConfig or Programming protocol-independent packet processors (P4) or PCEP (Path Computation Element Protocol) , etc.

In an embodiment, the at least one output action for at least one packet forwarding control parameter is determined by an agent of reinforcement learning in a packet forwarding entity.

In a third aspect of the disclosure, there is provided a packet forwarding entity. The packet forwarding entity comprises a processor and a memory coupled to the processor. Said memory contains instructions executable by said processor. Said packet forwarding entity is operative to obtain state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface. Said packet forwarding entity is further operative to determine a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment. Said packet forwarding entity is further operative to determine at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment. Said packet forwarding entity is further operative to map the at least one output action to at least one control value for the at least one packet forwarding control parameter. Said packet forwarding entity is further operative to apply the at least one control value for the at least one packet forwarding control parameter.

In a fourth aspect of the disclosure, there is provided a network control entity. The network control entity comprises a processor and a memory coupled to the processor. Said memory contains instructions executable by said processor. Said network control entity is operative to obtain information used for participating in determining a reward score of state information of a packet forwarding environment and/or control related information for at least one packet forwarding control parameter. Said network control entity is further operative to send the information used for participating in determining the reward score of the state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter to a packet forwarding entity via a network interface. The control related information for at least one packet forwarding control parameter is used for participating in determining at least one output action for the at least one packet forwarding control parameter.

In a fifth aspect of the disclosure, there is provided a packet forwarding entity. The packet forwarding entity comprises an obtaining module, a first determining module, a second determining module, a mapping module and an applying module. The obtaining module may be configured to obtain state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface. The first determining module may be configured to determine a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment. The second determining module may be configured to determine at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment. The mapping module may be configured to map the at least one output action to at least one control value for the at least one packet forwarding control parameter. The applying module may be configured to apply the at least one control value for the at least one packet forwarding control parameter.

In an embodiment, the packet forwarding entity comprises a receiving module configured to receive at least one of the at least one QoS budget indicator, the at least one QoS performance weight, or the control related information for the at least one packet forwarding control parameter aware by a network management and/or control plane from a network control entity.

In a sixth aspect of the disclosure, there is provided a network control entity. The network control entity comprises an obtaining module and a sending module. The obtaining module may be configured to obtain information used for participating in determining a reward score of state information of a packet forwarding environment and/or control related information for at least one packet forwarding control parameter. The sending module may be configured to send the information used for participating in determining the reward score of the state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter to a packet forwarding entity via a network interface. The control related information for at least one packet forwarding control parameter is used for participating in determining at least one output action for the at least one packet forwarding control parameter

In a seventh aspect of the disclosure, there is provided a computer program product comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out any of the methods according to the first and second aspects of the disclosure.

In an eighth aspect of the disclosure, there is provided a computer-readable storage medium storing instructions which, when executed on at least one processor, cause the at least one processor to carry out any of the methods according to the first and second aspects of the disclosure.

Embodiments herein afford many advantages, of which a non-exhaustive list of examples follows. In some embodiments herein, the forwarding plane has capability to sense and trade-off QoS requirements of packet delay (latency) and loss ratio, besides bandwidth, among competing flows on forwarding plane. In some embodiments herein, it can achieve much better service-awareness for controlling multiple dimensions of QoS/TM performance on forwarding plane. In some embodiments herein, the packet forwarding entity can have capability to do auto-tuning for the high-dimensional optimization problem of forwarding QoS/TM. In some embodiments herein, the packet forwarding entity can have capability of flexible joint control with optimization on multiple selected QoS/TM functions as an integrated whole (i.e., joint control with optimization and flexible scope) . In some embodiments herein, the packet forwarding entity can have capability of utilizing various/heterogenous special but dormant existing and future QoS/TM capabilities of different forwarding chips without intervention by management and control planes. In some embodiments herein, it can save time-to-market and cost to develop related new features across management and control planes (means CAPAX reduction) . In some embodiments herein, it can avoid unnecessary complexity exposed to or added into management and control planes (means OPEX reduction in maintenance or customer training) . In some embodiments herein, the packet forwarding entity can have capability of utilizing the prediction of ingress traffic patterns by means of DRL deep neural networks in the optimization of forwarding QoS/TM. In some embodiments herein, the packet forwarding entity can have flexibility of weighted trade-off among different dimensions of QoS performance. In some embodiments herein, it can provide flexibility of weighted trade-off of fairness among different traffic classes. In some embodiments herein, it can provide flexibility of smooth integration with existing DiffServ QoS solutions. In some embodiments herein, it can provide flexibility of smooth integration with network slicing evolution, or with any other intelligence from management and control planes on QoS/TM control. The embodiments herein are not limited to the features and advantages mentioned above. A person skilled in the art will recognize additional features and advantages upon reading the following detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features, and benefits of various embodiments of the present disclosure will become more fully apparent, by way of example, from the following detailed description with reference to the accompanying drawings, in which like reference numerals or letters are used to designate like or equivalent elements. The drawings are illustrated for facilitating better understanding of the embodiments of the disclosure and not necessarily drawn to scale, in which:

FIG. 1 shows a diagram of simplified QoS/TM function blocks on forwarding plane according to an embodiment of the present disclosure;

FIG. 2 shows an exemplary system architecture according to an embodiment of the present disclosure;

FIG. 3 shows a flowchart of a method according to an embodiment of the present disclosure;

FIG. 4 shows an example of a simplified design of DRL state space according to an embodiment of the present disclosure;

FIG. 5 shows an example of a general design of DRL state space according to an embodiment of the present disclosure;

FIG. 6 shows an example of a structure of DRL agent according to an embodiment of the present disclosure;

FIG. 7 shows an example of a network structure for the training stage according to an embodiment of the present disclosure;

FIG. 8 shows an example of a network structure for the training stage according to another embodiment of the present disclosure;

FIG. 9a shows an example of a structure of a policy neural network according to an embodiment of the present disclosure;

FIG. 9b shows an example of a structure of a value neural network according to an embodiment of the present disclosure;

FIG. 10 shows an example of a neural structure for policy neural network with RNN and/or attention network according to an embodiment of the present disclosure;

FIG. 11 shows a flowchart of a method according to another embodiment of the present disclosure;

FIG. 12 shows an example of auto-tune of forwarding QoS/TM functions according to an embodiment of the present disclosure;

FIG. 13 shows an example of network structure according to an embodiment of the present disclosure;

FIG. 14 shows a flow chart of decentralizing QoS requirements and interaction with centralized QoS control according to an embodiment of the present disclosure;

FIG. 15 is a block diagram showing an apparatus suitable for practicing some embodiments of the disclosure;

FIG. 16 is a block diagram showing a packet forwarding entity according to an embodiment of the disclosure; and

FIG. 17 is a block diagram showing a network control entity according to an embodiment of the disclosure.

DETAILED DESCRIPTION

The embodiments of the present disclosure are described in detail with reference to the accompanying drawings. It should be understood that these embodiments are discussed only for the purpose of enabling those skilled persons in the art to better understand and thus implement the present disclosure, rather than suggesting any limitations on the scope of the present disclosure. Reference throughout this specification to features, advantages, or similar language does not imply that all of the features and advantages that may be realized with the present disclosure should be or are in any single embodiment of the disclosure. Rather, language referring to the features and advantages is understood to mean that a specific feature, advantage, or characteristic described in connection with an embodiment is included in at least one embodiment of the present disclosure. Furthermore, the described features, advantages, and characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize that the disclosure may be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the disclosure.

As used herein, the term “network” refers to a network following any suitable communication standards such as new radio (NR) , long term evolution (LTE) , LTE-Advanced, wideband code division multiple access (WCDMA) , high-speed packet access (HSPA) , Code Division Multiple Access (CDMA) , Time Division Multiple Address (TDMA) , Frequency Division Multiple Access (FDMA) , Orthogonal Frequency-Division Multiple Access (OFDMA) , Single carrier frequency division multiple access (SC-FDMA) and other wireless networks. A CDMA network may implement a radio technology such as Universal Terrestrial Radio Access (UTRA) , etc. UTRA includes WCDMA and other variants of CDMA. A TDMA network may implement a radio technology such as Global System for Mobile Communications (GSM) . An OFDMA network may implement a radio technology such as Evolved UTRA (E-UTRA) , Ultra Mobile Broadband (UMB) , IEEE 802.11 (Wi-Fi) , IEEE 802.16 (WiMAX) , IEEE 802.20, Flash-OFDMA, Ad-hoc network, wireless sensor network, etc. In the following description, the terms “network” and “system” can be used interchangeably. Furthermore, the communications between two devices in the network may be performed according to any suitable communication protocols, including, but not limited to, the communication protocols as defined by a standard organization such as 3GPP. For example, the communication protocols may comprise the first generation (1G) , 2G, 3G, 4G, 4.5G, 5G communication protocols, and/or any other protocols either currently known or to be developed in the future.

The term “packet forwarding entity” or “network control entity” refers to any suitable network function (NF) which can be implemented in a network entity (physical or virtual) of a communication network. For example, the network function can be implemented either as a network element on a dedicated hardware, as a software instance running on a dedicated hardware, or as a virtualized function instantiated on an appropriate platform, e.g. on a cloud infrastructure. For example, the 5G system (5GS) may comprise a plurality of NFs such as AMF (Access and mobility Function) , SMF (Session Management Function) , AUSF (Authentication Service Function) , UDM (Unified Data Management) , PCF (Policy Control Function) , AF (Application Function) , NEF (Network Exposure Function) , UPF (User plane Function) and NRF (Network Repository Function) , RAN (radio access network) , SCP (service communication proxy) , NWDAF (network data analytics function) , NSSF (Network Slice Selection Function) , NSSAAF (Network Slice-Specific Authentication and Authorization Function) , etc. For example, the 4G system (such as LTE) may include MME (Mobile Management Entity) , HSS (home subscriber server) , Policy and Charging Rules Function (PCRF) , Packet Data Network Gateway (PGW) , PGW control plane (PGW-C) , Serving gateway (SGW) , SGW control plane (SGW-C) , E-UTRAN Node B (eNB) , etc. In other embodiments, the network function may comprise different types of NFs for example depending on a specific network.

The network device may be an access network device with accessing function in a communication network via which a terminal device accesses to the network and receives services therefrom. The access network device may include a base station (BS) , an access point (AP) , a multi-cell/multicast coordination entity (MCE) , a controller or any other suitable device in a wireless communication network. The BS may be, for example, a node B (NodeB or NB) , an evolved NodeB (eNodeB or eNB) , a next generation NodeB (gNodeB or gNB) , a remote radio unit (RRU) , a radio header (RH) , an Integrated Access and Backhaul (IAB) node, a remote radio head (RRH) , a relay, a low power node such as a femto, a pico, and so forth.

Yet further examples of the access network device comprise multi-standard radio (MSR) radio equipment such as MSR BSs, network controllers such as radio network controllers (RNCs) or base station controllers (BSCs) , base transceiver stations (BTSs) , transmission points, transmission nodes, positioning nodes and/or the like. More generally, however, the network node may represent any suitable device (or group of devices) capable, configured, arranged, and/or operable to enable and/or provide a terminal device access to a wireless communication network or to provide some service to a terminal device that has accessed to the wireless communication network.

The term “terminal device” refers to any end device that can access a communication network and receive services therefrom. By way of example and not limitation, the terminal device refers to a mobile terminal, user equipment (UE) , or other suitable devices. The UE may be, for example, a Subscriber Station (SS) , a Portable Subscriber Station, a Mobile Station (MS) , or an Access Terminal (AT) . The terminal device may include, but not limited to, a portable computer, an image capture terminal device such as a digital camera, a gaming terminal device, a music storage and a playback appliance, a mobile phone, a cellular phone, a smart phone, a voice over IP (VoIP) phone, a wireless local loop phone, a tablet, a wearable device, a personal digital assistant (PDA) , a portable computer, a desktop computer, a wearable terminal device, a vehicle-mounted wireless terminal device, a wireless endpoint, a mobile station, a laptop-embedded equipment (LEE) , a laptop-mounted equipment (LME) , a USB dongle, a smart device, a wireless customer-premises equipment (CPE) and the like. In the following description, the terms “terminal device” , “terminal” , “user equipment” and “UE” may be used interchangeably. As one example, a terminal device may represent a UE configured for communication in accordance with one or more communication standards promulgated by the 3GPP (3rd Generation Partnership Project) , such as 3GPP’ LTE standard or NR standard. As used herein, a “user equipment” or “UE” may not necessarily have a “user” in the sense of a human user who owns and/or operates the relevant device. In some embodiments, a terminal device may be configured to transmit and/or receive information without direct human interaction. For instance, a terminal device may be designed to transmit information to a network on a predetermined schedule, when triggered by an internal or external event, or in response to requests from the communication network. Instead, a UE may represent a device that is intended for sale to, or operation by, a human user but that may not initially be associated with a specific human user.

As yet another example, in an Internet of Things (IoT) scenario, a terminal device may represent a machine or other device that performs monitoring and/or measurements, and transmits the results of such monitoring and/or measurements to another terminal device and/or network equipment. The terminal device may in this case be a machine-to-machine (M2M) device, which may in a 3GPP context be referred to as a machine-type communication (MTC) device. As one particular example, the terminal device may be a UE implementing the 3GPP narrow band internet of things (NB-IoT) standard. Particular examples of such machines or devices are sensors, metering devices such as power meters, industrial machinery, or home or personal appliances, for example refrigerators, televisions, personal wearables such as watches etc. In other scenarios, a terminal device may represent a vehicle or other equipment that is capable of monitoring and/or reporting on its operational status or other functions associated with its operation.

References in the specification to “one embodiment, ” “an embodiment, ” “an example embodiment, ” and the like indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

It shall be understood that although the terms “first” and “second” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed terms.

As used herein, the phrase “at least one of A and B” or “at least one of A or B” should be understood to mean “only A, only B, or both A and B. ” The phrase “A and/or B” should be understood to mean “only A, only B, or both A and B” .

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms “a” , “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” , “comprising” , “has” , “having” , “includes” and/or “including” , when used herein, specify the presence of stated features, elements, and/or components etc., but do not preclude the presence or addition of one or more other features, elements, components and/or combinations thereof.

It is noted that these terms as used in this document are used only for ease of description and differentiation among nodes, devices or networks etc. With the development of the technology, other terms with the similar/same meanings may also be used.

In the following description and claims, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skills in the art to which this disclosure belongs.

Although the subject matter described herein may be implemented in any appropriate type of system using any suitable components, the embodiments disclosed herein are described in relation to a communication system complied with the exemplary system architecture illustrated in FIG. 2. For simplicity, the system architecture of FIG. 2 only depict some exemplary elements. In practice, a communication system may further include any additional elements suitable to support communication between any two communication devices. The communication system may provide communication and various types of services to one or more customer devices to facilitate the customer devices’ access to and/or use of the services provided by, or via, the communication system.

As shown in FIG. 2, the communication system comprises four packet forwarding entities simply denoted as PFE1, PFE2, PFE3 and PFE4, a network control entity (NCE) 30, two terminal devices simply denoted as TD1 and TD2, and a network 20. For example, the network 20 may be any suitable network such as Internet protocol (IP) , or multi-protocol label switching (MPLS) network. The connection between a PFE and a TD may be a 4G network connection or a 5G network connection. Although four PFE devices, one NCE and two TDs are shown in FIG. 2, there may be more or less PFE devices, or NCEs or TDs.

The terminal device connects to the network 20 via a PFE. The PFE device may be, for example, a router, a switch, a gateway, a modem, a firewall, a network interface controller (NIC) , a hub, a bridge, or any other type of data forwarding device. In an embodiment, in 3GPP network, the PFE device may be any suitable network device such as packet data network gateway user plane (PGW-U) , or UPF (User plane Function) , etc.

The network 20 can route and/or forward traffic of the terminal devices. The network 20 may be, for example, an IP based network, or an MPLS based network, or a combination thereof.

NCE 30 may provide network management and control function. For example, in a SDN system, NCE 30 may be SDN controller. It is noted that the SDN system may employ any suitable existing or future technologies or protocol such as OpenFlow, OpenDaylight, network virtualization platform, etc. NCE 30 may be a logically centralized entity that may be in charge of sending the QoS/TM requirements down to the packet forwarding plane. In an embodiment, in 3GPP network, the NCE 30 may be packet data network gateway control plane (PGW-C) , or SMF (Session Management Function) .

FIG. 3 shows a flowchart of a method according to an embodiment of the present disclosure, which may be performed by an apparatus implemented in or at or as a packet forwarding entity or communicatively coupled to the packet forwarding entity. As such, the apparatus may provide means or modules for accomplishing various parts of the method 300 as well as means or modules for accomplishing other processes in conjunction with other components.

At block 302, the packet forwarding entity may obtain state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface. The packet forwarding entity may obtain state information of packet forwarding environment in various ways, for example by itself or from another network device. For example, when some state information of packet forwarding environment can be collected by the packet forwarding entity, the packet forwarding entity may collect this state information by itself. When some state information of packet forwarding environment is configured by a network control entity, the packet forwarding entity may receive this state information from the network control entity.

In an embodiment, The network interface may be existing or future southbound interface between network control plane and packet forwarding plane. There are southbound interfaces of network control plane such as Netconf/Yang or OpenFlow Management and Configuration Protocol or Simple Network Management Protocol (SNMP) or OpenConfig or Programming protocol-independent packet processors (P4) or PCEP (Path Computation Element Protocol) , etc.

In an embodiment, the control related information for the at least one packet forwarding control parameter unaware by a network management and/or control plane is obtained from the packet forwarding entity. For example, control related information for some packet forwarding control parameter (s) may be determined/controlled by the packet forwarding entity related to a specific forwarding chip. In this case, the packet forwarding entity may obtain such control related information by itself.

In an embodiment, at least one of the at least one QoS budget indicator, the at least one QoS performance weight, or the control related information for the at least one packet forwarding control parameter aware by a network management and/or control plane is received from a network control entity via a network interface.

In an embodiment, the real-time input traffic characteristics, the real-time state of forwarding quality of service (QoS) performance and/or traffic management performance, and the real-time state of hardware resource related to QoS and/or traffic management may be obtained from the packet forwarding entity. For example, the packet forwarding entity may monitor/collect these state information periodically.

The state information of packet forwarding environment may comprise any suitable state information of packet forwarding environment which can be used to determine an output action for a packet forwarding control parameter and/or determine a reward score of the state information of packet forwarding environment.

In an embodiment, there is corresponding state information of packet forwarding environment for a specific service class. The specific service class can be identified by any suitable information. For example, one or more fields of a packet header can be used to identified a specific service class.

In an embodiment, the specific service class may be identified by at least one of traffic class or drop precedence or a combination of traffic class and drop precedence.

In an embodiment, the state information of packet forwarding environment may comprise at least one of real-time input traffic characteristics, real-time state of forwarding quality of service (QoS) performance and/or traffic management performance, real-time state of hardware resource related to QoS and/or traffic management, at least one QoS budget indicator, at least one QoS performance weight, or control related information for the at least one packet forwarding control parameter.

Real-time states of input traffic characteristics

In an embodiment, the real-time input traffic characteristics comprises at least one of: an ingress instantaneous rate of a service class, an ingress average rate of a service class, an instantaneous packet size of a service class, or an average packet size of a service class.

Real-time input traffic characteristics can usually be measured by inherent/internal statistic features supported by forwarding chip.

Additionally, the prediction of traffic characteristics can be used for better optimization of QoS and/or TM performance of the packet forwarding entity. The prediction of traffic characteristics can be obtained in various ways. For example, CNN (convolutional neural network) and RNN (recurrent neural network) and/or attention network may be used to capture the non-linear patterns in the building of DRL deep neural network structure. That is, to extract the non-linear traffic pattern across all the traffic classes, CNN can be added into the deep neural network. To extract the temporal/sequential traffic pattern, RNN and/or attention network can be added into the deep neural network. On the other hand, the depth of the deep neural network better not go too far but based on the principle of just enough. To improve the convergence performance of DRL, the techniques of neural network pruning can be utilized based on the research from Frankle, Jonathan and Michael Carbin. "The lottery ticket hypothesis: Finding sparse, trainable neural networks. " 7th International Conference on Learning Representations, May 2019, New Orleans, Louisiana, ICLR, May 2019. For DRL, the structure of the deep neural networks is a kind of hyperparameter, which is usually practically determined and tuned during the phase of DRL training by means of various empirical methods. Consequently, the structure of the deep neural network of DRL could have many practical variants for the same purpose.

Real-time states of forwarding QoS/TM performance

The real-time state of forwarding QoS performance and/or traffic management performance may comprise any suitable information which can be used to determine an output action for a packet forwarding control parameter and/or determine a reward score of the state information of packet forwarding environment. In an embodiment, the real-time state of forwarding QoS performance and/or traffic management performance comprises at least one of: a real output instantaneous rate of a service class, a real output average rate of a service class, a real instantaneous packet drop ratio of a service class, a real average packet drop ratio of a service class, a real maximum packet latency of a service class, a real minimum packet latency of a service class, or a real average packet latency of a service class.

The real-time states of forwarding QoS/TM performance can usually be measured by inherent/internal QoS/TM feature by forwarding chip, and/or by some additional features like IETF Two-Way Active Measurement Protocol (TWAMP) .

Real-time states of forwarding QoS/TM related HW resource can usually be measured by inherent/internal QoS/TM feature by forwarding chip. Real-time states of QoS/TM related HW resource may comprise any suitable real-time states of QoS/TM related HW resource for example depending on the resource management capability of specific forwarding chip.

In the packet forwarding entity, the time period between a time point that a packet enters into the packet forwarding entity and a time point that the packet leaves from the packet forwarding entity is called packet latency, or in other words, packet delay. The packet latency may comprises maximum packet latency, and/or minimum packet latency, and/or average packet latency.

Real-time states of hardware resource related to QoS/TM

The real-time states of hardware resource related to QoS and/or traffic management may comprise any suitable information which can be used to determine an output action for a packet forwarding control parameter and/or determine a reward score of the state information of packet forwarding environment. In an embodiment, real-time states (e.g., instantaneous and/or average states) of hardware resource related to QoS and/or traffic management may comprise at least one of:

●Queuing status

○ real queue size of a traffic class

○ allocated queue max size of a traffic class

○ total allocated queue max size of the egress interface

●Buffer status

○ occupied SRAM (Static Random-Access Memory) size

○ occupied DRAM (Dynamic Random Access Memory) size

○ total allocated SRAM size

○ total allocated DRAM size

●Bandwidth status

○ allocated guaranteed bandwidth of a traffic class

○ allocated max bandwidth of a traffic class

○ total allocated bandwidth of the egress interface

○ max bandwidth of the egress interface

QoS budget indicators

To provide service-aware basis for the intelligent optimization of QoS/TM on forwarding plane, the information QoS budget indicator is introduced to forwarding plane for a dimension of QoS/TM requirement (typically on bandwidth, packet delay, and packet drop ratio) , so that the dynamic multi-dimensional optimization has sufficient knowledge to approximate the optimal trade-off curve.

From SLA (service level agreement) , management and control planes can get bandwidth budget (e.g., guaranteed rate budget and maximum rate budget) , packet delay budget, and packet drop budget for a service as described in 3GPP TS 23.501 V15.4.0. For bandwidth dimension, we may assume that the QoS/TM control parameter of guaranteed rate does not need optimization so set it as demand mode, and the budget of elastic bandwidth portion (equals max rate budget minus guaranteed rate budget) is used as bandwidth budget indicator for joint multi-dimensional auto-tunning/control of QoS/TM. For packet delay dimension and packet drop ratio dimension, the QoS budget values directly derived from SLA are for the end-to-end performance budgets. Based on the information of global topology and planned/expected routing/forwarding paths, management and control planes have the knowledge to figure out the budget values for a single forwarding node.

For transport packet networks as an example, a traffic class is the aggregation of the same services or similar services. If a traffic class bears a group of same services, the budget values of the traffic class are clear. If a traffic class bears a group of similar services which means the budget values of these similar services are close, hence the budget values of a traffic class can just use the average values of the QoS budgets of services which belong to the same traffic class.

Furthermore, on the one hand, the point of utilizing these budget indicators on forwarding plane is to use them to evaluate the service-aware performance relative to its budget for a traffic class, based on which to trade-off the fairness of multi-dimensional performance relative to those of other traffic classes among Diffserv competing flows. Therefore, the budget values for a forwarding node only need to be roughly precise as long as the relative values compared to those of other traffic classes are proportionately correct according to SLA and other centralized knowledge like flow-path mapping and path topology.

On the other hand, management and control planes may also use centralized intelligence to dynamically deduce better expected values of QoS budgets for each forwarding node, which would be better.

Table 1 shows an example of the QoS budgets indicators. BW denotes bandwidth. DL denotes packet delay (or packet latency) . DR denotes packet drop ratio.

Table 1

Specifically, management and control plane decentralize QoS/TM requirements as below budget indicators

a) . determine QoS budget indicators for a dimension of QoS/TM requirements for a traffic-class

● Bandwidth budget indicator: Guaranteed bit rate, and Maximum bit rate

● Packet delay budget indicator: Packet delay budget

● Packet drop budget indicator: Packet drop ratio budget

Where the calculation of packet delay budget and packet drop budget for a single forwarding node may need auxiliary centralized knowledge like flow-path mapping, path topology, etc. After subtracting the link propagation delay, dividing the total end-to-end budget equally for each forwarding node along the service forwarding path is a simple solution, while using more intelligence (e.g. AI/ML on management &control planes) for budget allocation for each forwarding node is an advanced solution.

b) . management &control planes such as SDN controller send the QoS budget indicators to forwarding nodes (forwarding plane) .

In an embodiment, the at least one QoS budget indicator comprises at least one of: a budget indicator for QoS/TM requirement on packet latency for a service class, a budget indicator for QoS/TM requirement on packet loss ratio for a service class, a budget indicator for QoS/TM requirement on traffic bandwidth for a service class.

QoS performance weights

To provide the flexibility of adjusting the relative importance of different dimensions of QoS/TM performance (i.e., bandwidth, packet delay, packet drop) , below weights for a traffic class i may be introduced on forwarding plane. The default value of them may be any suitable value between 0 and 1. The closer to 0 the value of a weight is, the less importance the corresponding performance dimension of that traffic class is of.

To provide the flexibility of adjusting the relative importance of different traffic classes when dynamically optimizing QoS/TM, below weight for a traffic class i may be introduced on forwarding plane. The default value of them may be any suitable value between 0 and 1.

At least a part of the above QoS/TM performance weights can be configured from management and control planes to forwarding plane according to flexible business needs.

In an embodiment, the at least one QoS performance weight comprises at least one of: a weight for forwarding QoS/TM performance on packet latency for a service class, a weight for forwarding QoS/TM performance on packet loss ratio for a service class, a weight for forwarding QoS/TM performance on traffic bandwidth for a service class, or a weight of a service class.

Control related information for a target QoS/TM control parameter

The selection of control related information for a target QoS/TM control parameter may consider the purpose that DRL can get sufficient direct knowledge to yield better convergence performance.

The control related information for at least one packet forwarding control parameter may comprise any suitable information which can be used for participating fin determining an output action for a packet forwarding control parameter. In an embodiment, the control related information for the at least one packet forwarding control parameter comprises at least one of: a baseline value of a packet forwarding control parameter, a control mode of a packet forwarding control parameter, a tune ratio of a packet forwarding control parameter, a specified value range of a packet forwarding control parameter.

The control mode, baseline value and tune ratio may be introduced for control flexibility to integrate with either human expertise and/or any existing or future centralized intelligence of QoS/TM control from management and/or control planes. The control mode and tune ratio of a control parameter indicate to what extent the control parameter is expected to be optimized by the introduced intelligent QoS/TM controller in packet forwarding entity.

For common QoS/TM functions which are aware by management &control planes, the control mode, baseline value, tune ratio and/or specified value range of a target control parameter may be configured from or controlled by management &control planes.

For special forwarding QoS/TM functions which are unaware by management &control planes, the control mode, baseline value, tune ratio and/or specified value range of a target control parameter are determined by forwarding plane directly.

In an embodiment, the specified value range of a packet forwarding control parameter may comprise a minimal value of a packet forwarding control parameter and a maximal value of a packet forwarding control parameter.

Tune ratio for a QoS/TM controlled parameter may comprise value: [0%, 100%] . If the corresponding control mode is tune mode, then the tune ratio is set to a value between 0 and 100%, i.e. (0, 100%) . If the corresponding control mode is not tune mode, then tune ratio is set to 0%for demand mode, and set to 100%for free mode. For common QoS/TM functions aware by management &control planes, the tune ratio of a parameter is configured from/controlled by management &control planes. For special QoS/TM functions unaware by management &control planes, the tune ratio of a parameter is determined/controlled by forwarding plane directly.

In an embodiment, a control mode of a packet forwarding control parameter comprises at least one of: a control mode indicating the packet forwarding control parameter is controlled by an agent of reinforcement learning in a packet forwarding entity based on at least one of a tune ratio, a specified value range, or an initial baseline value; a control mode indicating the packet forwarding control parameter is not allowed to be controlled by an agent of reinforcement learning in a packet forwarding entity; or a control mode indicating the packet forwarding control parameter is freely controlled by an agent of reinforcement learning in a packet forwarding entity. For example, a control mode may indicate the packet forwarding control parameter is controlled by an agent of reinforcement learning in a packet forwarding entity based on initial baseline value and tune ratio, or based on initial baseline value and specified value range, or based on initial baseline value, tune ratio and specified value range.

For example, control mode for a target control parameter may have three configurable values: {tune mode | demand mode | free-run mode} .

Tune mode indicates the target control parameter is controlled by the intelligent QoS/TM controller in forwarding plane with up-and-down tune ratio around the initial baseline value.

In the tune mode for a target control parameter, the baseline value gives the initial recommended value which is either from human expertise or experience by network planning like legacy Diffserv QoS solution, or from any centralized intelligence from management and control planes which utilizes global information in end-to-end orchestration.

In the tune mode for a target control parameter, the associated “tune ratio” is introduced to indicate to what extent the target parameter is adjustable for further dynamic optimization by the intelligent QoS/TM controller on forwarding plane (i.e., packet forwarding entity) . The tune ratio may be calculated based on the maximum valid value range of the parameter, or a specified value range of the parameter . The detail on how to use the baseline value and tune ratio to map the DRL output to the final value of a control parameter of QoS/TM will be described in the following.

In the demand mode for a target control parameter, the baseline value determines the value of the target control parameter, which is not allowed to be further or fine optimized by the intelligent QoS/TM controller in forwarding plane. This mode not only keeps back compatibility with legacy QoS/TM solutions, but also provides the flexibility to give full control to any centralized intelligence from management and control plane.

For common QoS/TM functions which are aware by management &control planes (i.e., network control entity) , demand mode means the target parameter is fully controlled by management &control planes, while the DRL on forwarding plane cannot dynamically optimize or tune it.

For special forwarding QoS/TM functions which are unaware by management &control planes, demand mode still means the DRL on the forwarding plane cannot optimize or tune it. In this case, the target parameter is statically determined by forwarding plane without any optimization by the DRL on the forwarding plane. This is compatible with the cases of legacy solutions that a part of QoS/TM parameters are fixed determined once at initialization phase. Or the target parameter can be dynamically configured by non-DRL forwarding logic which may be the implicit calculated result as linkage of other explicit controlled functions by management &control planes.

In the demand mode for a control parameter, the baseline value directly determines the value of the target parameter, which is not allowed to be further or fine optimized by the DRL on the forwarding plane.

Considering the unified form for DRL input states, demand mode equals tune mode with tune ratio = 0%.

Free-run mode indicates the target control parameter is fully controlled by DRL dynamic optimization on the forwarding plane.

In free-run mode, the baseline value could also be provided as a reasonable initial value to configure the target control parameter, for the optional purpose to use human expertise to improve fast convergence and stability of DRL from the initial state.

Considering the unified form for DRL input states, free-run mode equals tune mode with tune ratio = 100%.

With reference to FIG. 3, at block 304, the packet forwarding entity may determine a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment.

The at least first part of the state information of packet forwarding environment may comprise any suitable state information. In an embodiment, the at least first part of the state information of packet forwarding environment comprises at least one of: real-time input traffic characteristics, real-time state of forwarding QoS performance and/or traffic management performance, real-time state of hardware resource related to QoS and/or traffic management, at least one QoS budget indicator, or at least one QoS performance weight.

The reward score of the state information of packet forwarding environment can be determined in various ways.

In an embodiment, the reward score of the state information of packet forwarding environment is determined based on at least one of below factors or a weighted combination of at least one of below factors:

● a positive reward score is given to a situation that all service classes have zero queuing packet,

● for a service class, a larger elastic bandwidth relative to a corresponding bandwidth budget, a larger reward component of the service class is given,

● for a service class, more service-aware fairness of elastic bandwidth compares to other service classes, a larger reward component of the service class is given,

● for a service class, a smaller packet latency relative to a corresponding packet latency budget, a larger reward component of the service class is given,

● for a service class, more service-aware fairness of packet latency compares to other service classes, a larger reward component of the service class is given,

● for a service class, a smaller drop ratio relative to a corresponding drop ratio budget, a larger reward component of the service class is given, or

● for a service class, more service-aware fairness of drop ratio compares to other service classes, a larger reward component of the service class is given.

Service-aware comprehensive target reward function

Reward function describes how the packet forwarding entity ought to behave by means of measurable reward or punishment as score, stipulating what we want the packet forwarding entity to accomplish. Below exemplifies the feasible &flexible pattern (as a normal form) of reward method on how to trade-off service-aware fairness among bandwidth/packet-delay/loss-ratio among competing flows.

To support the forwarding QoS/TM dynamic control/tuning with service awareness of multi-dimensional QoS/TM requirements, we introduce the service-aware comprehensive-target reward function r _t for DRL application/integration.

The main design principle of this reward function may comprise at least one of:

●In order to make full use of available bandwidth, reward (e.g., positive reward, or larger reward, ) is given to the situation that all the traffic classes have zero queuing packet.

●For the bandwidth dimension of performance, for a traffic class, the larger elastic bandwidth relative to its budget, the larger the reward is given. The more service-aware unfairness of elastic bandwidth compare to other traffic classes, the larger punishment (e.g., negative reward or smaller reward) is given.

●For the packet delay dimension of performance, for a traffic class, the larger delay relative to its budget, the larger punishment is given. The more service-aware unfairness of delay compare to other traffic classes, the larger punishment is given.

●For the packet drop ratio dimension of performance, for a traffic class, the larger drop ratio relative to its budget, the larger punishment is given. The more service-aware unfairness of drop ratio compare to other traffic classes, the larger punishment is given.

●There is flexibility to adjust the relative importance among different dimensions of performance.

●There is flexibility to adjust the relative importance among different traffic classes.

Notations of reward components with explanation

QoS budget information is introduced to forwarding plane, to improve service-awareness in the dynamic multi-dimensional optimization of QoS/TM. The QoS budget indicator for a dimension of QoS/TM performance has been described above, which are utilized in the reward function of DRL for the intelligent QoS/TM controller.

We use t to denote the current state, and r _t to denote the reward function of current state. The reward function r _t is used in DRL return function R _t in an accumulated way as following:

R _t=r _t+γr _t+1+γ ²r _t+2+γ ³r _t+3+…, where the discount factor γ∈ [0, 1]

For the traffic bandwidth dimension of QoS/TM performance, at current state t, for traffic class i, the bandwidth component for reward function is defined as following.

Use BW _i to denote the QoS budge indicator of the elastic bandwidth (equals max rate BW ^max minus guaranteed rate BW ^guar) , which is introduced to forwarding plane. Here we assume that the control parameter “guaranteed rate” is fixed with demand mode, and some of the other control parameters which are related to bandwidth (like maximum rate, scheduling weight, and etc. ) are expected to be dynamically adjusted in tune mode for optimization.

Use ro _i, t to denote the real output traffic rate, and use

to denote the guaranteed traffic rate, then use

to represent the real elastic portion of output bandwidth which can be optimized.

Use

to represent the “service-aware bandwidth component” of traffic class i, which indicates the relative bandwidth performance to service bandwidth budget.

will be used in the reward function for the purpose that the closer of

to its budget BW _i, the larger of reward is given. For simplicity, may limit the value range of

within [0, 1] , that is, when

limit

allocated elastic bandwidth exceeds its budget.

Then use

to denote the mean of “service-aware bandwidth component” of all traffic classes, where N is the total number of traffic classes (usually N = 8) .

Then use

to represent the distance from the relative bandwidth of traffic class i to the mean of relative packet bandwidth of all the traffic classes. The smaller of this distance value means more fairness of bandwidth allocated for traffic class i compare to the bandwidth allocation situation of all traffic classes. That is, this distance is used to measure the fairness of bandwidth dimension of performance.

Use

to represent the normalized value of this distance, where the normalization is a technique often applied in machine learning. After normalization, the value range of

is [0, 1] .

Additionally, use l _i, t to denote the queue length of traffic class i.

Using all the above notations, the bandwidth component for reward function is defined as

where

are two weight factors to adjust the relative proportion between the bandwidth performance and bandwidth fairness. These two factors may be chosen as fixed values for DRL in training phase (e.g., may use

) .

For the packet delay dimension of QoS/TM performance, at current state t, for traffic class i, the packet delay component for reward function may be defined as following.

Use DL _i to denote the QoS budge indicator of packet delay, which is introduced to forwarding plane.

Use dl _i, t to denote the measured packet delay (latency) caused by forwarding QoS/TM process. Specifically, this can be the average (or maximum) delay (latency) experienced by the packets which have just passed through the TM system during the latest control time interval.

Use l _i, t to denote the queue length, and use ro _i, t to denote the real output rate, then

represents the expected upcoming average (or maximum) queuing latency.

Then use

to represent the “service-aware packet delay component” of traffic class i, which indicates the relative packet delay performance to service delay budget.

has 3 design options as below.

Option 1: If forwarding chip has the capability to real-time measure the packet delay (dl _i, t) caused by forwarding QoS/TM process (including queuing/buffering, scheduling, shaping) , then

Option 2: Use real-time queuing length and real output rate of traffic class i to estimate the queuing/buffering delay (latency) which is the major delay caused by forwarding QoS/TM process.

Option 3: Use the combination of option 1 and option 2 as a comprehensive method

where α ₁, α ₂ ∈ (0, 1] are factors to adjust the relative proportion of the two additive items (e.g., may let α ₁=0.5, α ₂=0.25) .

For

option 1 provides finer optimization of QoS/TM delay performance which is not limited to queuing latency, while option 2 has the merit that it does not require the capability of forwarding chip to real-time measure the packet delay caused by forwarding QoS/TM process, and additionally, option 2 may have better DRL convergence performance because it directly establishes the relationship between expected delay and state of queue length and output traffic rate. Option 3, as the combination of option 1 and option 2, is more comprehensive, which may get a balance between finer optimization of QoS/TM delay and DRL convergence performance.

will be used in the reward function for the purpose that the closer of packet delay to its budget DL _i, the larger of punishment is given. For simplicity, may limit the value of

within [0, 1] as the packet can be dropped when its experienced packet delay exceeds its budget.

Then use

to denote the mean of “service-aware packet delay component” of all traffic classes, where N is the total number of traffic classes (usually N = 8) .

Then use

to represent the distance from the relative packet delay of traffic class i to the mean of relative packet delay of all the traffic classes. The smaller of this distance value means more fairness of packet delay experienced by traffic class i compare to the packet delay situation of all traffic classes. That is, this distance is used to measure the fairness of packet delay dimension of performance.

Use

to represent the normalized value of the distance, whose value range is [0, 1] .

Using all the above notations, the packet delay component for reward function is defined as,

where

are two weight factors to adjust the relative proportion between packet delay performance and packet delay fairness. These two factors may be chosen as fixed values for DRL in training phase (e.g., may use

) .

For the packet drop dimension of QoS/TM performance, at current state t, for traffic class i, the packet drop ratio component for reward function is defined as following.

Use DR _i to denote the QoS budge indicator of packet drop ratio, which is introduced to forwarding plane.

Use dr _i, t to denote the measured real packet drop ratio caused by forwarding QoS/TM,

Use

to represent the “service-aware drop ratio component” of traffic class i, which indicates the relative packet drop performance to service packet drop budget.

will be used in the reward function for the purpose that the closer of packet drop to its budget DR _i, the larger of punishment is given.

Then use

to denote the mean of “service-aware drop ratio component” of all traffic classes, where N is the total number of traffic classes (usually N = 8) .

Then use

to represent the distance from the relative packet drop ratio of traffic class i to the mean of relative packet drop ratio of all traffic classes. The smaller of this distance value means more fairness of packet drop suffered by traffic class i compare to the packet drop situation of all traffic classes. That is, this distance is used to measure the fairness of packet drop dimension of performance.

Use

Similarly, use l _i, t to denote the queue length of traffic class i.

Using all the above notations, the packet drop ratio component for reward function is defined as

where

are two factors to adjust the relative proportion between packet drop performance and packet drop fairness. These two factors may be chosen as fixed values for DRL in training phase (e.g., may use

) .

Reward function as a weighted normalized form

Using the notations of reward components described above, the final reward function at state t, may be designed as

where

is the weight to evaluate the relative importance of a traffic class i, which is configurable by management and control plane (e.g., a default case is to configure

) .

are the weights to evaluate the relative importance of different QoS/TM dimensions for traffic class i, which is configurable by management and control plane (e.g., a default case is to configure

) .

At block 306, the packet forwarding entity may determine at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment.

The at least second part of the state information of packet forwarding environment may comprise any suitable state information which can be used for determining at least one output action for at least one packet forwarding control parameter.

In an embodiment, the at least second part of the state information of packet forwarding environment comprises at least one of:

-real-time input traffic characteristics,

-real-time state of forwarding QoS performance and/or traffic management performance,

-real-time state of hardware resource related to QoS and/or traffic management,

-control related information for the at least one packet forwarding control parameter,

-at least one QoS budget indicator, or

-at least one QoS performance weight.

The at least one packet forwarding control parameter may comprise any suitable packet forwarding control parameter.

In an embodiment, the at least one packet forwarding control parameter may comprise at least one of:

-one or more packet forwarding control parameters for QoS function,

-one or more packet forwarding control parameters for traffic management function.

Choose target QoS/TM control parameters for adaptive joint optimization

For example, the target QoS/TM control parameters are chosen from QoS/TM functions on forwarding plane. the QoS/TM function pool on forwarding plane may comprise at least one of the following QoS/TM functions. The controlled functions of forwarding QoS/TM can be a selected combination of multiple below QoS/TM functions (features) .

Common QoS/TM functions

● Policing and metering

● Queue management and congestion management, e.g., maximum queue length (for tail drop) , WRED (Weighted random early detection) algorithm

● Packet scheduling, e.g., SP (Strict Priority) /WRR (Weighted Round Robin) /WFQ (Weighted Fair Queuing) /DWRR (Deficit Weighted Round Robin) algorithms

● Shaping, e.g., per queue shaping/per egress interface shaping on guaranteed rate and/or maximum rate

Various special QoS/TM functions supported by specific forwarding chips

● Advanced buffer management, congestion management, and scheduling, e.g. Fair adaptive dynamic thresholds (FADT) , On-chip flow control mechanism, Queue admission algorithms, Queue watchdog threshold for packet deletion, Credit scheduling mechanism, Push queue mechanism, etc.

● Latency management features, e.g. On-chip packet latency thresholds for packet drop, Pre-emption for time sensitive traffic class, etc.

In an embodiment, the at least one output action for at least one packet forwarding control parameter is determined by an agent of reinforcement learning. The reinforcement learning may be any suitable reinforcement learning either currently known or to be developed in the future.

In an embodiment, the agent of reinforcement learning comprises an agent of deep reinforcement learning. The deep reinforcement learning may be any suitable deep reinforcement learning either currently known or to be developed in the future.

In an embodiment, when all the packet forwarding control parameters can be discrete, the agent of reinforcement learning may be implemented based on at least a function approximator supporting discrete action space.

For example, when a deep reinforcement learning (DRL) is used on the packet forwarding entity. Inputs of DRL may be defined as a DRL state space. Assume that there are 8 traffic classes (TC) on forwarding plane as a common situation, and use i∈ [1, 8] to denote the traffic class index of an egress interface. For a traffic class, there may have 3 drop precedences (DP) mapped to sub flows within a traffic class as common usage in Diffserv traditional solutions. For the design of DRL state space, there may be two options as shown in FIGs. 4-5.

FIG. 4 shows an example of a simplified design of DRL state space according to an embodiment of the present disclosure. If the target QoS/TM parameters controlled/tuned by the packet forwarding entity (also called as intelligent controller or intelligent QoS/TM controller herein) are not based on DP (e.g., the target QoS/TM parameters per DP are fully controlled by the network control entity, or any target QoS/TM parameter may have the same value for different DPs) , the collected states can be only based on TC. Then the state space is a two-order tensor with one direction for 8 traffic classes, and the other direction for the state vector with continuous state components.

FIG. 5 shows an example of a general design of DRL state space according to an embodiment of the present disclosure. If the target QoS/TM parameters controlled/tuned by the intelligent controller are based on DP and TC, the collected states may be based on TC *DP. Then the state space is a three-order tensor with one direction for 8 traffic classes, one direction for 3 drop precedences, and the other direction for the state vector with continuous state components.

The general design of FIG. 5 is taken as an example. The state vector (per TC per DP) may comprise at least one of components: real-time input traffic characteristics, real-time states of forwarding QoS/TM performance, real-time states of QoS/TM related HW resource, QoS budget indicator (s) , QoS performance weight (s) , and/or control related information for target QoS/TM control parameter (s) , most of which may have continuous value space.

In an embodiment, the deep neural network comprises at least one convolution neural network (CNN) and/or at least one recurrent neural network (RNN) and/or at least one attention network. For example, to extract the non-linear traffic pattern across all the traffic classes, CNN can be added into the deep neural network; to extract the temporal/sequential traffic pattern, RNN and/or attention network can be added into the deep neural network.

The optimization of forwarding QoS/TM is the optimization of sequential decision problem which is a Marcov decision process. The changing input traffic in packet networks has complex characteristics, which requires that the dynamic control mechanism with optimization has the capability of solving sequential decision problems in a highly complex and uncertain environment. DRL (deep reinforcement learning) may be used for dynamic optimization of QoS/TM control. Furthermore, to solve the problems as mentioned above, DRL may be performed on forwarding plane.

As the control parameters of forwarding QoS/TM functions have continuous value space, DRL supporting continuous action space and continuous state space may be used, such as DDPG as described in arXiv: 1509.02971v6, or TD3 as described in Scott Fujimoto, Herke Hoof, David Meger, Proceedings of the 35th International Conference on Machine Learning, PMLR 80: 1587-1596, 2018, or SAC as described in Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al., ICML 2018 and arXiv: 1812.05905v2. With the evolution of DRL algorithms in the future, more and more intelligent algorithms will be options with better convergence, stability, and generalization.

The DRL agent may be an inference policy function with state/action value function, which is represented by DNN (deep neural networks) defined &trained by a chosen DRL algorithm. The DRL algorithm selection is open for general DRL algorithms, as long as it supports continuous state space and contiguous action space. Also, considering the “sampling efficiency” for QoS/TM problems, it is recommended to choose “off policy” DRL algorithm.

As described above, the DRL algorithm like DDPG, or TD3 (as upgraded version of DDPG) , or SAC may be selected. These DRL algorithms have publicly proved performance in the general DRL field. The selected RL or DRL algorithm can be any suitable RL or DRL algorithm currently known or to be developed in the future. The training techniques of a selected DRL algorithm may be any suitable training method either currently known or to be developed in the future.

Below Table 2 is an example list of the state-of-art public RL algorithms, from which we can see that the candidate for solving the QoS/TM problem can be DDPG, or TD3, or SAC, or etc.

Table 2

As TD3 is the upgraded version of DDPG, we take DDPG as an example, to illustrate the integrity of the integration solution.

FIG. 6 shows an example of a structure of DRL agent according to an embodiment of the present disclosure. It may take DDPG algorithm as the selected algorithm to train the policy function. The QoS/TM state space may be a three-order tensor with one direction for traffic class (TC) , one direction for drop precedence (DP) , and the other direction for the state vector with continuous state components. The QoS/TM state space may be input to the policy neural network and the value neural network. The policy neural network may output action A=μ (S; θ) for the chosen target control parameter (s) of the chosen QoS/TM function (s) . Each component of vector A has normalized continuous value from -1 to 1.

FIG. 7 shows an example of a network structure for the training stage according to an embodiment of the present disclosure. It takes DDPG algorithm as the selected algorithm to train the policy function. In training stage, the technique of target networks can be introduced for both policy and value neural networks for better training performance. Target network may have the exactly same network structure as its corresponding main network, but has different initialized network parameters and delayed weighted update in training iteration.

FIG. 8 shows an example of a network structure for the training stage according to another embodiment of the present disclosure. It takes TD3 algorithm as the selected algorithm to train the policy function. It introduces the technique of “Clipped Double Q Learning” mechanism to solve the overestimation problem caused by maximization and the bias conduction problem caused by bootstrapping.

FIG. 9a shows an example of a structure of a policy neural network according to an embodiment of the present disclosure. FIG. 9b shows an example of a structure of a value neural network according to an embodiment of the present disclosure. For policy network and value network, the inner neural structure may use CNN to extract feature vector of the input states. The input states are designed as 3-order tensor. In training stage, the design &finalization of the hyper-parameters of a DRL algorithm including the depth and width number of each neural network module usually depends on manual adjustment &tunning.

FIG. 10 shows an example of a neural structure for policy neural network with RNN and/or attention network according to an embodiment of the present disclosure. RNN and/or attention network may be inserted to extract effective sequential patterns of input states. For RNN and/or attention network, it may usually select LSTM (Long Short-Term Memory) or Attention layer as described in “A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,

Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS) , 2017” , and “S. Iqbal and F. Sha. Actor-attention-critic for multi-agent reinforcement learning, In International Conference on Machine Learning (ICML) , 2019” .

For further inner details of the candidate DRL algorithms (e.g., DDPG, TD3, SAC, etc. ) , including how the algorithm train the neural network parameters for both policy neural network and value neural network, may refer to any suitable public references, such as the following references:

● Deterministic Policy Gradient Algorithms, David Silver and etc., Proceedings of the 31st International Conference on Machine Learning (ICML) , 2014

● Continuous control with deep reinforcement learning, International Conference on Learning Representations (ICLR) 2016.

● Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al., ICML 2018

● Soft Actor-Critic Algorithms and Applications, Haarnoja et al. arXiv: 1812.05905v2 (2019)

● The Lottery Ticket Hypothesis: Finding Sparse, trainable Neural Networks, Jonathan Frankle, Michael Carbin, International Conference on Learning Representations (ICLR) 2019

● S. Fujimoto, H. Hoof, and D. Meger. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML) , 2018.

● Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,

Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS) , 2017.

● S. Iqbal and F. Sha. Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning (ICML) , 2019.

With reference to FIG. 3, at block 308, the packet forwarding entity may map the at least one output action to at least one control value for the at least one packet forwarding control parameter.

In an embodiment, the packet forwarding entity may map the at least one output action to at least one control value for the at least one packet forwarding control parameter based on at least one of a control mode of the at least one packet forwarding control parameter, a baseline value, a tune ratio, and/or a specified value range.

For example, outputs of QoS/TM controller may be DRL actions. Action output

is a vector of all the controlled parameters of the chosen target control parameter (s) of chosen QoS/TM function (s) . Each component a of the vector

has normalized continuous value from -1 to 1, which will be mapped to the control value of the corresponding target parameter.

Mapping the output action to QoS/TM parameter values may be based on baseline values, tune ratios, and a valid value range. The valid value range can be the default value range of the target control parameter, or a specified value range for the target control parameter.

Assume a _i∈ [-1, 1] is the output component value for one controlled/tuned QoS/TM parameter i, assume [v _min, v _max] is the valid value range of this controlled parameter.

Assume the control mode of the target parameter is configured as “tune mode” , and its baseline value is v _base, and its tune ratio is r%of the total range (v _max-V _min) , then the mapped control value of the target parameter is

V _temp=v _base+a _i×r%× (v _max-v _min) (5-1)

It means the mapped output value is around its baseline value with adjustable range |r%× (v _max-v _min) | and within its total valid value range [v _min, v _max] .

If the control mode of the target parameter is configured as “free-run mode” , the mapped output value has below two calculation methods for design options,

Option 1:

Option 2:

V _temp=v _base+a _i×r%× (v _max-v _min) same as (5-1)

where r%= 100%, and v _base is the initial value of the parameter from human expertise.

The option 2 has advantage that this method of “free-run mode” has the unified form with that of “tune mode” , which would make it easier for DRL convergence.

With reference to FIG. 3, at block 310, the packet forwarding entity may apply the at least one control value for the at least one packet forwarding control parameter.

FIG. 11 shows a flowchart of a method according to another embodiment of the present disclosure, which may be performed by an apparatus implemented in or at or as a network control entity or communicatively coupled to the network control entity. As such, the apparatus may provide means or modules for accomplishing various parts of the method 1100 as well as means or modules for accomplishing other processes in conjunction with other components. For some parts which have been described in the above embodiments, the description thereof is omitted here for brevity.

At block 1102, the network control entity may obtain information used for participating in determining a reward score of state information of a packet forwarding environment and/or control related information for at least one packet forwarding control parameter. For example, the information used for participating in determining a reward score of state information of a packet forwarding environment may be configured by the operator, or may be based on SLA or QoS/TM requirements, and/or network topology, and/or routing and/or forwarding path information, and/or flow path mapping. As described above, the QoS budget indicator such as bandwidth budget (e.g., guaranteed rate budget and maximum rate budget) , packet delay budget, and packet drop budget for a service may be determined by the network control entity. The control related information for at least one packet forwarding control parameter may be configured by the operator and/or obtained from a management/control plane.

At block 1104, the network control entity may send the information used for participating in determining the reward score of the state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter to a packet forwarding entity via a network interface. The control related information for at least one packet forwarding control parameter is used for participating in determining at least one output action for the at least one packet forwarding control parameter.

In an embodiment, the information used for participating in determining the reward score of the state information of the packet forwarding environment comprises at least one of: at least one QoS budget indicator, or at least one QoS performance weight.

In an embodiment, the at least one QoS budget indicator comprises at least one of: a budget indicator for QoS requirement on packet delay (latency) for a service class, a budget indicator for QoS requirement on packet loss ratio for a service class, or a budget indicator for QoS requirement on traffic bandwidth for a service class.

In an embodiment, the at least one QoS performance weight comprises at least one of: a weight for forwarding QoS performance on packet delay (latency) for a service class, a weight for forwarding QoS performance on packet loss ratio for a service class, a weight for forwarding QoS performance on traffic bandwidth for a service class, or a weight of a service class.

With the information received from network control entity (e.g. QoS budget indicator (s) , and/or QoS performance weight (s) , and/or control related information) , to solve the dynamic multidimensional optimization problem, some embodiments introduces QoS/TM intelligent controller on forwarding plane for auto-tuning/control, so that the decentralized multi-dimensional QoS/TM requirements for competing service flows can be effectively satisfied on forwarding plane.

In some embodiments herein, joint optimization may be used for real-time control of multiple forwarding QoS and TM functions.

In some embodiments herein, the proposed solution has the capability of traffic prediction or capability of utilizing traffic prediction.

In some embodiments herein, it introduces service-aware comprehensive-target oriented reward function for DRL to support the multi-dimensional optimization with trade-off among factors.

In some embodiments herein, the proposed solution can be smoothly integrated with existing Diffserv QoS architecture for QoS/TM optimization for example by means of introduced control mode, tune ratio, and performance weights.

In some embodiments herein, the proposed solution can be smoothly integrated with network slicing solution or with other centric intelligence solution for further fine QoS/TM optimization by means of introduced control mode, tune ratio, and performance weights.

FIG. 12 shows an example of auto-tune of forwarding QoS/TM functions according to an embodiment of the present disclosure. The management and control planes may configure one or more QoS/TM parameters with baseline value, control mode, tune ratio to the QoS intelligent controller with DRL agent in the forwarding plane. The management and control planes may send QoS budge indicators and QoS performance weights to the QoS/TM intelligent controller with DRL agent in the forwarding plane. The QoS/TM intelligent controller may collect real-time States such as ingress traffic characteristics, QoS/TM performance status, QoS/TM HW resource status, etc. The QoS/TM intelligent controller may control/tune QoS/TM parameters based on the method according to an embodiment of the present disclosure. For example, the packets may be received by the ingress interface. The packet classification, metering and policing may be performed on the received packets. Then the packets may be put into one or more queues based on enqueue acceptance algorithm. The congestion management, buffer management and other QoS/TM mechanism may be applied on the packets in the queues. The scheduling algorithms may be applied on the packets for dequeue decisions. The output packets may be shaped _. Finally, the packets are output by the egress interface. Any suitable parameter in the above operations (such as enqueue acceptance algorithm, congestion management, buffer management, scheduling algorithms, dequeue shaping, shaping, other QoS/TM mechanism, and etc. ) may be controlled/tuned by QoS/TM intelligent controller.

FIG. 13 shows an example of network structure according to an embodiment of the present disclosure. New information is introduced to the southbound interface from control plane to forwarding plane. The shown network structure can solve the below challenges/problems of existing QoS/TM solutions (such as Diffserv-based QoS solutions) on forwarding plane. For example, the existing forwarding plane cannot sense and cannot consider QoS/TM requirement of packet delay (latency) or packet loss ratio when trade-offing among competing flows. However most of the forwarding QoS/TM parameters impact two or three of the bandwidth, packet delay, and packet loss-ratio simultaneously. The existing forwarding plane cannot support diverse ingress traffic dynamics well due to lots of manual and static tuned parameters in the existing QoS/TM solutions. The shown network structure can also solve at least one of other problems as described above.

The shown network structure integrates the intelligent QoS/TM controller with DRL or RL (Reinforcement Learning) agent on forwarding plane. The design points of the shown network structure may comprise at least one of:

● Input state design per TC (or per TC per DP) , which may cover at least one of ingress traffic characteristics, QoS/TM performance status, QoS/TM HW resources status, QoS budget indicator (s) , QoS performance weight (s) , or control related information for a packet forwarding control parameter

● Output action design with mapping logic, which may be based on control mode, baseline value, tune ratio, and/or specified value range

● Comprehensive service-aware reward function as a weighted normal form, which make it possible for DRL to sense and handle multi-dimensional QoS/TM requirements when trading-off performance among competing flows

● DRL or RL algorithm selection, i.e., any suitable open or customized DRL or RL algorithm either currently known or to be developed in the future can be selected in the shown network structure

● Newly introduced information (such as QoS performance budget indicator (s) and/or QoS performance weight (s) and/or control related information for a packet forwarding control parameter) on the southbound network interface from control plane to forwarding plane as supportive mechanisms.

The control plane may send at least one of QoS budget indicators, QoS performance weights, control mode/baseline value/tune ratio for a packet forwarding control parameter aware by the control plane to the forwarding plane via southbound interface. The QoS budget indicators, QoS performance weights and statistics (for real-time states) may be input to a reward function with comprehensive service awareness which will generate a reward. The statistics (for real-time states) may comprise ingress traffic characteristics, QoS/TM performance states and QoS/TM HW resource states which may be collected from QoS/TM functions. The control mode/baseline value/tune ratio for a packet forwarding control parameter, the reward, the statistics (for real-time states) , and specifications of control mode/baseline value/tune ratio for a packet forwarding control parameter unaware by the control plane, QoS budget indicator (s) , QoS performance weight (s) may be input to the DRL agent with policy function as DNN trained by any chosen DRL algorithm. The DRL agent may output action. The action and the control mode/baseline value/tune ratio for a packet forwarding control parameter may be input to the action output mapping function. The action output mapping function may generate a control value for a packet forwarding control parameter which may be applied on the QoS/TM functions.

FIG. 14 shows a flow chart of decentralizing QoS/TM requirements and interaction with centralized QoS/TM control according to an embodiment of the present disclosure. The centralized QoS/TM control may be from either human expertise or centralized intelligence.

With the introduced integration solution, QoS/TM functions (such as Diffserv-based QoS functions) on forwarding plane may have capability to sense and trade-off various QoS/TM requirements (such as packet delay (latency) , packet loss ratio, bandwidth, etc. ) among competing flows on forwarding plane. Consequently, a better service-aware QoS/TM functions for controlling multiple dimensions of QoS/TM performance on forwarding plane may be achieved. In addition, the QoS/TM functions on forwarding plane may have capability to do auto-tuning of the complex high-dimensional optimization problem.

The control plane may decentralize multi-dimensional service QoS requirements with QoS budget indicator (s) and/or performance weight (s) .

At step 1401, the control plane may determine QoS budget indicators and/or performance weights for a traffic-class (flow) for a dimension of requirements (such as packet delay (latency) , and/or packet loss ratio, and/or bandwidth, and/or etc. ) according to SLA and other auxiliary centralized knowledge (such as network topology, and/or routing and/or forwarding path, and/or flow-path mapping, and/or etc. ) .

At step 1402, the control plane may send QoS budget indicators and performance weights to the forwarding plane.

The control plane may integrate with centralized QoS/TM control.

At step 1403, the control plane may determine which QoS/TM parameters need auto-tune (or determine control mode) . The control plane may determine to what extent a parameter is auto-tuned (baseline value and tune ratio) according to centralized QoS/TM control intelligence and/or human expertise.

At step 1404, the control plane may send control mode, baseline value, and tune ratio of a QoS/TM parameter to the forwarding plane.

At step 1405, on receiving the baseline values of QoS/TM parameters, the forwarding plane may configure the according QoS/TM functions with baseline value as initial values.

At step 1406, by reward function, the forwarding plane may determine the reward/punishment score based on current below states, such as received QoS budget indicators for “bandwidth/packet delay/packet loss ratio” , received QoS performance weights, state of ingress traffic characteristics, state of QoS/TM performance data, state of QoS/TM HW resource, etc.

At step 1407, by DRL agent with already trained policy/value functions (as DNN) , the forwarding plane may determine an action to maximize rewards based on at least one of reward/punishment score from reward function, state of ingress traffic characteristics, state of QoS/TM performance data, state of QoS/TM HW resource, received QoS budget indicator (s) , received QoS performance weight (s) , received control mode/baseline value/tune ratio for a QoS/TM parameter aware by controller plane, or specifications of those QoS/TM parameter which are unaware by controller plane.

At step 1408, the forwarding plane may map to the QoS/TM parameter values from DRL action according to the corresponding baseline value, tune mode, tune ratio, and/or specified value range. The forwarding plane may configure QoS/TM functions according to the mapped parameter values.

At step 1409, the forwarding plane may iteratively collect the states, determine reward, determine action, map to configure values, and configure QoS/TM functions in the next control time interval.

The various blocks/steps shown in figures may be viewed as method steps, and/or as operations that result from operation of computer program code, and/or as a plurality of coupled logic circuit elements constructed to carry out the associated function (s) . The schematic flow chart diagrams described above are generally set forth as logical flow chart diagrams. As such, the depicted order and labeled steps are indicative of specific embodiments of the presented methods. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated methods. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.

Embodiments herein afford many advantages, of which a non-exhaustive list of examples follows. In some embodiments herein, the forwarding plane has capability to sense and trade-off QoS/TM requirements of packet delay (latency) and loss ratio, besides bandwidth, among competing flows on forwarding plane. In some embodiments herein, it can achieve much better service-awareness for controlling multiple dimensions of QoS/TM performance on forwarding plane. In some embodiments herein, the packet forwarding entity can have capability to do auto-tuning for the high-dimensional optimization problem of forwarding QoS/TM. In some embodiments herein, the packet forwarding entity can have capability of flexible joint control with optimization on multiple selected QoS/TM functions as an integrated whole (i.e., joint control with optimization and flexible scope) . In some embodiments herein, the packet forwarding entity can have capability of utilizing various/heterogenous special but dormant existing and future QoS/TM capabilities of different forwarding chips without intervention by management and control planes. In some embodiments herein, it can save time-to-market and cost to develop related new features across management and control planes (means CAPAX reduction) . In some embodiments herein, it can avoid unnecessary complexity exposed to or added into management and control planes (means OPEX reduction in maintenance or customer training) . In some embodiments herein, the packet forwarding entity can have capability of utilizing the prediction of ingress traffic patterns by means of DRL deep neural networks in the optimization of forwarding QoS/TM. In some embodiments herein, the packet forwarding entity can have flexibility of weighted trade-off among different dimensions of QoS/TM performance. In some embodiments herein, it can provide flexibility of weighted trade-off of fairness among different traffic classes. In some embodiments herein, it can provide flexibility of smooth integration with existing DiffServ QoS solutions. In some embodiments herein, it can provide flexibility of smooth integration with network slicing evolution, or with any other intelligence from management and control planes on QoS/TM control. The embodiments herein are not limited to the features and advantages mentioned above. A person skilled in the art will recognize additional features and advantages upon reading the following detailed description.

FIG. 15 is a block diagram showing an apparatus suitable for practicing some embodiments of the disclosure. For example, any one of the packet forwarding entity or the network control entity described above may be implemented as or through the apparatus 1500.

The apparatus 1500 comprises at least one processor 1521, such as a digital processor (DP) , and at least one memory (MEM) 1522 coupled to the processor 1521. The apparatus 1500 may further comprise a transmitter TX and receiver RX 1523 coupled to the processor 1521. The MEM 1522 stores a program (PROG) 1524. The PROG 1524 may include instructions that, when executed on the associated processor 1521, enable the apparatus 1500 to operate in accordance with the embodiments of the present disclosure. A combination of the at least one processor 1521 and the at least one MEM 1522 may form processing means 1525 adapted to implement various embodiments of the present disclosure. The apparatus 1500 may further comprise a network interface 1555, which adapts communication data with other network elements.

Various embodiments of the present disclosure may be implemented by computer program executable by one or more of the processor 1521, software, firmware, hardware or in a combination thereof.

The MEM 1522 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memories and removable memories, as non-limiting examples.

The processor 1521 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multicore processor architecture, as non-limiting examples.

In an embodiment where the apparatus is implemented as or at the packet forwarding entity, the memory 1522 contains instructions executable by the processor 1521, whereby the packet forwarding entity operates according to any step of the methods related to the packet forwarding entity as described above.

In an embodiment where the apparatus is implemented as or at the network control entity, the memory 1522 contains instructions executable by the processor 1521, whereby the network control entity operates according to any step of the methods related to the network control entity as described above.

FIG. 16 is a block diagram showing a packet forwarding entity according to an embodiment of the disclosure. As shown, the packet forwarding entity 1600 comprises an obtaining module 1601, a first determining module 1602, a second determining module 1603, a mapping module 1604 and an applying module 1605. The obtaining module 1601 may be configured to obtain state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface. The first determining module 1602 may be configured to determine a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment. The second determining module 1603 may be configured to determine at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment. The mapping module 1604 may be configured to map the at least one output action to at least one control value for the at least one packet forwarding control parameter. The applying module 1605 may be configured to apply the at least one control value for the at least one packet forwarding control parameter.

In an embodiment, the packet forwarding entity 1600 comprises a receiving module 1606 configured to receive at least one of the at least one QoS budget indicator, the at least one QoS performance weight, or the control related information for the at least one packet forwarding control parameter aware by the network control entity from the network control entity.

FIG. 17 is a block diagram showing a network control entity according to an embodiment of the disclosure. As shown, the network control entity 1700 comprises an obtaining module 1701 and a sending module 1702. The obtaining module 1701 may be configured to obtain information used for participating in determining a reward score of state information of a packet forwarding environment and/or control related information for at least one packet forwarding control parameter. The sending module 1702 may be configured to send the information used for participating in determining the reward score of the state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter to a packet forwarding entity via a network interface. The control related information for at least one packet forwarding control parameter is used for participating in determining at least one output action for the at least one packet forwarding control parameter.

The term unit or module may have conventional meaning in the field of electronics, electrical devices and/or electronic devices and may include, for example, electrical and/or electronic circuitry, devices, modules, processors, memories, logic solid state and/or discrete devices, computer programs or instructions for carrying out respective tasks, procedures, computations, outputs, and/or displaying functions, and so on, as such as those that are described herein.

With function units, the packet forwarding entity or the network control entity may not need a fixed processor or memory, any computing resource and storage resource may be arranged from the packet forwarding entity or the network control entity in the communication system. The introduction of virtualization technology and network computing technology may improve the usage efficiency of the network resources and the flexibility of the network.

According to an aspect of the disclosure it is provided a computer program product being tangibly stored on a computer readable storage medium and including instructions which, when executed on at least one processor, cause the at least one processor to carry out any of the methods as described above.

According to an aspect of the disclosure it is provided a computer-readable storage medium storing instructions which when executed by at least one processor, cause the at least one processor to carry out any of the methods as described above.

In addition, the present disclosure may also provide a carrier containing the computer program as mentioned above, wherein the carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium. The computer readable storage medium can be, for example, an optical compact disk or an electronic memory device like a RAM (random access memory) , a ROM (read only memory) , Flash memory, magnetic tape, CD-ROM, DVD, Blue-ray disc and the like.

The techniques described herein may be implemented by various means so that an apparatus implementing one or more functions of a corresponding apparatus described with an embodiment comprises not only prior art means, but also means for implementing the one or more functions of the corresponding apparatus described with the embodiment and it may comprise separate means for each separate function or means that may be configured to perform one or more functions. For example, these techniques may be implemented in hardware (one or more apparatuses) , firmware (one or more apparatuses) , software (one or more modules) , or combinations thereof. For a firmware or software, implementation may be made through modules (e.g., procedures, functions, and so on) that perform the functions described herein.

Exemplary embodiments herein have been described above with reference to block diagrams and flowchart illustrations of methods and apparatuses. It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by various means including computer program instructions. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the subject matter described herein, but rather as descriptions of features that may be specific to particular embodiments. Certain features that are described in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable sub-combination.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any implementation or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular implementations. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.

It will be obvious to a person skilled in the art that, as the technology advances, the inventive concept can be implemented in various ways. The above described embodiments are given for describing rather than limiting the disclosure, and it is to be understood that modifications and variations may be resorted to without departing from the spirit and scope of the disclosure as those skilled in the art readily understand. Such modifications and variations are considered to be within the scope of the disclosure and the appended claims. The protection scope of the disclosure is defined by the accompanying claims.

Claims

A method (300) performed by a packet forwarding entity, comprising:

obtaining (302) state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface;

determining (304) a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment;

determining (306) at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment;

mapping (308) the at least one output action to at least one control value for the at least one packet forwarding control parameter; and

applying (310) the at least one control value for the at least one packet forwarding control parameter.
The method according to claim 1, wherein the state information of packet forwarding environment comprises at least one of:

real-time input traffic characteristics,

real-time state of forwarding quality of service (QoS) performance and/or traffic management performance,

real-time state of hardware resource related to QoS and/or traffic management,

at least one QoS budget indicator,

at least one QoS performance weight, or

control related information for the at least one packet forwarding control parameter.
The method according to claim 1 or 2, wherein the at least first part of the state information of packet forwarding environment comprises at least one of:

real-time input traffic characteristics,

real-time state of forwarding QoS performance and/or traffic management performance,

real-time state of hardware resource related to QoS and/or traffic management,

at least one QoS budget indicator, or

at least one QoS performance weight.
The method according to any of claims 1-3, wherein the at least second part of the state information of packet forwarding environment comprises at least one of:

real-time input traffic characteristics,

real-time state of forwarding QoS performance and/or traffic management performance,

real-time state of hardware resource related to QoS and/or traffic management,

control related information for the at least one packet forwarding control parameter,

at least one QoS budget indicator, or

at least one QoS performance weight.
The method according to any of claims 2-4, wherein at least one of the at least one QoS budget indicator, the at least one QoS performance weight, or the control related information for the at least one packet forwarding control parameter aware by a network management and/or control plane is received from a network control entity.
The method according to any of claims 2-4, wherein the control related information for the at least one packet forwarding control parameter unaware by a network management and/or control plane is obtained from the packet forwarding entity.
The method according to any of claims 2-6, wherein the real-time input traffic characteristics comprises at least one of:

an ingress instantaneous rate of a service class,

an ingress average rate of a service class,

an instantaneous packet size of a service class, or

an average packet size of a service class.
The method according to any of claims 2-7, wherein the real-time state of forwarding QoS performance and/or traffic management performance comprises at least one of:

a real output instantaneous rate of a service class,

a real output average rate of a service class,

a real instantaneous packet drop ratio of a service class,

a real average packet drop ratio of a service class,

a real maximum packet latency of a service class,

a real minimum packet latency of a service class, or

a real average packet latency of a service class.
The method according to any of claims 2-8, wherein the real-time state of hardware resource related to QoS and/or traffic management comprises at least one of:

queuing status,

buffer status, or

bandwidth status.
The method according to any of claims 2-9, wherein the control related information for the at least one packet forwarding control parameter comprises at least one of:

a baseline value of a packet forwarding control parameter,

a control mode of a packet forwarding control parameter,

a tune ratio of a packet forwarding control parameter,

a minimal value of a packet forwarding control parameter, or

a maximal value of a packet forwarding control parameter.
The method according to any of claims 2-10, wherein the at least one QoS budget indicator comprises at least one of:

a budget indicator for QoS requirement on packet latency for a service class,

a budget indicator for QoS requirement on packet loss ratio for a service class, or

a budget indicator for QoS requirement on traffic bandwidth for a service class.
The method according to any of claims 2-11, wherein the at least one QoS performance weight comprises at least one of:

a weight for forwarding QoS performance on packet latency for a service class,

a weight for forwarding QoS performance on packet loss ratio for a service class,

a weight for forwarding QoS performance on traffic bandwidth for a service class, or

a weight of a service class.
The method according to any of claims 1-12, wherein the at least one output action for at least one packet forwarding control parameter is determined by an agent of reinforcement learning.
The method according to claim 13, wherein the agent of reinforcement learning comprises an agent of deep reinforcement learning.
The method according to claim 13 or 14, wherein

the agent of reinforcement learning is implemented based on at least a function approximator supporting continuous state space and continuous action space or supporting continuous state space and discrete action space, and/or

the agent of deep reinforcement learning is implemented based on at least one deep neural network supporting continuous state space and continuous action space or supporting continuous state space and discrete action space.
The method according to claim 15, wherein the at least one deep neural network comprises at least one of convolutional neural network, recurrent neural network or attention neural network.
The method according to any of claims 1-16, wherein mapping the at least one output action to at least one control value for the at least one packet forwarding control parameter comprises:

mapping the at least one output action to at least one control value for the at least one packet forwarding control parameter based on at least one of: a control mode of the at least one packet forwarding control parameter, a baseline value, a tune ratio, or a specified value range.
The method according to any of claims 1-17, wherein a control mode of a packet forwarding control parameter comprises at least one of:

a control mode indicating the packet forwarding control parameter is controlled by an agent of reinforcement learning in a packet forwarding entity based on at least one of a tune ratio, a specified value range, or an initial baseline value,

a control mode indicating the packet forwarding control parameter is not allowed to be controlled by an agent of reinforcement learning in a packet forwarding entity, or

a control mode indicating the packet forwarding control parameter is freely controlled by an agent of reinforcement learning in a packet forwarding entity.
The method according to any of claims 1-18, wherein the at least one packet forwarding control parameter comprises at least one of:

one or more packet forwarding control parameters for QoS function, or

one or more packet forwarding control parameters for traffic management function.
The method according to any of claims 1-19, wherein the reward score of the state information of packet forwarding environment is determined based on at least one of below factors or a weighted combination of at least one of below factors:

a positive reward score is given to a situation that all service classes have zero queuing packet,

for a service class, a larger elastic bandwidth relative to a corresponding bandwidth budget, a larger reward component of the service class is given,

for a service class, more service-aware fairness of elastic bandwidth compares to other service classes, a larger reward component of the service class is given,

for a service class, a smaller packet latency relative to a corresponding packet latency budget, a larger reward component of the service class is given,

for a service class, more service-aware fairness of packet latency compares to other service classes, a larger reward component of the service class is given,

for a service class, a smaller drop ratio relative to a corresponding drop ratio budget, a larger reward component of the service class is given, or

for a service class, more service-aware fairness of drop ratio compares to other service classes, a larger reward component of the service class is given.
The method according to any of claims 1-20, wherein there is corresponding state information of packet forwarding environment for a specific service class.
The method according to claim 21, wherein the specific service class is identified by at least one of:

traffic class, or

drop precedence, or

a combination of traffic class and drop precedence.
A method (1100) performed by a network control entity, comprising:

obtaining (1102) information used for participating in determining a reward score of state information of a packet forwarding environment and/or control related information for at least one packet forwarding control parameter; and

sending (1104) the information used for participating in determining the reward score of the state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter to a packet forwarding entity via a network interface,

wherein the control related information for at least one packet forwarding control parameter is used for participating in determining at least one output action for the at least one packet forwarding control parameter.
The method according to claim 23, wherein the information used for participating in determining the reward score of the state information of the packet forwarding environment comprises at least one of:

at least one QoS budget indicator, or

at least one QoS performance weight.
The method according to claim 24, wherein the at least one QoS budget indicator comprises at least one of:

a budget indicator for QoS requirement on packet latency for a service class,

a budget indicator for QoS requirement on packet loss ratio for a service class, or

a budget indicator for QoS requirement on traffic bandwidth for a service class.
The method according to claim 24 or 25, wherein the at least one QoS performance weight comprises at least one of:

a weight for forwarding QoS performance on packet latency for a service class,

a weight for forwarding QoS performance on packet loss ratio for a service class,

a weight for forwarding QoS performance on traffic bandwidth for a service class, or

a weight of a service class.
The method according to any of claims 23-26, wherein a QoS budget indicator is determined based on at least one of:

service level agreement,

flow-path mapping, or

path topology.
The method according to any of claims 23-27, wherein the information used for participating in determining the reward score of state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter of the packet forwarding entity are sent to the packet forwarding entity via a network interface.
The method according to any of claims 23-28, wherein the at least one output action for at least one packet forwarding control parameter is determined by an agent of reinforcement learning in a packet forwarding entity.
The method according to any of claims 23-29, wherein the at least one packet forwarding control parameter comprises at least one of:

one or more packet forwarding control parameters for QoS function, or

one or more packet forwarding control parameters for traffic management function.
The method according to any of claims 23-30, wherein the control related information for the at least one packet forwarding control parameter comprises at least one of:

a baseline value of a packet forwarding control parameter,

a control mode of a packet forwarding control parameter,

a tune ratio of a packet forwarding control parameter,

a minimal value of a packet forwarding control parameter, or

a maximal value of a packet forwarding control parameter.
The method according to claim 31, wherein the control mode of the packet forwarding control parameter comprises at least one of:

a control mode indicating the packet forwarding control parameter is controlled by an agent of reinforcement learning in a packet forwarding entity based on at least one of a tune ratio, a specified value range, or an initial baseline value,

a control mode indicating the packet forwarding control parameter is not allowed to be controlled by an agent of reinforcement learning in a packet forwarding entity, or

a control mode indicating the packet forwarding control parameter is freely controlled by an agent of reinforcement learning in a packet forwarding entity.
A packet forwarding entity (1500) , comprising:

a processor (1521) ; and

a memory (1522) coupled to the processor (1521) , said memory (1522) containing instructions executable by said processor (1521) , whereby said packet forwarding entity (1500) is operative to:

obtain state information of packet forwarding environment from the packet forwarding entity and a network control entity via a network interface;

determine a reward score of the state information of packet forwarding environment based on at least first part of the state information of packet forwarding environment;

determine at least one output action for at least one packet forwarding control parameter to maximize a discounted accumulative reward from the packet forwarding environment based on the reward score and at least second part of the state information of packet forwarding environment;

map the at least one output action to at least one control value for the at least one packet forwarding control parameter; and

apply the at least one control value for the at least one packet forwarding control parameter.
The packet forwarding entity according to claim 33, wherein the packet forwarding entity is further operative to perform the method of any one of claims 2 to 22.
A network control entity (1500) , comprising:

a processor (1521) ; and

a memory (1522) coupled to the processor (1521) , said memory (1522) containing instructions executable by said processor (1521) , whereby said network control entity is operative to:

obtain information used for participating in determining a reward score of state information of a packet forwarding environment and/or control related information for at least one packet forwarding control parameter; and

send the information used for participating in determining the reward score of the state information of the packet forwarding environment and/or the control related information for at least one packet forwarding control parameter to a packet forwarding entity via a network interface,

wherein the control related information for at least one packet forwarding control parameter is used for participating in determining at least one output action for the at least one packet forwarding control parameter.
The network control entity to claim 35, wherein the network control entity is further operative to perform the method of any one of claims 24 to 32.
A computer-readable storage medium storing instructions which when executed by at least one processor, cause the at least one processor to perform the method according to any one of claims 1 to 32.
A computer program product comprising instructions which when executed by at least one processor, cause the at least one processor to perform the method according to any of claims 1 to 32.