CN115484205A - Deterministic network routing and queue scheduling method and device - Google Patents

Deterministic network routing and queue scheduling method and device Download PDF

Info

Publication number
CN115484205A
CN115484205A CN202210822548.9A CN202210822548A CN115484205A CN 115484205 A CN115484205 A CN 115484205A CN 202210822548 A CN202210822548 A CN 202210822548A CN 115484205 A CN115484205 A CN 115484205A
Authority
CN
China
Prior art keywords
network
agent
deterministic
forwarding
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210822548.9A
Other languages
Chinese (zh)
Other versions
CN115484205B (en
Inventor
谢坤
黄小红
李丹丹
张沛
马如兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210822548.9A priority Critical patent/CN115484205B/en
Publication of CN115484205A publication Critical patent/CN115484205A/en
Application granted granted Critical
Publication of CN115484205B publication Critical patent/CN115484205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/123Evaluation of link metrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements

Abstract

The present disclosure provides a deterministic network routing and queue scheduling method and device, belonging to the technical field of computer, the method comprises: agent a for creating a computational forwarding path r And an agent A for calculating the forwarding period at each node along the path c (ii) a Agent A r And agent A c Sharing a reward r; and taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximum expected reward as the optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the period offset information at each node along the path. The deterministic network routing and queue scheduling method and device provided by the disclosure can adapt to the dynamic property of the environment, do not need to artificially establish a complex static model, and canScheduling strategies can be adjusted in real time to accommodate new circumstances.

Description

Deterministic network routing and queue scheduling method and device
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a deterministic network routing and queue scheduling method and apparatus.
Background
In the aspect of meeting the requirement of deterministic Network low delay, most of the existing work adopts a scheme of optimizing a gating list (IEEE 802.1 Qbv) of an output port in a Time Sensitive Network (TSN) second-layer Network to perform traffic scheduling. There are also efforts in third tier deterministic networks to optimize the scheduling scheme from various perspectives to achieve deterministic network service. However, the optimization model-based solutions proposed in these works are not scalable, and heuristic algorithms may result in local optima and thus do not allow efficient optimization. In the solution using the deep reinforcement learning model, the deterministic service quality is improved by selecting the next hop for each hop, but the traditional queue management scheme is still adopted, so that the optimization effect is limited; the solution scheme based on the optimization model has the problems of low solution speed, incapability of achieving better network performance by a heuristic algorithm and the like. In addition, when the traditional optimization model-based scheme is adopted, the generation of new traffic demands will cause the configuration scheme of all the deployed flows in the network to be cancelled, and at the moment, a new configuration scheme needs to be calculated for the recombined traffic matrix, and if more nodes are involved, the resulting interaction will cause the delay to increase.
Disclosure of Invention
In view of the above, the present disclosure provides a deterministic network routing and queue scheduling method and apparatus.
Based on the above purpose, the present disclosure provides a deterministic network routing and queue scheduling method, including:
agent a for creating a computational forwarding path r And an agent A for calculating the forwarding period at each node along the path c (ii) a Agent A r And agent A c Sharing rewards
Figure BDA0003742774540000011
Agent A in a multiple intelligent near-end policy optimized MAPPO model r And agent A c Corresponding to one Actor networkAgent A r And agent A c Share a Critic network;
the global network state is used as the input of an evaluation network, the state value is used as the output of the evaluation network, the maximum expected reward is used as the optimization target, the network is continuously updated, and the optimal route and the optimal forwarding queue are selected to specify the forwarding path of the deterministic flow and the period offset information of each node along the path.
Optionally, with global network status as an input of the evaluation network, and status value as an output of the evaluation network, and with maximization of the expected reward as an optimization goal, continuously updating the network, including:
initializing a network environment;
inputting network status to agent A separately r And agent A c Obtaining a combined strategy Action corresponding to the Actor network;
executing a strategy, acquiring the next-time running state and the global reward, and storing the front-back running state, the strategy and the reward into a buffer;
and acquiring experience from the buffer under the condition that a training period is reached, and respectively updating the Critic network and the Actor network.
Optionally, the method further comprises:
in the case that the training period is not reached, the network state is input to agent A again r And agent A c And obtaining a combined strategy Action corresponding to the Actor network.
Optionally, the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration number does not reach the maximum value, initializing the network environment again.
Optionally, the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration times reach the maximum value, if the model is converged, outputting the optimal route and the optimal forwarding queue.
Optionally, the method further comprises:
determining the number of training rounds and the updating period, and initializing the iteration variables.
Optionally, the reward
Figure BDA0003742774540000021
The resource utilization variance and the forwarding delay are integrated indexes.
The present disclosure also provides a deterministic network routing and queue scheduling apparatus, comprising:
model building module for creating an agent A for calculating a forwarding path r And an agent A for calculating the forwarding period at each node along the path c (ii) a Agent A r And agent A c Sharing rewards
Figure BDA0003742774540000022
Agent A in a Multi-Intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network;
and the scheduling module is used for taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximum expected reward as the optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the period offset information of each node along the path.
The present disclosure also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the above deterministic network routing and queue scheduling method when executing the program.
The present disclosure also provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above deterministic network routing and queue scheduling method.
From the above, it can be seen that the deterministic network routing and queue scheduling method and apparatus provided by the present disclosure can adapt to the dynamic nature of the environment, do not need to manually establish a complex static model, and can adjust the scheduling policy in real time to adapt to a new environment.
Drawings
In order to clearly illustrate the technical solutions of the present disclosure or related technologies, the drawings used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the description below are only embodiments of the present disclosure, and those skilled in the art can obtain other drawings without making a creative effort.
Fig. 1 is a schematic diagram of a deterministic network routing and queue scheduling method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a system architecture according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a deterministic MAPPO-based network flow routing and queue scheduling algorithm architecture in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic flow diagram of a deterministic MAPPO-based network flow routing and queue scheduling algorithm in accordance with an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a deterministic network routing and queue scheduling apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an electronic device of an embodiment of the disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be described in further detail below with reference to specific embodiments and the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items.
The invention relates to the technical field of network communication and machine learning, in particular to a deterministic network routing and queue scheduling method based on deep strong chemistry. The method carries out routing and queue decision based on real-time network state and deterministic requirement aiming at deterministic network service scene, adopts multi-agent deep reinforcement learning algorithm, and issues deterministic network flow to a data plane through optimal path and optimal queue scheduling scheme to guide forwarding, thereby realizing maximum utilization rate of network resource while ensuring deterministic service.
Deterministic networks impose more strict requirements on network performance, such as bounded delay and jitter, and the traditional method for optimizing average performance based on statistical probability will cause great loss in this scenario. The invention provides a deterministic network routing and queue combined scheduling method based on deep reinforcement learning, which uses a periodic queue forwarding function to dynamically utilize the queuing and scheduling functions of a second layer on a third layer, combines the decision-making capability of the deep reinforcement learning technology, and uses a plurality of intelligent agents to comprehensively consider the routing and scheduling so as to solve the deterministic transmission problem of the deterministic network of the three layers and ensure that the bearable quantity of the deterministic network flow of the network is improved to the maximum extent. The method relates to the technical field of network communication and machine learning, and comprises the steps of designing a joint scheduling algorithm of routes and queues based on an SDN (software defined network); constructing a depth strengthening model under a deterministic network background, wherein the depth strengthening model comprises the design of a network state, an action space and rewards; and designing a training scheme of the decision model.
Software Defined Networking (SDN) is a new type of Network architecture. The basic idea is to decouple the control plane and the data plane of the traditional distributed network thoroughly, and use a controller with a logic set to control the whole distributed data plane, thereby realizing the centralized network management and configuration, improving the efficiency of the network management and reducing the complexity of the network configuration. When the controller receives a deterministic service request from a user, it analyzes the characteristic information of the deterministic flow and calculates explicit path and resource reservation information based on the network topology and state information and the deterministic network capabilities, and if successful allocation is possible, it responds to the service request. With SDN technology, deterministic networks can become more flexible and agile in guaranteeing deterministic services.
With the development of the internet and communication networks, several applications with deterministic requirements for network services, such as industrial control, car networking, smart grid, etc., are emerging in many emerging business areas. Meeting such deterministic service requirements has become a key driver for the development of network technology. However, the conventional Internet Protocol (IP) network only provides best effort services, and even if some quality of service policies, such as differentiated services (Diffserv), congestion control, etc., exist, due to the existence of micro-bursts in the network, these mechanisms only provide optimization of average performance based on probability statistics, and cannot meet the requirements of deterministic services, such as zero packet loss rate, bounded delay and jitter, etc.
In order to meet the Deterministic service requirements of such applications, an Internet Engineering Task Force (IETF) established Deterministic network (Deterministic Task Force) working group respectively optimizes a link layer and a network layer of an ethernet network, and improves the supporting capability of the ethernet network for time-sensitive streaming transmission. The method mainly optimizes the L3 layer of the Ethernet in the aspects of dynamic network configuration, resource arrangement, path planning, routing forwarding, multi-path forwarding and the like.
To implement deterministic Forwarding services in three layers of the network, the IETF deterministic network group proposes draft standards for periodic Specified Queuing and Forwarding (CSQF) mechanisms. It is an evolution of round-robin queue Forwarding (CQF), which increases the likelihood of using more queues to achieve loose synchronization and high-level scheduling between nodes compared to CQF. CSQF operates at the third layer, which allows flexible routing and scheduling of packets using Segment Routing (SR). The SR label stack is used to specify at which port (route) and which queue (schedule) each intermediate node should transmit after receiving and processing a packet.
Deep Reinforcement Learning (DRL) is a sub-field in machine Learning, combining Reinforcement Learning (RL) and Deep Neural Network (DNN). Reinforcement learning continuously interacts with the environment through agents, and can automatically learn the optimal actions (i.e., strategies) that should be taken in different states to maximize the rewards earned. Deep neural networks are brought into a solution by deep reinforced learning, the optimal strategy can be fully fitted by the strong representation capability of DNN, and the DNN can be well adapted to complex environments.
Multi-Agent Deep Learning (MADRL) uses the ideas and algorithms of Deep Learning in the Learning and control of Multi-Agent systems. The strategy of each agent in the multi-agent system is not only dependent on the feedback of the strategy and environment of the agent, but also influenced by the behaviors and cooperative relations of other agents.
In the aspect of meeting the requirement of deterministic Network low delay, most of the existing work adopts a scheme of optimizing a gating list (IEEE 802.1 Qbv) of an output port in a Time Sensitive Network (TSN) second-layer Network to perform traffic scheduling. There are also efforts in third tier deterministic networks to optimize the scheduling scheme from various perspectives to achieve deterministic network service. However, in none of the work, besides the scheduling of the queues, the selection of the route also has a great influence on the improvement of the optimized network performance, and the characteristic that a large-scale deterministic network has dynamic traffic is ignored. The optimization model-based solutions proposed in these works are not scalable, and heuristic algorithms may result in local optima and thus do not allow efficient optimization. In the solution using the deep reinforcement learning model, the deterministic service quality is improved by selecting the next hop at each hop, but the traditional queue management scheme is still adopted, so that the optimization effect is limited; the solution scheme based on the optimization model has the problems of low solution speed, the possibility that a heuristic algorithm cannot achieve better network performance and the like. In addition, when the traditional optimization model-based scheme is adopted, the generation of new traffic demands will cause the configuration scheme of all the deployed flows in the network to be cancelled, and at this time, a new configuration scheme needs to be calculated for the recombined traffic matrix, and if more nodes are involved, the resulting interaction will increase the delay.
The invention aims to solve the technical problem that a deterministic network routing and queue combined scheduling method based on deep reinforcement learning is invented to realize deterministic data transmission in a three-layer IP network, namely bounded jitter and end-to-end delay. The performance boundary of network optimization is limited by a single routing or queue scheduling optimization scheme adopted in the existing solution, even if the scheme combining the routing and the queue is based on a specific optimization model for modeling, the dynamic network environment is difficult to adapt, the overhead of information collection and centralized calculation is high, additional delay is caused, the dynamic flow cannot be responded in time, the adopted heuristic algorithm is easy to fall into local optimization, and the global optimization is difficult to guarantee.
Based on the information, the decision scheme based on deep reinforcement learning adopted by the method can be more suitable for a dynamic network environment, can timely cope with dynamic flow, and expands the boundary of an optimal scheme for exploring a routing and queue combined scheduling mechanism.
Fig. 1 is a schematic diagram of a deterministic network routing and queue scheduling method according to an embodiment of the present disclosure, and as shown in fig. 1, an embodiment of the present disclosure provides a deterministic network routing and queue scheduling method, an execution body of which may be an electronic device, for example, a computer, and the method includes:
step 101, an agent A for calculating a forwarding path is created r And an agent for calculating the forwarding period at each node along the pathA c (ii) a Agent A r And agent A c Shared rewards
Figure BDA0003742774540000061
Agent A in a multiple intelligent near-end policy optimized MAPPO model r And agent A c Corresponds to an Actor network, agent A r And agent A c Sharing a Critic network;
and 102, taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximum expected reward as the optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the period offset information of each node along the path.
Specifically, fig. 2 is a schematic diagram of a system architecture of the embodiment of the present disclosure, and as shown in fig. 2, the present invention is directed to a deterministic network scenario, a queue scheduling period is divided by using a period forwarding queue function, a deep reinforcement learning model is adopted, a global network state is used as an input of an evaluation network, a state value is used as an output of the evaluation network, and an algorithm continuously updates the network with a maximized expected reward as an optimization target. In order to jointly schedule routes and queues for deterministic services using multi-agent deep reinforcement learning, the forwarding path of deterministic flows and the cycle offset information at nodes along the way are specified by selecting the optimal route and optimal forwarding queue. The algorithm creates two agents, one agent calculates a forwarding path, one agent calculates a forwarding period, the agents share rewards according to a completed multi-agent deep reinforcement learning framework, and the reward value is set as a comprehensive index of resource utilization rate variance and forwarding delay. When a deterministic network flow is scheduled, if each link in the scheme obtained by the calculation of the intelligent agent and the selected period have enough capacity and meet the requirement of time delay, the requirement can be successfully distributed, and a strategy issuing module of the controller generates an SID label stack to be issued to a data plane to guide forwarding.
The invention characterizes deterministic flows by<src,dst,period,delay,bw>The quintuple represents that,information describing deterministic flows respectively: source and destination ports, period, delay ceiling, and bandwidth. Inside CSQF supporting equipment, each port is reserved with N nd (N nd = 3) queues are used for deterministic flow, a scheduling cycle C based on 10 μ s is divided at all nodes of the whole network, that is, the resource of each queue at each node is divided into C cycles, and at this time, the maximum jitter from end to end is 20 μ s, which meets the standard ultra-low delay requirement. Without loss of generality, it is assumed that the cycles of the entire network start at the same time and that the super-cycle length C of each port/link is the same.
The invention abstracts the network into an undirected graph
Figure BDA0003742774540000071
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003742774540000072
and ε is the set of points and edges for the network, ε c For a set of link resources, epsilon c The information of each link in the network comprises the residual bandwidth of the link, the occupation condition of a CSQF queue, and epsilon is the set of communication links between devices. To be provided with
Figure BDA0003742774540000073
Representing a set of deterministic network flows, each service being represented by<src,dst,period,delay,bw>Quintuple representation, which respectively describes information of deterministic flows: source and destination ports, period, delay ceiling, and bandwidth. With p f And r f Indicating the forwarding path and forwarding period offset ultimately selected by the algorithm for the data flow F e F.
FIG. 3 is a schematic diagram of a deterministic network flow routing and queue scheduling algorithm based on MAPPO according to an embodiment of the present disclosure, and as shown in FIG. 3, agent A r And agent A c Sharing rewards
Figure BDA0003742774540000081
Agent A in multi-intelligent near-end strategy optimization MAPPO model r And agent A c Corresponds to oneAn Actor network, agent A r And agent A c Sharing a critical network.
For the agent in the algorithm, in a certain environment state, a certain action is sent out, feedback of the environment, namely reward, is obtained, the environment state is changed, and in a new state, the agent continues to send out the action and obtain the feedback, and continuously interacts with the environment. Denote the set of agents by A r Agent representing a proxy computation route, with A c Agents represent agents that compute forwarding periods.
For agent A r The formula for the state is described as follows:
Figure BDA0003742774540000082
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003742774540000083
indicating the status of the network link or links,
Figure BDA0003742774540000084
LU denotes the link utilization of an edge, D denotes the end-to-end delay of an edge.
The formula for the action set is described as follows:
Figure BDA0003742774540000085
the formula for the action is described as follows:
a r =p f ,p f ∈P f
for agent A c The formula for the state is described as follows:
Figure BDA0003742774540000086
the formula for the action set is described as follows:
Figure BDA0003742774540000087
the formula for the action is described as follows:
a c =r f ,r f ∈R f
agent sharing rewards
Figure BDA0003742774540000088
For the comprehensive index of resource utilization variance and forwarding delay, the formula is described as follows:
Figure BDA0003742774540000089
where std (LU) denotes the standard deviation of the link utilization, f bw The bandwidth required for the deterministic network flow f, i.e. the bandwidth allocated to the service, D f Selected forwarding path p for deterministic service flow f for agent f And a forwarding period offset r f The latter transmission delay, which includes two parts, (i) the sum of the propagation delays of the links between the nodes
Figure BDA00037427745400000810
(ii) Summation of period offsets of intermediate nodes
Figure BDA00037427745400000811
r f,e Representing the periodic offset of the deterministic network flow f on edge e. f. of delay The upper end-to-end delay limit required for the deterministic network flow f can only be transmitted if it is guaranteed that this condition is met, otherwise the service request will be rejected, α, β, γ are the weighting parameters.
Optionally, with global network status as an input of the evaluation network, and status value as an output of the evaluation network, and with maximization of the expected reward as an optimization goal, continuously updating the network, including:
initializing a network environment;
separately inputting network states into the intelligenceEnergy body A r And agent A c Obtaining a combined strategy Action corresponding to the Actor network;
executing a strategy, acquiring the next-time running state and the global reward, and storing the front-back running state, the strategy and the reward into a buffer;
and acquiring experience from the buffer under the condition that a training period is reached, and respectively updating the Critic network and the Actor network.
Optionally, the method further comprises:
in the case that the training period is not reached, the network state is input to agent A again r And agent A c And obtaining a joint strategy Action corresponding to the Actor network.
Optionally, the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration number does not reach the maximum value, initializing the network environment again.
Optionally, the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration times reach the maximum value, if the model is converged, outputting the optimal route and the optimal forwarding queue.
Optionally, the method further comprises:
determining the number of training rounds and the updating period, and initializing the iteration variables
Fig. 4 is a schematic flow chart of a deterministic network flow routing and queue scheduling algorithm based on MAPPO in an embodiment of the present disclosure, and as shown in fig. 4, the deterministic network flow routing and queue scheduling algorithm based on MAPPO specifically includes the following steps:
1. firstly, a topology model of the network and information of computing nodes are created, m points exist in the topology, for example, m is more than or equal to 30, the topology comprises N computing nodes, for example, N is more than or equal to 8, and each port keeps N nd (N nd = 3) queues for deterministic flows, the scheduling period C of the 10 μ s-based cycle being divided at all nodes of the network, i.e. the resource of each queue at each node is divided intoAnd C periods, wherein all nodes in the whole network start the period at the same time. The link bandwidth in the topology is set to be a uniform value x MB/s, and x is more than or equal to 40.
2. Initializing variables, setting the initial iteration number i =0, and the maximum iteration number as i _ max, for example, i _ max ≧ 1000000, i _maxis set based on actual requirements. An empirical playback pool is set, with a length of n, e.g., n > =5000.
3. Creating 2 agent object instances, agent A r Representing the computational routing path, a mental agent A c Representing a calculation forwarding period, each agent corresponds to an Actor network in the MAPPO model, 2 agents share a criticic network, three layers of fully-connected neural networks are adopted, and network parameters are initialized randomly.
4. Randomly generating deterministic network flows, the information of each deterministic network flow f is composed of five-tuple < src, dst, period, delay, bw > five-tuple, which respectively describes the characteristics of the deterministic flow: source destination port, period, delay upper bound, and bandwidth. The source node and destination node are created by equal probability selection of 2 values from the set of vertices, with a packet length of 100-1500B. Normally, a stream sending period is randomly extracted from a set {1,2,4,8} ms, and a delay upper bound follows normal distribution with a minimum value of 20ms and a maximum value of 50 ms; the bandwidth of the stream follows a normal distribution with a minimum of 5MB/s and a maximum of 20 MB/s.
5. Initial State is set to a combination of network topology, deterministic network flow requirements, link resources, and network link State
Figure BDA0003742774540000101
6. An iteration is started to increment the value of i by 1. Each agent generates an action. Agent computing routed agent A r From action set P over Actor network f Selecting the generation action array a with the maximum Q value r = [p f,k1 ,p f,k2 ,...,p f,|k| ],p f,ki The path link numbers 1 to m to which the network flow f is assigned are shown, and the array size is the path length. Agent A for proxy calculation of forwarding period c Act of (a)or network action set R f Selecting the generation action array a with the maximum Q value r =[r f,k1 ,r f,k2 ,...,r f,|k| ],r f,k Indicating the forwarding period offset at the path link number to which the network flow f is assigned, and the array size is the path length.
7. Agent A r Calculated forwarding path, agent A c Given forwarding cycle recompose combined Action = (a) r ,a c ) It can be determined where p is a deterministic network flow f When r f And forwarding, namely applying the forwarding as a path and queue scheduling planning scheme to a network.
8. Generating corresponding deterministic network flow in the network, obtaining the link utilization rate, transmission delay and stay time delay at each node in the network, and calculating the variance of the link utilization rate and the total time delay D f Derived from the weight indices of the two specified
Figure BDA0003742774540000102
I.e. Reward.
9. Setting State' to a combination of network topology, deterministic network flow demand, link resources, and network link State
Figure BDA0003742774540000111
State, action = (a) r ,a c ) And Reward, state' is stored in the verified playback pool for reference of model iterative learning.
10. And repeating the 4 th step to the 9 th step until the experience playback pool is full.
11. And each agent plays back a pool updating strategy according to experience, wherein the updating process is as follows:
12. inputting the latest State 'obtained in the step 11 into a Critic network to obtain the v' value of the State, and calculating the discount reward R [ t ]]=r[t]+δ 1 *r[t+1]+…+δ T-t *r[t_]To yield R = [ R [0 ]],R[1],...,R[t],...R[t_]]Where t _ is the last time step and δ is the discount factor.
13. To pool experience playbackAll states are input into a Critic network to obtain V _ values of all states, and a dominance function is calculated
Figure BDA0003742774540000112
The value is obtained.
14. Calculating loss (loss) functions of the Critic network, and reversely propagating and updating the Critic network, wherein the loss functions of the Critic network are expressed as follows:
Figure BDA0003742774540000113
15. for the Actor network of each agent, inputting all stored State combinations into the Actor-old and Actor-new networks (the network structures are the same), respectively obtaining Normal distribution Normal1 and Normal2, combining all stored Actions into Actions, inputting the Actions into the Normal distribution Normal1 and Normal2, obtaining prob1 and prob2 corresponding to each Action, and then dividing prob2 by prob1 to obtain the importance weight ratio.
16. For the Actor network of each agent, calculating a loss function of the Actor network, and performing back propagation to update the Actor-new network, wherein the expression of the loss function of the Actor network is as follows:
Figure BDA0003742774540000114
ratio is the importance weight obtained in step 15, E is the learning rate, clip (ratio, 1-E, 1+ E) indicates that ratio beyond the range of (1-E, 1+ E) is cut out.
17. And (5) repeating the steps 15 and 16, for example, after repeating for 10 times, updating the Actor-old network by using the parameters of the Actor-new network.
18. And judging whether the iteration number i exceeds the maximum iteration number i _ max, if not, returning to the step 4, continuing the iteration, if so, ending the algorithm, and at the moment, outputting the optimal route and the optimal forwarding queue by the algorithm according to the input state.
The method is oriented to a deterministic network, forwarding of deterministic network flows is achieved by using a deterministic network flow routing and queue scheduling algorithm based on MAPPO, one intelligent agent is responsible for calculating a forwarding path, one intelligent agent is responsible for calculating a forwarding period at each node along the path, two intelligent agents share reward of joint action, and after repeated iterative learning, the risk that overall user experience is reduced due to overlarge load of a calculation node or network congestion can be avoided while deterministic service is guaranteed. The deterministic network flow scheduling algorithm designed by the invention can adapt to the dynamic property of the environment, does not need to artificially establish a complex static model, and can adjust a scheduling strategy in real time to adapt to a new environment.
It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to any embodiment method, the disclosure also provides a deterministic network routing and queue scheduling device.
Fig. 5 is a schematic diagram of a deterministic network routing and queue scheduling apparatus according to an embodiment of the present disclosure, and as shown in fig. 5, the deterministic network routing and queue scheduling apparatus includes a model building module 501 and a scheduling module 502, where:
model (model)Building block 501 is used to create an agent a that computes a forwarding path r And an agent A for calculating the forwarding period at each node along the path c (ii) a Agent A r And agent A c Sharing rewards
Figure BDA0003742774540000121
Agent A in a multiple intelligent near-end policy optimized MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network;
the scheduling module 502 is configured to take the global network status as an input of an evaluation network, take the status value as an output of the evaluation network, take the maximum expected reward as an optimization target, continuously update the network, and select the optimal route and the optimal forwarding queue to specify a forwarding path of a deterministic flow and cycle offset information at each node along the path.
Optionally, the continuously updating the network with the global network state as an input of the evaluation network, the state value as an output of the evaluation network, and the maximization of the expected reward as an optimization goal includes:
initializing a network environment;
inputting network status to agent A separately r And agent A c Obtaining a combined strategy Action corresponding to the Actor network;
executing a strategy, acquiring the next-time running state and the global reward, and storing the front-back running state, the strategy and the reward into a buffer;
and acquiring experience from the buffer under the condition that the training period is reached, and respectively updating the Critic network and the Actor network.
Optionally, the method further comprises:
in the case that the training period is not reached, the network state is input to agent A again r And agent A c And obtaining a combined strategy Action corresponding to the Actor network.
Optionally, the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration number does not reach the maximum value, initializing the network environment again.
Optionally, the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration times reach the maximum value, if the model is converged, outputting the optimal route and the optimal forwarding queue.
Optionally, the method further comprises:
and determining the number of training rounds and an updating period, and initializing an iteration variable.
Optionally, the reward
Figure BDA0003742774540000131
The resource utilization variance and the forwarding delay are integrated indexes.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.
The apparatus in the foregoing embodiment is configured to implement the corresponding deterministic network routing and queue scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to the method of any embodiment described above, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the deterministic network routing and queue scheduling method described in any embodiment above.
Fig. 6 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.
The memory 1020 may be implemented in the form of a ROM (read only memory), a RAM (random access memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only the components necessary to implement the embodiments of the present disclosure, and need not include all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding deterministic network routing and queue scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above-described embodiment methods, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the deterministic network routing and queue scheduling method according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the deterministic network routing and queue scheduling method according to any of the foregoing embodiments, and have the beneficial effects of corresponding method embodiments, and are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is merely exemplary, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Additionally, devices may be shown in block diagram form in order to avoid obscuring embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims (10)

1. A deterministic network routing and queue scheduling method, comprising:
agent a for creating a computational forwarding path r And an agent A for calculating the forwarding period at each node along the path c (ii) a Agent A r And agent A c A shared reward r; agent A in a Multi-Intelligent near-end policy optimization MAPPO model r And agent A c Corresponds to an Actor network, agent A r And agent A c Share a Critic network;
and taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximum expected reward as the optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the period offset information at each node along the path.
2. The deterministic network routing and queue scheduling method of claim 1, wherein continuously updating the network with global network state as an input to the evaluation network and state value as an output to the evaluation network with optimization goals of maximizing the expected rewards comprises:
initializing a network environment;
inputting network status to agent A separately r And agent A c Obtaining a combined strategy Action corresponding to the Actor network;
executing a strategy, acquiring the next-time running state and the global reward, and storing the front-back running state, the strategy and the reward into a buffer;
and acquiring experience from the buffer under the condition that the training period is reached, and respectively updating the Critic network and the Actor network.
3. The deterministic network routing and queue scheduling method of claim 2, wherein the method further comprises:
in the case that the training period is not reached, the network state is input to agent A again r And agent A c And obtaining a joint strategy Action corresponding to the Actor network.
4. The deterministic network routing and queue scheduling method of claim 2, wherein the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration number does not reach the maximum value, initializing the network environment again.
5. The deterministic network routing and queue scheduling method of claim 2, wherein the method further comprises:
judging whether the iteration times reach the maximum value;
and under the condition that the iteration times reach the maximum value, if the model is converged, outputting the optimal route and the optimal forwarding queue.
6. The deterministic network routing and queue scheduling method of claim 2, wherein the method further comprises:
determining the number of training rounds and the updating period, and initializing the iteration variables.
7. The deterministic network routing and queue scheduling method according to any of claims 1-6, wherein the reward r is a composite index of resource utilization variance and forwarding delay.
8. A deterministic network routing and queue scheduling apparatus, comprising:
model building module for creating an agent A for calculating a forwarding path r And an agent A for calculating the forwarding period at each node along the path c (ii) a Agent A r And agent A c Sharing a reward r; agent A in a Multi-Intelligent near-end policy optimization MAPPO model r And agent A c Corresponds to an Actor network, agent A r And agent A c Sharing a Critic network;
and the scheduling module is used for taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximum expected reward as an optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the period offset information at each node along the path.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the deterministic network routing and queue scheduling method of any of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the deterministic network routing and queue scheduling method of any of claims 1 to 7.
CN202210822548.9A 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device Active CN115484205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210822548.9A CN115484205B (en) 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210822548.9A CN115484205B (en) 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device

Publications (2)

Publication Number Publication Date
CN115484205A true CN115484205A (en) 2022-12-16
CN115484205B CN115484205B (en) 2023-12-01

Family

ID=84422719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210822548.9A Active CN115484205B (en) 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device

Country Status (1)

Country Link
CN (1) CN115484205B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499491A (en) * 2023-12-27 2024-02-02 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160269426A1 (en) * 2015-03-09 2016-09-15 International Business Machines Corporation Deploying a security appliance system in a high availability environment without extra network burden
US20180103094A1 (en) * 2016-10-12 2018-04-12 Cisco Technology, Inc. Deterministic stream synchronization
CN108765096A (en) * 2018-06-07 2018-11-06 冯瑞新 Shared Toilet system under a kind of cloud network
CN112437020A (en) * 2020-10-30 2021-03-02 天津大学 Data center network load balancing method based on deep reinforcement learning
AU2021101685A4 (en) * 2021-04-01 2021-05-20 Arun Singh Chouhan Design and development of real time automated routing algorithm for computer networks
CN113328938A (en) * 2021-05-25 2021-08-31 电子科技大学 Network autonomous intelligent management and control method based on deep reinforcement learning
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113791634A (en) * 2021-08-22 2021-12-14 西北工业大学 Multi-aircraft air combat decision method based on multi-agent reinforcement learning
US20220004191A1 (en) * 2020-07-01 2022-01-06 Wuhan University Of Technology Usv formation path-following method based on deep reinforcement learning
CN114286413A (en) * 2021-11-02 2022-04-05 北京邮电大学 TSN network combined routing and stream distribution method and related equipment
CN114499648A (en) * 2022-03-10 2022-05-13 南京理工大学 Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation
CN114745317A (en) * 2022-02-09 2022-07-12 北京邮电大学 Computing task scheduling method facing computing power network and related equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160269426A1 (en) * 2015-03-09 2016-09-15 International Business Machines Corporation Deploying a security appliance system in a high availability environment without extra network burden
US20180103094A1 (en) * 2016-10-12 2018-04-12 Cisco Technology, Inc. Deterministic stream synchronization
CN108765096A (en) * 2018-06-07 2018-11-06 冯瑞新 Shared Toilet system under a kind of cloud network
US20220004191A1 (en) * 2020-07-01 2022-01-06 Wuhan University Of Technology Usv formation path-following method based on deep reinforcement learning
CN112437020A (en) * 2020-10-30 2021-03-02 天津大学 Data center network load balancing method based on deep reinforcement learning
AU2021101685A4 (en) * 2021-04-01 2021-05-20 Arun Singh Chouhan Design and development of real time automated routing algorithm for computer networks
CN113328938A (en) * 2021-05-25 2021-08-31 电子科技大学 Network autonomous intelligent management and control method based on deep reinforcement learning
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113791634A (en) * 2021-08-22 2021-12-14 西北工业大学 Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN114286413A (en) * 2021-11-02 2022-04-05 北京邮电大学 TSN network combined routing and stream distribution method and related equipment
CN114745317A (en) * 2022-02-09 2022-07-12 北京邮电大学 Computing task scheduling method facing computing power network and related equipment
CN114499648A (en) * 2022-03-10 2022-05-13 南京理工大学 Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱小琴;袁晖;王维洲;魏峰;张驯;赵金雄;: "基于深度强化学习的电力通信网路由策略", 科学技术创新, no. 36 *
罗俊仁: "多智能体博弈学习研究进展", 《系统工程与电子技术》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499491A (en) * 2023-12-27 2024-02-02 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning
CN117499491B (en) * 2023-12-27 2024-03-26 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN115484205B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
Prathiba et al. Federated learning empowered computation offloading and resource management in 6G-V2X
Sun et al. Autonomous resource slicing for virtualized vehicular networks with D2D communications based on deep reinforcement learning
Kim et al. Multi-agent reinforcement learning-based resource management for end-to-end network slicing
CN110753319B (en) Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN114286413B (en) TSN network joint routing and stream distribution method and related equipment
CN113064671A (en) Multi-agent-based edge cloud extensible task unloading method
WO2023040022A1 (en) Computing and network collaboration-based distributed computation offloading method in random network
Yang et al. Burst-aware time-triggered flow scheduling with enhanced multi-CQF in time-sensitive networks
EP4024212B1 (en) Method for scheduling inference workloads on edge network resources
Gu et al. Deep reinforcement learning based VNF management in geo-distributed edge computing
Qi et al. Vehicular edge computing via deep reinforcement learning
CN106211344A (en) A kind of Ad Hoc network bandwidth management method based on context aware
Hu et al. Dynamic task offloading in MEC-enabled IoT networks: A hybrid DDPG-D3QN approach
CN115484205A (en) Deterministic network routing and queue scheduling method and device
De Mendoza et al. Near optimal VNF placement in edge-enabled 6G networks
CN114189937A (en) Real-time centralized wireless network scheduling method and device based on deep reinforcement learning
Liu et al. Deep reinforcement learning based adaptive transmission control in vehicular networks
Meng et al. Intelligent routing orchestration for ultra-low latency transport networks
Li et al. Profit driven service provisioning in edge computing via deep reinforcement learning
CN115514769A (en) Satellite elastic internet resource scheduling method, system, computer equipment and medium
CN115225512A (en) Multi-domain service chain active reconstruction mechanism based on node load prediction
CN116418808A (en) Combined computing unloading and resource allocation method and device for MEC
Javadi et al. A multi-path cognitive resource management mechanism for QoS provisioning in wireless mesh networks
Bensalem et al. Towards optimal serverless function scaling in edge computing network
Zhuang et al. Adaptive and robust network routing based on deep reinforcement learning with lyapunov optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant