CN115484205B - Deterministic network routing and queue scheduling method and device - Google Patents

Deterministic network routing and queue scheduling method and device Download PDF

Info

Publication number
CN115484205B
CN115484205B CN202210822548.9A CN202210822548A CN115484205B CN 115484205 B CN115484205 B CN 115484205B CN 202210822548 A CN202210822548 A CN 202210822548A CN 115484205 B CN115484205 B CN 115484205B
Authority
CN
China
Prior art keywords
network
agent
deterministic
forwarding
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210822548.9A
Other languages
Chinese (zh)
Other versions
CN115484205A (en
Inventor
谢坤
黄小红
李丹丹
张沛
马如兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210822548.9A priority Critical patent/CN115484205B/en
Publication of CN115484205A publication Critical patent/CN115484205A/en
Application granted granted Critical
Publication of CN115484205B publication Critical patent/CN115484205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/123Evaluation of link metrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present disclosure provides a deterministic network routing and queue scheduling method and device, belonging to the field of computer technology, the method comprising: agent a creating a computed forwarding path r And an agent A for calculating the forwarding period at each node along the path c The method comprises the steps of carrying out a first treatment on the surface of the Agent A r And agent A c Shared rewardsThe global network state is used as the input of the evaluation network, the state value is used as the output of the evaluation network, the maximization expected rewards are used as the optimization target, the network is continuously updated, and the optimal route and the optimal forwarding queue are selected to specify the forwarding path of the deterministic flow and the period offset information at each node along the way. The deterministic network routing and queue scheduling method and device provided by the disclosure can adapt to the dynamic nature of the environment, does not need to manually establish a complex static model, and can adjust the scheduling strategy in real time to adapt to the new environment.

Description

Deterministic network routing and queue scheduling method and device
Technical Field
The disclosure relates to the technical field of communication, and in particular relates to a deterministic network routing and queue scheduling method and device.
Background
In meeting the low-latency requirements of deterministic networks, most of the existing work adopts a scheme of optimizing the gating list (IEEE 802.1 Qbv) of output ports in the time-sensitive network (Time Sensitive Network, TSN) second-layer network for traffic scheduling. There is also some effort in deterministic network aspects of the third layer to optimize the scheduling scheme from various angles to achieve deterministic network services. However, the solutions proposed in these works based on optimization models are not extensible and heuristic algorithms may lead to local optimizations that do not allow efficient optimization. In the solution scheme using the deep reinforcement learning model, the deterministic service quality is improved by selecting the next hop at each hop, but the optimization effect is limited by adopting the traditional queue management scheme; the solution scheme based on the optimization model has the problems that the solution speed is low, and the heuristic algorithm may not achieve better network performance. In addition, when the conventional optimization model-based scheme is adopted, the generation of new traffic demands will cause the configuration scheme of all the flows already deployed in the network to be canceled, and at this time, a new configuration scheme needs to be calculated on the reconstructed traffic matrix, and if more nodes are involved, the resulting interaction will increase the delay.
Disclosure of Invention
Accordingly, an objective of the present disclosure is to provide a deterministic network routing and queue scheduling method and apparatus.
Based on the above objects, the present disclosure provides a deterministic network routing and queue scheduling method, comprising:
agent a creating a computed forwarding path r And an agent A for calculating the forwarding period at each node along the path c The method comprises the steps of carrying out a first treatment on the surface of the Agent A r And agent A c Shared rewardsAgent A in a multi-intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network;
the global network state is used as the input of the evaluation network, the state value is used as the output of the evaluation network, the maximization expected rewards are used as the optimization target, the network is continuously updated, and the optimal route and the optimal forwarding queue are selected to specify the forwarding path of the deterministic flow and the period offset information at each node along the way.
Optionally, taking the global network state as an input of the evaluation network, taking the state value as an output of the evaluation network, taking the maximization of the expected rewards as an optimization target, and continuously updating the network, including:
initializing a network environment;
respectively inputting network states into agent A r And agent A c Obtaining a combined policy Action corresponding to the Actor network;
executing a strategy, obtaining the running state and global rewards at the next moment, and storing the front and back running states, the strategy and the rewards into a buffer;
experience is obtained from the buffer and the Critic network and the Actor network are updated, respectively, when the training period is reached.
Optionally, the method further comprises:
in case the training period is not reached, the network state is again entered into agent a r And agent A c And obtaining the joint policy Action corresponding to the Actor network.
Optionally, the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and initializing the network environment again in case that the iteration number does not reach the maximum value.
Optionally, the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and under the condition that the iteration number reaches the maximum value, if the model converges, outputting an optimal route and an optimal forwarding queue.
Optionally, the method further comprises:
the training round number and the update period are determined, and the iteration variable is initialized.
Optionally, the rewardsIs the comprehensive index of the resource utilization variance and the forwarding delay.
The present disclosure also provides a deterministic network routing and queue scheduling apparatus, comprising:
model building module for creating an agent A that calculates a forwarding path r And an agent A for calculating the forwarding period at each node along the path c The method comprises the steps of carrying out a first treatment on the surface of the Agent A r And agent A c Shared rewardsThe method comprises the steps of carrying out a first treatment on the surface of the Agent A in a multi-intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network;
the scheduling module is used for taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximum expected rewards as an optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the period offset information at each node along the path.
The present disclosure also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the deterministic network routing and queue scheduling method described above when executing the program.
The present disclosure also provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the deterministic network routing and queue scheduling method described above.
From the above, it can be seen that the deterministic network routing and queue scheduling method and apparatus provided by the present disclosure can adapt to the dynamic nature of the environment, without needing to manually build a complex static model, and can adjust the scheduling policy in real time to adapt to the new environment.
Drawings
In order to more clearly illustrate the technical solutions of the present disclosure or related art, the drawings required for the embodiments or related art description will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a schematic diagram of a deterministic network routing and queue scheduling method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a system architecture of an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a deterministic MAPPO-based network flow routing and queue scheduling algorithm architecture in accordance with an embodiment of the present disclosure;
FIG. 4 is a flow chart of a MAPPO-based deterministic network flow routing and queue scheduling algorithm according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a deterministic network routing and queue scheduling apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items.
The invention relates to the technical field of network communication and machine learning, in particular to a deterministic network routing and queue scheduling method based on deep reinforcement learning. The invention aims at a deterministic network service scene, carries out routing and queue decision based on real-time network state and deterministic demand, adopts a multi-agent deep reinforcement learning algorithm, and transmits deterministic network flow to a data layer through an optimal path and an optimal queue scheduling scheme to guide forwarding, thereby realizing the purpose of ensuring deterministic service and maximizing the utilization rate of network resources.
Deterministic networks place more stringent demands on network performance, such as finite latency and jitter, and conventional statistical probability-based approaches to optimize average performance will cause significant losses in this scenario. The invention provides a deterministic network routing and queue joint scheduling method based on deep reinforcement learning, which dynamically utilizes the queuing and scheduling functions of a second layer on a third layer by using a periodic queue forwarding function, combines the decision making capability of a deep reinforcement learning technology, and uses multiple agents to comprehensively consider routing and scheduling so as to solve the problem of deterministic transmission of a three-layer deterministic network and ensure that the loadable quantity of the network to deterministic network flows is furthest improved. The method relates to the technical field of network communication and machine learning, and comprises the steps of designing a joint scheduling algorithm of a route and a queue based on SDN; constructing a depth reinforcement model under a deterministic network background, including the design of network states, action spaces and rewards; and designing a training scheme of the decision model.
A software defined network (Software Defined Network, SDN) is a new network architecture. The basic idea is to thoroughly decouple the control plane and the data plane of the traditional distributed network, and control the whole distributed data plane by using a logic centralized controller, thereby realizing centralized network management and configuration, improving the efficiency of network management and reducing the complexity of network configuration. When the controller receives a deterministic service request of a user, the characteristic information of the deterministic flow is analyzed, explicit path and resource reservation information is calculated according to network topology and state information and deterministic network capability, and if successful allocation is possible, the service request is responded. In combination with the SDN technology, deterministic networks may become more flexible and agile in guaranteeing deterministic services.
With the development of the internet and communication networks, applications with deterministic demands for network services, such as industrial control, internet of vehicles, smart grids, etc., are emerging in many emerging business fields. Meeting such deterministic service demands has become a key driving force for network technology development. However, conventional internet protocol (Internet Protocol, IP) networks offer only best effort services, even if there are some quality of service policies such as differentiated services (Diffserv) and congestion control, but due to the micro-burstiness in the network, these mechanisms can only provide optimization based on probabilistic statistics average performance and cannot meet deterministic service requirements such as zero packet loss rate, bounded delay and jitter, etc.
To meet the deterministic service requirements of such applications, the internet engineering task force (Internet Engineering Task Force, IETF) establishes deterministic networks (Deterministic Detwork, detNet) working groups that optimize the link layer and the network layer of ethernet, respectively, improving its support capability for time-sensitive streaming. The Ethernet L3 layer is optimized mainly in the aspects of dynamic network configuration, resource arrangement, path planning, route forwarding, multipath forwarding and the like.
To achieve deterministic forwarding services in the network three layers, the IETF deterministic network group sets forth draft standards for cycle specific queuing and forwarding (Cycle Specified Queuing and Forwarding, CSQF) mechanisms. It is an evolution of circular queue forwarding (Cycle Queuing and Forwarding, CQF) that increases the likelihood of using more queues to achieve loose synchronization and advanced scheduling between nodes compared to CQF. The CSQF operates at a third level, which allows flexible routing and scheduling of data packets using Segmented Routing (SR). The SR label stack is used to specify which port (route) and which queue (schedule) each intermediate node should transmit on after receiving and processing the data packet.
Deep reinforcement learning (Deep Reinforcement Learning, DRL) is a sub-field in machine learning, combining reinforcement learning (Reinforcement Learning, RL) and deep neural networks (Deep Neural Network, DNN). Reinforcement learning is the ability to automatically learn the optimal actions (i.e., policies) that should be taken in different states by agents constantly interacting with the environment to maximize winning rewards. The deep reinforcement learning brings the deep neural network into the solution, and the strong representation capacity of DNN can be fully fitted with the optimal strategy, so that the method can be well adapted to complex environments.
Multi-agent deep reinforcement learning (Multi-Agent Deep Reinforcement Learning, MADRL) uses the ideas and algorithms of deep reinforcement learning in learning and control of Multi-agent systems. The policy of each agent in a multi-agent system is not only dependent on its own policy and environmental feedback, but is also affected by the behavior and cooperative relationships of other agents.
In meeting the low-latency requirements of deterministic networks, most of the existing work adopts a scheme of optimizing the gating list (IEEE 802.1 Qbv) of output ports in the time-sensitive network (Time Sensitive Network, TSN) second-layer network for traffic scheduling. There is also some effort in deterministic network aspects of the third layer to optimize the scheduling scheme from various angles to achieve deterministic network services. However, none of these works consider that, in addition to scheduling of queues, the selection of routes has a great impact on optimizing the improvement of network performance and ignores the dynamic traffic characteristics of large scale deterministic networks. The solution based on the optimization model proposed in these works is not extensible and heuristic algorithms may lead to local optima and thus not efficient optimizations. In the solution scheme using the deep reinforcement learning model, the deterministic service quality is improved by selecting the next hop at each hop, but the optimization effect is limited by adopting the traditional queue management scheme; the solution scheme based on the optimization model has the problems that the solution speed is low, and the heuristic algorithm may not achieve better network performance. In addition, when the conventional optimization model-based scheme is adopted, the generation of new traffic demands will cause the configuration scheme of all the flows already deployed in the network to be canceled, and at this time, a new configuration scheme needs to be calculated on the reconstructed traffic matrix, and if more nodes are involved, the resulting interaction will increase the delay.
The technical problem to be solved by the invention is to invent a deterministic network routing and queue joint scheduling method based on deep reinforcement learning, which realizes deterministic data transmission in a three-layer IP network, namely bounded jitter and end-to-end delay. The optimization scheme of single route or queue scheduling adopted in the existing solution limits the performance boundary of network optimization, even if the scheme combining the route and the queue is based on a specific optimization model modeling, the method is difficult to adapt to a dynamic network environment, the cost of information collection and centralized calculation is large, extra delay is caused, dynamic traffic cannot be timely dealt with, the adopted heuristic algorithm is easy to fall into local optimum, and global optimum is difficult to ensure.
Based on the information, the decision scheme based on deep reinforcement learning adopted by the method can be more suitable for a dynamic network environment, can timely cope with dynamic traffic, expands the boundary for exploring an optimal scheme by a mechanism of joint scheduling of a route and a queue, can collect network global view and demand information of deterministic network flow in the network architecture based on SDN, inputs the demand information into a deep reinforcement learning intelligent agent, calculates according to state information to obtain a route and queue scheduling scheme, and sends the route and queue scheduling scheme to a forwarding plane in a form of SID label stack by the controller, so that the time when a data packet is forwarded (route) can be controlled, and the deterministic service target can be achieved.
Fig. 1 is a schematic diagram of a deterministic network routing and queue scheduling method according to an embodiment of the present disclosure, as shown in fig. 1, where an execution body of the deterministic network routing and queue scheduling method may be an electronic device, for example, a computer, etc., and the method includes:
step 101, creating an agent A that calculates a forwarding path r And an agent A for calculating the forwarding period at each node along the path c The method comprises the steps of carrying out a first treatment on the surface of the Agent A r And agent A c Shared rewardsAgent A in a multi-intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network;
step 102, taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximization of expected rewards as an optimization target, continuously updating the network, and selecting an optimal route and an optimal forwarding queue to specify the forwarding path of the deterministic flow and the period offset information at each node along the path.
Specifically, fig. 2 is a schematic diagram of a system architecture according to an embodiment of the present disclosure, as shown in fig. 2, in the present disclosure, a deterministic network scenario is oriented, a periodic forwarding queue function is used to divide a queue scheduling period, a deep reinforcement learning model is adopted, a global network state is used as an input of an evaluation network, a state value is used as an output of the evaluation network, an algorithm is used as an optimization target to maximize a desired reward, and the network is continuously updated. In order to perform joint scheduling of routes and queues for deterministic services using multi-agent deep reinforcement learning, the forwarding paths of deterministic flows and the periodic offset information at nodes along the way are specified by selecting optimal routes and optimal forwarding queues. The algorithm creates two agents, one agent calculates a forwarding path and one agent calculates a forwarding period, and according to the completed multi-agent deep reinforcement learning framework, the agents share rewards, and the rewards value is set as the comprehensive index of the resource utilization variance and the forwarding delay. When a deterministic network flow is scheduled, if each link and a selected period in the scheme obtained by calculation of the agent have enough capacity and meet the time delay requirement, the requirement can be successfully distributed, and a policy issuing module of the controller generates a SID label stack to issue to a data layer to guide forwarding.
The invention characterizes deterministic flow<src,dst,period,delay,bw>Quintuple representation, information describing deterministic flows, respectively: source and destination ports, period, delay upper bound, and bandwidth. Within a CSQF enabled device, let each port retain N nd (N nd =3) queues to deterministic flows, dividing the scheduling period C with 10 μs as the basic period at all nodes of the whole network, i.e. dividing the resources of each queue at each node into C periods, where the maximum jitter end-to-end is 20 μs, meeting the standard ultra-low delay requirement. Without loss of generality, assume that the cycles of the entire network start at the same time and that the per port/link supersyclesThe length C is the same.
The invention abstracts the network into an undirected graphWherein (1)>And epsilon is the point set and edge set of the network c Epsilon is the set of link resources c The information of each link in the network comprises the residual bandwidth of the link, CSQF queue occupation condition, and epsilon is the set of communication links among devices. To->Representing a set of deterministic network flows, each service consisting of < src, dst, period, delay, bw>Quintuple representation, information describing deterministic flows, respectively: source and destination ports, period, delay upper bound, and bandwidth. At p f And r f Representing the forwarding path and forwarding period offset that the algorithm ultimately selects for data flow F e F.
FIG. 3 is a schematic diagram of a deterministic MAPPO-based network flow routing and queue scheduling algorithm according to an embodiment of the present disclosure, as shown in FIG. 3, agent A r And agent A c Shared rewardsAgent A in a multi-intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network.
For the intelligent agent in the algorithm, under a certain environmental state, a certain action is sent out, the feedback of the environment, namely, rewarding, is obtained, the environmental state is changed, and under a new state, the intelligent agent continues to send out the action, obtain the feedback and continuously interact with the environment. Representing the collection of agents by A, by A r Agent representing proxy computed route, denoted A c Representing agents that calculate the forwarding period.
For the followingAgent A r The formula for the state is described as follows:
wherein,representing the status of the network link,/-, and>LU represents the link utilization of the edge and D represents the end-to-end delay of the edge.
The formula for the action set is described as follows:
the formula of the action is described as follows:
a r =p f ,p f ∈P f
for agent A c The formula for the state is described as follows:
the formula for the action set is described as follows:
the formula of the action is described as follows:
a c =r f ,r f ∈R f
agent sharing rewardsFor the comprehensive index of the resource utilization variance and the forwarding delay, the formula is described as follows:
wherein std (LU) represents the standard deviation of link utilization, f bw For deterministic network flows f, i.e. the bandwidth allocated to the service, D f Forwarding path p selected for deterministic service flow f for agent f And a forwarding period offset r f A subsequent propagation delay comprising two parts, (i) a sum of propagation delays of links between nodes(ii) Sum of period offsets of intermediate nodes +.> Representing the periodic offset of deterministic network flow f on edge e. f (f) delay For the upper end-to-end delay limit required by the deterministic network flow f, data can be transmitted only under the condition that the condition is guaranteed to be met, otherwise, the service request is refused, and alpha, beta and gamma are weight parameters.
Optionally, taking the global network state as an input of the evaluation network, taking the state value as an output of the evaluation network, taking the maximization of the expected rewards as an optimization target, and continuously updating the network, including:
initializing a network environment;
respectively inputting network states into agent A r And agent A c Obtaining a combined policy Action corresponding to the Actor network;
executing a strategy, obtaining the running state and global rewards at the next moment, and storing the front and back running states, the strategy and the rewards into a buffer;
experience is obtained from the buffer and the Critic network and the Actor network are updated, respectively, when the training period is reached.
Optionally, the method further comprises:
in case the training period is not reached, the network state is again entered into agent a r And agent A c And obtaining the joint policy Action corresponding to the Actor network.
Optionally, the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and initializing the network environment again in case that the iteration number does not reach the maximum value.
Optionally, the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and under the condition that the iteration number reaches the maximum value, if the model converges, outputting an optimal route and an optimal forwarding queue.
Optionally, the method further comprises:
determining training round number and updating period, and initializing iteration variable
Fig. 4 is a schematic flow chart of a deterministic network flow routing and queue scheduling algorithm based on MAPPO according to an embodiment of the disclosure, and as shown in fig. 4, the deterministic network flow routing and queue scheduling algorithm based on MAPPO specifically includes the following steps:
1. first, a topology model of the network and computing node information are created, m points exist in the topology, for example, m is more than or equal to 30, N computing nodes are included, for example, N is more than or equal to 8, and each port is reserved with N nd (N nd =3) queues to deterministic flows, a scheduling period C of 10 μs base period is divided at all nodes of the whole network, i.e. the resources of each queue at each node are divided into C periods, all nodes of the whole network start periods simultaneously. The link bandwidth in the topology is set to be a unified value x MB/s, and x is more than or equal to 40.
2. Initializing a variable, setting the initial iteration number i=0, and setting the maximum iteration number i_max, for example, i_max is larger than or equal to 1000000, and i_max is set based on actual requirements. An empirical playback pool is set to a length of n, e.g., n > =5000.
3. Creating 2 agent object instances, one agent A r Representative calculationRouting path, one agent A c Representing the calculation forwarding period, each agent corresponds to an Actor network in the MAPPO model, 2 agents share a Critic network, and three layers of fully-connected neural networks are adopted to randomly initialize network parameters.
4. The deterministic network flows are randomly generated, and the information of each deterministic network flow f is characterized by five tuples < src, dst, period, delay and bw > five tuples respectively: source destination port, period, delay upper bound and bandwidth. The source node and destination node are created by equally probability selecting 2 values from the vertex set, the packet length being 100-1500B. Typically the streaming period is randomly extracted from the set {1,2,4,8} ms, the upper bound of the delay obeys a normal distribution of 50ms, the minimum value of 20ms and the maximum value of 50 ms; the bandwidth of the stream obeys a normal distribution of minimum 5MB/s and maximum 20 MB/s.
5. The initial State is set as a combination of network topology, deterministic network flow requirements, link resources, and network link states
6. The iteration is started, increasing the value of i by 1. Each agent generates an action. Agent a for proxy-computed routing r From action set P over an Actor network f Selecting the generating action array a with the largest Q value r =[p f,k1 ,p f,k2 ,…,p f,|k| ],p f,ki The path link numbers 1 to m indicating the network flow f is allocated, and the array size is the path length. Agent a for agent to calculate forwarding period c Actor network action set R of (1) f Selecting the generating action array a with the largest Q value r =[r f,k1 ,r f,k2 ,...,r f,|k| ],r f,k The forwarding period offset at the path link number to which the network flow f is assigned is represented, and the array size is the path length.
7. Will agent A r Calculated forwarding path, agent a c The given forwarding period is recombined into a joint Action action= (a) r ,a c ) Can be determined as to where p the deterministic network flow is f When r f Forwarding is applied to the network as a path and queue scheduling scheme.
8. Generating corresponding deterministic network flow in the network, obtaining the link utilization rate, transmission delay and residence delay at each node in the network, thereby calculating and calculating the variance of the link utilization rate and the total delay D f Derived from the weight index of both specifiedI.e. rewards (Reward).
9. Setting State' as a combination of network topology, deterministic network flow requirements, link resources, and network link statesState, action= (a) r ,a c ) Reward, state' is stored in the experience playback pool for model iterative learning reference.
10. Repeating the steps 4 to 9 until the experience playback pool is full.
11. Each agent plays back the pool and updates the tactics according to experience, the update process is as follows:
12. inputting the latest State 'obtained in step 11 into a Critic network to obtain a v' value of the State, and calculating a discount prize R [ t ]]=r[t[+δ 1 *r[t+1]+…+δ T-t *r[t_]Obtaining R= [ R0 ]],R[1],…,R[t],…R[t_]]Where t_is the last time step and delta is the discount factor.
13. Inputting all states in the experience playback pool into a Critic network to obtain V_values of all states, and calculating a dominance function
14. A loss of Critic network (loss) function is calculated, and the Critic network is updated in a back propagation manner, and the expression of the loss function of the Critic network is as follows:
15. for the Actor network of each agent, all stored State combinations are input into the Actor-old and Actor-new networks (the network structure is the same) to obtain Normal distribution Normal1 and Normal2 respectively, all stored Actions are combined into Actions and input into the Normal distribution Normal1 and Normal2 to obtain prob1 and prob2 corresponding to each Action, and then the importance weight ratio is obtained by dividing prob2 by prob 1.
16. For each agent's Actor network, calculate the loss function of the Actor network, back-propagate and update the Actor-new network, the expression of the loss function of the Actor network is as follows:
the ratio is the importance weight obtained in step 15, e is the learning rate, and clip (ratio, 1-e, 1 +. Epsilon.) indicates that the ratio beyond the range of (1-e, 1 +. Epsilon.) is cut.
17. Steps 15 and 16 are repeated, for example, 10 times, and the Actor-old network is updated with the parameters of the Actor-new network.
18. Judging whether the iteration number i exceeds the maximum iteration number i_max, if not, returning to the step 4, continuing iteration, and if so, ending the algorithm, wherein the algorithm can output an optimal route and an optimal forwarding queue according to the input state.
The invention is oriented to a deterministic network, uses a deterministic network flow routing and queue scheduling algorithm based on MAPPO to realize the forwarding of the deterministic network flow, one agent is responsible for calculating a forwarding path, one agent is responsible for calculating the forwarding period at each node along the path, and the two agents share rewards of joint actions, and after repeated iterative learning, the risk of the overall user experience reduction caused by overlarge load of the computing node or network congestion can be avoided while deterministic service is ensured. The deterministic network flow scheduling algorithm designed by the invention can adapt to the dynamic nature of the environment, does not need to manually establish a complex static model, and can adjust the scheduling strategy in real time to adapt to the new environment.
It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.
It should be noted that the foregoing describes some embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Based on the same inventive concept, the present disclosure also provides a deterministic network routing and queue scheduling apparatus corresponding to the method of any embodiment described above.
Fig. 5 is a schematic diagram of a deterministic network routing and queue scheduling apparatus according to an embodiment of the present disclosure, as shown in fig. 5, including a model building module 501 and a scheduling module 502, wherein:
model building block 501 is used to create an agent a that calculates a forwarding path r And an agent A for calculating the forwarding period at each node along the path c The method comprises the steps of carrying out a first treatment on the surface of the Agent A r And agent A c Shared rewardsAgent A in a multi-intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network;
the scheduling module 502 is configured to take the global network state as an input of the evaluation network, the state value as an output of the evaluation network, and the maximization of the expected rewards as an optimization target, continuously update the network, and select an optimal route and an optimal forwarding queue to specify a forwarding path of the deterministic flow and cycle offset information at each node along the path.
Optionally, taking the global network state as an input of the evaluation network, taking the state value as an output of the evaluation network, taking the maximization of the expected rewards as an optimization target, and continuously updating the network, including:
initializing a network environment;
respectively inputting network states into agent A r And agent A c Obtaining a combined policy Action corresponding to the Actor network;
executing a strategy, obtaining the running state and global rewards at the next moment, and storing the front and back running states, the strategy and the rewards into a buffer;
experience is obtained from the buffer and the Critic network and the Actor network are updated, respectively, when the training period is reached.
Optionally, the method further comprises:
in case the training period is not reached, the network state is again entered into agent a r And agent A c And obtaining the joint policy Action corresponding to the Actor network.
Optionally, the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and initializing the network environment again in case that the iteration number does not reach the maximum value.
Optionally, the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and under the condition that the iteration number reaches the maximum value, if the model converges, outputting an optimal route and an optimal forwarding queue.
Optionally, the method further comprises:
the training round number and the update period are determined, and the iteration variable is initialized.
Optionally, the rewardsIs the comprehensive index of the resource utilization variance and the forwarding delay.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of the various modules may be implemented in the same one or more pieces of software and/or hardware when implementing the present disclosure.
The device of the foregoing embodiment is configured to implement the corresponding deterministic network routing and queue scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein.
Based on the same inventive concept, the present disclosure also provides an electronic device corresponding to the method of any embodiment, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor implements the deterministic network routing and queue scheduling method according to any embodiment.
Fig. 6 shows a more specific hardware architecture of an electronic device according to this embodiment, where the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The memory 1020 may be implemented in the form of ROM (read only memory), RAM (random access memory), a static storage device, a dynamic storage device, or the like. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The electronic device of the foregoing embodiment is configured to implement the corresponding deterministic network routing and queue scheduling method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, corresponding to any of the above embodiments of the method, the present disclosure further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the deterministic network routing and queue scheduling method as described in any of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the foregoing embodiments stores computer instructions for causing the computer to perform the deterministic network routing and queue scheduling method according to any one of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present disclosure, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in details for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present disclosure. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this also accounts for the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present disclosure are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the disclosure, are intended to be included within the scope of the disclosure.

Claims (8)

1. A deterministic network routing and queue scheduling method, comprising:
agent a creating a computed forwarding path r And an agent A for calculating the forwarding period at each node along the path c The method comprises the steps of carrying out a first treatment on the surface of the Agent A r And agent A c Shared rewardsAgent A in a multi-intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network; wherein, rewarding->The comprehensive index of the resource utilization variance and the forwarding delay is obtained;
taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximized expected rewards as the optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the periodic offset information at each node along the path;
wherein the continuously updating the network with the global network state as the input of the evaluation network and the state value as the output of the evaluation network with the maximum expected rewards as the optimization target further comprises:
initializing a network environment;
respectively inputting network states into agent A r And agent A c Obtaining a combined policy Action corresponding to the Actor network;
executing a strategy, obtaining the running state and global rewards at the next moment, and storing the front and back running states, the strategy and the rewards into a buffer;
experience is obtained from the buffer and the Critic network and the Actor network are updated, respectively, when the training period is reached.
2. The deterministic network routing and queue scheduling method according to claim 1, wherein the method further comprises:
in case the training period is not reached, the network state is again entered into agent a r And agent A c And obtaining the joint policy Action corresponding to the Actor network.
3. The deterministic network routing and queue scheduling method according to claim 1, wherein the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and initializing the network environment again in case that the iteration number does not reach the maximum value.
4. The deterministic network routing and queue scheduling method according to claim 1, wherein the method further comprises:
judging whether the iteration number reaches the maximum value or not;
and under the condition that the iteration number reaches the maximum value, if the model converges, outputting an optimal route and an optimal forwarding queue.
5. The deterministic network routing and queue scheduling method according to claim 1, wherein the method further comprises:
the training round number and the update period are determined, and the iteration variable is initialized.
6. A deterministic network routing and queue scheduling apparatus, comprising:
model building module for creating an agent A that calculates a forwarding path r And an agent A for calculating the forwarding period at each node along the path c The method comprises the steps of carrying out a first treatment on the surface of the Agent A r And agent A c Shared rewardsAgent A in a multi-intelligent near-end policy optimization MAPPO model r And agent A c Corresponding to an Actor network, agent A r And agent A c Sharing a Critic network; wherein, rewarding->The comprehensive index of the resource utilization variance and the forwarding delay is obtained;
the scheduling module is used for taking the global network state as the input of the evaluation network, taking the state value as the output of the evaluation network, taking the maximized expected rewards as an optimization target, continuously updating the network, and selecting the optimal route and the optimal forwarding queue to specify the forwarding path of the deterministic flow and the periodic offset information at each node along the path;
the scheduling module is also used for initializing a network environment; respectively inputting network states into agent A r And agent A c Obtaining a combined policy Action corresponding to the Actor network; executing a strategy, obtaining the running state and global rewards at the next moment, and storing the front and back running states, the strategy and the rewards into a buffer; experience is obtained from the buffer and the Critic network and the Actor network are updated, respectively, when the training period is reached.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the deterministic network routing and queue scheduling method according to any one of claims 1 to 5 when the program is executed by the processor.
8. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the deterministic network routing and queuing scheduling method of any one of claims 1 to 5.
CN202210822548.9A 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device Active CN115484205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210822548.9A CN115484205B (en) 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210822548.9A CN115484205B (en) 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device

Publications (2)

Publication Number Publication Date
CN115484205A CN115484205A (en) 2022-12-16
CN115484205B true CN115484205B (en) 2023-12-01

Family

ID=84422719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210822548.9A Active CN115484205B (en) 2022-07-12 2022-07-12 Deterministic network routing and queue scheduling method and device

Country Status (1)

Country Link
CN (1) CN115484205B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499491B (en) * 2023-12-27 2024-03-26 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning
CN118018382B (en) * 2024-04-09 2024-06-21 南京航空航天大学 Collaborative management method for distributed deterministic controllers in large-scale wide-area open network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765096A (en) * 2018-06-07 2018-11-06 冯瑞新 Shared Toilet system under a kind of cloud network
CN112437020A (en) * 2020-10-30 2021-03-02 天津大学 Data center network load balancing method based on deep reinforcement learning
AU2021101685A4 (en) * 2021-04-01 2021-05-20 Arun Singh Chouhan Design and development of real time automated routing algorithm for computer networks
CN113328938A (en) * 2021-05-25 2021-08-31 电子科技大学 Network autonomous intelligent management and control method based on deep reinforcement learning
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113791634A (en) * 2021-08-22 2021-12-14 西北工业大学 Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN114286413A (en) * 2021-11-02 2022-04-05 北京邮电大学 TSN network combined routing and stream distribution method and related equipment
CN114499648A (en) * 2022-03-10 2022-05-13 南京理工大学 Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9628504B2 (en) * 2015-03-09 2017-04-18 International Business Machines Corporation Deploying a security appliance system in a high availability environment without extra network burden
US10681128B2 (en) * 2016-10-12 2020-06-09 Cisco Technology, Inc. Deterministic stream synchronization
CN111694365B (en) * 2020-07-01 2021-04-20 武汉理工大学 Unmanned ship formation path tracking method based on deep reinforcement learning
CN114745317B (en) * 2022-02-09 2023-02-07 北京邮电大学 Computing task scheduling method facing computing power network and related equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765096A (en) * 2018-06-07 2018-11-06 冯瑞新 Shared Toilet system under a kind of cloud network
CN112437020A (en) * 2020-10-30 2021-03-02 天津大学 Data center network load balancing method based on deep reinforcement learning
AU2021101685A4 (en) * 2021-04-01 2021-05-20 Arun Singh Chouhan Design and development of real time automated routing algorithm for computer networks
CN113328938A (en) * 2021-05-25 2021-08-31 电子科技大学 Network autonomous intelligent management and control method based on deep reinforcement learning
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113791634A (en) * 2021-08-22 2021-12-14 西北工业大学 Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN114286413A (en) * 2021-11-02 2022-04-05 北京邮电大学 TSN network combined routing and stream distribution method and related equipment
CN114499648A (en) * 2022-03-10 2022-05-13 南京理工大学 Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度强化学习的电力通信网路由策略;朱小琴;袁晖;王维洲;魏峰;张驯;赵金雄;;科学技术创新(36);全文 *
多智能体博弈学习研究进展;罗俊仁;《系统工程与电子技术》;全文 *

Also Published As

Publication number Publication date
CN115484205A (en) 2022-12-16

Similar Documents

Publication Publication Date Title
Shu et al. Multi-user offloading for edge computing networks: A dependency-aware and latency-optimal approach
CN115484205B (en) Deterministic network routing and queue scheduling method and device
CN114286413B (en) TSN network joint routing and stream distribution method and related equipment
Shu et al. Dependency-aware and latency-optimal computation offloading for multi-user edge computing networks
Kim et al. Multi-agent reinforcement learning-based resource management for end-to-end network slicing
Wu et al. Multi-agent DRL for joint completion delay and energy consumption with queuing theory in MEC-based IIoT
CN113064671A (en) Multi-agent-based edge cloud extensible task unloading method
Jamil et al. IRATS: A DRL-based intelligent priority and deadline-aware online resource allocation and task scheduling algorithm in a vehicular fog network
EP4024212B1 (en) Method for scheduling inference workloads on edge network resources
Beraldi et al. Sequential randomization load balancing for fog computing
Zheng et al. Learning based task offloading in digital twin empowered internet of vehicles
CN116541106A (en) Computing task unloading method, computing device and storage medium
Hu et al. Dynamic task offloading in MEC-enabled IoT networks: A hybrid DDPG-D3QN approach
Zhang et al. Employ AI to improve AI services: Q-learning based holistic traffic control for distributed co-inference in deep learning
Chen et al. Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT
De Mendoza et al. Near optimal VNF placement in edge-enabled 6G networks
Aktas et al. Scheduling and flexible control of bandwidth and in-transit services for end-to-end application workflows
Liu et al. Deep reinforcement learning based adaptive transmission control in vehicular networks
Bensalem et al. Towards optimal serverless function scaling in edge computing network
CN116582407A (en) Containerized micro-service arrangement system and method based on deep reinforcement learning
CN115225512A (en) Multi-domain service chain active reconstruction mechanism based on node load prediction
Belkout et al. A load balancing and routing strategy in fog computing using deep reinforcement learning
Mukherjee et al. Timed loops for distributed storage in wireless networks
CN116418808A (en) Combined computing unloading and resource allocation method and device for MEC
CN115150335A (en) Optimal flow segmentation method and system based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant