CN113992595B - SDN data center congestion control method based on priority experience playback DQN - Google Patents
SDN data center congestion control method based on priority experience playback DQN Download PDFInfo
- Publication number
- CN113992595B CN113992595B CN202111348335.9A CN202111348335A CN113992595B CN 113992595 B CN113992595 B CN 113992595B CN 202111348335 A CN202111348335 A CN 202111348335A CN 113992595 B CN113992595 B CN 113992595B
- Authority
- CN
- China
- Prior art keywords
- network
- dqn
- congestion control
- data center
- sdn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2425—Traffic characterised by specific attributes, e.g. priority or QoS for supporting services specification, e.g. SLA
- H04L47/2433—Allocation of priorities to traffic types
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/25—Flow control; Congestion control with rate being modified by the source upon detecting a change of network conditions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a congestion control method of an SDN data center network based on priority experience playback DQN. The method is based on the network background of SDN and the congestion control thought of flow, and has the characteristics of intellectualization, centralization and initiative. The congestion control algorithm based on the priority experience playback DQN is introduced and improved, and an intelligent solution is provided for the congestion control problem of the SDN data center network. The invention distributes the rate globally to the flow of the whole network through the controller, so that the whole network can avoid congestion, and the utilization rate of the data link of the network can be as high as possible, thereby realizing the congestion control of the whole data center. Compared with a Q-learning-based method, the method solves the dimension disaster of the Q table; compared with the method based on DQN and DDQN, the method has better convergence speed and convergence effect. The invention relates to a congestion control method adapting to SDN data center network, which is in compliance with network intelligent development trend.
Description
Technical Field
The invention relates to the technical field of Network communication, in particular to a congestion control method of SDN (Software Defined Network ) data center Network (Data Center Network, DCN) based on preferential experience playback (Prioritized Experience Replay) DQN (Deep Q-learning Network).
Background
SDN architecture has been widely adopted by data center networks as a future network architecture. With the development of big data and cloud computing, the number of nodes and the number of flows in an SDN data center network are increasing, and the data center is at risk of network congestion. Since DQN requires experience in exploration, parameters of the neural network are updated by empirical playback, but the random uniform sampling strategy employed is not an optimal strategy in certain situations. When DQN is introduced into congestion control, since the actions to be taken per allocation are selected in many ways, it is difficult to obtain experience samples with high link utilization rate and neither congestion during random exploration, which makes training of neural networks difficult. For this reason, there is a need to design a new SDN data center network congestion control method to solve such a problem.
Disclosure of Invention
The invention aims to intelligently solve the congestion control problem of a data center network based on an SDN architecture, and provides a congestion control method of the SDN data center network based on priority experience playback DQN.
The invention concept is to introduce and improve a congestion control algorithm based on priority experience playback DQN, and provide an intelligent solution to the congestion control problem of SDN data center networks. The invention distributes the rate globally to the flow of the whole network through the controller, so that the whole network can avoid congestion, and the utilization rate of the data link of the network can be as high as possible, thereby realizing the congestion control of the whole data center.
The aim of the invention is realized by the following technical scheme:
an SDN data center congestion control method based on preferential experience playback DQN, comprising the steps of:
s1, deploying congestion control intelligent agents based on a depth Q network in an SDN controller, and introducing a priority experience playback DQN algorithm into a data center based on a software defined network;
s2, training the deep Q network, wherein the training process comprises S21-S24:
s21, setting the input of the deep Q network as link state information and stream state information, and outputting the input as Q values corresponding to different actions, wherein the different actions represent that different speeds are allocated to streams, and the reward function is a comprehensive function for balancing the link utilization rate and the link congestion condition;
s22, randomly constructing any initial link state and any group of flow and rate requirements to realize scene construction;
s23, constructing SumPree for storing experience, and marking the priority of the experience;
s24, selecting experience from SumPree according to the priority, and replaying DQN (differential motion vector N) improvement algorithm according to the priority to train a deep Q network, so that the SDN controller can maximize the utilization rate of a data link through the deep Q network under the condition that congestion of a data center is not guaranteed; the method comprises the steps that on the basis of a priority experience playback DQN algorithm, judgment on whether a link is congested is added before each step of each scene of network training is finished, if so, the scene is directly finished, and if not, the next step is continued; the scene is expressed to be a whole group of stream allocation rates, and each step in the scene is expressed as a stream allocation rate;
s3, the SDN controller collects link state information and flow state information of rates to be distributed from an SDN data plane in real time, inputs the link state information and the flow state information of the rates to be distributed into a trained deep Q network, determines optimal actions according to each flow Q value and generates a flow rate distribution scheme, and therefore overall congestion control is conducted on an SDN data center network.
Preferably, the SDN controller is connected to network devices of an SDN data plane through a southbound interface to implement centralized control.
Preferably, the reward function is:
wherein: reward m Represents a prize value, min represents a minimum value operation, lkCap m Indicating the utilization of the link,
preferably, the preferential empirical playback DQN algorithm is a DQN algorithm that replaces the empirical playback mechanism with preferential empirical playback.
Preferably, the priority mark is an empirical importance mark determined according to TD-error, which refers to the absolute value of the difference between the Q value of the current experience and the target Q value in the time sequence difference.
Preferably, in S3, the method for performing global congestion control by the SDN controller includes the following steps:
s31, acquiring the rate requirements and the route information of N flows to be distributed currently from an SDN data plane, and simultaneously acquiring the link states, namely the link bandwidth occupation condition, of the current SDN data center network;
s32, selecting one stream from N streams to be distributed currently, inputting the stream information and the current link state into a depth Q network after S2 training, and selecting optimal action execution according to the output of the depth Q network;
s33, updating the current link state, and simultaneously recording the mapping information of the current flow and the allocation rate.
S34, judging whether all the N streams are distributed, if not, returning to the continuous circulation S32 and S33 until the rates are distributed for all the streams; if the allocation is completed, executing S35;
s35, outputting a flow rate distribution mapping table of N flows, and using the mapping table as the distribution rate of each flow by the SDN controller.
The beneficial effects of the invention are as follows: the invention provides an intelligent solution method for the congestion control problem of an SDN data center based on priority experience playback DQN, and the congestion control can be performed intensively, actively and intelligently according to the load change of a network link of the data center. The invention overcomes the defect of weak multidimensional sensing capability of reinforcement learning and solves the dimension disaster of the Q table; meanwhile, by introducing a preferential experience playback mechanism, the method has better convergence speed and convergence effect compared with the traditional DQN algorithm. The method distributes the rate globally to the flow of the whole network through the controller, so that the whole network can avoid congestion, and the utilization rate of the data link of the network can be as high as possible, thereby realizing the congestion control of the whole data center.
Drawings
Fig. 1 is a diagram of a congestion control system architecture in an embodiment.
Fig. 2 is a data center network topology as employed by an embodiment.
Fig. 3 is a flowchart of a training algorithm.
Fig. 4 is a flow chart of a congestion control method.
Fig. 5 is a diagram of bandwidth variation of an embodiment.
Fig. 6 shows a graph of link utilization versus different algorithms for different numbers of streams (each set of three columns is shown from left to right as DQN, DDQN, PRIO, respectively).
Fig. 7 shows a graph of convergence speed versus various algorithms.
Detailed Description
The invention is further illustrated and described below with reference to the drawings and detailed description. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.
For ease of description, the basic definitions of the portions of the invention that follow will be explained.
The SDN data center refers to a data center which adopts a software defined network technology architecture.
In the invention, the problem of congestion control of the data center of the SDN refers to the problem of congestion control based on flows, namely, the overall allocation of the rates to all flows through the SDN controller, so that the rate requirements of the flows are met as much as possible, and the whole data center network is ensured not to generate congestion.
In the present invention, the preferential empirical playback DQN algorithm is a DQN algorithm based on preferential empirical playback, wherein the preferential empirical playback (prioritized experience reolay) is an improvement of the empirical playback mechanism in DQN algorithms, belonging to the prior art. Experience refers to the sample under test. The DQN algorithm belongs to the prior art and is a classical algorithm in deep reinforcement learning. The deep reinforcement learning is a learning method combining the advantages of the deep learning and the reinforcement learning to solve the problems of high-dimensional original input and decision control.
In a preferred embodiment of the present invention, there is provided an DQN-based SDN data center congestion control method, the method including the steps of:
step 1: congestion control agents based on deep Q networks are deployed in SDN controllers, introducing a preferential experience replay DQN algorithm into a software defined network based data center.
The SDN controller is a control decision part of the SDN network and is connected with network devices of an SDN data plane through a southbound interface, so as to centrally control rate allocation of flows in the whole network. The congestion control agent deployed in the SDN controller is used for flow rate distribution based on a depth Q network DQN, and the depth Q network in the invention executes a priority experience playback DQN algorithm to solve the congestion control problem of the SDN data center.
Step 2: the depth Q network is trained based on a preferential experience playback DQN improvement algorithm. The specific training process comprises the following steps:
and 2-1, determining the input and the output of the deep Q network. The input of the deep Q network is link state information and stream state information, and the output is Q values corresponding to different actions, which represent assigning different rates to the streams.
2-2, setting a reward function. Since the first objective of congestion control is to maximize the link utilization while keeping the link from becoming congested, the second objective is to maximize the link utilization, the reward function is a composite function that is set to take both the link utilization and the link congestion conditions into account. The form of the reward function is not limited, as long as it balances link utilization and link congestion conditions. In a subsequent example of the invention, a bonus function form may be provided as:
wherein: reward m Represents the prize value, min () represents the minimum value operation in brackets, lkCap m Indicating link utilization.
2-3, by improving the preferential empirical playback DQN algorithm, a preferential empirical playback DQN improvement algorithm is formed. The improvement is based on the prior experience playback DQN algorithm, and the specific improvement is that in the network training process of the algorithm, the judgment of whether the link is congested is added before each step of each scene is finished, if so, the scene is finished, the subsequent steps in the scene are not executed any more, and if not, the next step is continued. Wherein, by step, it is meant that a stream is assigned a rate; by scenario, it is meant that rate allocation is done for the entire set of streams, i.e. each stream in the set of streams has completed rate allocation.
This preferential empirical playback DQN improvement algorithm will subsequently be used to train the deep Q network.
2-4, constructing a scene: an arbitrary initial link state is randomly constructed, and an arbitrary set of flows and rate requirements are randomly constructed.
2-5. SumPree is constructed for storing experience and prioritizing experience. The SumTree herein refers to a data structure of a binary tree, and the priority flag is an empirical importance flag determined according to TD-error, which refers to an absolute value of a difference between a Q value of a current experience and a target Q value in a time series difference.
And 2-6, selecting experience from SumPree according to priority, playing back the DQN algorithm to train the deep Q network according to the improved priority experience, namely, increasing judgment on whether the link is congested before each step of each scene in the training process is finished, if so, directly finishing the scene, and if not, continuing to carry out the next step. Other training procedures for preferential empirical playback of the DQN algorithm are the same as the prior art.
Step 3: the SDN controller collects link state information and flow state information of rates to be distributed from an SDN data plane in real time, inputs the link state information and the flow state information of the rates to be distributed into a trained deep Q network, determines optimal actions according to each flow Q value and generates a flow rate distribution scheme, and therefore overall congestion control is conducted on an SDN data center network.
In this embodiment, the method for performing global congestion control by the SDN controller in step 3 includes the following steps:
3-1, acquiring the rate requirements and the route information of N current flows to be distributed from an SDN data plane, and simultaneously acquiring the link states, namely the link bandwidth occupation condition, of the current SDN data center network;
3-2, selecting one flow from N flows to be distributed currently, inputting the flow information and the current link state into a depth Q network after S2 training, and selecting optimal action execution according to the output of the depth Q network;
and 3-3, updating the current link state and simultaneously recording the mapping information of the current flow and the allocation rate.
3-4, judging whether all N streams are distributed completely: if not, returning to the step 3-2 to continue circulation until the rates are distributed for all the flows; if the distribution is finished, executing the step 3-5;
judging whether all N streams are distributed, if not, returning to the step 3-2 to continue to circularly execute the steps 3-2 and 3-3 until the rates are distributed for all the streams; if the distribution is finished, executing 3-5;
and 3-5, outputting a flow rate distribution mapping table of N flows, and using the mapping table to distribute the rates for all flows by the SDN controller. Therefore, the allocation rate can not only meet the rate requirements of all flows as best as possible, but also effectively avoid congestion so as to achieve the purpose of global congestion control of the data center.
The method provided by the invention has the core that an improved priority experience playback DQN algorithm is introduced into the SDN data center to solve the problem of congestion control, so that the SDN controller can perform congestion control intensively, actively and intelligently according to the load change of a network link of the data center, the defect of weak multidimensional perception of reinforcement learning is overcome, and the dimension disaster of a Q table is solved. Meanwhile, the invention has better convergence speed and convergence effect compared with the traditional DQN algorithm by introducing a priority experience playback mechanism. The method distributes the rate globally to the flow of the whole network through the controller, so that the whole network can avoid congestion, and the utilization rate of the data link of the network can be as high as possible, thereby realizing the congestion control of the whole data center. Specifically, the method has the following advantages:
1. intelligence. The intelligence is that the DQN is played back by leading in priority experience, so that the algorithm complexity is reduced compared with a congestion control method based on an optimization theory; compared with a congestion control method based on reinforcement learning Q-sparing, the method solves the dimension disaster of the Q table.
2. End-to-end congestion control concept. The end-to-end congestion control concept refers to congestion control by globally allocating the rate of end-to-end flows instead of considering congestion control enhancements at intermediate nodes such as routers.
3. Centralized network-arbitrated congestion control. The centralized network arbitrated congestion control means that the congestion control method is deployed on a controller of the SDN, fully utilizes the advantages of the centralized control of the SDN, and performs global end-to-end flow rate distribution according to the state of the whole network. Unlike traditional TCP-based congestion control which is distributed, each end system detects congestion through a TCP connection, and each develops congestion avoidance based on a slow start algorithm.
4. Granularity is flow. The granularity is that the object of congestion control is a flow to adapt to the next generation network architecture SDN flow-based features, whereas traditional TCP-based congestion control ideas are IP packet-based.
5. Active congestion control. The active congestion control means that the congestion control method of the invention is to collect the state of the whole network and the current flow and rate demands by the controller, and combine the two to carry out global rate allocation. The goal is to make the utilization of the whole network link as high as possible while avoiding congestion as much as possible. Thus featuring initiative. Whereas the TCP congestion control algorithm takes action after congestion is detected, as opposed to being passive.
6. A priority experience playback strategy is employed. The priority experience playback strategy is to use a proportional priority method to perform priority experience playback so as to obtain experience samples with neither congestion nor high link utilization. The traditional congestion control method based on DQN adopts a uniform sampling strategy, which is not beneficial to obtaining good experience samples.
Examples
In order to facilitate understanding and implementation of the present invention by those skilled in the art, technical effects of the method for controlling congestion of an SDN data center based on DQN in the above embodiment are shown, and the advantages of the present invention in practical application will be further described by applying the method in the above embodiment to a specific test application, with reference to the accompanying drawings and data.
The test application introduces the priority experience playback DQN algorithm into a data center based on a software defined network, and solves the congestion control problem in real time. Fig. 1 is a schematic diagram of a congestion control system, where an SDN controller is a control decision part of an SDN network, performs centralized control over a southbound interface, that is, a control-forwarding communication interface, and network devices of an SDN data plane, and provides flexible programmability. According to the invention, a congestion control intelligent agent is deployed in an SDN controller to introduce a priority experience replay DQN algorithm into a data center based on a software defined network, link state information and stream state information of a data forwarding plane are collected through a southbound interface, the state information is input into a neural network to generate a flow rate distribution scheme, and then the distribution scheme is issued to network equipment of the data plane through the southbound interface. On the premise of ensuring that the link is not congested, the link utilization rate is as high as possible.
Fig. 2 is a network topology diagram of an SDN data center employed by the present test application. The whole network has 8 links, and the bandwidth of the links is 40G. The flow queue length employed in this test application example is 28.
In the test application example, the specific DQN-based SDN data center congestion control method comprises the following steps:
step 1: the DQN algorithm is introduced to SDN data center congestion control problems.
In the congestion control system architecture diagram based on prioritized experience playback DQN as shown in fig. 1, the overall process can be described as follows: firstly, an SDN controller acquires state information of each link and flow state information to be distributed in a data center network from a data plane in real time through a southbound interface; then leading the priority experience playback DQN algorithm into a data center based on a software defined network by deploying congestion control intelligent agents in an SDN controller, and inputting link state information and stream state information collected from a southbound interface into a neural network to generate a stream rate distribution scheme; and finally, the SDN controller issues the allocation scheme to the network equipment of the data plane through the southbound interface.
Step 2: based on the improved DQN algorithm, the deep Q network is trained according to steps 2-1 to 2-6.
2-1. Determining the input and output of the deep Q network. The input of the deep Q neural network is the link state and the state of the current stream to be allocated with the rate, namely the sequence number of the current stream and the path through which the stream passes. And outputting Q values corresponding to different actions, namely, distributing different rates to the flow.
Fig. 2 is a network topology structure diagram of the present test application. In the figure, 28 streams are shown to pass through L1-L2, L1-L3, respectively, and L6-L8, L7-L8, respectively, with a bandwidth of 40G for each link. On the premise of meeting the rate requirements of all flows as much as possible, a congestion control method based on DQN is used for distributing the rate for each flow, and the network is ensured not to be congested.
In the test application example, states of 8 links and 28 flows are input, a neural network in the DQN algorithm is input for training, and a mapping table of flows and rates is output.
2-2, setting a reward function. Since the first objective of congestion control is to maximize the link utilization while the second objective is to avoid congestion, the reward function is a function that is set in consideration of both the link utilization and the link congestion situation. In this test application, the given bonus function is
Wherein: reward m Represents the prize value, min () represents the minimum value operation in brackets, lkCap m Indicating the link utilization, done is true indicating the link is congested, done is false indicating the link is not congested.
2-3. Improving the DQN algorithm based on preferential empirical playback. As previously mentioned, the improvement here refers to adding a determination of whether the link is congested or not before each step of each scenario of training ends on the basis of DQN algorithm based on preferential experience playback, and if so, ending the scenario. The step refers to allocating a rate for one stream, and the scene refers to completing the allocation of a rate for the entire set of streams. The improved DQN algorithm based on preferential empirical playback is referred to as the preferential empirical playback DQN improvement algorithm.
The rest of the DQN algorithm based on preferential empirical playback is the same as the prior art. For ease of understanding, a training flow chart of a preferential empirical playback DQN improvement algorithm is presented by fig. 3, comprising in particular the following steps:
2-3-1. Initializing a neural network storing the trained samples, the action value function, and the target action value function.
In the test application example, the sample capacity for storing training is set to 4500, and a basic DNN neural network is adopted for both the neural network of the action value function and the neural network of the target action value function. The initial load of 8 links is [21,27,20,28,22,26,23,18].
2-3-2. Selecting a random action according to probability E, inputting the current link state, calculating the Q value of each action, and selecting an action (optimal action) with the largest Q value to execute under the premise of considering the route information of the current flow.
In the present test application, the probability e is set to 0.9, the learning rate is 0.001, and the selected action set is [0g,1g,2g,3g,4g,5g ], that is, the rate allocated to each flow is selected from one of the six rates to be executed.
2-3-4. Get execution a t Post prize r t The next input phi t+1 And a link congestion determination label done (if the link congestion done is true, otherwise false). And continuously iterating and updating the target action value function parameter to be the current action value function parameter.
In the present test application, phi t ,a t ,r t ,φ t+1 Done represents the current output, current action, prize value, next output, and link congestion decision label, respectively. These data are stored in SumPree, from which Batch-size samples are taken per training according to priority. The target value for each state is calculated and updated by the SGD random gradient descent method. The size of the Batch-size here is set to 32 and the number of training steps is 20000.
2-3-4. This is cycled until s is the final state. And obtaining the trained Q neural network.
2-4, constructing a scene. The construction scenario refers to randomly constructing an arbitrary initial link state, and randomly constructing an arbitrary set of flows and rate requirements.
2-5. Build SumPree storage experience and mark priority. The SumPree refers to a binary tree data structure, the tag priority refers to a tag of importance to experience according to TD-error, which refers to the absolute value of the difference between the Q value of the current experience and the target Q value in the time sequence difference.
And 2-6, according to the priority selection experience, training the Q neural network by replaying the DQN improvement algorithm according to the priority experience.
Further, the training process of the preferential empirical playback DQN improvement algorithm described above can be described in pseudo-code as follows:
algorithm input: training round number MAX_EPISODE, link number M, number N of rate flows to be distributed, state characteristic dimension S, action set X, learning rate leanning-rate, priority adjustment strength alpha, sampling weight coefficient beta, discount rate gamma, exploration rate epsilon, minimum positive value epsilon, current Q network Q, target Q network Q', sample number BT of batch gradient descent, target Q network parameter updating frequency C and leaf node number ST of SumPree.
And (3) outputting: q network parameters.
For i from 1 to MAX_EPISODE do
Initializing all states and actions corresponding value Q
Randomly initializing all parameters omega of current Q network
Initializing a parameter ω' =ω of the target Q network Q
Initializing a default data structure of experience playback SumPree, priority p of S leaf nodes of all SumPree k Is 1
Initializing S to be the first state of a current state sequence, namely the load conditions of all links in a current SDN data center network and information of streams to be processed, and obtaining a characteristic vector phi (S) of the current state sequence
For j from 1 to N do
a) And using phi (S) as an input in the Q network to obtain Q value outputs corresponding to all actions of the Q network. Selecting corresponding action A, namely rate of j-th stream allocation, in current Q value output by using E-greedy method
b) Executing the current action A in the state S to obtain a feature vector phi (S') corresponding to the new state S
c) Calculating a reward value based on the status of each link m ,And a termination state done, when congestion occurs, namely the termination state is considered to be reached, the training of the current epoode is stopped
d) The five-tuple { phi (S), A, R, phi (S'), done } is stored in SumPree
e)S=S′
f) Sample BT samples { φ (S) k ),A k ,R k ,φ(S′ k ),done k K=1, 2, …, BT, the probability of each sample being sampled isLoss function weight omega k =(N*P(k)) -β /max j (ω j ) Calculating the current target Q value y k :
g) Using a mean square error loss functionUpdating all parameters ω of Q-network by gradient back-propagation of neural network
h) Recalculating the TD error delta for all samples k =y k -Q(φ(S k ),A k Omega), update the priority p of all nodes in SumTree k =|δ k |+ε
i) If i% c=1, the target Q network parameter ω' =ω is updated
j) If S' is a termination state, i.e., all flows are allocated, the current epoode is stopped.
Step 3: and (3) applying the Q neural network trained in the step (2) to perform congestion control on the SDN data center network.
The specific congestion control method flowchart is shown in fig. 4, and specifically includes the following steps:
and 3-1, acquiring 28 flow rate requirements and routing information to be distributed, and acquiring the state of each link of the current SDN data center network, namely the occupation condition of the link bandwidth. The flow request to be allocated is 28, the initial load of 8 links is [21,27,20,28,22,26,23,18], and the specific occupied links and bandwidth requirements are as follows:
TABLE 1
flow1 | flow2 | flow3 | flow4 | ... | flow27 | flow28 | |
Occupying links | l 1 ,l 2 | l 1 ,l 3 | l 1 ,l 4 | l 1 ,l 5 | ... | l 6 ,l 8 | l 7 ,l 8 |
Demand bandwidth (G) | 5 | 5 | 5 | 5 | ... | 5 | 5 |
And 3-2, selecting one of all N streams, inputting the information and the current link state of the one stream into the Q neural network trained by the DQN improvement algorithm through preferential experience playback, and selecting the action with the optimal Q value for the stream to execute on the premise of meeting the route.
And 3-3, updating the current link state and simultaneously recording the mapping information of the current flow and the allocation rate.
3-4, judging whether all 28 streams are distributed completely: if not, returning to the step 3-2 to continue the circulation until the distribution rate of all the flows is reached; if the distribution is finished, executing the step 3-5;
3-5, outputting a flow rate distribution mapping table of 28 flows, wherein the SDN controller distributes rates for all flows, so that the distribution rates not only can meet the rate requirements of all flows as best, but also can effectively avoid congestion, and the purpose of global congestion control of a data center is achieved. The flow rate allocation map for 28 streams is as follows:
TABLE 2
flow1 | flow2 | flow3 | flow4 | flow5 | ... | flow27 | flow28 | |
Occupying links | l 1 ,l 2 | l 1 ,l 3 | l 1 ,l 4 | l 1 ,l 5 | l 2 ,l 3 | ... | l 6 ,l 8 | l 7 ,l 8 |
Demand bandwidth (G) | 5 | 5 | 5 | 5 | 5 | ... | 5 | 5 |
Distribution Bandwidth (G) | 3 | 3 | 4 | 1 | 5 | ... | 4 | 3 |
Fig. 5 shows a bandwidth variation graph for each link allocated at a time. The abscissa indicates the number of allocations and the ordinate indicates the bandwidth occupation of each link after allocating bandwidth for each stream. As can be seen from fig. 5, after the rate allocation of 28 streams is completed, no congestion is generated on all links. The method of the invention can effectively realize congestion control.
Fig. 6 shows a graph of link utilization versus different methods for different numbers of streams. Wherein the DQN is an original algorithm of a single-target network, the DDQN is an improved DQN algorithm which is obtained by changing the single-target neural network into a double-neural network based on the DQN, and the PRIO is an improved DQN algorithm which is obtained by applying preferential experience playback in the invention. As can be seen from fig. 6, in the case where the initial state of the entire network is saturated and the rate requirement of each stream is high, the DQN algorithm based on priority experience playback has the maximum link utilization in three network states, and the advantages are more obvious as the dimension of the network state increases. Therefore, compared with the DQN algorithm and the DDQN algorithm, the DQN algorithm based on the priority experience playback provided by the invention has the optimal congestion control effect and the highest link utilization rate, namely, the DQN algorithm can meet the rate requirement of the flow in the network.
Fig. 7 shows a graph of convergence speed versus the different methods. As can be seen from fig. 7, the congestion control method of the DQN improvement algorithm based on preferential experience playback provided by the invention is significantly superior to both DQN algorithm and DDQN algorithm in terms of convergence speed and convergence effect.
The congestion control method of the present invention is described above in connection with specific embodiments. The embodiment shows that the SDN data center congestion control method based on the priority experience playback DQN is effective. The method distributes the rate globally to the flow of the whole network through the controller, so that the whole network can avoid congestion, and the utilization rate of the data link of the network can be as high as possible, thereby realizing the congestion control of the whole data center. The invention overcomes the defect of weak multidimensional perception capability of reinforcement learning, the convergence rate of the algorithm is accelerated by using the Q neural network to replace the Q table of reinforcement learning, the performance of the algorithm is improved by using the target network and experience replay, and the problem that the algorithm is difficult to obtain a high-quality sample is solved by using a priority experience replay mechanism, so that the algorithm has better convergence rate and convergence effect.
The above embodiment is only a preferred embodiment of the present invention, but it is not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, all the technical schemes obtained by adopting the equivalent substitution or equivalent transformation are within the protection scope of the invention.
Claims (6)
1. An SDN data center congestion control method based on preferential experience playback DQN is characterized by comprising the following steps:
s1, deploying congestion control intelligent agents based on a depth Q network in an SDN controller, and introducing a priority experience playback DQN algorithm into a data center based on a software defined network;
s2, training the deep Q network, wherein the training process comprises S21-S24:
s21, setting the input of the deep Q network as link state information and stream state information, and outputting the input as Q values corresponding to different actions, wherein the different actions represent that different speeds are allocated to streams, and the reward function is a comprehensive function for balancing the link utilization rate and the link congestion condition;
s22, randomly constructing any initial link state and any group of flow and rate requirements to realize scene construction;
s23, constructing SumPree for storing experience, and marking the priority of the experience;
s24, selecting experience from SumPree according to the priority, and replaying DQN (differential motion vector N) improvement algorithm according to the priority to train a deep Q network, so that the SDN controller can maximize the utilization rate of a data link through the deep Q network under the condition that congestion of a data center is not guaranteed; the method comprises the steps that on the basis of a priority experience playback DQN algorithm, judgment on whether a link is congested is added before each step of each scene of network training is finished, if so, the scene is directly finished, and if not, the next step is continued; the scene is expressed to be a whole group of stream allocation rates, and each step in the scene is expressed as a stream allocation rate;
s3, the SDN controller collects link state information and flow state information of rates to be distributed from an SDN data plane in real time, inputs the link state information and the flow state information of the rates to be distributed into a trained deep Q network, determines optimal actions according to each flow Q value and generates a flow rate distribution scheme, and therefore overall congestion control is conducted on an SDN data center network.
2. The SDN data center congestion control method based on priority experience replay DQN of claim 1, wherein the SDN controller is connected to network devices of an SDN data plane through a southbound interface to implement centralized control.
4. The SDN data center congestion control method based on priority experience playback DQN of claim 1, wherein the priority experience playback DQN algorithm is a DQN algorithm that replaces experience playback mechanisms with priority experience playback.
5. The SDN data center congestion control method based on priority experience playback DQN of claim 1, wherein the indicia of priority is an experience importance indicia determined from TD-error, which refers to the absolute value of the difference between the Q value of the current experience in the time series differential and the target Q value.
6. The SDN data center congestion control method based on priority experience playback DQN of claim 1, wherein in S3, the method of global congestion control by an SDN controller comprises the steps of:
s31, acquiring the rate requirements and the route information of N flows to be distributed currently from an SDN data plane, and simultaneously acquiring the link states, namely the link bandwidth occupation condition, of the current SDN data center network;
s32, selecting one stream from N streams to be distributed currently, inputting the stream information and the current link state into a depth Q network after S2 training, and selecting optimal action execution according to the output of the depth Q network;
s33, updating the current link state, and simultaneously recording the mapping information of the current flow and the allocation rate;
s34, judging whether all the N streams are distributed, if not, returning to the continuous circulation S32 and S33 until the rates are distributed for all the streams; if the allocation is completed, executing S35;
s35, outputting a flow rate distribution mapping table of N flows, and using the mapping table as the distribution rate of each flow by the SDN controller.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111348335.9A CN113992595B (en) | 2021-11-15 | 2021-11-15 | SDN data center congestion control method based on priority experience playback DQN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111348335.9A CN113992595B (en) | 2021-11-15 | 2021-11-15 | SDN data center congestion control method based on priority experience playback DQN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113992595A CN113992595A (en) | 2022-01-28 |
CN113992595B true CN113992595B (en) | 2023-06-09 |
Family
ID=79748547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111348335.9A Active CN113992595B (en) | 2021-11-15 | 2021-11-15 | SDN data center congestion control method based on priority experience playback DQN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113992595B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114567597B (en) * | 2022-02-21 | 2023-12-19 | 深圳市亦青藤电子科技有限公司 | Congestion control method and device based on deep reinforcement learning in Internet of things |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864102A (en) * | 2017-11-22 | 2018-03-30 | 浙江工商大学 | A kind of SDN data centers jamming control method based on Sarsa |
CN108900419A (en) * | 2018-08-17 | 2018-11-27 | 北京邮电大学 | Route decision method and device based on deeply study under SDN framework |
CN109039942A (en) * | 2018-08-29 | 2018-12-18 | 南京优速网络科技有限公司 | A kind of Network Load Balance system and equalization methods based on deeply study |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9679258B2 (en) * | 2013-10-08 | 2017-06-13 | Google Inc. | Methods and apparatus for reinforcement learning |
-
2021
- 2021-11-15 CN CN202111348335.9A patent/CN113992595B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864102A (en) * | 2017-11-22 | 2018-03-30 | 浙江工商大学 | A kind of SDN data centers jamming control method based on Sarsa |
CN108900419A (en) * | 2018-08-17 | 2018-11-27 | 北京邮电大学 | Route decision method and device based on deeply study under SDN framework |
CN109039942A (en) * | 2018-08-29 | 2018-12-18 | 南京优速网络科技有限公司 | A kind of Network Load Balance system and equalization methods based on deeply study |
Non-Patent Citations (1)
Title |
---|
基于深度强化学习的电力通信网路由策略;朱小琴;袁晖;王维洲;魏峰;张驯;赵金雄;;科学技术创新(第36期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113992595A (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111010294B (en) | Electric power communication network routing method based on deep reinforcement learning | |
CN112437020B (en) | Data center network load balancing method based on deep reinforcement learning | |
CN112134916B (en) | Cloud edge collaborative computing migration method based on deep reinforcement learning | |
CN112486690B (en) | Edge computing resource allocation method suitable for industrial Internet of things | |
CN111211987B (en) | Method and system for dynamically adjusting flow in network, electronic equipment and storage medium | |
CN108111335B (en) | A kind of method and system of scheduling and link virtual network function | |
CN113395207B (en) | Deep reinforcement learning-based route optimization framework and method under SDN framework | |
CN111988225A (en) | Multi-path routing method based on reinforcement learning and transfer learning | |
CN107864102B (en) | SDN data center congestion control method based on Sarsa | |
CN111917642B (en) | SDN intelligent routing data transmission method for distributed deep reinforcement learning | |
CN108684046A (en) | A kind of access net service function chain dispositions method based on incidental learning | |
CN113992595B (en) | SDN data center congestion control method based on priority experience playback DQN | |
CN110198280A (en) | A kind of SDN link allocation method based on BP neural network | |
CN114650227A (en) | Network topology construction method and system under layered federated learning scene | |
CN114828018A (en) | Multi-user mobile edge computing unloading method based on depth certainty strategy gradient | |
CN115665258A (en) | Deep reinforcement learning-based priority perception deployment method for multi-target service function chain | |
CN115714741A (en) | Routing decision method and system based on collaborative multi-agent reinforcement learning | |
CN116320620A (en) | Stream media bit rate self-adaptive adjusting method based on personalized federal reinforcement learning | |
CN117294643B (en) | Network QoS guarantee routing method based on SDN architecture | |
CN113676407A (en) | Deep learning driven flow optimization mechanism of communication network | |
CN116166444B (en) | Collaborative reasoning method oriented to deep learning hierarchical model | |
CN110971451B (en) | NFV resource allocation method | |
Suzuki et al. | Safe multi-agent deep reinforcement learning for dynamic virtual network allocation | |
Rădulescu et al. | Analysing congestion problems in multi-agent reinforcement learning | |
CN116367190A (en) | Digital twin function virtualization method for 6G mobile network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |