CN113992595A

CN113992595A - SDN data center congestion control method based on prior experience DQN playback

Info

Publication number: CN113992595A
Application number: CN202111348335.9A
Authority: CN
Inventors: 金蓉; 高桂超; 朱广信
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-01-28
Anticipated expiration: 2041-11-15
Also published as: CN113992595B

Abstract

The invention discloses a congestion control method of an SDN data center network based on prior experience DQN playback. The method is based on the network background of the SDN and the congestion control idea of the flow, and has the characteristics of intelligence, centralization and initiative. A congestion control algorithm based on prior experience DQN playback is introduced and improved, and an intelligent solution is provided for the congestion control problem of the SDN data center network. The invention distributes the speed to the flow of the whole network globally through the controller, not only can avoid the congestion of the whole network, but also can ensure that the utilization rate of the data link of the network is as high as possible, thereby realizing the congestion control of the whole data center. Compared with a Q-learning-based method, the method solves the dimensionality disaster of the Q table; compared with the method based on DQN and DDQN, the method has better convergence speed and convergence effect. The invention relates to a congestion control method which is suitable for an SDN data center network and conforms to the intelligent development trend of the network.

Description

SDN data center congestion control method based on prior experience DQN playback

Technical Field

The invention relates to the technical field of Network communication, in particular to a congestion control method of a Data Center Network (DCN) of an SDN (Software Defined Network) based on prior Experience playback (Deep Q-learning Network).

Background

The SDN architecture has been widely adopted by data center networks as a future network architecture. With the development of big data and cloud computing, the number of nodes and the number of flows in an SDN data center network become larger and larger, and a data center faces a risk of network congestion. Since DQN requires experience in exploration, parameters of the neural network are updated through empirical replay, but the random uniform sampling strategy adopted by DQN is not the optimal strategy in certain situations. When the DQN is introduced into congestion control, because the action selection adopted by each distribution is large, in the process of random exploration, an experience sample which is not congested and has high link utilization rate is difficult to obtain, and difficulty is brought to the training of a neural network. Therefore, a new SDN data center network congestion control method needs to be designed to solve such problems.

Disclosure of Invention

The invention aims to intelligently solve the problem of congestion control of a data center network based on an SDN framework, and provides a congestion control method of the SDN data center network based on prior experience DQN playback.

The invention introduces and improves a congestion control algorithm based on prior experience playback DQN, and provides an intelligent solution for the congestion control problem of an SDN data center network. The invention distributes the speed to the flow of the whole network globally through the controller, not only can avoid the congestion of the whole network, but also can ensure that the utilization rate of the data link of the network is as high as possible, thereby realizing the congestion control of the whole data center.

The purpose of the invention is realized by the following technical scheme:

a SDN data center congestion control method based on prior experience DQN playback comprises the following steps:

s1, deploying a congestion control agent based on a deep Q network in the SDN controller, and introducing a priority experience playback DQN algorithm into a data center based on a software defined network;

s2, training the deep Q network, wherein the training process comprises the steps of S21-S24:

s21, setting the input of the deep Q network as link state information and flow state information, and outputting Q values corresponding to different actions, wherein the different actions represent that different rates are distributed to the flows, and the reward function is a comprehensive function for balancing the link utilization rate and the link congestion condition;

s22, randomly constructing any initial link state and any group of stream and rate requirements to realize scene construction;

s23, constructing SumTree for storing experiences and marking the priority of the experiences;

s24, selecting experience from SumTree according to the priority, and playing back the DQN improved algorithm according to the experience of priority to train the deep Q network, so that the SDN controller can maximize the utilization rate of a data link through the deep Q network under the condition that congestion of a data center is avoided; on the basis of the prior experience playback DQN algorithm, judging whether a link is congested before each step of each scene of network training is finished, if so, directly finishing the scene, and if not, continuing to perform the next step; the scene representation is completed to a whole group of stream distribution rates, and each step in the scene is represented as a stream distribution rate;

and S3, the SDN controller collects link state information and flow state information of the rate to be distributed in real time from the SDN data plane, inputs the link state information and the flow state information into the trained deep Q network, determines the optimal action according to each flow Q value and generates a flow rate distribution scheme, and therefore global congestion control is conducted on the SDN data center network.

Preferably, the SDN controller is connected to a network device of an SDN data plane through a southbound interface, so as to implement centralized control.

Preferably, the reward function is:

in the formula: reward_mIndicating the reward value, min indicating the minimum value operation, LkCap_mThe utilization rate of the link is indicated,

preferably, the priority empirical playback DQN algorithm is a DQN algorithm that replaces the empirical playback mechanism with priority empirical playback.

Preferably, the priority flag is an empirical importance flag determined according to TD-error, which is an absolute value of a difference between a currently empirical Q value and a target Q value in the time-series difference.

Preferably, in S3, the method for the SDN controller to perform global congestion control includes the following steps:

s31, acquiring the rate requirement and routing information of the current N flows to be distributed from the SDN data plane, and acquiring the link state of the current SDN data center network, namely the link bandwidth occupation condition;

s32, selecting one stream from the N streams to be distributed currently, inputting the stream information and the current link state into the deep Q network trained in S2, and selecting the optimal action to execute according to the output of the deep Q network;

and S33, updating the current link state and recording the mapping information of the current flow and the distribution rate.

S34, judging whether all the N flows are distributed completely, if not, returning to continue circulating S32 and S33 until all the flows are distributed with the speed; if the allocation is finished, executing S35;

and S35, outputting a flow rate distribution mapping table of the N flows, and distributing the rate for each flow by the SDN controller according to the mapping table.

The invention has the beneficial effects that: the invention provides an intelligent solution for congestion control of an SDN data center based on prior experience DQN playback, and congestion control can be intensively, actively and intelligently performed according to load change of a data center network link. The invention overcomes the defect of weak multidimensional perception capability of reinforcement learning and solves the dimensionality disaster of the Q table; meanwhile, compared with the traditional DQN algorithm, the method has better convergence speed and convergence effect by introducing a prior experience playback mechanism. The method distributes the rate to the flow of the whole network globally through the controller, not only can avoid the congestion of the whole network, but also can ensure that the utilization rate of the data link of the network is as high as possible, thereby realizing the congestion control of the whole data center.

Drawings

Fig. 1 is a block diagram of a congestion control system according to an embodiment.

Fig. 2 is a data center network topology diagram adopted by the embodiment.

FIG. 3 is a flow chart of a training algorithm.

Fig. 4 is a flow chart of a congestion control method.

Fig. 5 is a diagram of the bandwidth variation of the embodiment.

Fig. 6 shows a comparison graph of link utilization for different algorithms for different numbers of streams (three bars in each set represent DQN, DDQN, PRIO, respectively, from left to right).

Fig. 7 shows a graph comparing the convergence rates of different algorithms.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.

For convenience of description, the following basic definitions of the parts of the invention to which the invention relates are explained.

The SDN data center in the present invention refers to a data center with a software-defined network technology architecture.

In the invention, the data center congestion control problem of the SDN is a congestion control problem based on flows, and means that the SDN controller distributes the rate to all the flows globally and comprehensively, so that the rate requirement of the flows is met as much as possible, and the whole data center network is ensured not to generate congestion.

In the invention, the prior experience replay DQN algorithm is a DQN algorithm based on prior experience replay, wherein the prior experience replay (prior experience replay) belongs to the prior art and is an improvement on an experience replay mechanism in the DQN algorithm. Experience refers to the sample under test. The DQN algorithm belongs to the prior art and is a classic algorithm in deep reinforcement learning. The deep reinforcement learning is a learning method which combines the advantages of the deep learning and the reinforcement learning to solve the problems of high-dimensional original input perception and decision control.

In a preferred embodiment of the present invention, a DQN-based SDN data center congestion control method is provided, which includes the following steps:

step 1: a deep Q network based congestion control agent is deployed in an SDN controller, introducing a prioritized empirical playback DQN algorithm into a software defined network based datacenter.

The SDN controller is a control decision part of the SDN network and is connected with network equipment of an SDN data plane through a southbound interface so as to centrally control the rate allocation of the flow in the whole network. The congestion control agent deployed in the SDN controller distributes flow rate based on a deep Q network DQN, and the deep Q network executes a prior experience playback DQN algorithm to solve the problem of congestion control of an SDN data center.

Step 2: and training the deep Q network based on a priority experience playback DQN improvement algorithm. The specific training process comprises the following steps:

and 2-1, determining the input and the output of the deep Q network. The input of the deep Q network is link state information and flow state information, and the output is Q values corresponding to different actions, wherein the different actions represent that different rates are distributed to the flows.

2-2, setting a reward function. Since the first objective of congestion control is to try to avoid congestion in the link and the second objective is to maximize the link utilization, the reward function is a composite function that is set taking into account both the link utilization and the congestion condition of the link. The form of the reward function is not limited, so as to balance the link utilization rate and the link congestion condition. In subsequent examples of the invention, a form of reward function may be set as:

in the formula: reward_mRepresenting the prize value, min () representing the minimum value operation in brackets, LkCap_mIndicating link utilization.

And 2-3, improving the prior experience playback DQN algorithm to form a prior experience playback DQN improved algorithm. The improvement is carried out on the basis of a prior experience playback DQN algorithm, and the specific improvement is that in the network training process of the algorithm, before each step of each scene is finished, the judgment on whether a link is congested is added, if so, the scene is finished, the subsequent steps in the scene are not executed, and if not, the next step is continued. Wherein, the step refers to allocating the rate for one stream; by scenario, it is meant that the rate assignment for the entire set of streams is done, i.e., each stream in the set of streams has completed the rate assignment.

This prior empirical playback DQN improvement algorithm will subsequently be used to train the deep Q network.

2-4, constructing a scene: randomly constructing any initial link state, randomly constructing any set of flows and rate requirements.

2-5, constructing SumTree for storing experiences and marking the priority of the experiences. Herein, SumTree refers to a data structure of a binary tree, the mark of priority is an experience importance mark determined according to TD-error, and the TD-error refers to an absolute value of a difference between a current experience Q value and a target Q value in time-series difference.

2-6, selecting experience from SumTree according to the priority, and training the deep Q network according to the improved prior experience and the DQN algorithm, namely, judging whether the link is congested before each step of each scene in the training process is finished, if so, directly finishing the scene, and if not, continuing to perform the next step. The other training procedures of the prior empirical playback DQN algorithm are the same as in the prior art.

And step 3: the SDN controller collects link state information and flow state information of rates to be distributed in real time from an SDN data plane, inputs the link state information and the flow state information into a trained deep Q network, determines an optimal action according to a Q value of each flow and generates a flow rate distribution scheme, and therefore global congestion control is conducted on an SDN data center network.

In this embodiment, the method for performing global congestion control by the SDN controller in step 3 includes the following steps:

3-1, acquiring the rate requirement and routing information of the current N flows to be distributed from an SDN data plane, and acquiring the state of each link of the current SDN data center network, namely the occupation condition of the link bandwidth;

3-2, selecting one stream from the N streams to be distributed currently, inputting the stream information and the current link state into a deep Q network trained by S2, and selecting the optimal action to execute according to the output of the deep Q network;

and 3-3, updating the current link state and simultaneously recording the mapping information of the current flow and the distribution rate.

3-4, judging whether all the N flows are distributed completely: if not, returning to the step 3-2 to continue the circulation until all the flows are allocated with the rates; if the distribution is finished, executing the step 3-5;

judging whether the N flows are completely distributed, if not, returning to the step 3-2 to continue to circularly and continuously execute the steps 3-2 and 3-3 until the rate is distributed to all the flows; if the distribution is finished, executing 3-5;

and 3-5, outputting a flow rate distribution mapping table of the N flows, and distributing the rate for each flow by the SDN controller according to the mapping table. Therefore, the rate distribution can not only meet the rate requirements of all flows as much as possible, but also effectively avoid congestion so as to achieve the aim of carrying out global congestion control on the data center.

The core of the method provided by the invention is to introduce an improved priority experience playback DQN algorithm into an SDN data center to solve the problem of congestion control, so that an SDN controller can intensively, actively and intelligently control congestion according to the load change of a data center network link, the defect of weak multidimensional perception of reinforcement learning is overcome, and the dimensionality disaster of a Q table is solved. Meanwhile, compared with the traditional DQN algorithm, the method has better convergence speed and convergence effect by introducing a priority experience playback mechanism. The method distributes the rate to the flow of the whole network globally through the controller, not only can avoid the congestion of the whole network, but also can ensure that the utilization rate of the data link of the network is as high as possible, thereby realizing the congestion control of the whole data center. Specifically, the method has the following advantages at the same time:

1. intelligence. The intelligence is that the DQN is played back by introducing prior experience, so that the algorithm complexity is reduced compared with a congestion control method based on an optimization theory; compared with a congestion control method based on reinforcement learning Q-learning, the dimension disaster of the Q table is solved.

2. End-to-end congestion control concepts. The end-to-end congestion control concept refers to the implementation of congestion control by globally allocating the rate of end-to-end flows, rather than considering congestion control enhancements at intermediate nodes such as routers.

3. Centralized network-mediated congestion control. The centralized network arbitration type congestion control means that a congestion control method is deployed on a controller of the SDN, the advantage of the SDN centralized control is fully utilized, and the overall end-to-end flow rate distribution is carried out according to the state of the whole network. Unlike traditional TCP-based congestion control, which is distributed, each end system detects congestion through a TCP connection individually, and congestion avoidance is developed based on a slow-start algorithm individually.

4. The particle size is a stream. The granularity is flow, which means that the object of congestion control is flow to adapt to the characteristics of the next generation network architecture SDN based on flow, while the traditional TCP based congestion control idea is based on IP packet.

5. Proactive congestion control. The active congestion control means that the concept of the congestion control method of the present invention is that a controller collects the state of the whole network and the current flow and rate requirements, and combines the two to perform global rate allocation. The goal is to keep the overall network link utilization as high as possible while avoiding congestion as much as possible. The feature is therefore active. The TCP congestion control algorithm is relatively passive and takes action after congestion is detected.

6. A priority experience playback strategy is employed. The adoption of the prior experience playback strategy means that a proportional priority method is adopted for prior experience playback so as to preferentially obtain experience samples which are not congested and have high link utilization rate. The traditional congestion control method based on DQN adopts a uniform sampling strategy, which is not favorable for obtaining good experience samples.

Examples

In order to facilitate those skilled in the art to understand and implement the present invention, the technical effects of the DQN-based SDN data center congestion control method in the foregoing embodiment are shown, and the method in the foregoing embodiment is applied to a specific test application example, and further illustrates advantages of the present invention in practical application by referring to the drawings and data.

According to the test application example, a prior experience playback DQN algorithm is introduced into a data center based on a software defined network, so that the problem of congestion control is solved in real time. Fig. 1 is a block diagram of a congestion control system architecture, in which an SDN controller is a control decision part of an SDN network, performs centralized control with network devices of an SDN data plane through a southbound interface, i.e., a control-and-forward communication interface, and provides flexible programmability. According to the method, a congestion control agent is deployed in an SDN controller, a priority experience playback DQN algorithm is introduced into a data center based on a software defined network, link state information and flow state information of a data forwarding plane are collected through a southbound interface, the state information is input into a neural network to generate a flow rate distribution scheme, and the distribution scheme is issued to network equipment of the data plane through the southbound interface. The link utilization is made as high as possible while ensuring that the link is not congested.

Fig. 2 is a network topology diagram of an SDN data center adopted in the test application. The whole network has 8 links, and the link bandwidth is 40G. The length of the flow queue used in this test application is 28.

In the test application case, the specific DQN-based SDN data center congestion control method includes the following steps:

step 1: and introducing a DQN algorithm into the congestion control problem of the SDN data center.

In the architecture diagram of the congestion control system based on prior experience playback DQN as shown in fig. 1, the whole process can be described as follows: firstly, an SDN controller acquires state information of each link in a data center network and flow state information to be distributed in real time from a data plane through a southbound interface; then, a congestion control agent is deployed in an SDN controller, a priority experience playback DQN algorithm is introduced into a data center based on a software defined network, and link state information and flow state information collected from a southbound interface are input into a neural network to generate a flow rate distribution scheme; and finally, the SDN controller issues the distribution scheme to the network equipment of the data plane through a southbound interface.

Step 2: based on an improved DQN algorithm, training a deep Q network according to steps 2-1 to 2-6.

2-1. determining the input and output of the deep Q network. The input of the deep Q neural network is a link state and a state of a current flow to be allocated with a rate, where the state of the flow is a sequence number of the current flow and a path through which the flow passes. And outputting Q values corresponding to different actions, namely allocating different rates to the flow.

Fig. 2 is a network topology structure diagram of the test application example. In the case that the bandwidth of each link is 40G, 28 flows pass through L1-L2, L1-L3, L6-L8, L7-L8, and the bandwidth requirement is 5G. On the premise of meeting the speed requirements of all the flows as much as possible, the congestion control method based on the DQN is used for distributing the speed for each flow, and the network is ensured not to be congested.

In the test application example, the input is the states of 8 links and 28 streams, the input is the neural network in the DQN algorithm for training, and the output is the mapping table of the streams and the rates.

2-2, setting a reward function. Since the first objective of congestion control is to avoid congestion in the link as much as possible and the second objective is to maximize the link utilization, the reward function is a function that is set in consideration of the link utilization and the congestion condition of the link. In the present test application case, the reward function is given as

In the formula: reward_mRepresenting a prize value, min () representing a minimum operation in parentheses，LkCap_mIndicating link utilization, done is true indicating link congestion, done is false indicating no link congestion.

And 2-3, improving the DQN algorithm based on prior experience playback. As mentioned above, the improvement here refers to that on the basis of the DQN algorithm based on prior experience playback, before each step of each training scenario is finished, a judgment on whether a link is congested is added, and if so, the scenario is finished. The step refers to allocating the rate for one stream, and the scene refers to allocating the rate for the whole group of streams. The improved DQN algorithm based on prior experience playback is called a prior experience playback DQN improvement algorithm.

The remaining approach of the DQN algorithm based on prior experience replay is the same as in the prior art. For the convenience of understanding, a training flow chart of the prior experience playback DQN improvement algorithm is given by fig. 3, which specifically includes the following steps:

and 2-3-1, initializing a neural network for storing the trained samples, the action value function and the target action value function.

In the present test application example, the sample capacity for storage training is set to 4500, and the neural network of the action value function and the neural network of the target action value function both use basic DNN neural networks. The initial load of 8 links is [21,27,20,28,22,26,23,18 ].

And 2-3-2, selecting a random action according to the probability epsilon, inputting the current link state, calculating the Q value of each action, and selecting the action (the optimal action) with the maximum Q value to execute on the premise of considering the routing information of the current flow.

In the test application example, the probability e is set to 0.9, the learning rate is 0.001, and the selected action set is [0G, 1G, 2G, 3G, 4G, 5G ], that is, the rate allocated to each flow is selected from the six rates to be executed.

2-3-4. obtaining execution a_tRear prize r_tNext input phi_t+1And a link congestion judgment label done (if the link congestion done is true, otherwise false). And continuously iterating and updating the target action value function parameter to be the current action value function parameter.

In this test application case, φ_t,a_t,r_t,φ_t+1Done respectively represents the current output, the current action, the reward value, the next output and the link congestion judgment tag. These data are stored in SumTree, and the samples of Batch-size are taken from SumTree according to priority for each training. The target value of each state is calculated and updated by the SGD random gradient descent method. Here the size of the Batch-size is set to 32 and the number of training steps is 20000.

2-3-4, and the process is repeated until s is in a final state. And obtaining the trained Q neural network.

And 2-4, constructing a scene. The construction scene refers to the random construction of any initial link state, and the random construction of any group of streams and rate requirements.

2-5, constructing SumTree storage experience and marking priority. The SumTree refers to a data structure of a binary tree, the tag priority refers to a tag that makes an importance to experience according to TD-error, and the TD-error refers to an absolute value of a difference between a Q value of current experience and a target Q value in time-series difference.

And 2-6, selecting experience according to the priority, and training the Q neural network according to a priority experience playback DQN improvement algorithm.

Further, the training process of the above-mentioned prior empirical playback DQN improvement algorithm can be described as follows in a pseudo code form:

inputting an algorithm: training round number MAX _ EPISODE, link number M, number N of rate streams to be distributed, state characteristic dimension S, action set X, learning rate learning-leaving-rate, preferential regulation strength alpha, sampling weight coefficient beta, discount rate gamma, exploration rate epsilon, minimum positive value epsilon, current Q network Q, target Q network Q', sample number BT of batch gradient decline, target Q network parameter update frequency C, and leaf node number ST of SumTree.

And (3) outputting: q network parameters.

For i from 1 to MAX_EPISODE do

Initializing values Q corresponding to all states and actions

Randomly initializing all parameters omega of the current Q network

Initializing parameter ω' of target Q network Q ═ ω

Initializing a default data structure for empirical playback SumTree, all S leaf nodes of SumTree having priority p_kIs 1

Initializing S as the first state of the current state sequence, namely the load conditions of all links in the current SDN data center network and the information of the flow to be processed to obtain the characteristic vector phi (S)

For j from 1 to N do

a) Q value outputs corresponding to all operations of the Q network are obtained by using phi (S) as an input in the Q network. Selecting corresponding action A in current Q value output by epsilon-greedy method, namely, the rate of j flow distribution

b) Executing the current action A in the state S to obtain a feature vector phi (S ') corresponding to the new state S')

c) Calculating reward value reward based on the status of each link_m，

And a termination state done, when congestion occurs, the training of the current epadiode is stopped by considering that the congestion reaches the termination state

d) Storing the quintuple of { phi (S), A, R, phi (S'), done } in SumTree

e)S＝S′

f) BT samples, { φ (S), are sampled from SumTree_k),A_k,R_k,φ(S′_k),done_k

K

1,2, …, BT, the probability of each sample being sampled is

Loss function weight ω_k＝(N*P(k))^-β/max_j(ω_j) Calculating the current target Q value y_k：

g) Using a mean square error loss function

Updating all parameters ω of the Q network by gradient back-propagation of the neural network

h) Recalculating TD error δ for all samples_k＝y_k-Q(φ(S_k),A_kω), update the priorities p of all nodes in SumTree_k＝|δ_k|+ε

i) If i% C is 1, the target Q network parameter ω' is updated to ω

j) If S' is the termination state, i.e. all flows are allocated, the current epamode is stopped.

And step 3: and (3) applying the Q neural network trained in the step (2) to control the congestion of the SDN data center network.

A flowchart of a specific congestion control method is shown in fig. 4, and specifically includes the following steps:

and 3-1, acquiring the flow rate requirement and routing information of 28 pieces of flow to be distributed, and acquiring the state of each link of the current SDN data center network, namely the occupation condition of the link bandwidth. The flow request to be allocated is 28, the initial load of 8 links is [21,27,20,28,22,26,23,18], and the specific occupied links and bandwidth requirements are as follows:

TABLE 1

flow1

flow2

flow3

flow4

...

flow27

flow28

Seizing a link

l₁,l₂

l₁,l₃

l₁,l₄

l₁,l₅

...

l₆,l₈

l₇,l₈

Bandwidth required (G)

5

...

5

And 3-2, selecting one of all the N flows, inputting the information of the flow and the current link state into a Q neural network trained by a priority experience playback DQN improvement algorithm, and selecting the action with the optimal Q value for the flow to execute on the premise of meeting the routing.

3-4, judging whether all 28 streams are distributed or not: if not, returning to step 3-2 to continue the cycle until the allocation rates for all streams; if the distribution is finished, executing the step 3-5;

and 3-5, outputting a flow rate distribution mapping table of 28 flows, and distributing the rate for each flow by the SDN controller, so that the distributed rate can meet the rate requirements of all the flows as much as possible, and can effectively avoid congestion so as to achieve the aim of carrying out global congestion control on the data center. The flow rate allocation map for 28 flows is as follows:

TABLE 2

	flow1	flow2	flow3	flow4	flow5	...	flow27	flow28
									Seizing a link	l₁,l₂	l₁,l₃	l₁,l₄	l₁,l₅	l₂,l₃	...	l₆,l₈	l₇,l₈
Bandwidth required (G)	5	5	5	5	5	...	5	5
									Allocation of bandwidth (G)	3	3	4	1	5	...	4	3

Fig. 5 shows a graph of the change in bandwidth per allocation of each link. The abscissa represents the number of allocations, and the ordinate represents the bandwidth occupancy of each link after allocating bandwidth for each stream. As can be seen from fig. 5, after the rate allocation of 28 flows is completed, no congestion is generated on all links. The method can effectively realize congestion control.

Fig. 6 shows a comparison of link utilization for different methods for different flow numbers. The DQN is an original algorithm of a single-target network, the DDQN is an improved DQN algorithm which changes a single-target neural network into a double-neural network on the basis of the DQN, and the PRIO is an improved DQN algorithm which applies prior experience playback. As can be seen from fig. 6, in the case that the initial state of the entire network is saturated and the rate requirement of each flow is large, the DQN algorithm based on the prior experience playback has the maximum link utilization in all three network states, and the advantage is more obvious as the dimensionality of the network state increases. Therefore, compared with the DQN algorithm and the DDQN algorithm, the DQN algorithm based on prior experience playback provided by the invention has the optimal congestion control effect and the highest link utilization rate, namely, the requirement of the rate of flow in the network can be met as far as possible.

Fig. 7 shows a graph comparing convergence rates of different methods. As can be seen from fig. 7, the congestion control method based on the DQN improved algorithm with prior experience playback proposed by the present invention is significantly better than the DQN algorithm and the DDQN algorithm in terms of convergence speed and convergence effect.

The congestion control method of the present invention has been described above with reference to specific embodiments. The embodiment shows that the SDN data center congestion control method based on priority experience DQN playback is effective. The method distributes the rate to the flow of the whole network globally through the controller, not only can avoid the congestion of the whole network, but also can ensure that the utilization rate of the data link of the network is as high as possible, thereby realizing the congestion control of the whole data center. The invention overcomes the defect of weak multidimensional perception capability of reinforcement learning, accelerates the convergence speed of the algorithm by using the Q neural network to replace a Q table of the reinforcement learning, improves the performance of the algorithm by using the target network and experience replay, solves the problem that the algorithm is difficult to obtain high-quality samples by using a prior experience replay mechanism, and has better convergence speed and convergence effect.

The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims

1. A SDN data center congestion control method based on prior experience DQN playback is characterized by comprising the following steps:

2. The priority experience playback DQN-based SDN data center congestion control method of claim 1, wherein the SDN controller interfaces with network devices of an SDN data plane through a southbound interface, implementing centralized control.

3. The SDN data center congestion control method based on prioritized empirical playback DQN of claim 1, wherein the reward function is:

in the formula: reward_mIndicating the reward value, min indicating the minimum value operation, LkCap_mIndicating link utilization.

4. The priority-experience-playback-DQN-based SDN datacenter congestion control method of claim 1, wherein the priority-experience-playback DQN algorithm is a DQN algorithm that replaces experience playback mechanisms with priority-experience playback.

5. The SDN data center congestion control method based on prioritized empirical playback DQN of claim 1, wherein the indicia of priority is an empirical importance indicia determined according to TD-error, which refers to an absolute value of a difference between a current empirical Q value and a target Q value in time-series difference.

6. The SDN data center congestion control method based on prioritized empirical playback DQN of claim 1, wherein in S3, the method for SDN controller to perform global congestion control comprises the following steps: