CN113992595B

CN113992595B - SDN data center congestion control method based on priority experience playback DQN

Info

Publication number: CN113992595B
Application number: CN202111348335.9A
Authority: CN
Inventors: 金蓉; 高桂超; 朱广信
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2023-06-09
Anticipated expiration: 2041-11-15
Also published as: CN113992595A

Abstract

The invention discloses a congestion control method of an SDN data center network based on priority experience playback DQN. The method of the present invention is based on the network background of SDN, based on the flow congestion control idea, and has the characteristics of intelligence, centralization and initiative. Introduce and improve the congestion control algorithm based on priority experience playback DQN, and provide an intelligent solution to the congestion control problem of SDN data center network. The present invention distributes the rate globally to the flow of the whole network through the controller, which can not only avoid the congestion of the whole network, but also make the utilization rate of the data link of the network as high as possible, so as to realize the congestion control of the whole data center. Compared with the method based on Q-learning, this method solves the dimensionality disaster of the Q table; compared with the methods based on DQN and DDQN, it has better convergence speed and convergence effect. The invention is a congestion control method adapted to the SDN data center network conforming to the development trend of network intelligence.

Description

A Congestion Control Method for SDN Data Center Based on Priority Experience Playback DQN

技术领域technical field

本发明涉及网络通信技术领域，具体涉及一种基于优先经验回放(PrioritizedExperience Replay)DQN(Deep Q-learning Network，深度Q网络)的SDN(SoftwareDefined Network,软件定义网络)数据中心网络(Data Center Network,DCN)的拥塞控制方法。The present invention relates to the technical field of network communication, in particular to an SDN (Software Defined Network, software-defined network) data center network (Data Center Network, DCN) congestion control method.

背景技术Background technique

SDN架构作为一种未来网络架构已经广泛被数据中心网络采用。随着大数据和云计算的发展，SDN数据中心网络中节点数目和流的数量越来越大，数据中心面临着网络拥塞的风险。由于DQN需要在探索中获取经验，通过经验回放来更新神经网络的参数，但是其采用的随机均匀采样策略在特定情况下并不是最优策略。将DQN引入拥塞控制时，由于每次分配采取的动作选择很多，在随机探索的过程中，很难获得既不拥塞而且链路利用率高的经验样本，给神经网络的训练带来困难。为此，亟需设计一种新的SDN数据中心网络拥塞控制方法来解决此类问题。As a future network architecture, SDN architecture has been widely adopted by data center networks. With the development of big data and cloud computing, the number of nodes and the number of flows in the SDN data center network are increasing, and the data center is facing the risk of network congestion. Since DQN needs to gain experience in exploration, the parameters of the neural network are updated through experience playback, but the random uniform sampling strategy it adopts is not the optimal strategy in certain cases. When DQN is introduced into congestion control, due to the large number of action options for each allocation, it is difficult to obtain experience samples that are neither congested nor have high link utilization in the process of random exploration, which brings difficulties to the training of neural networks. Therefore, it is urgent to design a new SDN data center network congestion control method to solve such problems.

发明内容Contents of the invention

本发明的目的是智能地解决基于SDN架构的数据中心网络的拥塞控制问题，提出了一种基于优先经验回放DQN的SDN数据中心网络的拥塞控制方法。The purpose of the present invention is to intelligently solve the congestion control problem of the data center network based on the SDN architecture, and proposes a congestion control method of the SDN data center network based on priority experience playback DQN.

本发明的发明构思是引入并改进基于优先经验回放DQN的拥塞控制算法，为SDN数据中心网络的拥塞控制问题提供智能的解决方法。本发明通过控制器对全网的流全局地分配速率，既能使整个网络避免发生拥塞，又能使网络的数据链路利用率尽可能高，从而实现整个数据中心的拥塞控制。The inventive idea of the present invention is to introduce and improve the congestion control algorithm based on priority experience playback DQN, and provide an intelligent solution to the congestion control problem of the SDN data center network. The present invention distributes the rate globally to the flow of the whole network through the controller, which can not only avoid the congestion of the whole network, but also make the utilization rate of the data link of the network as high as possible, so as to realize the congestion control of the whole data center.

本发明的目的是通过以下技术方案来实现的：The purpose of the present invention is achieved through the following technical solutions:

一种基于优先经验回放DQN的SDN数据中心拥塞控制方法，其包含如下步骤：A kind of SDN data center congestion control method based on priority experience playback DQN, it comprises the following steps:

S1、在SDN控制器中部署基于深度Q网络的拥塞控制智能体，将优先经验回放DQN算法引入基于软件定义网络的数据中心；S1. Deploy the congestion control agent based on the deep Q network in the SDN controller, and introduce the priority experience playback DQN algorithm into the data center based on the software-defined network;

S2、训练所述深度Q网络，其中训练过程包括S21～S24：S2. Training the deep Q network, wherein the training process includes S21～S24:

S21、设定所述深度Q网络的输入为链路状态信息以及流状态信息，输出为不同动作对应的Q值，所述不同动作表示给流分配不同速率，奖励函数为平衡链路利用率和链路拥塞情况的综合函数；S21. Set the input of the deep Q network as link state information and flow state information, and the output is the Q value corresponding to different actions. The different actions indicate that different rates are allocated to the flow, and the reward function is to balance link utilization and A comprehensive function of link congestion;

S22、随机构造任意的初始链路状态以及任意的一组流和速率需求，实现场景构造；S22. Randomly construct any initial link state and any set of flow and rate requirements to realize scene construction;

S23、构建SumTree用于存储经验，并标记经验的优先级；S23. Construct a SumTree for storing experience, and mark the priority of experience;

S24、根据优先级从SumTree中选取经验，并按照优先经验回放DQN改进算法训练深度Q网络，使SDN控制器能通过深度Q网络在保证数据中心不发生拥塞的情况下最大化数据链路利用率；其中，所述优先经验回放DQN改进算法是在优先经验回放DQN算法基础上，在网络训练的每个场景的每一步结束前增加对链路是否拥塞的判断，若拥塞，则直接结束该场景，若不拥塞，则继续进行下一步；所述场景表示完成为整组流分配速率，场景中的每一步表示为一条流分配速率；S24. Select experience from SumTree according to the priority, and play back the DQN improved algorithm to train the deep Q network according to the priority experience, so that the SDN controller can maximize the data link utilization rate through the deep Q network while ensuring that the data center is not congested. ; Wherein, the improved priority experience playback DQN algorithm is based on the priority experience playback DQN algorithm, adding a judgment on whether the link is congested before the end of each step of each scene of network training, and if it is congested, then directly end the scene , if there is no congestion, continue to the next step; the scenario indicates that the rate is allocated for the entire group of flows, and each step in the scenario indicates that the rate is allocated for a flow;

S3、SDN控制器从SDN数据平面中实时收集链路状态信息及待分配速率的流状态信息，并将其输入训练后的深度Q网络中，根据每一条流Q值确定最优动作并生成流速率分配方案，从而对SDN数据中心网络进行全局拥塞控制。S3. The SDN controller collects the link state information and the flow state information of the rate to be allocated in real time from the SDN data plane, and inputs it into the trained deep Q network, determines the optimal action according to the Q value of each flow and generates the flow Rate allocation scheme, so as to perform global congestion control on the SDN data center network.

作为优选，所述SDN控制器通过南向接口与SDN数据平面的网络设备连接，实现集中控制。Preferably, the SDN controller is connected to the network equipment of the SDN data plane through a southbound interface to realize centralized control.

作为优选，所述奖励函数为：Preferably, the reward function is:

式中：reward_m表示奖励值，min表示取最小值操作，LkCap_m表示链路利用率，In the formula: reward _m represents the reward value, min represents the minimum value operation, LkCap _m represents the link utilization rate,

作为优选，所述优先经验回放DQN算法是将经验回放机制替换为优先经验回放的DQN算法。Preferably, the priority experience replay DQN algorithm is a DQN algorithm in which the experience replay mechanism is replaced by a priority experience replay.

作为优选，所述优先级的标记是根据TD-error确定的经验重要性标记，所述TD-error是指时序差分中当前经验的Q值与目标Q值之间差值的绝对值。Preferably, the priority mark is an empirical importance mark determined according to TD-error, and the TD-error refers to the absolute value of the difference between the current empirical Q value and the target Q value in the time sequence difference.

作为优选，所述S3中，SDN控制器进行全局拥塞控制的方法包含如下步骤：As preferably, in said S3, the method for the SDN controller to perform global congestion control includes the following steps:

S31、从SDN数据平面中获取当前待分配的N条流的速率需求和路由信息，同时获取当前SDN数据中心网络各链路状态即链路带宽占用情况；S31. Obtain the rate requirements and routing information of the N streams currently to be allocated from the SDN data plane, and at the same time obtain the status of each link in the current SDN data center network, that is, the link bandwidth occupancy;

S32、从当前待分配的N条流中选择一条流，输入这一条流信息和当前链路状态到经过S2训练好后深度Q网络中，根据深度Q网络的输出，选择最优动作执行；S32. Select a flow from the current N flows to be allocated, input the information of this flow and the current link state into the depth Q network trained by S2, and select the optimal action to execute according to the output of the depth Q network;

S33、更新当前的链路状态，同时记录当前流与分配速率的映射信息。S33. Update the current link state, and record the mapping information between the current flow and the allocated rate at the same time.

S34、判断N条流是否全部分配完毕，如若没有，则返回继续循环S32和S33，直到为所有的流分配速率；若分配完毕，则执行S35；S34, judging whether all the N streams have been allocated, if not, then return to continue looping S32 and S33 until the rate is allocated for all streams; if all the streams are allocated, then execute S35;

S35、输出N条流的流速率分配映射表，SDN控制器以该映射表为各流分配速率。S35. Output the flow rate allocation mapping table of the N flows, and the SDN controller allocates a rate for each flow using the mapping table.

本发明的有益效果是：本发明为SDN数据中心的拥塞控制问题提出了一种基于优先经验回放DQN的智能解决方法，可根据数据中心网络链路的负载变化，集中、主动、智能地进行拥塞控制。本发明克服了强化学习多维感知能力弱的缺点，解决了Q表的维度灾难；同时通过引入优先经验回放机制，使其相比于传统的DQN算法具有更好的收敛速度和收敛效果。该方法通过控制器对全网的流全局地分配速率，既能使整个网络避免发生拥塞，又能使网络的数据链路利用率尽可能高，从而实现整个数据中心的拥塞控制。The beneficial effects of the present invention are: the present invention proposes an intelligent solution based on priority experience playback DQN for the congestion control problem of the SDN data center, which can centrally, proactively and intelligently perform congestion according to the load change of the data center network link control. The invention overcomes the disadvantage of weak multi-dimensional perception ability of reinforcement learning, and solves the dimension disaster of Q table; at the same time, by introducing a priority experience playback mechanism, it has better convergence speed and convergence effect than the traditional DQN algorithm. In this method, the controller allocates the rate globally to the flow of the whole network, which can not only avoid the congestion of the whole network, but also make the utilization rate of the data link of the network as high as possible, so as to realize the congestion control of the whole data center.

附图说明Description of drawings

图1为实施例中拥塞控制系统构架图。Fig. 1 is a block diagram of the congestion control system in the embodiment.

图2为实施例采用的数据中心网络拓扑图。FIG. 2 is a topology diagram of a data center network used in an embodiment.

图3为训练算法流程图。Figure 3 is a flow chart of the training algorithm.

图4为拥塞控制方法流程图。FIG. 4 is a flowchart of a congestion control method.

图5为实施例的带宽变化图。Fig. 5 is a bandwidth change diagram of the embodiment.

图6表示不同流数量下，不同算法的链路利用率对比图(每一组三条柱从左到右分别表示DQN、DDQN、PRIO)。Figure 6 shows the comparison of link utilization ratios of different algorithms under different flow numbers (each group of three columns represents DQN, DDQN, and PRIO from left to right).

图7表示不同算法的收敛速度对比图。Figure 7 shows the comparison of the convergence speed of different algorithms.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明做进一步阐述和说明。本发明中各个实施方式的技术特征在没有相互冲突的前提下，均可进行相应组合。The present invention will be further elaborated and illustrated below in conjunction with the accompanying drawings and specific embodiments. The technical features of the various implementations in the present invention can be combined accordingly on the premise that there is no conflict with each other.

为了便于叙述，下面先对本发明后续涉及的部分基本定义进行解释。For ease of description, some basic definitions involved in the present invention will be explained below first.

本发明中的SDN数据中心，是指用软件定义网络技术架构的数据中心。The SDN data center in the present invention refers to a data center using a software-defined network technology architecture.

本发明中，SDN的数据中心拥塞控制问题是指基于流的拥塞控制问题，指通过SDN控制器对所有的流全局统筹分配速率，既尽力满足流的速率需求，又保证整个数据中心网络不产生拥塞。In the present invention, the data center congestion control problem of SDN refers to the flow-based congestion control problem, which refers to the overall planning and allocation rate of all flows through the SDN controller, which not only tries to meet the flow rate requirements, but also ensures that the entire data center network does not generate congestion.

本发明中，优先经验回放DQN算法是一种基于优先经验回放的DQN算法，其中所述的优先经验回放(prioritized experience reolay)属于现有技术，是对DQN算法中经验回放机制的一种改进。经验是指测试中的样本。DQN算法属于现有技术，是深度强化学习中的一种经典算法。所述的深度强化学习，是一种结合深度学习和强化学习二者的优点，以解决需要感知高维原始输入和决策控制问题的学习方法。In the present invention, the prioritized experience replay DQN algorithm is a DQN algorithm based on prioritized experience reolay, wherein the prioritized experience reolay (prioritized experience reolay) belongs to the prior art, and is an improvement on the experience replay mechanism in the DQN algorithm. Experience refers to the samples under test. The DQN algorithm belongs to the prior art and is a classic algorithm in deep reinforcement learning. The deep reinforcement learning is a learning method that combines the advantages of deep learning and reinforcement learning to solve problems that require perception of high-dimensional original input and decision-making control.

在本发明的一个较佳实施例中，提供了一种基于DQN的SDN数据中心拥塞控制方法，该方法包括如下步骤：In a preferred embodiment of the present invention, a kind of SDN data center congestion control method based on DQN is provided, and this method comprises the steps:

步骤1：在SDN控制器中部署基于深度Q网络的拥塞控制智能体，从而将优先经验回放DQN算法引入基于软件定义网络的数据中心。Step 1: Deploy a deep Q-network-based congestion control agent in the SDN controller, thereby introducing the prior experience replay DQN algorithm into the software-defined network-based data center.

SDN控制器是SDN网络的控制决策部分，通过南向接口与SDN数据平面的网络设备连接，从而集中控制全网中流的速率分配。SDN控制器中部署的拥塞控制智能体是基于深度Q网络DQN进行流速率分配的，本发明中的深度Q网络执行优先经验回放DQN算法，目的是解决SDN数据中心拥塞控制问题。The SDN controller is the control and decision-making part of the SDN network. It is connected to the network equipment of the SDN data plane through the southbound interface, so as to centrally control the rate distribution of the flow in the entire network. The congestion control agent deployed in the SDN controller is based on the deep Q network DQN to allocate the flow rate. The deep Q network in the present invention executes the priority experience playback DQN algorithm, and the purpose is to solve the problem of congestion control in the SDN data center.

步骤2：基于优先经验回放DQN改进算法，训练深度Q网络。具体训练过程包含如下步骤：Step 2: Based on the priority experience playback DQN improved algorithm, train the deep Q network. The specific training process includes the following steps:

2-1.确定深度Q网络的输入与输出。深度Q网络的输入为链路状态信息以及流状态信息，输出为不同动作对应的Q值，所述不同动作代表给流分配不同速率。2-1. Determine the input and output of the deep Q network. The input of the deep Q network is link state information and flow state information, and the output is the Q value corresponding to different actions, which represent different rates allocated to the flow.

2-2.设定奖励函数。因为拥塞控制的第一目的是使链路尽量避免拥塞，第二目的是链路利用率最大化，因此所述奖励函数是同时考虑链路利用率和链路拥塞情况而设定的一个综合函数。该奖励函数的形式不限，以能够平衡链路利用率和链路拥塞情况为准。在本发明的后续实例中，一种奖励函数形式可设置为：2-2. Set the reward function. Because the first purpose of congestion control is to avoid link congestion as much as possible, and the second purpose is to maximize link utilization, so the reward function is a comprehensive function set while considering link utilization and link congestion . The form of the reward function is not limited, as long as it can balance link utilization and link congestion. In subsequent examples of the present invention, a reward function form can be set as:

式中：reward_m表示奖励值，min()表示取括号内的最小值操作，LkCap_m表示链路利用率。In the formula: reward _m represents the reward value, min() represents the operation of taking the minimum value in parentheses, and LkCap _m represents the link utilization rate.

2-3.通过对优先经验回放DQN算法进行改进，形成优先经验回放DQN改进算法。所谓改进是在优先经验回放DQN算法基础上进行改进的，其具体改进是在该算法的网络训练过程中，在每个场景的每一步结束前增加对链路是否拥塞的判断，若拥塞，则结束该场景，不再执行该场景中的后续步，若不拥塞，则继续进行下一步。其中，所谓的步，是指为一条流分配速率；所谓的场景，是指完成为整组流分配速率，即一组流中的每一条流都完成了速率分配。2-3. By improving the priority experience replay DQN algorithm, an improved priority experience replay DQN algorithm is formed. The so-called improvement is based on the priority experience playback DQN algorithm. The specific improvement is to increase the judgment of whether the link is congested before the end of each step in each scene during the network training process of the algorithm. If congested, then End the scene, do not execute the subsequent steps in the scene, if there is no congestion, continue to the next step. Among them, the so-called step refers to allocating a rate for a flow; the so-called scene refers to completing the rate allocation for the whole group of flows, that is, each flow in a group of flows has completed the rate allocation.

该优先经验回放DQN改进算法后续将用于训练深度Q网络。This priority experience playback DQN improved algorithm will be used to train the deep Q network later.

2-4.构造场景：随机构造任意的初始链路状态，随机构造任意的一组流以及速率需求。2-4. Construction scenario: Randomly construct any initial link state, randomly construct any set of flows and rate requirements.

2-5.构建SumTree用于存储经验，并标记经验的优先级。此处的SumTree是指一种二叉树的数据结构，所述优先级的标记是根据TD-error确定的经验重要性标记，所述TD-error是指时序差分中当前经验的Q值与目标Q值之间差值的绝对值。2-5. Construct SumTree to store experience and mark the priority of experience. The SumTree here refers to a binary tree data structure, the priority mark is the empirical importance mark determined according to TD-error, and the TD-error refers to the current empirical Q value and the target Q value in the time sequence difference The absolute value of the difference between.

2-6.根据优先级从SumTree中选取经验，按照前述改进的优先经验回放DQN算法训练深度Q网络，即在训练过程中的每个场景的每一步结束前增加对链路是否拥塞的判断，若拥塞，则直接结束该场景，若不拥塞，则继续进行下一步。优先经验回放DQN算法的其他训练过程与现有技术相同。2-6. Select the experience from SumTree according to the priority, and train the deep Q network according to the aforementioned improved priority experience playback DQN algorithm, that is, increase the judgment of whether the link is congested before the end of each step of each scene in the training process, If it is congested, end the scene directly; if not, proceed to the next step. The other training process of the priority experience replay DQN algorithm is the same as the prior art.

步骤3：SDN控制器从SDN数据平面中实时收集链路状态信息及待分配速率的流状态信息，并将其输入训练后的深度Q网络中，根据每一条流Q值确定最优动作并生成流速率分配方案，从而对SDN数据中心网络进行全局拥塞控制。Step 3: The SDN controller collects the link state information and the flow state information of the rate to be allocated in real time from the SDN data plane, and inputs it into the trained deep Q network, determines the optimal action according to the Q value of each flow and generates Flow rate allocation scheme, so as to perform global congestion control on the SDN data center network.

在本实施例中，步骤3中SDN控制器进行全局拥塞控制的方法包含如下步骤：In this embodiment, the method for the SDN controller to perform global congestion control in step 3 includes the following steps:

3-1.从SDN数据平面中获取当前待分配的N条流的速率需求和路由信息，同时获取当前SDN数据中心网络各链路状态即链路带宽占用情况；3-1. Obtain the rate requirements and routing information of the N streams currently to be allocated from the SDN data plane, and at the same time obtain the status of each link in the current SDN data center network, that is, the link bandwidth occupancy;

3-2.从当前待分配的N条流中选择一条流，输入这一条流信息和当前链路状态到经过S2训练好后深度Q网络中，根据深度Q网络的输出，选择最优动作执行；3-2. Select a flow from the current N flows to be allocated, input the flow information and current link status into the deep Q network after S2 training, and select the optimal action to execute according to the output of the deep Q network ;

3-3.更新当前的链路状态，同时记录当前流与分配速率的映射信息。3-3. Update the current link state, and record the mapping information between the current flow and the allocated rate at the same time.

3-4.判断N条流是否全部分配完毕：如若没有，则返回到步骤3-2继续循环，直到为所有的流分配速率；若分配完毕，则执行步骤3-5；3-4. Determine whether all the N streams have been allocated: if not, return to step 3-2 and continue the loop until the rate is allocated for all streams; if the allocation is complete, perform step 3-5;

判断N条流是否全部分配完毕，如若没有，则返回步骤3-2继续循环继续循环执行3-2和3-3，直到为所有的流分配速率；若分配完毕，则执行3-5；Determine whether all the N streams have been allocated. If not, return to step 3-2 and continue to loop and execute 3-2 and 3-3 until the rate is allocated for all streams; if the allocation is complete, then execute 3-5;

3-5.输出N条流的流速率分配映射表，SDN控制器以该映射表为各流分配速率。这样分配速率不仅能尽力满足所有流的速率需求，而且能有效避免拥塞，以达到对数据中心进行全局的拥塞控制的目的。3-5. Output the flow rate allocation mapping table of N flows, and the SDN controller allocates the rate for each flow according to the mapping table. In this way, the allocated rate can not only meet the rate requirements of all flows as much as possible, but also effectively avoid congestion, so as to achieve the purpose of global congestion control for the data center.

本发明所提供的的上述方法的核心是将改进的优先经验回放DQN算法引入SDN数据中心中解决拥塞控制问题，使得SDN控制器可根据数据中心网络链路的负载变化，集中、主动、智能地进行拥塞控制，克服了强化学习多维感知能力弱的缺点，解决了Q表的维度灾难。同时本发明通过引入优先经验回放机制，使其相比于传统的DQN算法具有更好的收敛速度和收敛效果。该方法通过控制器对全网的流全局地分配速率，既能使整个网络避免发生拥塞，又能使网络的数据链路利用率尽可能高，从而实现整个数据中心的拥塞控制。具体而言，该方法同时具备以下几方面的优点：The core of the above-mentioned method provided by the present invention is to introduce the improved priority experience playback DQN algorithm into the SDN data center to solve the congestion control problem, so that the SDN controller can centrally, proactively and intelligently Carry out congestion control, overcome the shortcoming of reinforcement learning's weak multi-dimensional perception ability, and solve the dimension disaster of Q table. At the same time, the present invention has better convergence speed and convergence effect compared with the traditional DQN algorithm by introducing a priority experience playback mechanism. In this method, the controller allocates the rate globally to the flow of the whole network, which can not only avoid the congestion of the whole network, but also make the utilization rate of the data link of the network as high as possible, so as to realize the congestion control of the whole data center. Specifically, this method has the following advantages at the same time:

1、智能性。所述智能性是指因为引入优先经验回放DQN而具备了智能性，与基于优化理论的拥塞控制方法相比，降低了算法复杂度；与基于强化学习Q-learing的拥塞控制方法相比，解决了Q表的维度灾难。1. Intelligence. Said intelligence refers to because the introduction of priority experience playback DQN possesses intelligence, compared with the congestion control method based on optimization theory, the complexity of the algorithm is reduced; compared with the congestion control method based on reinforcement learning Q-learning, the solution The curse of dimensionality of the Q table.

2、端到端的拥塞控制思想。所述端到端的拥塞控制思想，是指通过全局分配端到端流的速率而实现拥塞控制，而不是在中间节点诸如路由器上考虑拥塞控制增强。2. End-to-end congestion control idea. The idea of end-to-end congestion control refers to implementing congestion control by globally allocating the rate of end-to-end flows, instead of considering congestion control enhancement on intermediate nodes such as routers.

3、集中的网络仲裁式的拥塞控制。所述集中的网络仲裁式的拥塞控制是指拥塞控制方法是部署到SDN的控制器上的，充分利用SDN集中控制的优势，根据全网的状态进行全局的端到端的流的速率的分配。不像传统的基于TCP的拥塞控制是分布式的，每个端系统各自通过TCP连接探测拥塞，各自基于慢启动算法开展拥塞避免。3. Centralized network arbitration congestion control. The centralized network arbitration congestion control means that the congestion control method is deployed on the controller of the SDN, fully utilizes the advantages of the centralized control of the SDN, and distributes the overall end-to-end flow rate according to the state of the entire network. Unlike traditional TCP-based congestion control, which is distributed, each end system detects congestion through a TCP connection, and performs congestion avoidance based on the slow-start algorithm.

4、粒度是流。所述粒度是流是指拥塞控制的对象是流，以适应下一代网络架构SDN基于流的特征，而传统的基于TCP的拥塞控制思想是基于IP包的。4. The granularity is flow. The granularity is flow means that the object of congestion control is flow, so as to adapt to the flow-based characteristics of the next generation network architecture SDN, while the traditional idea of congestion control based on TCP is based on IP packets.

5、主动的拥塞控制。所述主动的拥塞控制，是指本发明的拥塞控制方法思路是由控制器收集全网的状态，和当前的流及速率需求，结合这两者进行全局分配速率。目标是尽量避免拥塞的同时，使全网链路利用率尽可能高。因此特点是主动。而TCP拥塞控制算法是在探测到拥塞以后再去采取行动，相对被动。5. Active congestion control. The active congestion control means that the idea of the congestion control method of the present invention is that the controller collects the status of the entire network, and the current flow and rate requirements, and combines the two to globally allocate the rate. The goal is to avoid congestion as much as possible while making the link utilization rate of the entire network as high as possible. Therefore, the characteristic is active. The TCP congestion control algorithm takes action after detecting congestion, which is relatively passive.

6、采用优先经验回放策略。采用优先经验回放策略是指采用比例优先级法进行优先经验回放，以优先获得既不拥塞而且链路利用率高的经验样本。而传统的基于DQN的拥塞控制方法，采用的是均匀采样策略，不利于获得好的经验样本。6. Adopt the priority experience playback strategy. Using the priority experience playback strategy refers to using the proportional priority method for priority experience playback, so as to preferentially obtain experience samples that are neither congested nor have high link utilization. However, the traditional DQN-based congestion control method uses a uniform sampling strategy, which is not conducive to obtaining good empirical samples.

实施例Example

为了便于本领域一般技术人员理解和实现本发明，展示上述实施例中的基于DQN的SDN数据中心拥塞控制方法的技术效果，现将上述实施例中的方法应用于一个具体测试应用例中，结合附图和数据进一步说明本发明在实际应用中的优点。In order to facilitate those of ordinary skill in the art to understand and realize the present invention, and to demonstrate the technical effects of the DQN-based SDN data center congestion control method in the above-mentioned embodiment, the method in the above-mentioned embodiment is now applied to a specific test application example, combined with The figures and data further illustrate the advantages of the invention in practice.

本测试应用例将优先经验回放DQN算法引入基于软件定义网络的数据中心，实时地解决拥塞控制问题。图1为拥塞控制系统架构图，其中SDN控制器是SDN网络的控制决策部分，通过南向接口也就是控制-转发通信接口与SDN数据平面的网络设备进行集中控制，并提供灵活的可编程能力。本发明通过在SDN控制器中部署拥塞控制智能体将优先经验回放DQN算法引入基于软件定义网络的数据中心，通过南向接口收集数据转发平面的链路状态信息及流状态信息，将这些状态信息输入进神经网络产生流速率分配方案，再通过南向接口将分配方案下发给数据平面的网络设备。在确保链路不拥塞的前提下，使链路利用率尽可能高。This test application example introduces the priority experience replay DQN algorithm into the data center based on software-defined network to solve the congestion control problem in real time. Figure 1 is the architecture diagram of the congestion control system, in which the SDN controller is the control and decision-making part of the SDN network. It performs centralized control with the network equipment of the SDN data plane through the southbound interface, that is, the control-forward communication interface, and provides flexible programmability. . The present invention introduces the priority experience playback DQN algorithm into the data center based on the software-defined network by deploying the congestion control agent in the SDN controller, collects the link state information and flow state information of the data forwarding plane through the southbound interface, and integrates these state information Input into the neural network to generate a flow rate allocation plan, and then send the allocation plan to the network device on the data plane through the southbound interface. Under the premise of ensuring that the link is not congested, the link utilization rate should be as high as possible.

图2为本测试应用例采用的SDN数据中心网络拓扑图。整个网络有8条链路，链路带宽均为40G。本测试应用例采用的流队列长度为28。Figure 2 is a topology diagram of the SDN data center network used in this test application example. There are 8 links in the whole network, and the link bandwidth is 40G. The flow queue length used in this test application example is 28.

本测试应用例中，具体的基于DQN的SDN数据中心拥塞控制方法包含如下步骤：In this test application example, the specific DQN-based SDN data center congestion control method includes the following steps:

步骤1：将DQN算法引入SDN数据中心拥塞控制问题。Step 1: Introduce the DQN algorithm into the congestion control problem of SDN data center.

如图1所示的基于优先经验回放DQN的拥塞控制系统架构图中，整个过程可以这样描述：首先，SDN控制器通过南向接口从数据平面实时获取数据中心网络中各链路的状态信息和待分配的流状态信息；然后通过在SDN控制器中部署拥塞控制智能体将优先经验回放DQN算法引入基于软件定义网络的数据中心，从南向接口收集到的链路状态信息及流状态信息，输入进神经网络产生流速率分配方案；最后SDN控制器再通过南向接口将分配方案下发给数据平面的网络设备。In the architecture diagram of the congestion control system based on priority experience playback DQN shown in Figure 1, the whole process can be described as follows: First, the SDN controller obtains the status information and status information of each link in the data center network in real time from the data plane through the southbound interface. The flow state information to be allocated; then, by deploying the congestion control agent in the SDN controller, the priority experience playback DQN algorithm is introduced into the data center based on the software-defined network, and the link state information and flow state information collected from the southbound interface, Input into the neural network to generate a flow rate allocation plan; finally, the SDN controller sends the allocation plan to the network devices on the data plane through the southbound interface.

步骤2：基于改进的DQN算法，按照2-1～2-6步骤训练深度Q网络。Step 2: Based on the improved DQN algorithm, train the deep Q network according to steps 2-1 to 2-6.

2-1.确定深度Q网络的输入和输出。深度Q神经网络的输入为链路状态以及当前待分配速率的流的状态，所述流的状态即当前流的序号及流经过的路径。输出为不同动作对应的Q值，所述不同动作即给流分配不同速率。2-1. Determine the input and output of the deep Q network. The input of the deep Q neural network is the state of the link and the state of the current flow to be allocated. The state of the flow is the sequence number of the current flow and the path that the flow passes. The output is the Q value corresponding to different actions that assign different rates to the stream.

图2是本测试应用例的网络拓扑结构图。图中的所示为每条链路的带宽为40G的情况下，有28条流，分别经过L1-L2，L1-L3，...，L6-L8，L7-L8，带宽需求都是5G。在尽量满足所有流的速率需求的前提下，使用基于DQN的拥塞控制方法为每条流分配速率，并且保证网络不发生拥塞。Figure 2 is a network topology diagram of this test application example. As shown in the figure, when the bandwidth of each link is 40G, there are 28 flows, passing through L1-L2, L1-L3, ..., L6-L8, L7-L8 respectively, and the bandwidth requirements are all 5G . On the premise of meeting the rate requirements of all flows as much as possible, use the DQN-based congestion control method to allocate a rate for each flow, and ensure that the network is not congested.

在本测试应用例中，输入为8条链路和28条流的状态，输入DQN算法中的神经网络进行训练，输出为流和速率的映射表。In this test application example, the input is the state of 8 links and 28 flows, which is input to the neural network in the DQN algorithm for training, and the output is a mapping table of flows and rates.

2-2.设定奖励函数。因为拥塞控制的第一目的是使链路尽量避免拥塞，第二目的是链路利用率最大化，因此所述奖励函数是综合考虑链路利用率和链路拥塞情况而设定的一个函数。在本测试应用例中，给定奖励函数为2-2. Set the reward function. Since the first purpose of congestion control is to avoid link congestion as much as possible, and the second purpose is to maximize link utilization, the reward function is a function set by comprehensively considering link utilization and link congestion. In this test application example, the given reward function is

式中：reward_m表示奖励值，min()表示取括号内的最小值操作，LkCap_m表示链路利用率，done is true表示链路拥塞，done is false表示链路不拥塞。In the formula: reward _m represents the reward value, min() represents the operation of taking the minimum value in parentheses, LkCap _m represents the link utilization rate, done is true represents link congestion, and done is false represents link no congestion.

2-3.改进基于优先经验回放的DQN算法。如前所述，此处改进是指在基于优先经验回放的DQN算法基础上，在训练的每个场景的每一步结束前增加对链路是否拥塞的判断，若拥塞，则结束该场景。所述步是指为一条流分配速率，所述场景是指完成为整组流分配速率。改进后的基于优先经验回放的DQN算法称为优先经验回放DQN改进算法。2-3. Improve the DQN algorithm based on priority experience replay. As mentioned above, the improvement here refers to the addition of a judgment on whether the link is congested before the end of each step of each training scenario on the basis of the DQN algorithm based on priority experience playback, and if it is congested, the scenario is ended. The step refers to allocating a rate for a flow, and the scenario refers to completing the rate allocation for the entire group of flows. The improved DQN algorithm based on priority experience replay is called priority experience replay DQN improved algorithm.

基于优先经验回放的DQN算法的其余做法与现有技术相同。为了便于理解，通过图3给出了优先经验回放DQN改进算法的训练流程图，具体包含如下步骤：The rest of the DQN algorithm based on priority experience replay is the same as the prior art. In order to facilitate understanding, the training flow chart of the improved algorithm of priority experience playback DQN is given in Figure 3, which specifically includes the following steps:

2-3-1.初始化存储训练的样本、动作值函数和目标动作值函数的神经网络。2-3-1. Initialize the neural network that stores training samples, action-value functions, and target action-value functions.

在本测试应用例中，设置存储训练的样本容量为4500，动作值函数的神经网络和目标动作值函数的神经网络均采用的是基础的DNN神经网络。8条链路的初始负载为[21,27,20,28,22,26,23,18]。In this test application example, the sample size for storage training is set to 4500, and the neural network of the action-value function and the neural network of the target action-value function both use the basic DNN neural network. The initial load of the 8 links is [21,27,20,28,22,26,23,18].

2-3-2.根据概率∈选择一个随机的动作，输入当前的链路状态，计算出每个动作的Q值，在考虑当前流的路由信息的前提下，选择Q值最大的一个动作(最优动作)执行。2-3-2. Select a random action according to the probability ∈, input the current link state, calculate the Q value of each action, and select the action with the largest Q value under the premise of considering the routing information of the current flow ( optimal action) to execute.

在本测试应用例中，概率∈设置为0.9，学习率为0.001，选取的动作集为[0G，1G，2G，3G，4G，5G]，也就是说为每条流分配的速率会从这六个速率中选择一种去执行。In this test application example, the probability ∈ is set to 0.9, the learning rate is 0.001, and the selected action set is [0G, 1G, 2G, 3G, 4G, 5G]. Choose one of six rates to execute.

2-3-4.得到执行a_t后的奖励r_t、下一个输入φ_t+1和链路拥塞判断标签done(若链路拥塞done为true，反之为false)。经过不断迭代、更新目标动作值函数参数为当前动作值函数参数。2-3-4. Obtain the reward r _t after executing a _t , the next input φ _t+1 and the link congestion judgment label done (if the link congestion done is true, otherwise it is false). After continuous iteration, the target action value function parameter is updated as the current action value function parameter.

在本测试应用例中，φ_t,a_t,r_t,φ_t+1,done分别表示当前输出、当前的动作、奖励值、下一输出和链路拥塞判断标签。这些数据存放在SumTree中，每次训练根据优先级从SumTree中取出Batch-size个样本。计算每个状态的目标值，通过SGD随机梯度下降方法来更新。这里的Batch-size的大小设置为32，训练步数为20000。In this test application example, φ _t , a _t , r _t , φ _t+1 , done represent the current output, current action, reward value, next output and link congestion judgment label respectively. These data are stored in SumTree, and each training takes Batch-size samples from SumTree according to the priority. Calculate the target value of each state and update it by SGD stochastic gradient descent method. The Batch-size here is set to 32, and the number of training steps is 20000.

2-3-4.如此循环，直到s为最终状态。得到训练后的Q神经网络。2-3-4. Loop like this until s is the final state. Get the trained Q neural network.

2-4.构造场景。所述构造场景是指随机构造任意的初始链路状态，随机构造任意的一组流以及速率需求。2-4. Construct the scene. The construction scenario refers to randomly constructing an arbitrary initial link state, and randomly constructing an arbitrary group of flows and rate requirements.

2-5.构建SumTree存储经验，并标记优先级。所述SumTree是指一种二叉树的数据结构，所述标记优先级指根据TD-error来对经验进行一个重要性的标记，所述TD-error是指时序差分中当前经验的Q值与目标Q值之间差值的绝对值。2-5. Build the SumTree storage experience and mark the priority. The SumTree refers to a binary tree data structure, and the marking priority refers to marking the importance of experience according to TD-error, and the TD-error refers to the Q value of the current experience in the timing difference and the target Q value The absolute value of the difference between values.

2-6.根据优先级选取经验，按照优先经验回放DQN改进算法训练Q神经网络。2-6. According to the priority selection experience, according to the priority experience playback DQN improved algorithm to train the Q neural network.

进一步的，上述优先经验回放DQN改进算法的训练过程，以伪代码形式可描述如下：Furthermore, the training process of the above-mentioned priority experience playback DQN improved algorithm can be described as follows in the form of pseudo-code:

算法输入：训练轮数MAX_EPISODE，链路数量M，待分配速率流的数量N，状态特征维度S,动作集X,学习率learnning-rate，优先调节力度α，采样权重系数β，折扣率γ,探索率∈，极小正值ε，当前Q网络Q，目标Q网络Q′,批量梯度下降的样本数BT，目标Q网络参数更新频率C，SumTree的叶子节点数ST。Algorithm input: number of training rounds MAX_EPISODE, number of links M, number of streams to be allocated N, state feature dimension S, action set X, learning rate learning-rate, priority adjustment strength α, sampling weight coefficient β, discount rate γ, Exploration rate ∈, minimal positive value ε, current Q network Q, target Q network Q′, batch gradient descent sample number BT, target Q network parameter update frequency C, and SumTree leaf node number ST.

输出：Q网络参数。Output: Q network parameters.

For i from 1 to MAX_EPISODE doFor i from 1 to MAX_EPISODE do

初始化所有的状态和动作对应的价值QInitialize all states and actions corresponding to the value Q

随机初始化当前Q网络的所有参数ωRandomly initialize all parameters ω of the current Q network

初始化目标Q网络Q′的参数ω′＝ωInitialize the parameter ω′=ω of the target Q network Q′

初始化经验回放SumTree的默认数据结构，所有SumTree的S个叶子节点的优先级p_k为1Initialize the default data structure of experience playback SumTree, the priority p _k of all S leaf nodes of SumTree is 1

初始化S为当前状态序列的第一个状态,即当前SDN数据中心网络中所有链路的负载状况及待处理流的信息，得到其特征向量φ(S)Initialize S as the first state of the current state sequence, that is, the load status of all links in the current SDN data center network and the information of the flow to be processed, and obtain its feature vector φ(S)

For j from 1 to N doFor j from 1 to N do

a)在Q网络中使用φ(S)作为输入，得到Q网络的所有动作对应的Q值输出。用∈-贪婪法在当前Q值输出中选择对应的动作A，即第j条流分配的速率a) Use φ(S) as input in the Q network to get the Q value output corresponding to all actions of the Q network. Use the ∈-greedy method to select the corresponding action A in the current Q value output, that is, the rate allocated by the jth flow

b)在状态S执行当前动作A,得到新状态S′对应的特征向量φ(S′)b) Execute the current action A in the state S, and get the eigenvector φ(S′) corresponding to the new state S′

c)根据每条链路的状态计算奖励值reward_m，

以及终止状态done，当拥塞发生时，即认为到达终止状态，停止当前episode的训练c) Calculate the reward value reward _m according to the state of each link,

And the termination state done, when congestion occurs, it is considered to have reached the termination state, and the training of the current episode is stopped

d)将{φ(S),A,R,φ(S′),done}这个五元组存入SumTreed) Store the five-tuple {φ(S), A, R, φ(S′), done} into SumTree

e)S＝S′e) S=S'

f)从SumTree中采样BT个样本{φ(S_k),A_k,R_k,φ(S′_k),done_k},k＝1,2,…,BT，每个样本被采样的概率是

损失函数权重ω_k＝(N*P(k))^-β/max_j(ω_j)，计算当前目标Q值y_k：f) Sample BT samples from SumTree {φ(S _k ), A _k , R _k ,φ(S′ _k ), done _k }, k=1,2,...,BT, the probability of each sample being sampled yes

Loss function weight ω _k =(N*P(k)) ^-β /max _j (ω _j ), calculate the current target Q value y _k :

g)使用均方差损失函数

通过神经网络的梯度反向传播来更新Q网络的所有参数ωg) Use the mean square error loss function

Update all parameters of the Q-network ω through the gradient back-propagation of the neural network

h)重新计算所有样本的TD误差δ_k＝y_k-Q(φ(S_k),A_k,ω)，更新SumTree中所有节点的优先级p_k＝|δ_k|+εh) Recalculate the TD error of all samples δ _k =y _k -Q(φ(S _k ),A _k ,ω), and update the priorities of all nodes in SumTree p _k =|δ _k |+ε

i)如果i％C＝1,则更新目标Q网络参数ω′＝ωi) If i%C=1, then update the target Q network parameter ω′=ω

j)如果S′是终止状态，即分配了所有流，则停止当前episode。j) If S' is in the terminated state, that is, all streams are allocated, stop the current episode.

步骤3：应用步骤2训练后的Q神经网络，对SDN数据中心网络进行拥塞控制。Step 3: Apply the Q neural network trained in step 2 to control the congestion of the SDN data center network.

具体的拥塞控制方法流程图如图4所示，具体包含如下步骤：The specific congestion control method flow chart is shown in Figure 4, which specifically includes the following steps:

3-1.获取待分配的28条流速率需求和路由信息，获取当前SDN数据中心网络各链路状态，即链路带宽占用情况。待分配的流请求为28条，8条链路的初始负载为[21,27,20,28,22,26,23,18]，具体需要占用的链路和带宽需求如下：3-1. Obtain the rate requirements and routing information of the 28 streams to be allocated, and obtain the status of each link in the current SDN data center network, that is, the link bandwidth occupancy. There are 28 stream requests to be allocated, and the initial load of the 8 links is [21, 27, 20, 28, 22, 26, 23, 18]. The specific links and bandwidth requirements to be occupied are as follows:

表1Table 1

flow1flow1 flow2flow2 flow3flow3 flow4flow4 ...... flow27flow27 flow28flow28 占用链路occupied link l₁,l₂ l ₁ , l ₂ l₁,l₃ l ₁ ,l ₃ l₁,l₄ l ₁ , l ₄ l₁,l₅ l ₁ ,l ₅ ...... l₆,l₈ l ₆ ,l ₈ l₇,l₈ l ₇ ,l ₈ 需求带宽(G)Required Bandwidth (G) 55 55 55 55 ...... 55 55

3-2.从所有N条流中选择一条，输入这一条流的信息和当前链路状态到经过优先经验回放DQN改进算法训练的Q神经网络中，为该流在满足路由的前提下选择具有Q值最优的动作执行。3-2. Select one stream from all N streams, input the information of this stream and the current link state into the Q neural network trained by the improved algorithm of priority experience playback DQN, and select a stream with The action with the optimal Q value is executed.

3-4.判断28条流是否全部分配完毕：如若没有，则返回到步骤3-2继续循环，直到为所有的流的分配速率；若分配完毕，则执行步骤3-5；3-4. Determine whether all the 28 streams have been allocated: if not, return to step 3-2 and continue the cycle until the allocation rate for all streams; if the allocation is complete, execute step 3-5;

3-5.输出28条流的流速率分配映射表，SDN控制器以此为各流分配速率，这样分配速率不仅能尽力满足所有流的速率需求，而且能有效避免拥塞，以达到对数据中心进行全局的拥塞控制的目的。28条流的流速率分配映射表如下：3-5. Output the flow rate allocation mapping table of 28 streams, and the SDN controller allocates the rate for each stream. In this way, the allocated rate can not only meet the rate requirements of all streams, but also effectively avoid congestion, so as to achieve data center The purpose of global congestion control. The flow rate allocation mapping table of 28 flows is as follows:

表2Table 2

flow1flow1 flow2flow2 flow3flow3 flow4flow4 flow5flow5 ...... flow27flow27 flow28flow28 占用链路occupied link l₁,l₂ l ₁ ,l ₂ l₁,l₃ l ₁ ,l ₃ l₁,l₄ l ₁ , l ₄ l₁,l₅ l ₁ ,l ₅ l₂,l₃ l ₂ ,l ₃ ...... l₆,l₈ l ₆ ,l ₈ l₇,l₈ l ₇ ,l ₈ 需求带宽(G)Required Bandwidth (G) 55 55 55 55 55 ...... 55 55 分配带宽(G)Allocate Bandwidth (G) 33 33 44 11 55 ...... 44 33

图5表示每次分配每条链路的带宽变化图。横坐标表示分配的次数，纵坐标表示为每条流分配带宽后每条链路的带宽占用情况。从图5可以看出，完成28条流的速率分配后，所有链路均没有产生拥塞。说明本发明方法可以有效实现拥塞控制。Fig. 5 shows a diagram of the bandwidth variation of each link for each allocation. The abscissa indicates the number of allocations, and the ordinate indicates the bandwidth occupancy of each link after the bandwidth is allocated for each flow. It can be seen from Figure 5 that after the rate allocation of 28 streams is completed, no congestion occurs on all links. It shows that the method of the present invention can effectively implement congestion control.

图6表示不同流数量下，不同方法的链路利用率对比图。其中DQN是原始的单目标网络的算法，DDQN是在DQN的基础上将单目标神经网络改成了双神经网络，PRIO是本发明中应用了优先经验回放的改进DQN算法。从图6中可以看出，在整个网络初始状态较为饱和，且每条流的速率需求较大的情况下，基于优先经验回放的DQN算法在三种网络状态下，都具有最大的链路利用率，且随着网络状态维度的增加，优势更加明显。因此，本发明提出的基于优先经验回放的DQN算法与DQN算法和DDQN算法相比，具有最优的拥塞控制效果，具备最高的链路利用率，即能尽力满足网络中流的速率需求。Fig. 6 shows a comparison chart of link utilization ratios of different methods under different flow numbers. Wherein DQN is the algorithm of the original single-objective network, DDQN is to change the single-objective neural network into a double neural network on the basis of DQN, and PRIO is an improved DQN algorithm that has applied priority experience playback in the present invention. It can be seen from Figure 6 that when the initial state of the entire network is relatively saturated and the rate requirement of each flow is large, the DQN algorithm based on priority experience playback has the largest link utilization in the three network states rate, and with the increase of the network state dimension, the advantages are more obvious. Therefore, compared with the DQN algorithm and the DDQN algorithm, the DQN algorithm based on priority experience replay proposed by the present invention has the best congestion control effect and the highest link utilization rate, that is, it can try its best to meet the flow rate requirements in the network.

图7表示不同方法的收敛速度对比图。从图7中可以看出，本发明提出的基于优先经验回放的DQN改进算法的拥塞控制方法从收敛速度和收敛效果上都要明显优于DQN算法和DDQN算法。Figure 7 shows the comparison of the convergence speed of different methods. It can be seen from FIG. 7 that the congestion control method of the improved DQN algorithm based on priority experience replay proposed by the present invention is obviously superior to the DQN algorithm and the DDQN algorithm in terms of convergence speed and convergence effect.

以上结合具体实施例描述了本发明的拥塞控制方法。实施例表明本发明提出的一种基于优先经验回放DQN的SDN数据中心拥塞控制方法是有效的。该方法通过控制器对全网的流全局地分配速率，既能使整个网络避免发生拥塞，又能使网络的数据链路利用率尽可能高，从而实现整个数据中心的拥塞控制。本发明克服了强化学习多维感知能力弱的缺点，运用Q神经网络代替强化学习的Q表加快了算法收敛速度，目标网络和经验重放的使用提高了算法的性能，使用优先经验回放机制解决算法获取优质样本困难的问题，使其拥有更好的收敛速度和收敛效果。The congestion control method of the present invention has been described above in conjunction with specific embodiments. The embodiment shows that a SDN data center congestion control method based on priority experience playback DQN proposed by the present invention is effective. In this method, the controller allocates the rate globally to the flow of the whole network, which can not only avoid the congestion of the whole network, but also make the utilization rate of the data link of the network as high as possible, so as to realize the congestion control of the whole data center. The present invention overcomes the shortcoming of weak multi-dimensional perception ability of reinforcement learning, uses Q neural network instead of Q table of reinforcement learning to speed up the algorithm convergence speed, uses the target network and experience replay to improve the performance of the algorithm, and uses the priority experience replay mechanism to solve the algorithm The difficulty of obtaining high-quality samples makes it have better convergence speed and convergence effect.

以上所述的实施例只是本发明的一种较佳的方案，然其并非用以限制本发明。有关技术领域的普通技术人员，在不脱离本发明的精神和范围的情况下，还可以做出各种变化和变型。因此凡采取等同替换或等效变换的方式所获得的技术方案，均落在本发明的保护范围内。The above-mentioned embodiment is only a preferred solution of the present invention, but it is not intended to limit the present invention. Various changes and modifications can be made by those skilled in the relevant technical fields without departing from the spirit and scope of the present invention. Therefore, all technical solutions obtained by means of equivalent replacement or equivalent transformation fall within the protection scope of the present invention.

Claims

1. a kind of SDN data center congestion control method based on priority experience playback DQN, is characterized in that, comprises the steps:

S1. Deploy the congestion control agent based on the deep Q network in the SDN controller, and introduce the priority experience playback DQN algorithm into the data center based on the software-defined network;

S2. Training the deep Q network, wherein the training process includes S21～S24:

S21. Set the input of the deep Q network as link state information and flow state information, and the output is the Q value corresponding to different actions. The different actions indicate that different rates are allocated to the flow, and the reward function is to balance link utilization and A comprehensive function of link congestion;

S22. Randomly construct any initial link state and any set of flow and rate requirements to realize scene construction;

S23. Construct a SumTree for storing experience, and mark the priority of experience;

S24. Select experience from SumTree according to the priority, and play back the DQN improved algorithm to train the deep Q network according to the priority experience, so that the SDN controller can maximize the data link utilization rate through the deep Q network while ensuring that the data center is not congested. ; Wherein, the improved priority experience playback DQN algorithm is based on the priority experience playback DQN algorithm, adding a judgment on whether the link is congested before the end of each step of each scene of network training, and if it is congested, then directly end the scene , if there is no congestion, continue to the next step; the scenario indicates that the rate is allocated for the entire group of flows, and each step in the scenario indicates that the rate is allocated for a flow;

S3. The SDN controller collects the link state information and the flow state information of the rate to be allocated in real time from the SDN data plane, and inputs it into the trained deep Q network, determines the optimal action according to the Q value of each flow and generates the flow Rate allocation scheme, so as to perform global congestion control on the SDN data center network.

2. The SDN data center congestion control method based on priority experience playback DQN as claimed in claim 1, wherein the SDN controller is connected to the network equipment of the SDN data plane through a southbound interface to realize centralized control.

3. the SDN data center congestion control method based on priority experience playback DQN as claimed in claim 1, is characterized in that, described reward function is:

In the formula: reward _m represents the reward value, min represents the operation of taking the minimum value, and LkCap _m represents the link utilization rate.

4. The SDN data center congestion control method based on priority experience playback DQN according to claim 1, wherein the priority experience playback DQN algorithm is a DQN algorithm that replaces the experience playback mechanism with priority experience playback.

5. the SDN data center congestion control method based on priority experience playback DQN as claimed in claim 1, is characterized in that, the mark of described priority is the empirical importance mark that determines according to TD-error, and described TD-error is Refers to the absolute value of the difference between the current empirical Q value and the target Q value in the temporal difference.

6. the SDN data center congestion control method based on priority experience playback DQN as claimed in claim 1, is characterized in that, in described S3, the method that SDN controller carries out global congestion control comprises the following steps:

S31. Obtain the rate requirements and routing information of the N streams currently to be allocated from the SDN data plane, and at the same time obtain the status of each link in the current SDN data center network, that is, the link bandwidth occupancy;

S32. Select a flow from the current N flows to be allocated, input the information of this flow and the current link state into the depth Q network trained by S2, and select the optimal action to execute according to the output of the depth Q network;

S33. Update the current link state, and record the mapping information between the current flow and the allocation rate;

S34, judging whether all the N streams have been allocated, if not, then return to continue looping S32 and S33 until the rate is allocated for all streams; if all the streams are allocated, then execute S35;

S35. Output the flow rate allocation mapping table of the N flows, and the SDN controller allocates a rate for each flow using the mapping table.