CN111555907B

CN111555907B - Data center network energy consumption and service quality optimization method based on reinforcement learning

Info

Publication number: CN111555907B
Application number: CN202010308862.6A
Authority: CN
Inventors: 郭泽华; 孙鹏浩; 窦松石; 张云天; 韩宁; 夏元清
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-04-19
Filing date: 2020-04-19
Publication date: 2021-04-23
Anticipated expiration: 2040-04-19
Also published as: CN111555907A

Abstract

The invention discloses a data center network energy consumption and service quality optimization method based on reinforcement learning, which comprises the steps of constructing a data center network energy consumption and service quality optimization model based on a deep reinforcement learning framework, comparing the link utilization rate, the completion time related performance obtained by network performance calculation and the link margin as the state, reward and action of the optimization model, and then adjusting the flow of a data center network according to the link margin comparison output by the optimization model, so that the adjustment process simultaneously considers the time volatility of data flow and the spatial distribution characteristic of the data flow, and the energy efficiency of the data center network is improved while the FCT is ensured.

Description

Data center network energy consumption and service quality optimization method based on reinforcement learning

Technical Field

The invention belongs to the technical field of computer networks, and particularly relates to a data center network energy consumption and service quality optimization method based on reinforcement learning.

Background

The high power consumption of data centers has become a big problem for data center operators. Recent studies have shown that power consumption in the U.S. data center is expected to reach 1390 hundred million kilowatt-hours in 2020. In a data center, a Data Center Network (DCN) consisting of switches and links consumes about 10% to 20% of the total energy consumption. Traffic in DCNs can generally be divided into two categories: delay sensitive traffic, which is primarily traffic for delay sensitive services (e.g., web searches), which is typically small (from a few KB to MB) and has explicit data Flow Completion Time (FCT) limitations; while delay tolerant traffic is typically large (hundreds of MB or more), it is mainly used for data synchronization or backup between servers that do not have strict FCT requirements. When merging traffic in DCN, delay sensitive traffic should be carefully allocated to ensure quality of service (QoS), while most of all traffic is delay tolerant traffic, so it is important to find a method to improve efficiency in this respect.

Since traffic in DCNs exhibits volatility, some recent studies propose an efficient DCN that reduces the power consumption of DCNs through traffic consolidation using Software Defined Networking (SDN). In such DCNs, traffic is consolidated into a lowest power subnet that is served by switches and links, while unused switches and links may go into sleep mode or shut down to save power consumption.

However, the existing traffic scheduling scheme has the following two disadvantages: first, a coarse integration scheme may result in a decrease in QoS. Data centers have many delay sensitive applications, and FCT is a typical QoS metric context. In a power efficient DCN, shutting down a portion of the network devices may increase the FCT for certain data flows due to possible congestion of the data flows on the active links. However, much prior work has incorporated data stream transmission lines without regard to FCT requirements. Secondly, the diversity of DCN flow types is not fully considered in the existing scheme, and the adaptability is poor. The distribution of different kinds of traffic in DCN has time fluctuation and imbalance of spatial distribution, i.e. the load on a certain link is different at different times, and the load on different links at the same time is also different. However, using these two features to obtain an optimized merging scheme results in high computational complexity. Most of the existing schemes only provide some fixed modes, and cannot be well popularized to most situations.

Disclosure of Invention

In view of this, the invention provides a data center network energy consumption and service quality optimization method based on reinforcement learning, which can dynamically schedule traffic of a data center network according to the performance of the data center network, and reduce the power consumption of the data center network on the premise of ensuring the FCT requirement.

The invention provides a data center network energy consumption and service quality optimization method based on reinforcement learning, which comprises the following steps:

step 1, establishing a data center network energy consumption and service quality optimization model by adopting a deep reinforcement learning framework; the optimization model comprises a deep reinforcement learning agent and an environment, wherein the environment is a data center network to be optimized;

step 2, counting and calculating historical flow and network performance of each link in the data center network to be optimized, wherein the network performance comprises time for completing transmission of delay sensitive streams, the number of currently used links, the number of closed or dormant links and the number of streams violating the time for completing data streams; calculating the network performance to obtain the performance related to the completion time as the initial reward of the deep reinforcement learning agent; calculating the link utilization rate of each link in the data center network to be optimized according to the flow and the network performance, wherein the link utilization rate is used as an initial input state of the deep reinforcement learning agent;

step 3, inputting the initial input state and the initial reward into the deep reinforcement learning agent, and calculating by the deep reinforcement learning agent to obtain an action, wherein the action is used as a link margin ratio of each link in the data center network to be optimized; applying the link margin ratio to the environment, namely adjusting the flow path in the data center network to be optimized according to the link margin ratio, calculating the flow and the network performance under the current adjustment, and updating the deep reinforcement learning agent; iteratively executing the step 3 until the maximum iteration times are reached, and finishing the training of the deep reinforcement learning agent;

and 4, in the actual deployment process, calculating the flow and the network performance of each link in the data center network to be optimized to obtain the link utilization rate and the completion time related performance, inputting the obtained link utilization rate and the completion time related performance into a deep reinforcement learning agent obtained by training as a state and a reward respectively to obtain the link margin ratio of the output data center network to be optimized, adjusting the flow path in the data center network to be optimized according to the link margin ratio, and completing the optimization of the energy consumption and the service quality of the data center network to be optimized.

Further, the award r_tCalculated using the following formula: r is_t＝D[U_t]-λC_tWherein, D [ U ]_t]Number of closed and dormant links at time t for the data centre network to be optimized, C_tAnd lambda is the punishment weight of the flow violating the data flow completion time in the data center network to be optimized at the time t.

Further, the process of calculating the historical traffic and the network performance of each link in the data center network to be optimized through statistics in the step 2 and the process of adjusting the path of the flow in the data center network to be optimized according to the link margin ratio in the step 4 are both realized through a Software Defined Network (SDN) controller.

Further, the adjustment of the flow path in the data center network to be optimized according to the link margin ratio is realized by adopting an optimal matching decreasing boxing algorithm.

Further, the deep reinforcement learning framework is an AC algorithm framework.

Further, the reinforcement learning agents in the step 2 and the step 3 are target networks of the reinforcement learning agents in the step 4, and the target networks and the reinforcement learning agents in the step 4 have the same network structures and network parameters.

Has the advantages that:

the invention constructs a data center network energy consumption and service quality optimization model based on a deep reinforcement learning framework, compares the link utilization rate, the completion time related performance obtained by network performance calculation and the link margin as the state, reward and action of the optimization model, and then adjusts the flow of the data center network according to the link margin comparison output by the optimization model. Test data show that the scheme provided by the invention can save more than 12.2% of power of a data center network compared with the existing scheme under the condition of ensuring FCT constraint.

According to the invention, by adopting the Actor-critical framework and combining the characteristic that the Actor-critical framework generates continuous actions, the matching degree of the continuity characteristic of the value of the margin ratio of the optimization model and the link is improved, so that the effectiveness of the optimization model is improved.

According to the method, the training process and the deployment and use process are separated, namely the reinforcement learning agent obtained through training in the training process is used as the target network of the deep reinforcement learning agent used in the deployment process to complete calculation of the link margin ratio of the data center network to be optimized, and the target network is adopted in the training process to improve smoothness, so that the stability of the deep reinforcement learning agent training process is improved on the whole.

Detailed Description

The present invention will be described in detail below with reference to examples.

The invention provides a data center network energy consumption and service quality optimization method based on reinforcement learning, which has the following basic ideas: the method comprises the steps of establishing a data center network energy consumption and service quality optimization model by utilizing a deep reinforcement learning framework (DRL framework), constructing a training sample set of the data center network energy consumption and service quality optimization model by adopting historical data of flow and network performance of each link in a data center network to be optimized, finishing training of the data center network energy consumption and service quality optimization model by adopting the sample set, inputting current flow and network performance characteristics of the data center network to be optimized into the optimization model obtained by training to obtain a link margin ratio in an actual deployment process, and adjusting a flow path in the data center network to be optimized by adopting the link margin ratio to finish optimization of the data center network energy consumption and service quality.

The invention provides a data center network energy consumption and service quality optimization method based on reinforcement learning, which specifically comprises the following steps:

step 1, establishing a data center network energy consumption and service quality optimization model by adopting a DRL framework, wherein the optimization model comprises a deep reinforcement learning agent (DRL agent) and an environment, and the environment is a data center network to be optimized.

In the prior art, a DRL framework is designed based on a typical Reinforcement Learning (RL) framework, and includes a deep reinforcement learning agent and a working environment thereof, and training of the DRL framework is completed through interaction between the deep reinforcement learning agent and the working environment thereof. In the DRL framework, the action generation strategy is implemented by a deep neural network that completes the slave state s_tTo action a_tI.e. each step t in the interaction, the agent observes the state s of the network environment_tAnd generating an operation a based on the temporary policy mu_t. This action will result in a change of environment from which the reward r of the agent is calculated_t(e.g., energy saving rate in the network). The agent then updates its policy calculation function based on this reward. The purpose of the DRL agent is to maximize the overall cumulative reward with a discount factor.

In the invention, the data center network to be optimized is the working environment in the DRL framework, and the training of the deep reinforcement learning agent is completed in the interaction process between the deep reinforcement learning agent established by the invention and the data center network to be optimized.

Step 2, counting and calculating historical data of flow and network performance of each link in the data center network to be optimized, wherein the network performance comprises time for completing transmission of delay sensitive streams, the number of currently used links, the number of closed or dormant links and the number of streams violating the time for completing data streams; calculating the network performance to obtain the performance related to the completion time as the initial reward of the deep reinforcement learning agent; and calculating the link utilization rate of each link in the data center network to be optimized according to the flow and the network performance, wherein the link utilization rate is used as an initial input state of the deep reinforcement learning agent.

The data center network energy consumption and service quality optimization method based on reinforcement learning can be realized by adopting a Software Defined Network (SDN), a data center network energy consumption and service quality deep reinforcement learning model is realized by a deep reinforcement learning agent (DRL agent) deployed on an SDN controller, and the DRL agent is communicated with the SDN controller through a northbound interface. The SDN controller is used for collecting the flow and network performance statistical data of the data center network to be optimized, and calculating the flow data and network performance of each link in the network, wherein the flow data and network performance include the time FCT for completing the transmission of the delay sensitive streams, the number of currently used links, the number of closed or dormant links and the number of streams violating the time for completing the data streams.

Rewards in deep reinforcement learning are used to evaluate the effectiveness of deep reinforcement learning algorithms, which reflect the overall goal of the user, which for the present invention is to minimize power consumption while ensuring FCT. In the invention, the SDN controller calculates the completion time correlation performance according to the network performance to be used as the initial reward r of the deep reinforcement learning agent_tR is a prize_tThe following formula is used for calculation:

r_t＝D[U_t]-λC_t

wherein, D [ U ]_t]Number of closed or dormant links for the data centre network to be optimized at time t, C_tAnd lambda is the penalty weight of the flow violating the data flow completion time.

And the SDN controller calculates the link utilization rate of each link in the data center network to be optimized according to the flow data and the network performance obtained by statistics, and the link utilization rate is used as the state of the input DRL agent.

Step 3, inputting the link utilization rate into a deep reinforcement learning agent, and calculating by the deep reinforcement learning agent to obtain an action which is used as a link margin ratio of each link in the data center network to be optimized; the link margin ratio is used in the environment, namely, the flow path in the data center network to be optimized is adjusted according to the link margin ratio, the flow and the network performance under the current adjustment are calculated, and the deep reinforcement learning agent is updated; and (5) iteratively executing the step (3) until the maximum iteration times are reached, and finishing the training of the deep reinforcement learning agent.

And (3) sending the link utilization rate and the completion time related performance obtained by the calculation in the step (2) to a DRL agent, generating an action list by the DRL agent according to the input state and sending the action list to an SDN controller, wherein each element in the action list is the link margin ratio of each link in the data center network to be optimized. The SDN controller adjusts a flow path according to the link margin ratio, and then updates a flow table in a corresponding switch, where the specific adjustment manner may be: when no traffic is passed on a link or a device, it is put in a sleep mode or turned off directly to save energy. And finally, the SDN controller calculates the network performance of the adjusted data center network to be optimized again. And (5) iteratively executing the process to the maximum iteration times, namely finishing the training of the DRL agent.

In consideration of the situation that congestion may be caused by unpredictability of burst flow in link capacity, the invention introduces a concept of link margin ratio to avoid congestion of a link caused by the burst flow, thereby ensuring the requirement of delay sensitive flow transmission completion time FCT of delay sensitive flow of a network.

In the actual deployment process, the trained DRL agent calculates and outputs a link margin ratio of the data center network to be optimized based on the link utilization ratio and the completion time related performance of the data center network to be optimized, the SDN controller determines a strategy of adjusting a flow path of the network according to the link margin ratio, namely, the flow is distributed to each link according to the link capacity reserved with margin, an algorithm adopted for distributing the flow can adopt a Best-fit progressive Bin-packing Algorithm (Best-fit progressive Bin-packing Algorithm), namely, the flow is merged, and idle links or equipment are placed in a sleep mode or are directly closed to save energy, so that the aim of merging the flow into fewer links and reducing the used equipment as much as possible is fulfilled.

In the present invention, the action of the deep reinforcement learning agent is referred to as a link margin ratio, and since the values of the link margin ratio should be continuous, it is preferable to select a model that generates continuous action when selecting the deep reinforcement learning model. An Actor-critical framework (such as DDPG) in the deep reinforcement learning model is a typical method for generating continuous actions, the Actor-critical framework comprises two types of neural network Actor networks and critical networks, for the invention, the quality of a temporary adjustment method (optimized measurement) is evaluated by using the critical networks, and the Actor networks are used for generating the actions according to input states.

Meanwhile, in order to further improve the stability of the deep reinforcement learning agent training process, a target network can be adopted in the training process to improve the smoothness, namely the reinforcement learning agent trained in the training process is used as the target network of the deep reinforcement learning agent used in the deployment process to complete the calculation of the link margin ratio of the data center network to be optimized, and the target network has the same structure as the neural networks of the Actor network and the Critic network of the original network.

To further refine the neural network portion of the DRL algorithm, the present invention combines gated-round-element networks (GRUs) with forward-propagating neural networks (FFs). The GRU is derived from a Recurrent Neural Network (RNN) for extracting timing information from input data. In our application scenario, the GRU can achieve similar performance with less consumption of computational resources compared to the hot LSTM, so the GRU is selected. State list s ═ s¹,s²,…,s^L]The input data is entered into the input layer of the GRU as the input data, and after the GRU processing, the output list h ═ h is obtained¹,h²,…h^L]And serves as an input to FF. The output of FF is the final output of the whole neural network.

In practical deployment, the invention focuses on the problem of the flow statistics collection frequency. On one hand, the higher flow statistics and acquisition frequency can enable the network state information input by the DRL algorithm to be more accurate, and enables the flow distribution to be more timely. On the other hand, a higher traffic statistics collection frequency occupies a larger bandwidth of the control channel. According to the data center network energy consumption and service quality optimization method based on reinforcement learning, provided by the invention, a Software Defined Network (SDN) is adopted to realize a SmartFCT application system, and an experiment is carried out by using the SmartFCT, in the experiment, the SmartFCT generates comprehensive consideration by combining flow statistics collection operation and flow, a frequency index sigma is set, and the optimal sigma value is determined to be 100ms through simulation and evaluation. At this value, we can find the best balance point for pros and cons. The performance of the embodiment is evaluated through experimental simulation, the SmartFCT realized based on the method provided by the invention can ensure the FCT constraint of the flow, and the power is saved by more than 12.2% compared with the existing scheme.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The data center network energy consumption and service quality optimization method based on reinforcement learning is characterized by comprising the following steps:

2. The method of claim 1, wherein the reward r is_tCalculated using the following formula: r is_t＝D[U_t]-λC_tWherein, D [ U ]_t]Number of closed and dormant links at time t for the data centre network to be optimized, C_tAnd lambda is the punishment weight of the flow violating the data flow completion time in the data center network to be optimized at the time t.

3. The method according to claim 1, wherein the statistical calculation of the historical traffic and network performance of each link in the data center network to be optimized in step 2 and the adjustment of the path of the flow in the data center network to be optimized according to the link margin ratio in step 4 are both implemented by a Software Defined Network (SDN) controller.

4. The method of claim 3, wherein the adjusting the path of the flow in the data center network to be optimized according to the link margin ratio is implemented by using a best-match decreasing binning algorithm.

5. The method of claim 1, wherein the deep reinforcement learning framework is an AC algorithm framework.

6. The method of claim 5, wherein the reinforcement learning agents in step 2 and step 3 are target networks of the strong learning agents in step 4, and the target networks have the same network structures and network parameters as the strong learning agents in step 4.