CN111637614A

CN111637614A - Intelligent control method for active ventilation floor in data center

Info

Publication number: CN111637614A
Application number: CN202010455152.6A
Authority: CN
Inventors: 万剑雄; 周杰; 熊伟
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-09-08
Anticipated expiration: 2040-05-26
Also published as: CN111637614B

Abstract

The intelligent control method of the active ventilation floor of the data center, establishes a Markov decision process model for the hotspot problem of the data center rack, and provides three model solving algorithms, including the basic intelligent algorithm, the sample value variant intelligence algorithm and the structural variant intelligence algorithm , respectively as the core of the active ventilation floor control algorithm. The model consists of four parts: system state, behavior, reward and value function. The solution of the model is that the optimal behavior is continuously selected under a series of system states to maximize the cumulative reward of the system. The active ventilation floor control algorithm, through continuous exploration and Learn the complex relationship between the temperature distribution of the rack air inlet and the fan speed of the active ventilation floor. Finally, according to the temperature distribution of the rack air inlet, the optimal PWM signal duty cycle value can be generated, and the speed of the active ventilation floor fan can be adjusted so that the temperature of the rack air inlet can be adjusted. Uniform distribution to alleviate rack hotspot issues. Compared with other solutions, the present invention is more universal, easier to deploy, and more cost-effective.

Description

Intelligent control method for active ventilation floor in data center

技术领域technical field

本发明属于自动控制技术领域，特别涉及数据中心主动通风地板的智能控制方法。The invention belongs to the technical field of automatic control, and particularly relates to an intelligent control method for an active ventilation floor of a data center.

背景技术Background technique

机架热点，即数据中心机房机架某一个或几个位置，温度明显高于其他位置温度的高温点。过高的温度会导致数据中心某些服务器工作效率降低，进而降低其整体功率密度，同时也会降低其可靠性，这显然与数据中心的需求相悖。Rack hotspots are high-temperature spots where the temperature of one or several locations in the data center rack is significantly higher than that of other locations. Excessive temperatures can cause some servers in the data center to work less efficiently, thereby reducing their overall power density and reducing their reliability, which is obviously contrary to the needs of the data center.

采用全局调控的方式进行缓解或消除机架热点，例如提升机房空调功率以提供足量冷气，必然会导致大部分机架区域处于过度制冷状态，在造成制冷资源浪费的同时，使得数据中心总能耗中占比近半的制冷能耗更加巨大。因此，机架级制冷方案更适合于缓解机架热点问题。Using global regulation to alleviate or eliminate rack hot spots, such as increasing the power of the air conditioner in the equipment room to provide sufficient cooling air, will inevitably lead to excessive cooling in most of the rack areas. While causing waste of cooling resources, the data center can always be The cooling energy consumption, which accounts for nearly half of the consumption, is even more huge. Therefore, rack-level cooling solutions are more suitable for alleviating rack hotspot issues.

目前已有机架级制冷方案，例如安装自适应通风地板、安装挡板、封闭单个机架并为其设置通风管等。但这些方案皆为“被动式”制冷方案，不能主动为机架提供冷气流，当冷气供应不足时，这些方案都无能为力。Rack-level cooling solutions exist, such as installing adaptive ventilation floors, installing baffles, and enclosing and ducting individual racks. However, these solutions are all "passive" cooling solutions, which cannot actively provide cold airflow to the racks. When the cooling air supply is insufficient, these solutions are powerless.

主动通风地板作为另一种机架级制冷方案，通过主动输送冷气的方式缓解机架热点问题，相较于上述方案更容易部署，更具成本效益，但其控制的难点主要在于其放置环境的多样性与动态性，例如机房空调、机架相对位置以及机架内部服务器分布不同；冷、热通道封闭状态不同，服务器机架标准和密封情况不同；机房空调功率、不同机架服务器的热负载不同，等等。因此，数据中心的热能效与气流模型，一般难以用解析模型进行描述。As another rack-level cooling solution, active ventilation floors can alleviate the problem of rack hot spots by actively transporting cold air. Compared with the above solutions, it is easier to deploy and more cost-effective, but the difficulty of its control mainly lies in the placement environment. Diversity and dynamism, such as different computer room air conditioners, relative positions of racks, and server distribution inside the rack; different closed states of cold and hot aisles, different server rack standards and sealing conditions; computer room air conditioner power, heat load of servers in different racks different, wait. Therefore, the thermal energy efficiency and airflow models of data centers are generally difficult to describe with analytical models.

现有的主动通风地板相关研究大多是基于测量或仿真的性能建模和评估，目前还没有主动通风地板控制问题的研究文献。Most of the existing active ventilation floor related research is based on measurement or simulation performance modeling and evaluation, and there is no research literature on the control problem of active ventilation floor.

发明内容SUMMARY OF THE INVENTION

为了克服上述现有技术的缺点，本发明的目的在于提供一种数据中心主动通风地板的智能控制方法，在不提升机房空调功率的前提下，自动学习最优运行策略，规划机架气流，使机架入风口温度分布均匀化，缓解机架热点问题。且不必建立和校准复杂气流和热交换模型，从而提高主动通风地板的普适性。In order to overcome the above-mentioned shortcomings of the prior art, the purpose of the present invention is to provide an intelligent control method for an active ventilation floor of a data center, which can automatically learn the optimal operation strategy, plan the air flow of the rack, and make The temperature distribution of the air inlet of the rack is uniform, which alleviates the problem of rack hot spots. And there is no need to build and calibrate complex airflow and heat exchange models, thereby improving the universality of active ventilation floors.

为了实现上述目的，本发明采用的技术方案是：In order to achieve the above object, the technical scheme adopted in the present invention is:

数据中心主动通风地板的智能控制方法，对数据中心机架热点问题建立马尔可夫决策过程模型，并提供三种模型求解算法，包括基础智能算法、样本值变体智能算法和结构变体智能算法，分别作为主动通风地板控制算法的核心。所述模型由系统状态、行为、奖励和价值函数四部分组成，所述模型的解为，在一系列系统状态下不断选择最优行为，使得系统累计奖励最大化，所述主动通风地板控制算法，通过不断探索和学习机架入风口温度分布与主动通风地板风扇转速间的复杂关系，最终可以根据机架入风口温度分布，产生最优PWM信号占空比值，调节主动通风地板风扇转速，使得机架入风口温度分布均匀化，缓解机架热点问题。The intelligent control method of the active ventilation floor of the data center, establishes a Markov decision process model for the hotspot problem of the data center rack, and provides three model solving algorithms, including the basic intelligent algorithm, the sample value variant intelligence algorithm and the structural variant intelligence algorithm , respectively as the core of the active ventilation floor control algorithm. The model consists of four parts: system state, behavior, reward, and value function. The solution of the model is that the optimal behavior is continuously selected under a series of system states to maximize the cumulative reward of the system. The active ventilation floor controls The algorithm, by continuously exploring and learning the complex relationship between the temperature distribution of the rack air inlet and the fan speed of the active ventilation floor, can finally generate the optimal PWM signal duty cycle value according to the temperature distribution of the air inlet of the rack, and adjust the fan speed of the active ventilation floor. The temperature distribution of the air inlet of the rack is uniform, and the hot spot problem of the rack is alleviated.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

本发明不必建立和校准复杂的气流和热交换模型，使用智能控制算法，克服主动通风地板放置环境的多样性和动态性，自动匹配机架入风口温度分布与最优主动通风地板风扇转速，只需要将原普通通风地板置换为运行本发明的主动通风地板，本发明即可自主运行，改善机架入风口温度分布，缓解机架热点问题，相比其他方案，本发明普适性更高，更易部署，更具成本效益。The present invention does not need to establish and calibrate complex airflow and heat exchange models, uses intelligent control algorithms, overcomes the diversity and dynamics of the placement environment of the active ventilation floor, automatically matches the temperature distribution of the air inlet of the rack and the optimal active ventilation floor fan speed, only It is necessary to replace the original ordinary ventilation floor with the active ventilation floor running the present invention, and the present invention can operate autonomously, improve the temperature distribution of the air inlet of the rack, and alleviate the problem of hot spots of the rack. Compared with other solutions, the present invention has higher universality. Easier to deploy and more cost-effective.

附图说明Description of drawings

图1为主动通风地板设计及部署图。Figure 1 is an active ventilation floor design and deployment diagram.

具体实施方式Detailed ways

下面结合附图和实施例详细说明本发明的实施方式。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples.

图1为本发明的详细部署实施示意图，一定数量的温度传感器一1均匀分布在机架2入风口处，监测机架2入风口温度分布，同时在主动通风地板下另设一个温度传感器二，监测主动通风地板下送风温度。1 is a schematic diagram of the detailed deployment implementation of the present invention. A certain number of temperature sensors 1 are evenly distributed at the air inlets of rack 2 to monitor the temperature distribution of the air inlets of rack 2. At the same time, another temperature sensor 2 is installed under the active ventilation floor. Monitors actively ventilated underfloor supply air temperature.

本领域中，机架2是一个长方体铁盒子，里面放一定数量的服务器，许多机架一排一排摆放。在某一排机架中，一般某一机架左右面板与其他机架紧贴，机架前面板即为入风口，用来吸冷气制冷服务器，机架后面板为出风口，用来排出制冷后的热气，监测机架入风口温度分布即监测机架前面板某些位置的温度，这些位置的温度组成了机架入风口温度分布，因此温度传感器一1的个数取决于这些位置的数量。In the art, the rack 2 is a rectangular iron box in which a certain number of servers are placed, and many racks are placed in rows. In a row of racks, the left and right panels of a rack are generally close to other racks. The front panel of the rack is the air inlet, which is used to suck in cold air to cool the server, and the rear panel of the rack is the air outlet, which is used to discharge cooling. After the hot air, monitoring the temperature distribution of the air inlet of the rack is to monitor the temperature of certain positions on the front panel of the rack. The temperature of these positions constitutes the temperature distribution of the air inlet of the rack. Therefore, the number of temperature sensors 1 depends on the number of these positions. .

本发明主动通风地板智能控制方法运行于PC端，PC6与微控制器3连接，微控制器3连接驱动板4，驱动板4在连接开关电源5(12V，20A)后与主动通风地板风扇7连接。根据温度传感器一1传回的温度分布，产生PWM信号的占空比值，并传给微控制器3，微控制器3据此占空比值，产生相应PWM信号，传输给驱动板4，驱动板4根据PWM信号控制开关电源5提供给主动通风地板风扇7的电压，通过控制风扇供电电压，达到调节风扇转速的目的。The intelligent control method of the active ventilation floor of the present invention runs on the PC side, the PC6 is connected to the microcontroller 3, the microcontroller 3 is connected to the driving board 4, and the driving board 4 is connected to the active ventilation floor fan 7 after being connected to the switching power supply 5 (12V, 20A). connect. According to the temperature distribution returned by the temperature sensor 1, the duty cycle value of the PWM signal is generated and transmitted to the microcontroller 3. The microcontroller 3 generates the corresponding PWM signal according to the duty cycle value, and transmits it to the driver board 4. The driver board 4. Control the voltage provided by the switching power supply 5 to the active ventilation floor fan 7 according to the PWM signal, and achieve the purpose of adjusting the fan speed by controlling the fan power supply voltage.

控制方法包括以下部分：The control method includes the following parts:

1、对抬升地板结构(数据中心的送风结构，数据中心机房地板被架高，留出60-100cm高的地板下空间用于机房空调输送冷气，这种结构即为抬升地板结构，目前国内大部分数据中心均采用这种构造)数据中心机架热点问题建立马尔可夫决策过程模型，由以下ABCD四部分组成：1. For the raised floor structure (the air supply structure of the data center, the floor of the data center computer room is raised, leaving a 60-100cm high under-floor space for the air conditioner of the computer room to deliver cold air. This structure is the raised floor structure. At present, domestic Most data centers use this structure) The data center rack hotspot problem establishes a Markov decision process model, which consists of the following four parts: ABCD:

A系统状态，为带有历史的机架入风口温度分布集合，其公式为：A system state, which is a collection of temperature distributions of the air inlets of the rack with history, and its formula is:

φ_t＝{s_t-p,…,s_x…,s_t-1,s_t}，其中

φ _t = {s _tp ,…,s _x …,s _t-1 ,s _t }, where

其中φ_t为t时刻系统状态，s_t-p、s_x、s_t-1、s_t分别为t-p、x、t-1、t时刻机架入风口温度分布，x∈[t-p,t]，p为历史长度；T_i为编号为i的温度传感器一的读数，

为温度传感器一的集合，

为温度传感器一的总数。where φ _t is the system state at time t, s _tp , s _x , s _t-1 , and s _t are the temperature distribution of the air inlet of the rack at time tp, x, t-1, and t, respectively, x∈[tp,t], p is the history length; T _i is the reading of temperature sensor one numbered i,

is a set of temperature sensors,

is the total number of temperature sensors one.

B行为空间

定义为离散化的PWM信号占空比值，其公式为：B behavior space

Defined as the discretized PWM signal duty cycle value, its formula is:

其中a是

中某个行为，DC为PWM信号占空比，max(DC)为最大占空比，D_DRL为DC离散化等分比，k表示某个行为中D_DRL的个数；where a is

In a certain behavior, DC is the PWM signal duty cycle, max(DC) is the maximum duty cycle, D _DRL is the DC discretization equal division ratio, and k represents the number of D _DRLs in a certain behavior;

C奖励R_t+1由机架入风口温度分布均匀程度的量化指标及主动通风地板风扇能耗两部分构成，其公式为：The C reward R _t+1 is composed of two parts: the quantitative index of the uniformity of the temperature distribution of the air inlet of the rack and the energy consumption of the active ventilation floor fan. The formula is:

其中R_t+1为t时刻系统采取某行为后所得的奖励，

表示机架入风口温度分布均匀程度，该式值全为负，越接近0，表明机架入风口温度分布越均匀，其中T_t,i为t时刻编号为i的传感器一的温度读数，

为t时刻机架参考温度，

T_t,under为t时刻所述温度传感器二的读数，Δ_T为根据主动通风地板上下冷热气流混合程度设置的固定温度差，为正数；-(A_ref×DC_t)³表示主动通风地板风扇能耗，该式的值全为负，越接近0，表明风扇能耗越低，其中A_ref为保持与机架入风口温度分布均匀程度同一量级的参考行为值，DC_t为t时刻PWM信号方波占空比。where R _t+1 is the reward obtained by the system after taking a certain behavior at time t,

Indicates the uniformity of the temperature distribution at the air inlet of the rack. The value of this formula is all negative. The closer to 0, the more uniform the temperature distribution of the air inlet of the rack is. T _t,i is the temperature reading of the sensor numbered i at time t.

is the rack reference temperature at time t,

T _t,under is the reading of the temperature sensor 2 at time t, Δ _T is the fixed temperature difference set according to the mixing degree of hot and cold air above and below the active ventilation floor, which is a positive number; -(A _ref ×DC _t ) ³ represents active ventilation The energy consumption of the floor fan. The values of this formula are all negative. The closer to 0, the lower the energy consumption of the fan. A _ref is a reference behavior value that maintains the same level of uniformity as the temperature distribution of the air inlet of the rack. DC _t is t The duty cycle of the square wave of the PWM signal at the moment.

D价值函数Q(φ_t,a_t)为行为价值函数，其公式为：The D value function Q(φ _t , at _t ) is the behavioral value function, and its formula is:

其中价值函数Q(φ_t,a_t)称为Q函数，

为t时刻系统采取的行为，

为期望函数，y为相对于t时刻的未来时刻，R_t+y+1表示系统在t+y时刻采取行为后获得的奖励，γ表示衰减因子，表示在某状态下采取某行为对系统未来奖励即环境影响的重视程度，0≤γ＜1，γ^y为γ的y次方，是t+y时刻R_t+y+1的衰减因子。where the value function Q(φ _t , at _t ) is called the Q function,

is the action taken by the system at time t,

is the expectation function, y is the future time relative to time t, R _t+y+1 represents the reward obtained by the system after taking action at time t+y, γ represents the decay factor, which means that taking a certain behavior in a certain state will affect the future of the system Reward is the importance of environmental impact, 0≤γ<1, γy is the ^y power of γ, which is the decay factor of R _t+y+1 at time t+y.

E模型可以被总结为，在任意t时刻系统状态下，通过选择最优行为，使得累计奖励最大化，其模型公式为：The E model can be summarized as, in the system state at any time t, by selecting the optimal behavior to maximize the cumulative reward, the model formula is:

约束于bound to

其中，γ^t是t时刻系R_t+1的衰减因子。Among them, γ ^t is the decay factor of R _t+1 at time t.

2、模型的解及求解算法2. Model solution and solution algorithm

a所述模型的解，在计算得到最优Q函数，即可根据最优Q函数在任意t时刻系统状态下选择最优行为，使累计奖励最大化，最优Q函数计算公式为：For the solution of the model described in a, after the optimal Q function is obtained by calculation, the optimal behavior can be selected under the system state at any time t according to the optimal Q function, so as to maximize the cumulative reward. The calculation formula of the optimal Q function is:

在任意t时刻，最优行为选择公式为：At any time t, the optimal behavior selection formula is:

其中Q^*(φ_t,a_t)表示最优Q函数，φ_t+1表示t+1时刻的系统状态，a表示在t+1时刻系统可能采取的所有行为中的任一行为，亦即行为空间

中的某一行为。where Q ^* (φ _t , a _t ) represents the optimal Q function, φ _t+1 represents the system state at time t+1, and a represents any one of all actions that the system may take at time t+1, that is, behavior space

one of the behaviors.

b求解算法即为，计算得到最优Q函数并在决策中选择选择最优行为，使得累计奖励最大化。所述求解算法包括基础智能算法，样本值变体智能算法和结构变体智能算法，这三种算法均通过不断决策积累(φ_t,a_t,R_t+1,φ_t+1)样本记录训练神经网络，使得神经网络能够近似Q函数，进而选择最优行为，使得所述模型的累计奖励最大化，其中φ_t为t时刻系统状态，a_t为系统在t时刻采取的行为，R_t+1为系统采取a_t后得到的奖励，φ_t+1为t+1时刻系统状态。所述三种算法的设计如下：The b solution algorithm is to calculate the optimal Q function and select the optimal behavior in the decision-making, so as to maximize the cumulative reward. The solution algorithms include basic intelligence algorithms, sample value variant intelligence algorithms and structural variant intelligence algorithms, all of which are accumulated through continuous decision-making (φ _t , at , R _t ₊₁ , φ _t+1 ) sample records Train the neural network so that the neural network can approximate the Q function, and then select the optimal behavior to maximize the cumulative reward of the model, where φ _t is the state of the system at time t, a _t is the behavior taken by the system at time t, and R _{t +1} is the reward obtained by the system after taking a _t , and φ _t+1 is the state of the system at time t+1. The three algorithms are designed as follows:

a)所述基础智能算法，使用两个结构相同的神经网络近似Q函数，一个用于近似Q样本函数，计算Q样本值，称为targ网络；另一个用于近似Q预测函数，计算Q预测值，称为eval网络；利用所述样本记录计算Q样本值与Q预测值之差，训练更新神经网络，所述Q样本值计算公式为：a) The basic intelligent algorithm uses two neural networks with the same structure to approximate the Q function, one is used to approximate the Q sample function and calculate the Q sample value, which is called targ network; the other is used to approximate the Q prediction function and calculate the Q prediction value, called eval network; use the sample record to calculate the difference between the Q sample value and the Q predicted value, train and update the neural network, and the Q sample value calculation formula is:

其中Q_t+1,target为Q样本值，R_t+1和φ_t+1取自所述样本记录，0≤γ＜1为衰减因子，Q(φ_t+1,a；θ_t,target)为targ网络输出的Q样本集合，a表示在t+1时刻系统可能采取的所有行为，

为行为空间，θ_t,target为t时刻targ网络参数集合。where Q _t+1,target is the Q sample value, R _t+1 and φ _t+1 are taken from the sample records, 0≤γ<1 is the attenuation factor, Q(φ _t+1 ,a; θ _t,target ) is the set of Q samples output by the targ network, a represents all actions that the system may take at time t+1,

is the behavior space, θ _{t, target} is the set of targ network parameters at time t.

所述神经网络更新方式如下：The neural network update method is as follows:

其中δ_t+1为Q样本值与对应Q预测值之差，Q(φ_t,a_t；θ_t,eval)为eval网络输出的Q预测集合中，a_t对应的Q预测值，φ_t和a_t取自所述样本记录，θ_t,eval为eval网络t时刻参数集合，θ_t+1,eval为eval网络t+1时刻参数集合，

为

关于θ_t,eval的梯度，α为神经网络学习步长，θ_target是时刻t为N的整数倍(包括0)时，targ网络参数集合，θ_eval是时刻t为N的整数倍(包括0)时，eval网络参数集合。Where δ _t+1 is the difference between the Q sample value and the corresponding Q predicted value, Q(φ _t , at ; θ _t _{, eval} ) is the Q predicted value corresponding to a _t in the Q prediction set output by the eval network, φ _t and a _t are taken from the sample records, θ _{t, eval} is the parameter set at time t of the eval network, θ _{t+1, eval} is the parameter set at time t+1 of the eval network,

for

Regarding the gradient of θ _{t and eval} , α is the learning step size of the neural network, θ _target is the set of targ network parameters when time t is an integer multiple of N (including 0), and θ _eval is an integer multiple of N (including 0) at time t ), the set of eval network parameters.

b)所述样本值变体智能算法，在计算Q样本值时使用公式：b) The sample value variant intelligent algorithm uses the formula when calculating the Q sample value:

其中Q_t+1,target为Q样本值，R_t+1和φ_t+1取自所述样本记录，Q(φ_t+1,a；θ_t,target)为targ网络输出的Q样本集合，

为targ网络输出的Q样本集合中，使Q_eval(φ_t+1,a；θ_t,eval)最大的行为对应的Q样本值，Q_eval(φ_t+1,a；θ_t,eval)为eval网络输出的Q预测集合，a表示在t+1时刻系统可能采取的所有行为中的任一行为，亦即行为空间

中的某一行为，θ_t,eval为t时刻eval网络参数集合，θ_t,target为t时刻targ网络参数集合；Wherein Q _{t+1, target} is the Q sample value, R _t+1 and φ _t+1 are taken from the sample records, Q (φ _t+1 , a; θ _{t, target} ) is the set of Q samples output by the targ network ,

It is the Q sample value corresponding to the behavior that maximizes Q _eval (φ _t+1 , a; θ _{t, eval} ) in the set of Q samples output by the targ network, Q _eval (φ _t+1 , a; θ _{t, eval} ) is the Q prediction set output by the eval network, a represents any of all the actions that the system may take at time t+1, that is, the action space

For a certain behavior in , θ _{t, eval} is the set of eval network parameters at time t, and θ _{t, target} is the set of targ network parameters at time t;

所述样本值变体智能算法的神经网络结构及其更新方式，与所述基础智能算法相同。The neural network structure of the sample value variant intelligent algorithm and its update method are the same as those of the basic intelligent algorithm.

c)所述结构变体智能算法，使用两个结构相同的神经网络，在每个神经网络的倒数第二层设置DN层，DN层分V段和A段，其中V段神经元结点数为1，表示t时刻系统状态，A段神经元个数为行为空间中的元素个数，表示在该系统状态下可能采取的所有行为，DN层计算公式为：c) The structural variant intelligent algorithm uses two neural networks with the same structure, and sets a DN layer on the penultimate layer of each neural network. The DN layer is divided into V segment and A segment, and the number of neurons in the V segment is 1. Represents the state of the system at time t. The number of neurons in segment A is the number of elements in the behavior space, representing all possible behaviors in this system state. The calculation formula of the DN layer is:

其中，Q(φ_t,a_t；θ_t,θ_t,V,θ_t,A)为神经网络最终输出，φ_t和a_t取自所述样本记录，θ_t为t时刻，结构变体智能算法神经网络DN层前的网络参数集合，θ_t,V为t时刻DN层V段参数，θ_t,A为t时刻DN层A段参数，V(φ_t；θ_t,θ_t,V)为V段输出，A(φ_t,a_t；θ_t,θ_t,A)为A段中a_t对应的输出值，A(φ_t,a'；θ_t,θ_t,A)为A段全部输出，a'表示在状态φ_t下，系统可能采取的所有行为，

为行为空间中元素个数。Among them, Q(φ _t , at ; θ _t , θ _t _{, V} , θ _{t, A} ) is the final output of the neural network, φ _t and at _t are taken from the sample records, θ _t is time t, the structural variant The set of network parameters before the _DN _layer _of the intelligent algorithm _neural network _. ) is the V segment output, A(φ _t , at _t ; θ _t , θ _{t, A} ) is the output value corresponding to a _t in the A segment, A (φ _t , a'; θ _t , θ _{t, A} ) is All outputs of segment A, a' represents all possible actions of the system in the state φ _t ,

is the number of elements in the behavior space.

之后，采取与所述样本值变体智能算法相同的Q样本值计算及神经网络更新方式训练更新神经网络。After that, adopt the same Q sample value calculation and neural network update method as the sample value variant intelligent algorithm to train and update the neural network.

3、通过不断探索和学习机架入风口温度分布与主动通风地板风扇转速间的复杂关系，最终根据机架入风口温度分布，产生最优PWM信号占空比值，调节主动通风地板风扇转速，使得机架入风口温度分布均匀化，缓解机架热点问题。其在PC端的运行逻辑如下：3. By continuously exploring and learning the complex relationship between the temperature distribution of the rack air inlet and the fan speed of the active ventilation floor, and finally according to the temperature distribution of the air inlet of the rack, the optimal PWM signal duty cycle value is generated, and the speed of the active ventilation floor fan is adjusted so that the The temperature distribution of the air inlet of the rack is uniform, which alleviates the problem of rack hot spots. Its operation logic on the PC side is as follows:

1：在不同控制算法中，构建和初始化不同神经网络，并令targ网络参数与eval网络参数相同；设置所述样本记录缓存数组；设置参考温度T_t；1: In different control algorithms, construct and initialize different neural networks, and make the targ network parameters the same as the eval network parameters; set the sample record buffer array; set the reference temperature T _t ;

2：设置初始时刻t＝0，缓存数组中样本记录的时刻记为τ；初始行为探索概率ε，探索率随t减少量Δ_ε，最小探索概率ε_min；2: Set the initial time t=0, the time of the sample record in the cache array is recorded as τ; the initial behavior exploration probability ε, the exploration rate decreases with t Δ _ε , the minimum exploration probability ε _min ;

3：在Z个时刻内随机选择行为，并将每个时刻产生的记录(φ_z∈[0,Z),a_z∈[0,Z),R_z+1∈[0,Z],φ_z+1∈[0,Z])存入缓存数组；3: Randomly select actions in Z moments, and record (φ _z∈[0,Z) ,a _z∈[0,Z) ,R _z+1∈[0,Z] ,φ z∈[0,Z) ,R z+1∈[0,Z] , _z+1∈[0,Z] ) is stored in the cache array;

4：获取初始机架入风口温度分布

4: Obtain the initial rack air inlet temperature distribution

5：循环体开始；5: The loop body starts;

6：获取p个历史机架入风口温度分布，共同组成一个系统状态φ_t＝{s_t-p,…,s_t-1,s_t}；6: Obtain p historical rack air inlet temperature distributions to form a system state φ _t ={s _tp ,...,s _t-1 ,s _t };

7：若t＝0，则选择行为a_t＝max(DC)并转9，否则转8；7: If _t =0, select the behavior at =max(DC) and turn to 9, otherwise turn to 8;

8：使用如下公式选择行为：8: Use the following formula to select the behavior:

9：执行a_t，PC发送占空比指令到微控制器，改变风扇转速，并获得系统下一时刻机架入风口温度分布s_t+1，根据权利要求4中奖励公式计算R_t+1；9: Execute a _t , the PC sends a duty cycle command to the microcontroller, changes the fan speed, and obtains the rack air inlet temperature distribution s _t+1 at the next moment of the system, and calculates R _t+1 according to the reward formula in claim 4 ;

10：根据最新的p条温度分布历史，组成下一状态φ_t+1＝{s_t+1-p,…,s_t,s_t+1}，并将(φ_t,a_t,R_t+1,φ_t+1)存入缓存数组；10: According to the latest p temperature distribution history, form the next state φ _t+1 ={s _t+1-p ,...,s _t ,s _t ₊₁ }, and combine (φ _t ,at ,R _t ₊₁ ,φ _t+1 ) is stored in the cache array;

11：从缓存数组中随机抽取Y条样本记录(φ_τ,a_τ,R_τ+1,φ_τ+1)；11: Randomly extract Y sample records from the cache array (φ _τ , a _τ , R _τ+1 , φ _τ+1 );

12：根据不同控制算法，利用Y条记录，计算Q样本值，公式如下：12: According to different control algorithms, use Y records to calculate the Q sample value, the formula is as follows:

13：使用学习步长α和如下损失函数更新eval网络：13: Update the eval network with the learning step size α and the following loss function:

14：探索概率ε取ε-Δ_ε和ε_min中的最小值；14: The exploration probability ε takes the minimum value of ε- _Δε and ε _min ;

15：如果t mod N＝0，则targ网络复制eval网络参数，否则转16；15: If t mod N=0, the targ network copies the eval network parameters, otherwise go to 16;

16：时刻t增加1；16: time t increases by 1;

17：循环体结束。17: The loop body ends.

综上，本发明对数据中心机架热点问题建立马尔可夫决策过程模型，并提供三种模型求解算法，包括基础智能算法、样本值变体智能算法和结构变体智能算法，分别作为主动通风地板控制算法的核心。模型由系统状态、行为、奖励和价值函数四部分组成，模型的解为，在一系列系统状态下不断选择最优行为，使得系统累计奖励最大化，主动通风地板控制算法，通过不断探索和学习机架入风口温度分布与主动通风地板风扇转速间的复杂关系，最终可以根据机架入风口温度分布，产生最优PWM信号占空比值，调节主动通风地板风扇转速，使得机架入风口温度分布均匀化，缓解机架热点问题。相比其他方案，本发明普适性更高，更易部署，更具成本效益。In summary, the present invention establishes a Markov decision process model for the hotspot problem of data center racks, and provides three model solving algorithms, including the basic intelligent algorithm, the sample value variant intelligence algorithm and the structural variant intelligence algorithm, which are respectively used as active ventilation. The core of the floor control algorithm. The model consists of four parts: system state, behavior, reward and value function. The solution of the model is that the optimal behavior is continuously selected under a series of system states to maximize the cumulative reward of the system. The active ventilation floor control algorithm, through continuous exploration and Learn the complex relationship between the temperature distribution of the rack air inlet and the fan speed of the active ventilation floor. Finally, according to the temperature distribution of the rack air inlet, the optimal PWM signal duty cycle value can be generated, and the speed of the active ventilation floor fan can be adjusted so that the temperature of the rack air inlet can be adjusted. Uniform distribution to alleviate rack hotspot issues. Compared with other solutions, the present invention is more universal, easier to deploy, and more cost-effective.

Claims

1. the intelligent control method of the active ventilation floor of the data center, is characterized in that, comprises the steps:

Step 1, set a certain number of temperature sensors at the air inlet of the rack for monitoring the temperature distribution of the air inlet of the rack;

Step 2, establish a Markov decision process model for the data center rack hotspot problem, the model consists of system state φ _t , behavior space

The reward R _t+1 and the value function Q (φ _t , a _t ) are composed of four parts;

Among them: the system state φ _t at time t is defined as the temperature distribution set of the air inlet of the rack with history, and its formula is:

φ _t = {s _tp ,…,s _x …,s _t-1 ,s _t }, where

where s _tp , s _x , s _t-1 , and s _t are the temperature distribution of the air inlet of the rack at tp, x, t-1, and t, respectively, x∈[tp,t], p is the history length; T _i is the serial number is the reading of temperature sensor one of i,

is a set of temperature sensors,

is the total number of temperature sensors one;

behavior space

Defined as the discretized PWM signal duty cycle value, its formula is:

where a is

The reward R _t+1 is composed of the quantitative index of the uniformity of the temperature distribution of the air inlet of the rack and the energy consumption of the active ventilation floor fan. The formula is:

where R _t+1 is the reward obtained by the system after taking a certain behavior at time t,

is the rack reference temperature at time t,

T _t,under is the reading of the temperature sensor 2 at time t, Δ _T is the fixed temperature difference set according to the mixing degree of hot and cold air above and below the active ventilation floor, which is a positive number; -(A _ref ×DC _t ) ³ represents active ventilation The energy consumption of the floor fan. The values of this formula are all negative. The closer to 0, the lower the energy consumption of the fan. A _ref is a reference behavior value that maintains the same level of uniformity as the temperature distribution of the air inlet of the rack. DC _t is t The duty cycle of the square wave of the PWM signal at the moment;

The value function Q(φ _t , at _t ) is the behavioral value function, and its formula is:

where the value function Q(φ _t , at _t ) is called the Q function,

is the action taken by the system at time t,

is the expectation function, y is the future time relative to time t, R _t+y+1 represents the reward obtained by the system after taking action at time t+y, γ represents the decay factor, which means that taking a certain behavior in a certain state will affect the future of the system Reward is the importance of environmental impact, 0≤γ<1, γy is the ^y power of γ, which is the decay factor of R _t+y+1 at time t+y;

The Markov decision process model is summarized as: in the system state at any time t, by selecting the optimal behavior to maximize the cumulative reward of the system, the formula is:

bound to

Among them, γ ^t is the decay factor of R _t+1 at time t;

Step 3: Solve the model, through continuous exploration and learning of the complex relationship between the rack air inlet temperature distribution and the fan speed of the active ventilation floor, and finally generate the optimal PWM signal duty cycle value according to the rack air inlet temperature distribution, and adjust The active ventilation floor fan speed makes the temperature distribution of the rack air inlets uniform and alleviates the problem of rack hot spots.

2. The intelligent control method for the active ventilation floor of the data center according to claim 1, wherein in the step 2, the optimal Q function is obtained by calculation, and the optimal Q function can be selected according to the system state at any time t. The optimal behavior maximizes the cumulative reward, and the optimal Q function calculation formula is:

At any time t, the optimal behavior selection formula is:

where Q ^* (φ _t , a _t ) represents the optimal Q function, φ _t+1 represents the system state at time t+1, and a represents any one of all actions that the system may take at time t+1, that is, behavior space

one of the behaviors.

3. The intelligent control method for the active ventilation floor of the data center according to claim 1, wherein in the step 3, the basic intelligent algorithm, the sample value variant intelligent algorithm and the structural variant intelligent algorithm are used to solve the model, and the model is solved by continuously The decision accumulation (φ _t , at , R _t ₊₁ , φ _t+1 ) sample records to train the neural network, so that the neural network can approximate the Q function, and then select the optimal behavior to maximize the cumulative reward of the model, where φ _t+1 represents the system state at time t+1.

4. The intelligent control method of the active ventilation floor of the data center according to claim 3, wherein the basic intelligent algorithm uses two neural networks with the same structure to approximate the Q function, one is used to approximate the Q sample function, and calculates the Q function. The sample value is called the targ network; the other is used to approximate the Q prediction function, and the Q prediction value is calculated, which is called the eval network; the difference between the Q sample value and the Q prediction value is calculated by using the sample record, and the neural network is trained and updated. The formula for calculating the Q sample value is:

In the sample value variant intelligent algorithm, the Q sample value calculation formula is:

Wherein Q _{t+1, target} is the Q sample value, R _t+1 and φ _t+1 are taken from the sample records, Q (φ _t+1 , a; θ _{t, target} ) is the set of Q samples output by the targ network ,

The neural network update method is as follows:

Where δ _t+1 is the difference between the Q sample value and the corresponding Q predicted value, Q(φ _t , at ; θ _t _{, eval} ) is the Q predicted value corresponding to a _t in the Q prediction set output by the eval network, φ _t and a _t are taken from the sample records, θ _{t+1, eval} is the set of eval network parameters at time t+1,

for

Regarding the gradient of θ _{t and eval} , α is the learning step size of the neural network, θ _target is the set of targ network parameters when time t is an integer multiple of N including 0, and θ _eval is the time t is N including 0 The set of eval network parameters when an integer multiple of .

5. The intelligent control method for the active ventilation floor of the data center according to claim 4, wherein the structural variant intelligent algorithm uses two neural networks with the same structure, and is set on the penultimate layer of each neural network DN layer, the DN layer is divided into V segment and A segment, where the number of neurons in segment V is 1, which represents the system state at time t, and the number of neurons in segment A is the number of elements in the behavior space, indicating that it is possible in this system state. All actions taken, the DN layer calculation formula is:

Among them, Q(φ _t , at ; θ _t , θ _t _{, V} , θ _{t, A} ) is the final output of the neural network, φ _t and at _t are taken from the sample records, θ _t is time t, the structural variant The set of network parameters before the _DN _layer _of the intelligent algorithm _neural network _. ) is the V segment output, A(φ _t , at _t ; θ _t , θ _{t, A} ) is the output value corresponding to a _t in the A segment, A (φ _t , a'; θ _t , θ _{t, A} ) is All outputs of segment A, a' represents all possible actions of the system in the state φ _t ,

is the number of elements in the behavior space;

After that, adopt the same Q sample value calculation and neural network update method as the sample value variant intelligent algorithm to train and update the neural network.

6. The intelligent control method of the active ventilation floor of the data center according to claim 1, wherein the operation logic of the intelligent control method is as follows:

1: In different control algorithms, construct and initialize different neural networks, and make the targ network parameters the same as the eval network parameters; set the sample record cache array; set the reference temperature

2: Set the initial time t=0, the time of the sample record in the cache array is recorded as τ; the initial behavior exploration probability ε, the exploration rate decreases with t Δ _ε , the minimum exploration probability ε _min ;

3: Randomly select actions in Z moments, and record (φ _z∈[0,Z) ,a _z∈[0,Z) ,R _z+1∈[0,Z] ,φ z∈[0,Z) ,R z+1∈[0,Z] , _z+1∈[0,Z] ) is stored in the cache array;

4: Obtain the initial rack air inlet temperature distribution

5: The loop body starts;

6: Obtain p historical rack air inlet temperature distributions to form a system state φ _t ={s _tp ,...,s _t-1 ,s _t };

7: If _t =0, select the behavior at =max(DC) and turn to 9, otherwise turn to 8;

8: Use the following formula to select the behavior:

9: Execute a _t , the PC sends a duty cycle command to the microcontroller, changes the fan speed, and obtains the rack air inlet temperature distribution s _t+1 at the next moment of the system, and calculates R _t+1 according to the reward formula in claim 4 ;

10: According to the latest p temperature distribution history, form the next state φ _t+1 ={s _t+1-p ,...,s _t ,s _t ₊₁ }, and combine (φ _t ,at ,R _t ₊₁ ,φ _t+1 ) is stored in the cache array;

11: Randomly extract Y sample records from the cache array (φ _τ , a _τ , R _τ+1 , φ _τ+1 );

12: According to different control algorithms, use Y records to calculate the Q sample value, the formula is as follows:

13: Update the eval network with the learning step size α and the following loss function:

14: The exploration probability ε takes the minimum value of ε- _Δε and ε _min ;

15: If t mod N=0, the targ network copies the eval network parameters, otherwise go to 16;

16: time t increases by 1;

17: The loop body ends.