CN115983438A

CN115983438A - Method and device for determining operation strategy of data center terminal air conditioning system

Info

Publication number: CN115983438A
Application number: CN202211571284.0A
Authority: CN
Inventors: 胡潇; 贾庆山; 周翰辰
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-04-18

Abstract

The invention discloses a method and a device for determining an operation strategy of a data center tail end air conditioning system, wherein the method comprises the following steps: building a temperature field distribution model of a data center machine room; constructing a Markov decision process model of an operation strategy of an air conditioning system at the tail end of a data center; in the temperature field distribution model, training is carried out by using a reinforcement learning algorithm based on Markov decision process models with different strategy functions and different parameters respectively, so as to generate operation strategies of the air conditioning system at the tail end of various data centers and construct a strategy library; according to a sequence optimization method, evaluating the performance of each operation strategy in a strategy library in a temperature field distribution model, and determining a selection set from the strategy library; and respectively applying each operation strategy in the selected set to the real operation environment of the data center machine room, and determining the optimal operation strategy in the selected set. The method and the device can accurately determine the optimal operation strategy of the air conditioning system at the tail end of the data center.

Description

Method and device for determining operation strategy of data center terminal air conditioning system

Technical Field

The invention relates to the technical field of energy-saving optimization of internet data centers, in particular to a method and a device for determining an operation strategy of a terminal air-conditioning system of a data center.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

The largest energy consumption in a data center is cooling infrastructure in addition to server IT load power consumption, about 1/3 to 1/2 of the total power consumption of the data center is used for a cooling system, and the increasing energy consumption of the data center requires an increase in energy utilization efficiency through better thermal management. Energy consumption of a refrigeration system of a data center comprises energy consumption of a cold machine side and energy consumption of a tail end air conditioner, mature technical means (such as methods of optimizing the energy consumption of the cold machine based on load prediction) are provided for optimizing the energy consumption of the cold machine side, but the optimization of the energy consumption of the tail end air conditioner relates to temperature field distribution inside a machine room of the data center, the simulation of the temperature field distribution inside the machine room relates to complex hydrodynamics and thermodynamics analysis, and the temperature field distribution generally changes continuously along with time, so that on the premise of ensuring the thermal safety of IT equipment of a server, the reduction of the operation power consumption of the tail end air conditioner system of the data center is a key challenge and a technical problem.

And a selection scheme of an air conditioner operation strategy at the tail end of a data center is lacked at present.

Disclosure of Invention

The embodiment of the invention provides a method for determining an operation strategy of a data center terminal air conditioning system, which is used for accurately determining an optimal operation strategy of the data center terminal air conditioning system and comprises the following steps:

building a temperature field distribution model of a data center machine room;

constructing a Markov decision process model of an operation strategy of an air conditioning system at the tail end of a data center;

in the temperature field distribution model, training is carried out by using a reinforcement learning algorithm based on Markov decision process models with different strategy functions and different parameters respectively, so as to generate operation strategies of the air conditioning system at the tail end of various data centers and construct a strategy library;

according to a sequence optimization method, evaluating the performance of each operation strategy in a strategy library in a temperature field distribution model, and determining a selection set from the strategy library;

and respectively applying each operation strategy in the selected set to a real operation environment of the data center machine room, and determining the optimal operation strategy in the selected set.

The embodiment of the invention also provides a device for determining the operation strategy of the air conditioning system at the tail end of the data center, which is used for accurately determining the optimal operation strategy of the air conditioning system at the tail end of the data center, and the device comprises:

the temperature field distribution model building module is used for building a temperature field distribution model of a data center machine room;

the Markov decision process model building module is used for building a Markov decision process model of an operation strategy of the air conditioning system at the tail end of the data center;

the strategy library construction module is used for training Markov decision process models based on different strategy functions and different parameters respectively by using a reinforcement learning algorithm in a temperature field distribution model, generating operation strategies of the tail end air-conditioning system of various data centers and constructing a strategy library;

the selection set determining module is used for evaluating the performance of each operation strategy in the strategy library in the temperature field distribution model according to the sequence optimization method and determining a selection set from the strategy library;

and the optimal operation strategy determining module is used for respectively applying each operation strategy in the selection set to the real operation environment of the data center machine room and determining the optimal operation strategy in the selection set.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can be operated on the processor, wherein the processor realizes the method for determining the operation strategy of the air conditioning system at the tail end of the data center when executing the computer program.

The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the method for determining the operation strategy of the data center terminal air conditioning system is realized.

The embodiment of the invention also provides a computer program product, which comprises a computer program, and when the computer program is executed by a processor, the method for determining the operation strategy of the air conditioning system at the tail end of the data center is realized.

In the embodiment of the invention, a temperature field distribution model of a data center machine room is established; constructing a Markov decision process model of an operation strategy of an air conditioning system at the tail end of a data center; in the temperature field distribution model, training is carried out by using a reinforcement learning algorithm based on Markov decision process models with different strategy functions and different parameters respectively, so as to generate operation strategies of the air conditioning system at the tail end of various data centers and construct a strategy library; according to a sequence optimization method, evaluating the performance of each operation strategy in a strategy library in a temperature field distribution model, and determining a selection set from the strategy library; and respectively applying each operation strategy in the selected set to the real operation environment of the data center machine room, and determining the optimal operation strategy in the selected set. In the process, a Markov decision process model based on different parameters and a strategy function form comprise a strategy library of a plurality of operation strategies, compared with the traditional two-stage method and the reinforcement learning method, only a single operation strategy is generated, the scheme comprehensively considers the strategy functions of a plurality of different forms and reasonably selects the operation strategies, so that the finally obtained operation strategy has better performance guarantee than the single operation strategy obtained by the traditional method, namely the operation strategy can be better guaranteed to ensure the thermal safety of the IT equipment of the server in the actual data center environment and reduce the energy consumption of the terminal air conditioner to the maximum extent. In the selection link of the operation strategies, different from the traditional method of calculating the real performance of all the operation strategies, sorting and then selecting the best strategy, the scheme adopts a sequence optimization method to obtain a selection set, greatly reduces the evaluation times of the operation strategies in a strategy library in a real operation environment, further ensures the thermal safety of data center server IT equipment, and saves manpower, material resources and financial resources.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:

fig. 1 is a flowchart of a method for determining an operation policy of an air conditioning system at a data center end according to an embodiment of the present invention;

FIG. 2 is a flow chart of building a temperature field distribution model of a data center machine room in the embodiment of the invention;

FIG. 3 is a flow chart of determining a pick set in an embodiment of the present invention;

fig. 4 is a schematic diagram of an operation policy determination device of an air conditioning system at the end of a data center according to an embodiment of the present invention;

FIG. 5 is a diagram of a computer device in an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

The inventors have discovered that conventional approaches to the strategic optimization of energy savings in data center refrigeration systems, particularly end air conditioning systems, are mostly based on a two-stage (two-stage) framework. In the first stage, an approximate system model is established through a mechanism analysis method or a data driving method, the model generally comprises fluid dynamics, heat transfer and mechanical principles, and the distribution condition of the temperature field in the data center machine room needs to be considered. In the second stage, the approximate system model is utilized, and the optimal decision sequence of the controllable variables of the air conditioner is obtained through a strategy optimization algorithm, wherein the common strategy optimization algorithm mainly comprises dynamic planning, a model prediction control algorithm and the like. However, these traditional optimization methods based on two-stage frames need to establish an approximate model of a temperature field of a data center room, the distribution of the temperature field involves professional knowledge such as fluid dynamics and heat transfer, if a method of mechanism analysis is used for modeling, a complex partial differential equation system needs to be established, for a large data center with an increasingly enlarged scale in recent years, the process of establishing the mechanism model of the temperature field of the data center room is complex, difficult and prone to make mistakes, so these traditional optimization algorithms based on models are difficult to solve the strategy optimization problem of the air conditioning system at the end of the data center.

The reinforcement learning method continuously learns to obtain an optimal operation strategy in the process of interacting with the environment, and does not require the dynamic characteristics of the system to be known (particularly, a Model-Free reinforcement learning method). Just because the mechanism model of the temperature field distribution of the data center machine room is very complex, the method for solving the strategy optimization problem of the data center terminal air conditioning system by adopting the reinforcement learning method is probably an effective method, and at present, part of documents adopt the method to solve the strategy optimization problem of the data center refrigerating system. In general, to prevent the loss caused by over-temperature of the server IT equipment, the reinforcement learning algorithm cannot be directly trained in the real data center environment, so that IT is still necessary to first establish a simulation model of the data center end air-conditioning and machine room temperature field by using Computational Fluid Dynamics (CFD) simulation software. Although the reinforcement learning method can effectively avoid mechanism modeling and analysis of a machine room temperature field, the existing mainstream reinforcement learning methods have the defects of low sample utilization rate, unstable strategy training process, large parameter influence on training strategy performance and the like, so that the operation strategy performance of the finally trained data center terminal air conditioning system cannot be guaranteed, and the simulation environment and the real environment are inevitably different due to the fact that the training is carried out in the simulation environment, so that the performance of the trained strategy in the real environment cannot be guaranteed.

Based on the method, the embodiment of the invention provides a method for selecting the operation strategy of the data center air conditioning system based on strategy library and sequence optimization.

Fig. 1 is a flowchart of a method for determining an operation policy of an air conditioning system at a data center end in an embodiment of the present invention, where the method includes:

step 101, building a temperature field distribution model of a data center machine room;

102, constructing a Markov decision process model of an operation strategy of an air conditioning system at the tail end of a data center;

103, training the Markov decision process models respectively based on different strategy functions and different parameters by using a reinforcement learning algorithm in the temperature field distribution model to generate operation strategies of the terminal air-conditioning system of various data centers and construct a strategy library;

step 104, evaluating the performance of each operation strategy in a strategy library in the temperature field distribution model according to a sequence optimization method, and determining a selection set from the strategy library;

and 105, respectively applying each operation strategy in the selected set to the real operation environment of the data center machine room, and determining the optimal operation strategy in the selected set.

In step 101, a temperature field distribution model of a data center machine room is built; fig. 2 is a flowchart of building a temperature field distribution model of a data center machine room in the embodiment of the present invention, including:

step 201, modeling and simulating the space structure of a data center machine room and the models of air conditioners and IT equipment by CFD software according to a computer room layout CAD drawing and by CFD simulation software, and establishing a temperature field distribution model of the data center machine room;

the temperature field of the data center machine room is influenced by boundary conditions such as server IT load, tail end air conditioner fan rotating speed and the like, and changes along with time and space distribution. A simple temperature distribution model established by using a traditional fluid dynamics and heat transfer mechanism analysis method is difficult to accurately depict the temperature change of each measuring point in a machine room along with the time and space distribution, and is difficult to capture local hot spots beside IT equipment of a certain server in time, so that the IT equipment of the machine room has potential overheating hazards. Therefore, the embodiment of the invention adopts CFD simulation software for a data center to simulate the temperature field distribution of a machine room of the data center, and utilizes original libraries (air conditioner original, IT equipment original and the like) rich in the CFD simulation software to carry out detailed modeling and simulation on the spatial structure of the machine room (including server IT equipment spatial arrangement, cold and hot channel spatial arrangement, air conditioner spatial arrangement and the like, temperature sensor spatial arrangement, air conditioning system structure and the like) and the models of the air conditioner and the IT equipment according to the CAD drawing of the machine room arrangement, thereby establishing a machine room temperature field distribution model and accurately describing the temperature change of each measuring point in the machine room along with the time and spatial distribution.

Although the computer room temperature field distribution model is modeled according to the computer room arrangement CAD drawing, a certain difference still exists between the computer room temperature field distribution model and the actual temperature field in the actual computer room, so that the temperature field distribution model needs to be more carefully set.

Step 202, collecting real operating environment data in a machine room;

the real operation environment data comprises historical data of various temperature measuring points, return air temperature set points, air conditioner fan rotating speed, environment working conditions and the like in an actual machine room.

And 203, comparing the acquired real operating environment data with the operating environment data simulated by the temperature field distribution model, and continuously setting the temperature field distribution model to ensure that the matching degree of the operating environment data of the set temperature field distribution model and the real operating environment data reaches a preset threshold value.

And continuously setting the temperature field distribution model, namely finely adjusting the space positions of the air conditioner and the server IT equipment, operating set parameters and the like, wherein the preset threshold is determined by a user according to actual needs, so that a high matching degree is achieved.

In step 102, a Markov decision process model of an operation strategy of the air conditioning system at the tail end of the data center is constructed; in one embodiment, the Markov decision process model consists of a state space S, an action space A, a state transition function P, a reward function R, and a discount factor γ; can be represented as a quintuple

Selecting the state of the state space S from the observation variables;

selecting the action in the action space A from the control variables;

the reward function R is obtained according to the energy consumption punishment of the air conditioner and the overtemperature punishment of the server IT equipment;

the state transfer function P is obtained according to the temperature field distribution model;

at each instant t, the state S observed according to the environment at instant t _t Performing learning and selecting action A _t Environment vs. action A _t Respond correspondingly and present a new state S _t+1 With the generation of a prize R _t+1 The reward is targeted for long term maximization in the action selection process.

In the above embodiment, the state S _t+1 And R _t+1 Dependent only on P and A _t Independent of earlier state and actions, which are the basic characteristics of states and rewards in a markov decision process model (markov).

In the real operating environment of a typical data center terminal air conditioning system, the observed variables are generally: temperature measuring point measurement values positioned in a cold/hot channel and an air conditioner air inlet/air outlet, server IT load rate in each cabinet, outdoor temperature and illumination intensity; the control variables are typically: air conditioner air supply/return temperature set points, air conditioner fan rotating speed and the like; at each moment t, the control variable generally affects the observation variables such as the temperature measurement point measurement values of the cold/hot channel and the air conditioner air inlet/air outlet at the next moment (moment t + 1), and the observable variables such as the server IT load rate, the outdoor temperature, the illumination intensity and the like in each cabinet are not affected by the control variable, and generally, the load prediction can be performed only by adopting methods such as time series and the like based on historical data. The system state may generally be selected with reference to the observed variables

Selecting system action based on the control variable>

Design reward function based on energy consumption penalty of air conditioner and overtemperature penalty of server IT equipment>

The temperature field distribution model provides a state transfer function

Selecting a proper discount factor gamma epsilon (0,1) and constructing a Markov decision process model->

On the basis of the Markov decision process model, a reinforcement learning algorithm is applied to train in a simulation environment to obtain an optimal terminal air conditioning system operation strategy.

The operation strategy of the terminal air conditioning system of the data center is optimized, so that the energy consumption of the terminal air conditioner is reduced to the maximum extent on the premise of ensuring the thermal safety of IT equipment in a machine room. Because the temperature field (controlled variable) in the machine room changes along with time and is influenced by parameters (controlled variable) such as the return air temperature set point of the terminal air conditioner, the rotating speed of the fan and the like at each moment and the IT load rate of the server, the value of the controlled variable at each moment needs to be reasonably adjusted by combining the measured value and the IT load rate of each temperature measuring point in the temperature field at each moment, and a sequential decision problem is essentially formed. For such sequential decision problems, generally, a state-action space should be defined, a markov decision process model is established, and then a strategy function is trained by using methods such as reinforcement learning and the like.

Because the temperature measuring points of each cold and hot channel in the environment of the air conditioning system at the tail end of the data center are continuous in value, the values of the return air temperature set point, the fan rotating speed and the like are also continuous, and the temperature field changes along with time and space, the sequential decision problem is a very difficult problem. A certain operation strategy can be trained and obtained through the existing function approximation type reinforcement learning method, but the optimality of the strategy is difficult to guarantee. Therefore, various data center terminal air conditioning system operation strategies can be generated in a simulation environment by using a reinforcement learning algorithm, a strategy library pi is constructed, and an optimal strategy is reasonably selected from the strategy library.

In the optimization problem of the operation strategy of the data center terminal air conditioning system, a plurality of Markov decision process models with different parameters can be constructed, for example, the Markov decision process models with different parameters can be corresponding to different selection states S, different actions A, different designed reward functions R and different selection discount factors gammaAnd (4) molding. For Markov decision process models with different parameters, optimal strategies obtained by applying reinforcement learning training are different, but in the problem of the embodiment of the invention, the performance of the operation strategy can be objectively evaluated only by actually applying the operation strategy obtained by training in the temperature field distribution model, and then the optimal operation strategy with the best performance can be obtained by judging which Markov decision process model is adopted. Therefore, the Markov decision process model set of the end air conditioning system can be constructed by selecting different states S, different actions A, different designed reward functions R and different discount factors gamma

And respectively applying a reinforcement learning algorithm to each Markov decision process model in the set to train to obtain an optimal strategy.

In step 103, training the temperature field distribution model by using a reinforcement learning algorithm based on different strategy functions and Markov decision process models with different parameters respectively to generate operation strategies of the terminal air-conditioning system of various data centers and construct a strategy library;

in one embodiment, step 103 comprises:

determining various adopted strategy functions;

an action cost function for determining an operating policy, the action cost function representing a cumulative rebate reward weighting the reward R for taking action a in state s when the operating policy is used, may be represented as follows:

for each strategy function, in a temperature field distribution model, under the framework of a reinforcement learning algorithm, the action value function and the strategy function are converged to an optimized operation strategy in the process of continuously and alternately updating, and the optimized operation strategy is added into a strategy library as an operation strategy.

In one embodiment, the policy function includes a neural network type policy function, a basis function linear weighting type policy function.

Because neural networks have good characterization capabilities and generalization properties, neural networks are often used to fit operational strategies:

π(s)＝f ^NN (s，θ)

where π(s) is the operating strategy, f ^NN (s, θ) represents a neural network type strategy function, with the neural network input being state s and the output being action a, and θ represents the neural network and training parameters. A plurality of Markov decision process models with different parameters can use algorithms such as DDPG, TD3, SAC and the like to train out a neural network type optimal data center terminal air conditioning system operation strategy function, and the strategy is placed in a strategy library pi.

In addition to a neural network type of policy function, a basis function linear weighting type of policy function may be used:

π(s)＝f ^basis (s，w)＝w ^T φ(s)

where π(s) is the operating strategy, f ^basis (s, w) is a policy function of a basis function linear weighting type,

is a weight vector>

The selection of the basic function can be designed according to the prior knowledge of the safe operation of the data center and the operation characteristics of the tail end air conditioning system. Similarly, a Markov decision process model based on different parameters can be trained in the framework of an AC reinforcement learning algorithm, weight vectors are continuously updated, a base function linear weighting type optimal data center terminal air conditioning system operation strategy function is finally trained, and the strategy is placed in a strategy library pi.

The characterization capability of the linear weighting of the basis function is possibly weaker than that of the neural network, but the linear weighting weight vector is only required to be updated, so that the interpretability is stronger, the training process is more stable, and the convergence is better.

In the step of104, evaluating the performance of each operation strategy in a strategy library in the temperature field distribution model according to a sequence optimization method, and determining a selection set from the strategy library; if the acquired data center terminal air conditioner operation strategy library pi includes N operation strategies, the N operation strategies are generated in the temperature field distribution model, but because the temperature field distribution model and the real operation environment have unavoidable deviation, the performance of the operation strategies in the strategy library pi needs to be evaluated in the real operation environment to select the optimal strategy pi in the strategy library pi ^* ：

In one embodiment, the formula for evaluating the performance of each operating strategy in the strategy library in the temperature field distribution model is as follows:

wherein, pi ^* For an optimal operation strategy, J (pi) is a performance evaluation function of the operation strategy in the operation environment of the data center machine room, and represents the total energy consumption and total over-temperature condition of the operation strategy within the time T; Δ t is the time interval t to t +1, P _t For the operation power of the air conditioner at the end of time t,

the temperature of an air outlet of the ith server IT equipment of the data center machine room at the moment T, T _max And the upper limit of the allowable temperature of the air outlet of the cabinet is defined, and lambda is a weight parameter.

In addition to the above formula (1), J (π) can be expressed as:

all the operation strategies in the strategy library pi are evaluated in the real operation environment under the safety limit of the real operation environment, J (pi) is not practical, only a small part of the strategy library pi which is more likely to be the optimal strategy can be reasonably selected to be evaluated in the real operation environment, but the performance of the operation strategies can be evaluated without limit in the simulation environment of the temperature field distribution model; in actual engineering implementation, an accurate solution of the above evaluation formula (1) is not required to be pursued, but a "good enough" solution of the above evaluation formula (1) is obtained to meet engineering requirements, that is, an operation strategy with the minimum performance J (pi) in a strategy library pi is not required to be pursued, and the requirement can be met by finally obtaining the real performance J (pi) of the operation strategy to be located in the minimum g strategy sets.

In consideration of the characteristics of the problems, the operation strategies in the strategy library II can be selected by using an order optimization method. Specifically, the strategic performance evaluation function J (pi) of the real operating environment is used as a Detailed Model (refined Model), the above formula is also used as the strategic performance evaluation function in the temperature field distribution Model, but the strategic performance evaluation in the temperature field distribution Model is different from the strategic performance evaluation of the real operating environment, so the performance evaluation obtained in the simulation environment is recorded as J' (pi), and the performance evaluation is regarded as a rough Model (cloud Model); the noise level of the order performance curve OPC category and the rough model is estimated, and the size G of a given 'good enough' set (strategy selection set) G is related to the alignment level k according to the user preference, and the alignment level k is related to the alignment probability. Determining the size S of the selected set S according to a selected set formula of order optimization, evaluating the performance of all operating strategies in a strategy library pi according to a rough model, selecting S strategies with the minimum J' (pi) to form the selected set S, and ensuring that the strategy set S at least comprises k strategies with the real strategy performance of small g at the probability of 95% by an order optimization theory. It can also be understood that in the chosen pick set S there is a probability of greater than or equal to 95% that there are instances where the elements (design) in k S are truly "good enough".

In summary, the following steps are formed.

Fig. 3 is a flowchart of determining a selected set according to an embodiment of the present invention, in an embodiment, evaluating performance of each operating policy in a policy repository in a temperature field distribution model according to an order optimization method, and determining the selected set from the policy repository includes:

step 301, taking a strategy performance evaluation function J' (pi) obtained in the temperature field distribution model as a rough model;

step 302, estimating the category of the sequence performance curve and the noise level of the rough model;

step 303, obtaining the size G and the alignment level k of a strategy selection set G determined by a user; the alignment level k is related to the alignment probability.

Step 304, determining the size S of a selected set S according to a selected set formula of order optimization, wherein parameters of the selected set formula of order optimization comprise order performance curve types, noise levels of a rough model, the size G of a strategy selection set G and an alignment level k; the sorted set formula for order optimization is as follows:

wherein Z is ₁ ,Z ₂ ,Z ₃ ,Z ₄ The method is determined according to the category of the sequence performance curve and the noise level of the rough model, and is determined based on a large amount of historical data in a summary mode.

Step 305, calculating the values of the rough models of all the operation strategies in the strategy library, and selecting S operation strategies with the minimum rough model values from the selection set S to form the selection set S.

In step 105, each operation strategy in the selected set is respectively applied to the real operation environment of the data center machine room, and the optimal operation strategy in the selected set is determined.

In one embodiment, the step of determining the optimal operation strategy in the selected set by respectively applying each operation strategy in the selected set to the real operation environment of the data center machine room comprises the steps of:

taking a strategic performance evaluation function J (pi) of a real operation environment of a data center machine room as a detailed model;

respectively applying each operation strategy in the selected set to a real operation environment of the data center machine room to obtain a value of a detailed model of each operation strategy;

and taking the operation strategy with the minimum value of the detailed model as the final operation strategy of the data center terminal air conditioning system.

In summary, the method provided in the embodiment of the present invention has the following beneficial effects:

firstly, a strategy library containing a plurality of operation strategies is formed based on Markov decision process models with different parameters and strategy functions, compared with the traditional two-stage method and the reinforced learning method, only a single operation strategy is generated, the scheme comprehensively considers the strategy functions with a plurality of different forms and reasonably selects the operation strategies, so that the finally obtained operation strategy has better performance guarantee than the single operation strategy obtained by the traditional method, namely the operation strategy can be better guaranteed to ensure the thermal safety of the IT equipment of the server in the actual data center environment and reduce the energy consumption of the air conditioner at the tail end to the maximum extent.

Secondly, in the selection link of the operation strategies, different from the traditional method of calculating the real performance of all the operation strategies, sorting and then selecting the best strategy, the scheme adopts a sequence optimization method to obtain a selection set, thereby greatly reducing the evaluation times of the operation strategies in a strategy library in a real operation environment, further ensuring the thermal safety of the IT equipment of the data center server, and saving the manpower, material resources and financial resources.

The embodiment of the invention also provides a device for determining the operation strategy of the air conditioning system at the tail end of the data center, the principle of the device is similar to that of a method for determining the operation strategy of the air conditioning system at the tail end of the data center, and the device is not repeated.

Fig. 4 is a schematic diagram of an operation policy determination device of an air conditioning system at a data center end in an embodiment of the present invention, where the operation policy determination device includes:

the temperature field distribution model building module 401 is used for building a temperature field distribution model of a data center machine room;

a markov decision process model building module 402, configured to build a markov decision process model of an operation policy of an air conditioning system at the end of the data center;

a strategy base construction module 403, configured to use a reinforcement learning algorithm in the temperature field distribution model, train based on different strategy functions and markov decision process models with different parameters, respectively, generate operation strategies for a plurality of data center terminal air conditioning systems, and construct a strategy base;

a selection set determining module 404, configured to evaluate performance of each operation policy in the policy repository in the temperature field distribution model according to the order optimization method, and determine a selection set from the policy repository;

and an optimal operation policy determining module 405, configured to apply each operation policy in the selected set to a real operation environment of the data center machine room, respectively, and determine an optimal operation policy in the selected set.

In an embodiment, the temperature field distribution model building module is specifically configured to:

modeling and simulating the space structure of the data center machine room and the models of the air conditioners and the IT equipment by CFD software according to the CAD drawing of the machine room layout and by using CFD simulation software, and establishing a temperature field distribution model of the data center machine room;

collecting real operating environment data in a machine room;

and comparing the acquired real operating environment data with the operating environment data simulated by the temperature field distribution model, and continuously setting the temperature field distribution model, so that the matching degree of the operating environment data of the set temperature field distribution model and the real operating environment data reaches a preset threshold value.

In one embodiment, the Markov decision process model consists of a state space S, an action space A, a state transition function P, a reward function R, and a discount factor γ;

selecting the state of the state space S from the observation variables;

the action in the action space A is selected from the control variables;

at each instant t, the state S observed according to the environment at instant t _t Performing learning and selecting action A _t Environment to action A _t Respond correspondinglyAnd presents a new state S _t+1 With the generation of a prize R _t+1 The reward is targeted for long term maximization in the action selection process.

In one embodiment, in the temperature field distribution model, a reinforcement learning algorithm is used, and the markov decision process models based on different policy functions and different parameters are trained respectively to generate various operation policies of the data center terminal air conditioning system, and a policy base is constructed, including:

determining various adopted strategy functions;

determining an action cost function of the operation strategy, wherein the action cost function represents the accumulated discount reward for weighting the reward R of taking the action a under the condition of using the operation strategy;

the temperature of the air outlet of the ith server IT equipment of the data center machine room at the moment T, T _max And the upper limit of the allowable temperature of the air outlet of the cabinet is defined, and lambda is a weight parameter.

In an embodiment, the pick set determining module is specifically configured to:

taking a strategy performance evaluation function J' (pi) obtained in the temperature field distribution model as a rough model;

estimating the category of the sequence performance curve and the noise level of the rough model;

obtaining the size G and the alignment level k of a strategy selection set G determined by a user;

determining the size S of a selection set S according to a selection set formula of sequence optimization, wherein parameters of the selection set formula of sequence optimization comprise sequence performance curve types, noise levels of rough models, the size G of a strategy selection set G and an alignment level k;

and calculating the values of the rough models of all the operation strategies in the strategy library, and selecting S operation strategies with the minimum values of the rough models to form a selection set S.

In an embodiment, the optimal operation policy determining module is specifically configured to:

respectively applying each operation strategy in the selected set to a real operation environment of a data center machine room to obtain a value of a detailed model of each operation strategy;

In summary, the device provided by the embodiment of the invention has the following beneficial effects:

Fig. 5 is a schematic diagram of a computer device in an embodiment of the present invention, where the computer device 500 includes a memory 510, a processor 520, and a computer program 530 stored in the memory 510 and capable of being executed on the processor 520, and when the processor 520 executes the computer program 530, the method for determining an operation policy of an end air conditioning system in a data center is implemented.

The embodiment of the invention also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the method for determining the operation strategy of the air conditioning system at the end of the data center is realized.

The embodiment of the invention also provides a computer program product, which comprises a computer program, and when the computer program is executed by a processor, the method for determining the operation strategy of the data center terminal air conditioning system is realized.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for determining an operation strategy of a data center terminal air conditioning system is characterized by comprising the following steps:

building a temperature field distribution model of a data center machine room;

in the temperature field distribution model, training is carried out by using a reinforcement learning algorithm based on Markov decision process models with different strategy functions and different parameters respectively, operation strategies of the air conditioning system at the tail end of various data centers are generated, and a strategy library is constructed;

and respectively applying each operation strategy in the selected set to the real operation environment of the data center machine room, and determining the optimal operation strategy in the selected set.

2. The method of claim 1, wherein building a temperature field distribution model for a data center room comprises:

modeling and simulating the space structure of the data center machine room and the models of the air conditioner and the IT equipment by CFD software according to the CAD drawing of the machine room layout and by utilizing CFD simulation software, and establishing a temperature field distribution model of the data center machine room;

collecting real operating environment data in a machine room;

3. The method of claim 1, wherein the markov decision process model consists of a state space S, an action space a, a state transition function P, a reward function R, and a discount factor γ;

selecting the state of the state space S from the observation variables;

selecting the action in the action space A from the control variables;

at each time t, a state S observed in dependence on the environment at time t _t Performing learning and selecting action A _t Environment to action A _t Respond correspondingly and present a new state S _t+1 With the generation of a prize R _t+1 The reward is targeted for long term maximization in the action selection process.

4. The method of claim 3, wherein in the temperature field distribution model, training is performed by using a reinforcement learning algorithm based on Markov decision process models with different strategy functions and different parameters respectively to generate a plurality of operation strategies of the data center end air conditioning system, and a strategy library is constructed, comprising:

determining various adopted strategy functions;

5. The method of claim 3, wherein the policy function comprises a neural network type policy function, a basis function linear weighted type policy function.

6. The method of claim 1, wherein the formula for evaluating the performance of each operating strategy in the strategy library in the temperature field distribution model is as follows:

wherein, pi ^* For the optimal operation strategy, J (pi) is a performance evaluation function of the operation strategy in the operation environment of the data center machine room, and represents the total energy consumption and total overtemperature condition of the operation strategy within time T; Δ t is the time interval t to t +1, P _t For the operation power of the air conditioner at the end of time t,

7. The method of claim 6, wherein the performance of each operating policy in the policy repository is evaluated in the temperature field distribution model according to an order optimization method, and the determining a selection set from the policy repository comprises:

8. The method of claim 7, wherein the step of applying each operation strategy in the selected set to a real operation environment of the data center machine room respectively to determine an optimal operation strategy in the selected set comprises:

9. An operation strategy determination device for an air conditioning system at the tail end of a data center is characterized by comprising the following steps:

the Markov decision process model building module is used for building a Markov decision process model of the operation strategy of the air conditioning system at the tail end of the data center;

the strategy base building module is used for training a Markov decision process model based on different strategy functions and different parameters respectively by using a reinforcement learning algorithm in the temperature field distribution model, generating operation strategies of the air-conditioning system at the tail end of various data centers and building a strategy base;

the selected set determining module is used for evaluating the performance of each operation strategy in the strategy library in the temperature field distribution model according to the sequence optimization method and determining a selected set from the strategy library;

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 8.

12. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1 to 8.