CN116963461A

CN116963461A - Energy saving method and device for machine room air conditioner

Info

Publication number: CN116963461A
Application number: CN202310807022.8A
Authority: CN
Inventors: 江传来
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-10-27

Abstract

The embodiment of the invention provides an energy-saving method and device for an air conditioner of a machine room. The method comprises the following steps: real-time dynamic cruising monitoring the heat load data of a target machine room; collecting real-time data of the environmental temperature and humidity of the target machine room; based on a reinforcement learning algorithm, analyzing the heat load data and the environment temperature and humidity real-time data, and determining an energy-saving strategy of the target machine room; and determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room. According to the embodiment of the invention, the air conditioning system of the machine room can be intelligently adjusted according to the dynamic change condition of the heat load of the machine room, so that the intelligent control of the air conditioner of the machine room and the energy saving effect to the maximum extent are realized.

Description

Energy saving method and device for machine room air conditioner

Technical Field

The invention relates to the technical field of communication, in particular to an energy-saving method of a machine room air conditioner and an energy-saving device of the machine room air conditioner.

Background

With the continuous development of information technology, the operation load of a data center is continuously increased, an air conditioner of a machine room becomes one of equipment with the largest load in the machine room, and the energy consumption occupies about one third of the whole data center. Therefore, how to reasonably utilize the heat load data of the machine room to control, and realize the maximum energy conservation while ensuring the operation of the machine room becomes a hot spot problem in the current air-conditioning field of the machine room.

The energy-saving method of the air conditioner in the machine room at present mainly comprises the following steps: the traditional timing power-on and power-off energy-saving mode, quantitative air quantity energy-saving mode and self-adaptive temperature energy-saving mode. The traditional timing startup and shutdown energy-saving mode has the problems of low energy-saving effect, incapability of adapting to dynamic change of the load of a machine room and the like; the quantitative air volume energy-saving mode needs to realize a relatively complex aerodynamic simulation design, so that the problems of inconvenient use and inconvenient maintenance are brought; the self-adaptive temperature energy-saving mode is difficult to realize in practical application, or the control failure and the like caused by too much dependence on environmental stability. Meanwhile, the traditional air conditioner control mode is often controlled by taking the feedback signal of the temperature sensor as the control basis, and the method has the advantages of simplicity, but the defects are obvious, namely the temperature mutation in a short time or the too high or too low temperature of part of a machine room can be caused.

In recent years, artificial intelligence (Artificial Intelligence, AI) technology has rapidly developed, and the influence on the energy-saving field is also obvious, so that the AI energy-saving technology of the air conditioner in the machine room has developed. The artificial intelligence technology is applied to the machine room air conditioner control system, and the energy conservation of the machine room air conditioner is realized, and meanwhile, the constant temperature of the machine room can be ensured. Because the air conditioner of the machine room is a highly dynamic complex system covering various parameters, and the AI energy-saving mode has the characteristics of self-adaption, intelligence and the like, the variable load data of the machine room can be rapidly processed, and the AI energy-saving mode becomes an important energy-saving method in the energy-saving field.

The current AI energy-saving method of the air conditioner of the machine room mainly adopts a static setting mode, namely, the method performs optimization control according to historical data and predicted data of a period of time in the past, and mostly adopts manual control, so that automatic adjustment cannot be realized.

Therefore, how to realize the dynamic cruising based on the heat load of the machine room, and the self-learning self-adaptive self-adjusting AI energy-saving mode become an important subject to be solved urgently.

Disclosure of Invention

Aiming at the defects in the prior art, the embodiment of the invention provides an energy-saving method of a machine room air conditioner and an energy-saving device of the machine room air conditioner.

In a first aspect, an embodiment of the present invention provides an energy saving method for an air conditioner in a machine room, including:

real-time dynamic cruising monitoring the heat load data of a target machine room;

collecting real-time data of the environmental temperature and humidity of the target machine room;

based on a reinforcement learning algorithm, analyzing the heat load data and the environment temperature and humidity real-time data, and determining an energy-saving strategy of the target machine room;

and determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room.

In the above method, optionally, the analyzing the thermal load data and the real-time environmental temperature and humidity data based on the reinforcement learning algorithm to determine an energy saving strategy of the target machine room includes:

determining a safety strategy of the target machine room according to the maintenance guarantee level of the target machine room and the power supply use efficiency of the target machine room;

and inputting the safety strategy, the heat load data and the environment temperature and humidity real-time data into a strategy network determined based on a reinforcement learning algorithm, and determining the energy-saving strategy of the target machine room from an output result of the strategy network.

As in the above method, optionally, the policy network is determined by:

abstracting an air conditioning system of the target machine room into an intelligent agent;

based on the Off-policy algorithm of the reinforcement learning Model-free, the intelligent agent is enabled to select different air conditioning system regulation actions under different states, wherein each state comprises safety information, heat load information and environment information.

As described above, optionally, the Off-policy algorithm based on reinforcement learning Model-free enables the agent to select different air conditioning system control actions in different states, including:

Taking heat load data, environment temperature and humidity data and security policy data of each machine room as sample data;

converting the sample data into a plurality of nodes in a state space, wherein each node comprises: a machine room safety temperature set value, a machine room power supply use efficiency predicted value and a machine room thermal load predicted value;

defining a set of air conditioning system regulation actions for each of the nodes;

training an Actor network by using an Actor-Critic method in an Off-poll algorithm, wherein the input of the Actor network is a node in a state space, and the output of the Actor network is an air conditioning system regulation action corresponding to the node;

comparing the air conditioning system regulation action output by the Actor network with the air conditioning system regulation action actually taken, and estimating the value of the air conditioning system regulation action output by the Actor network by using a Critic network;

optimizing parameters of the Actor network using the value;

and taking the optimized Actor network as the strategy network.

The method optionally determines the energy saving policy of the target machine room from the output result of the policy network, including:

constructing a running state rewarding target value of an air conditioning system in a target machine room and hot spot punishment data of the target machine room through a learning network and a target network of the reinforcement learning algorithm;

Determining the score of the output result according to the running state rewarding target value and the hot spot punishment data;

if the score of the output result is lower than a preset value, feeding back the score of the output result to the strategy network so that the strategy network can redetermine a new output result;

and taking the air conditioning system regulation and control action corresponding to the maximum score in the output result as an energy-saving strategy of the target machine room.

As described above, optionally, the determining, based on the deep reinforcement learning algorithm and the energy saving policy, an optimal control policy of the target machine room in the current state includes:

inputting two continuous time sequence indoor environment indexes corresponding to a target machine room, the feedback of the previous time sequence and the regulation and control action of an air conditioning system of the previous time sequence as states into a Q network of a deep reinforcement learning algorithm, wherein the regulation and control action of the air conditioning system is determined according to the energy-saving strategy;

constructing an error gradient of error counter propagation through a Q network;

updating parameters of the Q network through a backward gradient propagation algorithm;

predicting and selecting an output action in a current state by using the Q network;

and optimizing by using an epsilon-greedy algorithm, and outputting an optimal control strategy.

As in the above method, optionally, the Q network is determined by:

determining each state of the target machine room, wherein the states comprise environmental temperature data, humidity data and heat load data of the target machine room;

determining the regulation and control actions of the air conditioning system in each state;

constructing a deep neural network model Q network based on the state and the air conditioning system regulation action;

and training the deep neural network model Q network, and learning an optimal control strategy based on the reward value.

As in the method above, optionally, the prize value is determined according to the following equation:

reward＝-(1-comfort)+energy

where comfort is comfort, energy is energy consumption of the air conditioning system, and reward is a prize value.

In a second aspect, an embodiment of the present invention provides an energy saving device for an air conditioner in a machine room, including:

the cruise monitoring module is used for dynamically cruising and monitoring the heat load data of the target machine room in real time;

the collecting module is used for collecting the real-time data of the environmental temperature and humidity of the target machine room;

the analysis module is used for analyzing the heat load data and the environment temperature and humidity real-time data based on a reinforcement learning algorithm and determining an energy-saving strategy of the target machine room;

and the execution module is used for determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room.

The apparatus as above, optionally, the analysis module includes:

the safety boundary construction unit is used for determining the safety strategy of the target machine room according to the maintenance guarantee level of the target machine room and the power supply use efficiency of the target machine room;

and the AI energy-saving learning unit is used for inputting the safety strategy, the heat load data and the environment temperature and humidity real-time data into a strategy network determined based on a reinforcement learning algorithm, and determining the energy-saving strategy of the target machine room from an output result of the strategy network.

As described above, optionally, the AI energy-saving learning unit includes:

an abstract subunit, configured to abstract an air conditioning system of the target machine room into an agent;

and the training subunit is used for enabling the intelligent body to select different air conditioning system regulation actions under different states based on an Off-poll algorithm of the reinforcement learning Model-free, wherein each state comprises safety information, heat load information and environment information.

The above apparatus, optionally, the training subunit is specifically configured to:

optimizing parameters of the Actor network using the value;

and taking the optimized Actor network as the strategy network.

As in the above apparatus, optionally, the analysis module 530 further includes: the energy-saving strategy issuing unit is specifically used for:

The above apparatus, optionally, the execution module is specifically configured to:

The apparatus as above, optionally, the execution module is configured to determine the Q network according to the following manner:

And training the deep neural network model Q network, and learning an optimal strategy based on the reward value.

The apparatus above, optionally, the execution module is configured to determine the prize value according to the following formula:

reward＝-(1-comfort)+energy

In a third aspect, an embodiment of the present invention provides an electronic device, including:

the device comprises a memory and a processor, wherein the processor and the memory are communicated with each other through a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions capable of performing the method of: real-time dynamic cruising monitoring the heat load data of a target machine room; collecting real-time data of the environmental temperature and humidity of the target machine room; based on a reinforcement learning algorithm, analyzing the heat load data and the environment temperature and humidity real-time data, and determining an energy-saving strategy of the target machine room; and determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room.

In a fourth aspect, embodiments of the present invention provide a storage medium having stored thereon a computer program which, when executed by a processor, performs a method of: real-time dynamic cruising monitoring the heat load data of a target machine room; collecting real-time data of the environmental temperature and humidity of the target machine room; based on a reinforcement learning algorithm, analyzing the heat load data and the environment temperature and humidity real-time data, and determining an energy-saving strategy of the target machine room; and determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room.

According to the energy-saving method for the machine room air conditioner, provided by the embodiment of the invention, the heat load data of the target machine room is dynamically cruising and monitored in real time, the environment temperature and humidity real-time data of the target machine room is collected, the heat load data and the environment temperature and humidity real-time data are analyzed based on a reinforcement learning algorithm, the energy-saving strategy of the target machine room is determined, the optimal control strategy of the target machine room in the current state is determined based on a deep reinforcement learning algorithm and the energy-saving strategy, and the optimal control strategy is issued to the air conditioning system of the target machine room, so that the air conditioning system of the machine room can be intelligently regulated according to the heat load dynamic change condition of the machine room, and the intelligent control and the maximum energy-saving effect of the machine room air conditioner are realized.

Drawings

FIG. 1 is a flow chart of steps of an embodiment of an energy saving method of a room air conditioner of the present invention;

FIG. 2 is a schematic diagram of a learning network and a target network in an embodiment of an energy saving method of a room air conditioner according to the present invention;

FIG. 3 is a flow chart of intelligent control of an air conditioner based on the DQN algorithm in an embodiment of an energy saving method of a machine room air conditioner of the present invention;

FIG. 4 is a flow chart of steps of an embodiment of an energy saving method of another room air conditioner of the present invention;

FIG. 5 is a block diagram of an embodiment of an energy saving device for a room air conditioner of the present invention;

fig. 6 is a block diagram of an embodiment of an electronic device of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Referring to fig. 1, a step flow chart of an embodiment of an energy saving method of a machine room air conditioner of the present invention may specifically include the following steps:

step S110, the heat load data of a target machine room are dynamically cruising and monitoring in real time;

specifically, the current AI energy-saving method for the air conditioner of the machine room mainly adopts a static setting mode, namely, the method performs optimal control according to historical data and predicted data of a period of time in the past, and mostly adopts manual control, so that automatic adjustment cannot be realized. In order to solve the problem, the embodiment of the invention provides a novel machine room air conditioner AI energy-saving mode based on machine room load dynamic cruising, and in the embodiment of the invention, the heat load data of a target machine room is monitored in real time, wherein the target machine room is the machine room needing the machine room air conditioner energy saving, and the heat load data are all heat load data in the target machine room, such as equipment heat load generated by all equipment in the machine room, building heat load data generated by building of the target machine room, personnel heat load generated by personnel entering and exiting of the target machine room, and the like.

In order to collect the heat load information, the electricity consumption voltage and current of each device in the real-time dynamic cruising and monitoring target machine room are calculated to count the heat load data of the machine room device, for example, the voltage and current values of each machine frame device in the real-time dynamic cruising and monitoring target machine room, and the heat load information of the machine room is calculated. And detecting the personnel access condition of the target machine room in real time, and determining the personnel heat load data of the target machine room according to the personnel access condition. And detecting heat load data of the target machine room building in real time, such as the heat insulation condition of the target machine room building, the outdoor environment temperature and the like. And (3) dynamically cruising the heat source of the machine room in real time, and collecting the heat load data to be used as the heat load data of the target machine room.

Step S120, collecting real-time data of the environmental temperature and humidity of the target machine room;

specifically, the environmental temperature and humidity sensor of the target machine room is used for collecting the real-time data of the environmental temperature and the real-time data of the environmental humidity of the target machine room.

Step S130, analyzing the heat load data and the environment temperature and humidity real-time data based on a reinforcement learning algorithm, and determining an energy-saving strategy of the target machine room;

specifically, based on a reinforcement learning algorithm, the heat load data and the real-time data of the environmental temperature and humidity are analyzed, and the policies of air supply/return air, proportion zone setting, fan rotating speed, compressor frequency and the like of the air conditioning equipment of the target machine room are output and controlled.

And step 140, determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and transmitting the optimal control strategy to an air conditioning system of the target machine room.

Specifically, in order to prevent the safety problems such as the uncontrolled air conditioner of the machine room caused by the failure of the AI control platform, an edge system or a server may be provided to receive the energy saving policy of the target machine room, where the edge system may be a moving ring monitoring unit (Field supervision unit, FSU) in moving ring monitoring. After the edge system receives the energy-saving strategy, based on a deep reinforcement learning algorithm and the energy-saving strategy, determining an optimal control strategy of the target machine room in the current state, wherein the optimal control strategy comprises switching on and off air conditioning equipment, adjusting output frequency of an air conditioning compressor and a fan, adjusting parameters such as temperature, humidity and the like of an air conditioner, issuing the optimal control strategy to an air conditioning system of the target machine room, and adjusting the temperature and the air quantity of the air conditioner according to the optimal control strategy by the air conditioning system, so that an AI energy-saving process of the target machine room is realized.

In practical application, after the heat load data of the target machine room and the real-time data of the environmental temperature and humidity are obtained, the data can be compared with the related data at the previous moment to judge whether the strategy is required to be adjusted and optimized, and the next operation is performed under the condition that the strategy is required.

On the basis of the foregoing embodiment, further, the analyzing the thermal load data and the real-time environmental temperature and humidity data based on the reinforcement learning algorithm to determine the energy-saving strategy of the target machine room includes:

Specifically, in practical applications, in order to ensure the safety of the air conditioner of the target machine room during the AI energy-saving control, it is also necessary to determine the safety policy of the target machine room.

First, a safety boundary is formulated based on operation and maintenance service level agreement (Service Level Agreement, SLA) requirements and site safety specifications, and the maintenance security level of each target machine room is determined, for example, the upper limit and the lower limit of the temperature are preset according to the maintenance rules and the safety management methods of the target machine room, so as to ensure that the temperature of the machine room fluctuates within a safe range. And then constructing different operation safety strategies by combining the power supply use efficiency (Power Usage Effectiveness, PUE) operation requirements of the target machine room. For example, the safety guarantee of the temperature of the target machine room is controlled to be 25-30 ℃, and the safety policy that the operation PUE of the target machine room is not more than 1.5 is provided.

And then, inputting the safety strategy, the heat load data and the real-time environmental temperature and humidity data into a strategy network determined based on a reinforcement learning algorithm, wherein the strategy network is an online self-learning network, learns to a neural network through the reinforcement learning algorithm and a large amount of training data, and is continuously optimized according to feedback effects in actual use, the strategy network can send/return air to a target machine room air conditioner, the output control strategy of a proportion belt setting, a fan rotating speed, a compressor frequency and the like is further trained, and the strategy network is trained to be capable of outputting the control strategy of an air conditioner on-off state, an air inlet/outlet temperature and humidity, a cold/hot channel temperature and humidity set value, a compressor on-off state, a compressor working frequency (variable frequency), a fan on-off state, a fan working frequency (variable frequency), voltage, current, active power, electric energy, valve opening, outdoor temperature and the like.

And finally, determining the energy-saving strategy of the target machine room through the output result of the strategy network.

The embodiment of the invention adopts an AI energy-saving mode of the machine room air conditioner based on the dynamic cruising of the machine room load, realizes the optimization and optimization of the control of the machine room air conditioner, saves energy and adds safety to the online self-learning reinforcement strategy application, reduces energy consumption on the premise of guaranteeing the safety of the machine room, and improves the energy utilization rate.

On the basis of the above embodiments, further, the policy network is determined by the following method:

Specifically, model-free is a Model-free learning method in reinforcement learning, and does not require explicit states and transition probabilities between states. Off-Policy refers to the inconsistent way in which the state matrix is updated and the Policy is selected. An Off-policy algorithm of Model-free is adopted to formulate an energy-saving strategy, safety information, heat load information and environment information of a target machine room are used as states, the energy-saving strategy made for the states is used as actions to learn, and the energy-saving strategy is formulated. Specifically, the air conditioning system of the target machine room is abstracted into an agent, so that the agent can select different actions under different states, and the energy saving target is realized. The state is the state of the machine room formed by the current safety information, the heat load information, the environment information and the like of the target machine room, and the actions comprise the regulation and control actions of the air conditioning system, for example, under a certain state, the intelligent body can select different actions of improving temperature, reducing humidity, increasing air supply quantity and the like so as to maximize energy-saving benefits.

The embodiment of the invention adopts a reinforcement machine learning algorithm to construct a strategy network model for adaptively controlling the relationship between the learning state and the action, thereby realizing the self-adaptive intelligent control of the air conditioner of the machine room.

Based on the above embodiments, further, the Off-policy algorithm based on reinforcement learning Model-free makes the agent select different air conditioning system control actions in different states, including:

Optimizing parameters of the Actor network using the value;

and taking the optimized Actor network as the strategy network.

Specifically, to train this agent, the Actor-Critic method in the Off-polar algorithm may be used. The Actor-Critic includes two parts in terms of name, actor (Actor) and evaluator (Critic). Wherein the Actor uses a policy function, which is responsible for generating actions (actions) and interacting with the environment. Critic uses a cost function to evaluate the performance of the Actor and to guide the action of the Actor in the next stage.

Specifically, the following steps may be used to formulate an energy saving strategy for the target room air conditioner:

step A1, collecting data: acquiring all information of each machine room from the basic data, wherein the information comprises indexes such as temperature, humidity, heat load condition, safety strategy and the like; in order to simplify the heat load index of the machine room, the machine room rack can be divided into columns or areas so as to obtain the average value of the heat load of a certain column or area as the heat load value of the area. And taking the acquired heat load data, environment temperature and humidity data and safety strategy data as sample data.

Step A2, then constructing a state space: the collected sample data is converted into nodes in a state space, each node representing a set of similar operating metrics. The nodes comprise machine room safety temperature setting, PUE predicted values and machine room heat load predicted values.

Step A3, defining an action space: a set of air conditioning system possible actions are defined for each node, such as increasing (decreasing) temperature, decreasing humidity, increasing air delivery, etc.

Step A4, training an Actor network: an Actor-Critic method in the Off-poll algorithm is used to train a neural network to select the optimal action according to the current state to maximize energy saving benefits. The input of the Actor network is a node in a state space, the output of the Actor network is the air conditioning system regulation action corresponding to the node, namely the output is the air conditioning system energy saving strategy.

Step A5, verifying a strategy: and generating an energy-saving strategy by using the trained neural network, and applying the energy-saving strategy to an air conditioning system of a machine room so as to verify the energy-saving effect of the energy-saving strategy.

In particular, in training an Actor network, a large amount of data is required to ensure the stability of the training effect. Therefore, adequate preparation in terms of data acquisition and data processing is required. In addition, during the training process, the problems of super parameter adjustment of the network, selection of a loss function and the like are also required to be paid attention to. After selecting the appropriate loss function and super parameters, training of the Actor network may begin. During training, the action output by the Actor network may be compared with the action actually taken and the Critic network used to estimate the value of the action. This value estimate can be used to optimize the parameters of the Actor network so that it can better select actions.

Once the Actor network training is complete, it can be used as a policy network to generate a power saving policy. Specifically, the current state of the target machine room may be input into the Actor network, and then the output action thereof is the energy saving policy of the target machine room. In practical application, certain adjustment is needed according to the situation, for example, factors such as dynamic characteristics of an air conditioning system, heat load change of a machine room and the like are considered, so that an energy-saving strategy is more reasonable.

In the embodiment of the invention, the Off-policy algorithm based on Model-free is used for evaluating the value by using the safety temperature, the PUE predicted value and the heat load predicted value of the machine room of the target machine room, so that the output control strategies such as the air supply/return of the air conditioner of the target machine room, the setting of a proportion zone, the rotating speed of a fan, the frequency of a compressor and the like are realized, the optimization and the optimization of the air conditioner control of the machine room are realized, the energy-saving and safe online self-learning reinforcement strategy is applied, the energy consumption is reduced on the premise of ensuring the safety of the machine room, and the energy utilization rate is improved.

On the basis of the above embodiments, further, determining the energy saving policy of the target machine room from the output result of the policy network includes:

Specifically, referring to fig. 2, a schematic diagram of a learning network and a target network in an embodiment of an energy saving method of an air conditioner in a machine room is shown, as shown in fig. 2, the target network is used to tune the learning network, and an operation state rewarding target value of an air conditioning system in the target machine room and hot spot punishment data of the target machine room are constructed, for example, an output strategy is beneficial to energy saving of the air conditioner, a rewarding value is obtained, if the air conditioner is not beneficial to energy saving, a punishment value is obtained, then a score of an output result of each strategy network is determined, the score can be the rewarding value, the punishment value can be a positive number, and the punishment value is set as a negative number. If the score of the output result is lower than the preset value, the score of the output result is fed back to the strategy network so that the strategy network adjusts parameters, the new output result is redetermined, the score of the new output result is calculated again, and finally the air conditioning system regulation and control action corresponding to the maximum score in the output result is used as the energy saving strategy of the target machine room.

For example, the on-off rewards of the air conditioner 1 and the air conditioner 2 are determined based on the learning network, the on-off rewards target values of the air conditioner 1 and the air conditioner 2 are determined based on the target network, parameters of the learning network are revised through the loss function, and finally the maximum rewarding action is selected and output.

In the embodiment of the invention, the energy-saving strategy output by the strategy network is adjusted through the learning network and the target network, so that the output energy-saving strategy is more reasonable.

Based on the above embodiments, further, the determining, based on the deep reinforcement learning algorithm and the energy saving policy, an optimal control policy of the target machine room in the current state includes:

Specifically, referring to fig. 3, an intelligent control flow chart of an air conditioner based on DQN algorithm in an embodiment of an energy saving method of a machine room air conditioner according to the present invention is shown in fig. 3:

the embodiment of the invention converts the intelligent control problem of the air conditioner into a reinforcement learning problem based on a Deep Q-network (DQN) algorithm in the Deep reinforcement learning algorithm. The edge system obtains the energy saving policy from the policy network to better decide on the adjusted temperature setting. The edge system optimizes rewards by updating the policy model. The indoor environment indexes of two continuous time sequences, such as a machine room heating and ventilation system, building environment information and the like, are input into a Q network based on an DQN algorithm, wherein the building environment information can comprise building heat insulation conditions, outdoor environment temperature and the like, feedback of a previous time sequence and control actions of the previous time sequence are used as states, error gradients of error back propagation are constructed through the two Q networks, parameters of the Q network are updated through a back gradient propagation algorithm, the Q network is used for prediction and optimal output actions under the current state are selected, and final control actions are output through an epsilon-greedy algorithm, and the epsilon-greedy algorithm is a common algorithm in the field of deep reinforcement learning and is not repeated here.

Specifically, determining the current state S of a machine room heating and ventilation system and building environment information _t And the current state S _t Is input into a neural network for training a Q model to obtain a current Q value (S _t ,a _t ). Based on the current state S _t Last time period state S _t-1 And the last timing control output a _t-1 Determining a return function r _t Specifically, the return function r is determined using the following equation (1) _t ：

r _t ＝cost(a _t-1 ，S _t-1 )+penalty(S _t ) Formula (1)

Where cost is the cost function and penalty is the penalty function.

Then the current state S is stored by a memory _t Return function r _t Last time period state S _t-1 And the last timing control output a _t-1 . The last time sequence state S stored in the memory _t-1 Control output a, next time period state S _t+1 And a return value r is input to the neural network for deducing the Q value, predicting the Q value at the maximum prize value (S _t+1 ,a _t+1 ). And summing the predicted value and the input value, combining the output Q value of the training Q model, utilizing an error back propagation algorithm to obtain an error gradient, feeding the error gradient back to the training model, and modifying model parameters. Output Q value of simultaneous training Q model controls output a using epsilon-greedy algorithm _t And uses the control output a _t And continuing the inference and optimization of the next time sequence, circularly learning the training model in such a way, continuously and autonomously strengthening in the field environment, dynamically following the heat load change, the environment change and the self-refrigerating capacity change of the machine room, and outputting the optimal control strategy in the current state.

The embodiment of the invention adopts a further optimized deep reinforcement learning algorithm and a combination strategy to make up for the defect of model-free deep reinforcement learning, and greatly accelerates the convergence of an on-site accurate energy-saving control strategy. The machine room load condition is mastered through the machine room load dynamic cruising, the running state of the air conditioner is timely adjusted by using an edge system, the energy saving of the machine room air conditioner AI is realized by using the deep reinforcement learning and safety reinforcement learning on-line reinforcement strategy, the energy consumption is saved, the machine room management cost is reduced, and meanwhile, the temperature stability in the machine room is ensured. Not only can the energy conservation and consumption reduction of the air conditioner in the machine room be realized, but also the energy utilization efficiency of the machine room can be greatly improved.

On the basis of the above embodiments, further, the Q network is determined by:

Specifically, the Q network is determined using the following steps:

Step B1, determining each state of a target machine room: the variables of the management state comprise the target machine room environment temperature, humidity, heat load data and the like.

Step B2, determining the air conditioning system regulation actions of each state, for example, regulating the temperature setting, can control the temperature setting within a safe range, and represent the actions as values deviating up and down from the current temperature value.

Step B3, setting a reward signal: a suitable reward signal is found to reward the edge system for making the correct decision. In embodiments of the present invention, the goal is to maximize comfort and minimize energy consumption. Thus, the prize value may be set according to equation (2):

reward= - (1-comfort) +energy formula (2)

Step B4, constructing a neural network model Q network: this step builds a neural network model for learning the relationships between states and actions. Using the deep neural network model, the conventional Q-learning algorithm cannot handle because the air conditioning control problem has a large amount of state space. The deep neural network can adaptively learn the relationship between states and actions and rely on powerful approximation techniques so that any function that can approximate the function can be approximated using the deep neural network.

Step B5, training a model: the model is trained so that the model can learn an optimal strategy based on the reward signal. During training, the current state is taken as input, and then actions are taken according to the selected strategy. The reward signals are received and these signals are used to update the neural network. By iteratively revising the neural network weights, an optimal strategy for achieving maximum comfort and minimum energy consumption can be learned.

Step B6, testing a model: after the trained model is installed, the effect of the model in practical application needs to be tested.

In the embodiment of the invention, the air conditioning system mainly refers to a precise air conditioning unit with a built-in compressor, and mainly comprises a fan and a compressor without a freezing water valve. The control parameters include: air conditioner start-stop, fan start-stop (fixed frequency, variable frequency), inlet/outlet air temperature setting, fan minimum rotation speed setting (variable frequency), fan rated rotation speed setting (variable frequency), fan running rotation speed setting (variable frequency), compressor start-stop (fixed frequency, variable frequency), compressor minimum load setting (variable frequency), compressor maximum load setting (variable frequency), compressor rotation speed setting (variable frequency) and the like.

The present embodiment collects the above data to evaluate the performance of the model and uses it to predict future temperature settings.

According to the energy-saving method of the machine room air conditioner, the multi-level energy-saving control is performed by adopting a combined algorithm, firstly, a safety boundary is formulated based on operation and maintenance SLA requirements and site safety standards, then, an energy-saving strategy under different environments is explored by using a reinforcement learning algorithm and a deep neural network, meanwhile, a strategy boundary is formulated in one step, finally, an on-line reinforcement strategy of deep reinforcement learning and safety reinforcement learning is used, and finally, rapid energy saving, safety, controllability and automatic adaptation change are achieved, and as time goes on, the optimization boundary and the energy-saving strategy are further reinforced, so that larger machine room energy saving is realized.

Referring to fig. 4, a step flow chart of another embodiment of an energy saving method of an air conditioner in a machine room of the present invention may specifically include:

determining an SLA safety boundary by using an optimization/search algorithm, an SLA rule and a knowledge rule, predicting by using a Long Short-Term Memory (LSTM)/a gate-control loop unit (gated recurrent unit, GRU) hot spot and a PUE, domain randomizing reinforcement learning (Reinforcement Learning, RL), determining a strategy boundary, determining safety layer data by the SLA safety boundary and the strategy boundary, performing safety reinforcement learning on a penalty function, and inputting the safety reinforcement learning into a V network, wherein the V network is used for evaluating the value of a state. And outputting each state value in the V network, combining a flexible action-evaluation (SAC) output strategy and sampling operation in the strategy network, and outputting the state operation value through Q network reinforcement learning so as to obtain the optimal control strategy of the target air conditioner.

According to the energy-saving method for the machine room air conditioner, provided by the embodiment of the invention, the energy-saving strategy of the target machine room is determined by using the target machine room safety temperature, the PUE predicted value and the machine room thermal load predicted value as value evaluation based on the Model-free Off-policy algorithm, the intelligent control of the air conditioner is realized based on the DQN algorithm, and the self-adaptive intelligent control of the machine room air conditioner is realized by constructing a neural network Model and adaptively controlling the relationship between the learning state and the action.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 5, a block diagram of an embodiment of an energy saving device of a machine room air conditioner according to the present invention is shown, which may specifically include the following modules:

the cruise monitoring module 510 is used for dynamically cruising and monitoring the heat load data of the target machine room in real time;

The collecting module 520 is configured to collect real-time data of environmental temperature and humidity of the target machine room;

the analysis module 530 is configured to analyze the thermal load data and the real-time environmental temperature and humidity data based on a reinforcement learning algorithm, and determine an energy saving strategy of the target machine room;

and the execution module 540 is configured to determine an optimal control policy of the target machine room in the current state based on the deep reinforcement learning algorithm and the energy saving policy, and send the optimal control policy to an air conditioning system of the target machine room.

Specifically, the cruise monitoring module 510 may include a machine room equipment thermal load acquisition unit, a machine room personnel thermal load acquisition unit, an environmental thermal load acquisition unit, and a policy validation unit.

The machine room equipment heat load acquisition unit is used for acquiring the electricity voltage and current conditions of equipment in the target machine room in real time so as to count the heat load of the machine room equipment; machine room personnel heat load obtaining unit: the system is used for detecting the in-out situation of the personnel in the target machine room in real time so as to count the thermal load of the personnel in the machine room; the environment heat load acquisition unit is used for detecting heat load data of the target machine room building in real time; policy confirmation unit: the method is used for summarizing and recording the heat load fluctuation condition of the target machine room in real time, comparing the heat load fluctuation with the energy-saving strategy, and feeding back whether the strategy needs to be adjusted and optimized or not through the association unit.

The collecting module 520 includes an environmental temperature and humidity sensor, and is configured to collect real-time data of the environmental temperature and humidity of the target machine room, and report the change condition of the temperature and humidity of the target machine room to the analyzing module 530 in real time according to the change of the energy saving policy.

The analysis module 530 includes a basic data reading unit, a security boundary construction unit, an AI energy saving learning unit, and an energy saving policy issuing unit.

The basic data reading unit is used for acquiring real-time data of the heat load of the target machine room by the cruise monitoring module 510, acquiring real-time temperature and humidity data in the target machine room from the collecting module 520, sending the data into the AI energy-saving learning unit, and constructing a learning network in the learning unit.

The safety boundary construction unit is used for constructing different operation safety strategies according to the difference of different maintenance security levels of the target machine room and the PUE operation requirements. For example, the temperature safety of the machine room is controlled to be 25-30 ℃, and the operation PUE of the machine room is not more than 1.5.

The AI energy-saving learning unit is used for inputting the security policy, the heat load data and the real-time data of the environmental temperature and humidity into a policy network determined based on the reinforcement learning algorithm, and determining the energy-saving policy of the target machine room from the output result of the policy network.

Specifically, the AI energy-saving learning unit includes:

The training subunit is specifically configured to:

and reading all information of the machine room from the basic data, including indexes such as temperature, humidity, air quantity, heat load condition and the like, and dividing the machine room frame according to columns or areas to obtain the average value of the heat load of a certain column or area as the heat load value of the area in order to simplify the heat load index of the machine room. Taking heat load data, environment temperature and humidity data and security policy data of each machine room as sample data;

a set of air conditioning system conditioning actions are defined for each of the nodes, such as increasing (decreasing) temperature, decreasing humidity, increasing air delivery, etc.

and verifying the strategy, namely generating the strategy by using the trained Actor network, and applying the strategy to an air conditioning system of the machine room so as to verify the energy-saving effect of the machine room. Comparing the air conditioning system regulation action output by the Actor network with the air conditioning system regulation action actually taken, and estimating the value of the air conditioning system regulation action output by the Actor network by using a Critic network;

optimizing parameters of the Actor network using the value;

and taking the optimized Actor network as the strategy network.

In practical applications, the analysis module 530 further includes: the energy-saving strategy issuing unit is specifically used for:

In practical applications, the execution module 540 is specifically configured to:

In practical applications, the execution module 540 is configured to determine the Q network according to the following manner:

training the deep neural network model Q network, and learning an optimal strategy based on the reward value, wherein the execution module 540 determines the reward value according to the formula reward= -1-comfort+energy, wherein comfort is comfort, energy is energy consumption of the air conditioning system, and reward is the reward value.

According to the energy-saving device for the machine room air conditioner, provided by the embodiment of the invention, the heat load data of the target machine room is dynamically cruising and monitored in real time, the environment temperature and humidity real-time data of the target machine room is collected, the heat load data and the environment temperature and humidity real-time data are analyzed based on a reinforcement learning algorithm, the energy-saving strategy of the target machine room is determined, the optimal control strategy of the target machine room in the current state is determined based on a deep reinforcement learning algorithm and the energy-saving strategy, and the optimal control strategy is issued to the air conditioning system of the target machine room, so that the air conditioning system of the machine room can be intelligently regulated according to the heat load dynamic change condition of the machine room, and the intelligent control and the maximum energy-saving effect of the machine room air conditioner are realized.

For the device embodiment, since the device embodiment is substantially similar to the method embodiment, the description is relatively simple, and the relevant points only need to be referred to the part of the description of the method embodiment, which is not repeated herein.

Referring to fig. 6, there is shown a block diagram of an embodiment of an electronic device of the present invention, the device comprising: a processor (processor) 610, a memory (memory) 620, and a bus 630;

wherein the processor 610 and the memory 620 communicate with each other via the bus 630;

the processor 610 is configured to invoke program instructions in the memory 620 to perform the methods provided by the method embodiments described above, including, for example: real-time dynamic cruising monitoring the heat load data of a target machine room; collecting real-time data of the environmental temperature and humidity of the target machine room; based on a reinforcement learning algorithm, analyzing the heat load data and the environment temperature and humidity real-time data, and determining an energy-saving strategy of the target machine room; and determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room.

Embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the method embodiments described above, for example comprising: real-time dynamic cruising monitoring the heat load data of a target machine room; collecting real-time data of the environmental temperature and humidity of the target machine room; based on a reinforcement learning algorithm, analyzing the heat load data and the environment temperature and humidity real-time data, and determining an energy-saving strategy of the target machine room; and determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room.

Embodiments of the present invention provide a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: real-time dynamic cruising monitoring the heat load data of a target machine room; collecting real-time data of the environmental temperature and humidity of the target machine room; based on a reinforcement learning algorithm, analyzing the heat load data and the environment temperature and humidity real-time data, and determining an energy-saving strategy of the target machine room; and determining an optimal control strategy of the target machine room in the current state based on a deep reinforcement learning algorithm and the energy-saving strategy, and issuing the optimal control strategy to an air conditioning system of the target machine room.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The energy saving method of the air conditioner in the machine room and the energy saving device of the air conditioner in the machine room provided by the invention are described in detail, and specific examples are applied to illustrate the principle and the implementation mode of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. An energy-saving method of an air conditioner of a machine room is characterized by comprising the following steps:

2. The method of claim 1, wherein the analyzing the thermal load data and the environmental temperature and humidity real-time data based on the reinforcement learning algorithm to determine the energy saving strategy of the target machine room comprises:

3. The method of claim 2, wherein the policy network is determined by:

4. The method of claim 3, wherein the reinforcement learning Model-free based Off-policy algorithm causes the agent to select different air conditioning system conditioning actions in different states, comprising:

optimizing parameters of the Actor network using the value;

and taking the optimized Actor network as the strategy network.

5. The method of claim 4, wherein determining the energy conservation policy of the target machine room from the output result of the policy network comprises:

6. The method of claim 5, wherein determining an optimal control strategy for the target machine room in a current state based on the deep reinforcement learning algorithm and the energy saving strategy comprises:

7. The method of claim 6, wherein the Q network is determined by:

8. The method of claim 7, wherein the prize value is determined according to the following equation:

reward＝-(1-comfort)+energy

9. An energy-saving device of a machine room air conditioner is characterized by comprising:

10. An electronic device, comprising:

the device comprises a memory and a processor, wherein the processor and the memory are communicated with each other through a bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-8.

11. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any one of claims 1 to 8.