CN114279042A

CN114279042A - Central air conditioner control method based on multi-agent deep reinforcement learning

Info

Publication number: CN114279042A
Application number: CN202111609118.0A
Authority: CN
Inventors: 陈建平; 傅启明; 陈曦尧
Original assignee: Chongqing Industrial Big Data Innovation Center Co ltd; Suzhou University of Science and Technology
Current assignee: Chongqing Industrial Big Data Innovation Center Co ltd; Suzhou University of Science and Technology
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-05
Anticipated expiration: 2041-12-27
Also published as: CN114279042B

Abstract

The invention discloses a central air-conditioning control method based on multi-agent deep reinforcement learning, which carries out model-free optimization control on the starting and stopping states and working parameters of a refrigerator, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to the current indoor required cooling load and the outdoor wet bulb temperature, and comprises the operation sequence control of the refrigerator and the intelligent optimization control on the working frequency of the cooling water pump and the cooling water tower fan, the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, can respectively control the working frequency of the cooling water pump and the cooling water tower fan by using a single agent, can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce the unnecessary refrigerating capacity, reduce the working loads of the refrigerator, the cooling water pump and the cooling water tower fan, prolong the service life and reduce the failure rate, the energy consumption of the whole central air conditioning system and even the total energy consumption of the building are greatly reduced.

Description

Central air conditioner control method based on multi-agent deep reinforcement learning

Technical Field

The invention relates to the technical field of central air-conditioning control, in particular to a central air-conditioning control method based on multi-agent deep reinforcement learning.

Background

According to statistics, the energy consumption of the central air-conditioning system accounts for and even exceeds 50% of the total energy consumption of the building, wherein the energy consumption of the cold machine and the cooling water system is an important component of the energy consumption of the central air-conditioning system, and therefore, the optimal control of the cold machine and the cooling water system is particularly important for reducing the energy consumption of the whole central air-conditioning system and even the total energy consumption of the building.

Currently, the optimal control method in the current control method of the central air conditioning system mainly includes rule-based control, model-free control, and the like. Rule-based control is often static, and control rules are determined based on the experience of engineers and plant administrators, and have very limited applicability and optimization. The model-based approach requires a large amount of historical data and sensor information to build an accurate central air conditioning model, but the approach generally lacks robustness and is not suitable for old building groups lacking historical data and sensors. In order to avoid establishing an accurate mathematical model, a model-free control method is adopted, and the traditional model-free control method needs discretization of states and actions, so that the action space is large, the training time is long, the generalization capability of the algorithm is reduced, and the complex problem cannot be solved.

Therefore, this problem is urgently solved.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a central air-conditioning control method based on multi-agent deep reinforcement learning.

In order to achieve the purpose, the technical scheme adopted by the invention for solving the technical problems is as follows: a central air-conditioning control method based on multi-agent deep reinforcement learning carries out model-free optimal control on start-stop states and working parameters of a cold machine, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to current indoor required cold load and outdoor wet bulb temperature, and comprises cold machine operation sequence control and intelligent agent optimal control of working frequencies of the cooling water pump and the cooling water tower fan.

Preferably, the central air-conditioning system is characterized in that the cold machine, the cooling water pump and the cooling water tower are sequentially connected and arranged in groups, the sequential control of the cold machine is realized through a sequential controller, and the intelligent optimization control of the working frequency of the cooling water pump and the fan of the cooling water tower is realized through a reinforcement learning controller respectively.

Preferably, the method comprises the following steps:

A1. recording the outdoor wet bulb temperature by an electronic thermometer;

A2. obtaining the current indoor required cooling load through energy consumption software EnergyPlus analog simulation;

A3. the sequence controller determines the number of cold machines to be started according to the current indoor cold load requirement;

A4. and after receiving the current state information, the reinforcement learning controller establishes an environment model for the received data information and provides an optimal strategy according to the environment model.

Preferably, in the step a2, the current room is modeled by energy plus as a whole, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are inputted, wherein CL is_sRepresenting the current indoor required cooling load, and T representing the current indoor dry bulb temperature, outdoor dry bulb temperature and indoor wet bulb temperatureAnd set of outdoor wet bulb temperatures, model_roomRepresents the current room model, outputs CL_s＝{T，model_room}。

Preferably, in the step a3, the sequence controller performs threshold calculation and action execution, wherein threshold_nRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and threshold_nThe sequence controller calculates CL in real time_sThreshold of fall_nTo threshold_n+1When n is 0, the sequence controller closes all the coolers and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.

Preferably, in step a4, the two reinforcement learning controllers are respectively used as agents for controlling the working frequency of the cooling water pump and the cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network includes two fully-connected layers and a playback memory unit, the input layer is the current indoor required cooling load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cooling load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the reachable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all reachable frequencies of the cooling water pump, and the playback memory unit is used for recording all samples(s) and the playback memory unit is used for recording all the samples(s)_t，a_t，r_t，s_t+1) Wherein s is_tIndicating the current indoor required cooling load and outdoor wet bulb temperature, a_tRepresenting the operating frequencies of the cooling water pump and the cooling tower fan under the current indoor required cooling load and outdoor wet bulb temperature state, s' is represented at s_t’Performing action a in State_tNext state to which the post-transition is made, r_tIs shown in the current state s_t’Lower execution action a_tAn immediate reward is obtained.

Preferably, in the step a4, the two reinforcement learning controllers model the control problem of the operating frequency of the cooling water pump and the cooling water tower fan as two Markov Decision Process (MDP) models, and define the states, actions and reward functions therein as follows:

B1. state, denoted by S, wherein C_Ls represents the current indoor demand cooling load, T_wetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CL_s，T_wet}；

B2. Actions, denoted by a, wherein f_pumpRepresenting the frequency of the cooling water pump, f_towerRepresenting the frequency of the cooling tower fan, a_pump＝f_pummp；a_tower＝f_tower；

B3. A reward function, denoted by r, where P_chillerRepresenting the power consumption of the refrigerator, P_towerRepresenting the power consumption of the cooling tower fan, P_pumpRepresenting the power consumption of the cooling water pump,

preferably, in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents a return value of adopting the action a in the state s, and Q (s, a) is equal to E [ R (s, a) ] if the value function Q (s, a) is an expectation regarding R (s, a).

Preferably, in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, where the algorithm training process is as follows:

C1. initializing a memory playback unit, wherein the capacity is N, and the memory playback unit is used for storing training samples;

C2. initializing a current value network, randomly initializing a weight parameter omega, initializing a target value network, wherein the structure and the initialization weight are the same as the current value network;

C3. obtaining Q (s, a) in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, after a value function is calculated through the current value network, selecting an action a by using a greedy strategy, marking the action as a time step t by each state transition, and storing data (s, a, r, s') obtained at each time step into a playback memory unit;

C4. defining a loss function:

L(ω)＝E[(r+γmax_a′Q(s′，a′；ω-)-Q(s，a；ω))²]；

C5. randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting the (s, a, r, s') to the current value network, the target value network and the L (omega), and updating the L (omega) by using a random gradient descent method with respect to the omega, wherein the updating formula is as follows:

preferably, the overall algorithm training process is as follows:

D1. at the current time step t, performing cold machine start-stop control according to the real-time cold load;

D2. observing the environmental state s_tRecording data such as real-time cold load, outdoor wet bulb temperature and the like;

D3. model-free method given control action a_tSelecting the action a with the maximum current Q value by using a greedy strategy_t；

D4. The system executes the control action to obtain the next environment state s_t+1Calculating the coefficient of refrigerating performance under the current action and taking the coefficient as a reward value r in the reinforcement learning algorithm_t；

D5. Training multi-agent deep reinforcement learning algorithm, executing parameter updating, and sampling(s)_t，a_t，r_t，s_t+1) Storing the data into an experience pool, randomly sampling the data from the experience pool, and executing algorithm training to update network parameters;

D6. ending the current time step t and starting the next time step t + 1.

Due to the application of the technical scheme, compared with the prior art, the invention has the following beneficial effects:

1. the control method adopts multi-agent deep reinforcement learning, namely, a plurality of agents and a neural network are introduced on the basis of the traditional reinforcement learning, the problem of dimension disaster caused by that the reinforcement learning calculates and stores state-action values one by one under the condition of slow convergence speed of a single agent and high-dimensional state space is solved, and particularly, the control method is applied to the aspect that a plurality of cooperative controllers exist in a system, models the control problem of the working frequency of a cooling water pump and a cooling water tower fan into a Markov decision process model according to the current indoor required cold load and outdoor wet bulb temperature, defines a loss function and updates the loss function by using a gradient descent method, and solves the optimal strategy for controlling the working frequency of the cooling water pump and the cooling water tower fan;

2. the control method does not need to establish an accurate central air-conditioning system model in the actual deployment process, and can respectively control the working frequencies of the cooling water pump and the cooling water tower fan by using a single agent;

3. the control method can train a high-efficiency and accurate control strategy in a short time by depending on a small amount of historical data, reduce unnecessary refrigerating capacity, reduce the working load of a cold machine, a cooling water pump and a cooling water tower fan, prolong the service life, reduce the failure rate and greatly reduce the energy consumption of the whole central air-conditioning system and even the total energy consumption of a building.

Drawings

Fig. 1 is a schematic layout diagram of a chiller, a cooling water pump and a cooling water tower in a central air-conditioning system according to an embodiment of a central air-conditioning control method based on multi-agent deep reinforcement learning.

Fig. 2 is a flowchart of an embodiment of a central air-conditioning control method based on multi-agent deep reinforcement learning according to the present invention.

Fig. 3 is a logic flow diagram of the sequence controller performing threshold calculation and action execution in an embodiment of a multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.

Fig. 4 is a flowchart of deep Q learning algorithm training performed by the reinforcement learning controller in an embodiment of the multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.

FIG. 5 is a flowchart of the overall algorithm training in an embodiment of a multi-agent deep reinforcement learning-based central air conditioning control method according to the present invention.

Detailed Description

The present invention will be further described in detail with reference to the following specific examples:

with reference to fig. 1 to 5, this embodiment is a central air-conditioning control method based on multi-agent deep reinforcement learning, and model-free optimal control is performed on start-stop states and working parameters of a chiller, a cooling water pump and a cooling water tower fan in a central air-conditioning system according to a current indoor required cooling load and an outdoor wet bulb temperature, including chiller running sequence control and intelligent agent optimal control of working frequencies of the cooling water pump and the cooling water tower fan.

As shown in fig. 1, the cold machines, the cooling water pumps and the cooling water towers in the central air-conditioning system are sequentially connected and arranged in groups, as shown in fig. 2, the sequential control of the cold machines is realized by a sequential controller, and the intelligent optimization control of the working frequencies of the cooling water pumps and the fans of the cooling water towers is realized by a reinforcement learning controller respectively.

The present embodiment includes the following steps:

A1. recording the outdoor wet bulb temperature by an electronic thermometer;

In step A2, the current room is modeled as a whole using EnergyPlus, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are input, wherein CL is_sRepresenting the current indoor demand cooling load, T representing the set of the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature, model_roomRepresents the current room model, outputs CL_s＝{T，model_room}。

As shown in FIG. 3, in step A3, the sequence controller performs threshold calculation and action execution, wherein threshold_nRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and threshold_nThe sequence controller calculates CL in real time_sThreshold of fall_nTo threshold_n+1When n is 0, the sequence controller closes all the coolers and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.

In step A4, two reinforcement learning controllers are respectively used as agents for controlling the working frequency of a cooling water pump and a cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network comprises two full-connection layers and a playback memory unit, the input layer is the current indoor required cold load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cold load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the reachable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all the reachable frequencies of the cooling water tower fan, and the playback memory unit is used for recording all samples (s_t，a_t，r_t，s_t+1) Wherein s is_tIndicating the current indoor required cooling load and outdoor wet bulb temperature, a_tRepresenting the operating frequencies of the cooling water pump and the cooling tower fan under the current indoor required cooling load and outdoor wet bulb temperature state, s' is represented at s_t’Performing action a in State_tNext state to which the post-transition is made, r_tIs shown in the current state s_t’Lower execution action a_tAn immediate reward is obtained.

In step a4, two reinforcement learning controllers model the control problem of the operating frequency of the cooling water pumps and cooling tower fans as two Markov Decision Process (MDP) models and define the states, actions and reward functions therein as follows:

B2. Actions, denoted by a, wherein f_pumpRepresenting the frequency of the cooling water pump, f_towerRepresenting the frequency of the cooling tower fan, a_pump＝f_pump；a_tower＝f_tower；

in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents the return value of adopting action a in state s, and Q (s, a) is the expectation for R (s, a), and Q (s, a) is E [ R (s, a) ].

As shown in fig. 4, in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, and the algorithm training flow is as follows:

C3. obtaining Q (s, a ') in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, selecting an action a by using an e-greedy strategy after calculating a value function through the current value network, marking the action as a time step t by each state transition, and storing data (s, a, rs') obtained at each time step into a playback memory unit;

C4. defining a loss function:

L(ω)＝E[(r+γmax_a，Q(s′，a′；ω-)-Q(s，a；ω))²]；

as shown in fig. 5, the present embodiment includes the following overall algorithm training process:

D6. ending the current time step t and starting the next time step t + 1.

The innovation of the invention is as follows:

The above-mentioned embodiments are merely illustrative of the technical idea and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention, and all equivalent changes or modifications made according to the spirit of the present invention should be covered in the scope of the present invention.

Claims

1. A central air-conditioning control method based on multi-agent deep reinforcement learning is characterized in that: and performing model-free optimal control on start-stop states and working parameters of a central air conditioning system refrigerating machine, a cooling water pump and a cooling tower fan according to the current indoor required cooling load and the outdoor wet bulb temperature, wherein the model-free optimal control comprises the operation sequence control of the refrigerating machine and the intelligent body optimal control of the working frequency of the cooling water pump and the cooling tower fan.

2. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 1, characterized in that: the central air-conditioning system is characterized in that the cold machine, the cooling water pump and the cooling water tower are sequentially connected and arranged in groups, the cold machine sequence control is realized through a sequence controller, and the intelligent body optimization control of the working frequency of the cooling water pump and the cooling water tower fan is realized through a reinforcement learning controller respectively.

3. The central air-conditioning control method based on multi-agent deep reinforcement learning as claimed in claim 2, characterized by comprising the following steps:

A1. recording the outdoor wet bulb temperature by an electronic thermometer;

4. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 3, characterized in that: in the step A2, the current room is wholly modeled by EnergyPlus, and the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature are input, wherein CL is_sRepresenting the current indoor demand cooling load, T representing the set of the current indoor dry bulb temperature, outdoor dry bulb temperature, indoor wet bulb temperature and outdoor wet bulb temperature, model_roomRepresents the current room model, outputs CL_s＝{T，model_room}。

5. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 4, characterized in that: in the step A3, the sequence controller performs threshold calculation and action execution, wherein threshold_nRepresents the threshold value, n (0, 1, 2, 3, …) represents the number of cold machines turned on, refragerating capacit represents the rated refrigerating capacity of a single cold machine, and threshold_nThe sequence controller calculates CL in real time_sThreshold of fall_nTo threshold_n+1The range of (1) is always kept in the on state of the n refrigerators, and when n is equal to 0,the sequence controller turns off all the cold machines and takes away indoor heat only by the working of the cold water pump and the cooling water tower fan.

6. The central air-conditioning control method based on multi-agent deep reinforcement learning of claim 5, characterized in that: in the step a4, the two reinforcement learning controllers are respectively used as agents for controlling the working frequency of the cooling water pump and the cooling water tower fan to perform multi-agent deep reinforcement learning (MADRL) and construct a neural network, the neural network comprises two fully-connected layers and a playback memory unit, the input layer is the current indoor required cooling load and the outdoor wet bulb temperature, the intermediate layer is fully connected with all possible actions, the output layer is the value estimation of all actions under the current indoor required cooling load and the outdoor wet bulb temperature, the output action of the agent for controlling the working frequency of the cooling water pump is all the achievable frequencies of the cooling water pump, the output action of the agent for controlling the working frequency of the cooling water tower fan is all the achievable frequencies of the cooling water tower fan, and the playback memory unit is used for recording all samples(s) and is used for recording the frequencies of the cooling water tower fan_t，a_t，r_t，s_t+1) Wherein s is_tIndicating the current indoor required cooling load and outdoor wet bulb temperature, a_tRepresenting the operating frequencies of the cooling water pump and the cooling tower fan at the current indoor required cooling load and outdoor wet bulb temperature state, s, is represented at s_t’Performing action a in State_tNext state to which the post-transition is made, r_tIs shown in the current state s_t’Lower execution action a_tAn immediate reward is obtained.

7. The multi-agent deep reinforcement learning-based central air conditioning control method according to claim 6, wherein in the step A4, two reinforcement learning controllers model the control problem of the working frequencies of the cooling water pump and the cooling water tower fan as two Markov Decision Process (MDP) models, and define the states, actions and reward functions therein as follows:

B1. state, denoted by S, in which CL_sRepresenting the current indoor demand cooling load, T_wetRepresenting the current outdoor wet bulb temperature, the current states of the two agents are consistent and denoted by S ═ CL_s，T_wet}；

8. the central air-conditioning control method based on multi-agent deep reinforcement learning of claim 7, characterized in that: in step a4, the reinforcement learning controller builds a value function return model, where R (s, a) represents a return value of action a in state s, and Q (s, a) is an expectation regarding R (s, a), and Q (s, a) is E [ R (s, a) ].

9. The multi-agent Deep reinforcement learning-based central air-conditioning control method according to claim 8, wherein in step a4, the reinforcement learning controller solves the optimal strategy through a Deep Q learning (Deep Q Network or DQN) algorithm, and the algorithm training process is as follows:

C3. obtaining Q (s, a) in any state s by passing indoor required cooling load and outdoor wet bulb temperature through a current value network, after a value function is calculated through the current value network, selecting an action a by using an e-greedy strategy, marking the action as a time step t by each state transition, and storing data (s, a, r, s') obtained at each time step into a playback memory unit;

C4. defining a loss function:

L(ω)＝E[(r+γmax_a′Q(s′，a′；ω-)-Q(s，a；ω))²]；

10. the central air-conditioning control method based on multi-agent deep strengthening exercise of calligraphy of claim 9, characterized by comprising the following overall algorithm training process:

D6. ending the current time step t and starting the next time step t + 1.