CN110779132A

CN110779132A - Water pump equipment operation control system of air conditioning system based on reinforcement learning

Info

Publication number: CN110779132A
Application number: CN201911107409.2A
Authority: CN
Inventors: 陈安琪; 杨光; 花静霞
Original assignee: Yaokong Technology Shanghai Co Ltd
Current assignee: Yaokong Technology Shanghai Co Ltd
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-02-11

Abstract

A water pump equipment operation control system of an air conditioning system based on reinforcement learning is disclosed, wherein a water pump energy consumption measuring module is used for measuring the actual energy consumption of the water pump equipment of the air conditioning system; the outdoor environment sensing measurement module is used for measuring the outdoor environment; the indoor environment sensing measurement module is used for obtaining a comprehensive feedback equivalent value; the water pump control system is used for controlling water pump equipment; the water pump operation condition feedback module is used for feeding back water pump operation condition parameters; the intelligent control module obtains the parameter data, obtains the optimal operation setting parameter point of the water pump equipment after optimization calculation, and controls the water pump equipment according to the optimal operation setting parameter point of the water pump equipment. The invention provides a water pump equipment operation control system of an air conditioning system for reinforcement learning, which has stronger adaptability to environmental change, simpler calculation process and wider development prospect.

Description

Water pump equipment operation control system of air conditioning system based on reinforcement learning

Technical Field

The invention relates to the field of air conditioner water pump control, in particular to a water pump equipment operation control system of an air conditioner system based on reinforcement learning.

Background

For many buildings, it is indispensable to construct an air conditioning system, wherein the water pump equipment in the control air conditioning system is a very important link, and the control of traditional water pump equipment can not realize the air conditioning system water pump optimal control of taking into account personnel's thermal comfort and energy consumption under indoor outdoor comprehensive environmental condition.

Disclosure of Invention

The invention aims to provide a water pump equipment operation control system of an air conditioning system for reinforcement learning aiming at the defects in the background art, the optimal operation setting parameter point of the water pump equipment is obtained through optimization calculation, the water pump equipment is controlled according to the optimal operation setting parameter point of the water pump equipment, the environment change adapting capability is stronger, the calculation process is simpler, the further optimization space on the basis is larger, and the development prospect is wider.

In order to achieve the purpose, the invention adopts the following technical scheme:

a water pump equipment operation control system of an air conditioning system based on reinforcement learning comprises a water pump energy consumption measuring module, an outdoor environment sensing measuring module, an indoor environment sensing measuring module, a water pump control system, a water pump operation condition feedback module and an intelligent control module;

the water pump energy consumption measuring module is used for measuring the actual energy consumption of water pump equipment of the air conditioning system;

the outdoor environment sensing and measuring module is used for measuring the outdoor environment to obtain the parameters of the outdoor environment;

the indoor environment sensing and measuring module is used for measuring passenger flow and indoor environment to obtain passenger flow parameters, and obtaining a comprehensive feedback equivalent value by calculating outdoor environment parameters and the passenger flow parameters;

the water pump control system is used for controlling water pump equipment;

the water pump operation condition feedback module is used for feeding back water pump operation condition parameters;

the intelligent control module is connected with and acquires parameter data uploaded by the water pump energy consumption measuring module, the outdoor environment sensing measuring module, the indoor environment sensing measuring module, the water pump control system and the water pump operation condition feedback module, optimal operation set parameter points of the water pump equipment are obtained through optimization calculation, and the water pump equipment is controlled according to the optimal operation set parameter points of the water pump equipment.

Preferably, the intelligent control module controls the water pump equipment and comprises the steps of reading environmental information, calculating an action reward value, an intelligent learning process, an action selection process and an environment interaction process;

the method comprises the steps of reading environment information, wherein the step of reading the environment information comprises the steps of obtaining outdoor environment information, indoor environment information, energy consumption information and running state information of a bottom layer controller, extracting corresponding parameters from the environment information, and obtaining an optimal indoor set point;

comprises obtaining an optimal indoor set point using the following formula;

K＝f(K ₁，K ₂，K ₃，K ₄) - - -formula two;

wherein: k ₁，K ₂，K ₃，K ₄K respectively represents intermediate coefficients for formula conversion calculation;

T _outrepresenting a real-time outdoor temperature;

n represents the real-time outdoor wind speed;

V _outrepresenting real-time passenger flow;

V _inrepresenting real timeIndoor wind speed;

f () represents the PMV calculation function of the thermal comfort theory;

T _insetrepresents an optimal indoor set point;

T _inset，0standard indoor design values are expressed, and constants are specified by standard specifications.

Preferably, calculating the action reward value comprises deriving potential reward values for different actions based on the composite feedback equivalent value, as follows:

calculating potential reward values for different actions using formula four;

wherein:

r represents potential reward values for different actions;

α ₁，α ₂both represent two predicted apportioned values of energy consumption to thermal comfort ratio, the thermal comfort ratio being a known constant, (α) ₁+α ₂＝1)；

T _inRepresenting a real-time indoor temperature;

T _insetrepresents an optimal indoor set point;

e represents a real-time energy consumption value;

E _setrepresenting a standard energy consumption value, a constant value specified for a standard specification;

representing a known temperature difference transformation function for converting the temperature difference into a fraction value, the larger the temperature difference, the lower the fraction;

a known energy difference transformation function is represented for converting the energy difference into a fraction value, the larger the energy difference, the lower the fraction.

Preferably, the smart learning process: considering a controlled object (water pump) as an intelligent agent, considering the cumulative influence of future series states on current selection according to a set coefficient, and integrating the instant reward obtained by the current action and the cumulative reward of the future states to obtain evaluation values of different states and different actions of the intelligent agent, wherein the evaluation values are represented by a formula V:

Q＝αR _step+βQ _{step_next}-formula five;

wherein:

q represents an action evaluation value;

R _stepthe instant reward obtained by the current action is represented and obtained by a formula IV;

Q _{step_next}the accumulated reward representing the future state is obtained by accumulating according to a formula four;

α denotes a conversion factor, α ═ 0.1;

β denotes a conversion factor, β is 0.5.

Preferably, the action selection process: selecting the action with the largest evaluation value as an output action signal and selecting the action with the non-largest evaluation value as a seed according to evaluation values of different states and different actions of the intelligent agent and a greedy strategy; the concrete embodiment is formula six:

wherein:

a represents the corresponding next action of the intelligent agent in the current state;

ε represents the greedy value set (0 < ε ≦ 1, typically taking a number greater than 0.5);

α ', α "each represent computer-generated pseudo-random numbers (0 < α' ≦ 1, 0 < α ≦ 1).

Preferably, the environment interaction process: after the intelligent agent outputs corresponding action signals, the air conditioning system interacts with the environment to obtain new environment information, and the intelligent control module repeatedly controls the water pump equipment to realize closed loop including reading the environment information, calculating an action reward value, an intelligent learning process, an action selection process and an environment interaction process.

Preferably, outdoor environment response measuring module and indoor environment response measuring module all include the sensor, outdoor environment response measuring module measures outdoor temperature, outdoor humidity and outdoor wind speed, indoor environment response measuring module measures indoor temperature and passenger flow volume, the sensor with measured data packing send to intelligent control module.

Preferably, the water pump control system controls the water pump device to include:

for the water pump equipment without a basic control link, namely the water pump equipment without the control of a controller, modifying the type of a control object to directly control the water pump equipment, and outputting a control signal transmitted to the water pump equipment as an output result;

for the existing basic control link, namely the control is carried out by the controller, the control signal is output by the bottom controller to control the water pump equipment, and the output result is the control signal output to the controller.

Preferably, the water pump operation condition feedback module feeds back water pump operation condition parameters, including:

for water pump equipment without a basic control link, namely the water pump equipment without a controller for control, the feedback parameters are water pump operation parameters;

for the existing basic control link, namely the existing controller for controlling, the running parameters of the water pump and the running parameters of the bottom layer controller are fed back.

Preferably, the system further comprises a water pump operation model module, and the water pump operation model is built by collecting historical data of enough time length.

Drawings

FIG. 1 is a system block diagram of the present invention;

fig. 2 is a flow chart of the intelligent control module controlling the water pump of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

The orientation of the embodiment is based on the attached drawings of the specification.

The invention discloses a water pump equipment operation control system of an air conditioning system based on reinforcement learning, which is shown in figure 1 and comprises a water pump energy consumption measuring module, an outdoor environment induction measuring module, an indoor environment induction measuring module, a water pump control system, a water pump operation condition feedback module and an intelligent control module;

the indoor environment sensing and measuring module is used for measuring passenger flow and indoor environment to obtain passenger flow parameters, comprehensive feedback equivalent values are obtained by calculating outdoor environment parameters and passenger flow parameters, parameters reflecting the indoor actual personnel environment and parameters reflecting the outdoor environment obtained in the outdoor environment sensing and measuring module are comprehensively calculated, and according to set rules (for example, when outdoor temperature is increased, the indoor temperature set value can be properly increased under the conditions of low passenger flow and the like), an indoor personnel thermal comfort evaluation system is re-formulated to obtain the comprehensive feedback equivalent values reflecting the indoor environment; (ii) a

The water pump control system is used for controlling water pump equipment;

Preferably, as shown in fig. 2, the intelligent control module controls the water pump device, including reading environmental information, calculating an action reward value, an intelligent learning process, an action selection process and an environment interaction process;

the method comprises the following steps of reading environment information, wherein the environment information comprises outdoor environment information (such as outdoor temperature and wind speed), indoor environment information (such as indoor temperature and wind speed), system operation energy consumption information (power consumption or gas consumption, unit kWh or unit), bottom layer controller operation state information (controller working state (on/off), controller control point parameter set values and the like), extracting corresponding parameters from the environment information, and obtaining an optimal indoor set point;

comprises obtaining an optimal indoor set point using the following formula;

K＝f(K ₁，K ₂，K ₃，K ₄) - - -formula two;

T _outrepresenting a real-time outdoor temperature;

n represents the real-time outdoor wind speed;

V _outrepresenting real-time passenger flow;

V _inrepresenting real-time indoor wind speed;

f () represents the PMV calculation function of the thermal comfort theory;

T _insetrepresents an optimal indoor set point;

Preferably, the calculating of the action reward value includes obtaining potential reward values of different actions according to the comprehensive feedback equivalent value, firstly, according to a rule, calculating a comprehensive feedback equivalent value of an indoor environment according to indoor and outdoor environment parameters, and integrating the environment comprehensive feedback equivalent value and the operation energy consumption value according to a certain proportion (the proportion is determined according to the system use characteristics), so as to obtain potential reward values of different actions, specifically as follows:

calculating potential reward values for different actions using formula four;

wherein:

r represents potential reward values for different actions;

T _inRepresenting a real-time indoor temperature;

T _insetrepresents an optimal indoor set point;

e represents a real-time energy consumption value;

Preferably, the smart learning process: according to an evaluation value table under different < state, action > of the intelligent agent, according to a greedy strategy, an action with the largest evaluation value is selected as an output action signal under a certain probability, a certain exploration probability is kept, and an action with the non-largest evaluation value is selected as a seed under the other part of conditions, so that the situation that the optimal value appears at other places is prevented, and the method is represented as a formula five:

Q＝αR _step+βQ _{step_next}-formula five;

wherein:

q represents an action evaluation value;

Q _{step_next}representing future statesThe accumulated reward is obtained by accumulating according to a formula IV;

α denotes a conversion factor, α ═ 0.1;

β denotes a conversion factor, β is 0.5.

wherein:

Preferably, the system also comprises a water pump operation model module, the water pump operation model is built by collecting historical data of enough time length, the problem that running-in time is long when an algorithm is initialized may exist during actual application, aiming at the problem, system historical experience data can be utilized, the existing accurate water pump model is combined, a virtual environment is constructed to train the algorithm, the trained algorithm of the existing current building parameter trace can shorten the running-in time when the system is actually applied to a target building, and further optimization in time is realized. Because the training results of different water pump models are different, the advantages and disadvantages of the models can be reversely deduced and compared through the training results. The method aims to shorten the time of running-in debugging of the algorithm and the building, which is needed in the initial stage in actual application, and meanwhile, the method can also be used as a verification platform of different water pump operation models to evaluate the accuracy of different models by comparing the initial control effect.

The technical principle of the present invention is described above in connection with specific embodiments. The description is made for the purpose of illustrating the principles of the invention and should not be construed in any way as limiting the scope of the invention. Based on the explanations herein, those skilled in the art will be able to conceive of other embodiments of the present invention without inventive effort, which would fall within the scope of the present invention.

Claims

1. The utility model provides a water pump equipment operation control system of air conditioning system based on reinforcement study which characterized in that: the system comprises a water pump energy consumption measuring module, an outdoor environment sensing measuring module, an indoor environment sensing measuring module, a water pump control system, a water pump operation condition feedback module and an intelligent control module;

the water pump control system is used for controlling water pump equipment;

2. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 1, characterized in that:

the intelligent control module controls the water pump equipment and comprises the steps of reading environmental information, calculating an action reward value, an intelligent learning process, an action selection process and an environment interaction process;

comprises obtaining an optimal indoor set point using the following formula;

K＝f(K ₁，K ₂，K ₃，K ₄) - - -formula two;

T _outrepresenting a real-time outdoor temperature;

n represents the real-time outdoor wind speed;

V _outrepresenting real-time passenger flow;

V _inrepresenting real-time indoor wind speed;

f () represents the PMV calculation function of the thermal comfort theory;

T _insetrepresents an optimal indoor set point;

3. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 2, characterized in that:

calculating the action reward value includes deriving potential reward values for different actions based on the composite feedback equivalent value, as follows:

calculating potential reward values for different actions using formula four;

wherein:

r represents potential reward values for different actions;

T _inRepresenting a real-time indoor temperature;

T _insetrepresents an optimal indoor set point;

e represents a real-time energy consumption value;

4. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 3, characterized in that:

the intelligent learning process: considering a controlled object (water pump) as an intelligent agent, considering the cumulative influence of future series states on current selection according to a set coefficient, and integrating the instant reward obtained by the current action and the cumulative reward of the future states to obtain evaluation values of different states and different actions of the intelligent agent, wherein the evaluation values are represented by a formula V:

Q＝αR _step+βQ _{step_next}-formula five;

wherein:

q represents an action evaluation value;

α denotes a conversion factor, α ═ 0.1;

β denotes a conversion factor, β is 0.5.

5. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 4, characterized in that:

and (3) action selection process: selecting the action with the largest evaluation value as an output action signal and selecting the action with the non-largest evaluation value as a seed according to evaluation values of different states and different actions of the intelligent agent and a greedy strategy; the concrete embodiment is formula six:

wherein:

6. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 4, characterized in that:

and (3) environment interaction process: after the intelligent agent outputs corresponding action signals, the air conditioning system interacts with the environment to obtain new environment information, and the intelligent control module repeatedly controls the water pump equipment to realize closed loop including reading the environment information, calculating an action reward value, an intelligent learning process, an action selection process and an environment interaction process.

7. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 1, characterized in that:

outdoor environment response measuring module and indoor environment response measuring module all include the sensor, outdoor environment response measuring module measures outdoor temperature, outdoor humidity and outdoor wind speed, indoor environment response measuring module measures indoor temperature and passenger flow volume, the sensor with measured data packing send to intelligent control module.

8. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 1, characterized in that:

the water pump control system controls the water pump equipment to include:

9. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 1, characterized in that:

the water pump operation condition feedback module feeds back water pump operation condition parameters, which comprises the following steps:

10. The water pump apparatus operation control system of an air conditioning system based on reinforcement learning of claim 1, characterized in that:

the system also comprises a water pump operation model module, and the water pump operation model is built by collecting historical data of enough time length.