CN112032982A - Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm - Google Patents

Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm Download PDF

Info

Publication number
CN112032982A
CN112032982A CN202010851497.3A CN202010851497A CN112032982A CN 112032982 A CN112032982 A CN 112032982A CN 202010851497 A CN202010851497 A CN 202010851497A CN 112032982 A CN112032982 A CN 112032982A
Authority
CN
China
Prior art keywords
value
indoor
action
state
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010851497.3A
Other languages
Chinese (zh)
Inventor
涂春光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Construct Forever Technology Co ltd
Original Assignee
Construct Forever Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Construct Forever Technology Co ltd filed Critical Construct Forever Technology Co ltd
Priority to CN202010851497.3A priority Critical patent/CN112032982A/en
Publication of CN112032982A publication Critical patent/CN112032982A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/88Electrical aspects, e.g. circuits
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/62Control or safety arrangements characterised by the type of control or by internal processing, e.g. using fuzzy logic, adaptive control or estimation of values
    • F24F11/63Electronic processing
    • F24F11/64Electronic processing using pre-stored data
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D27/00Simultaneous control of variables covered by two or more of main groups G05D1/00 - G05D25/00
    • G05D27/02Simultaneous control of variables covered by two or more of main groups G05D1/00 - G05D25/00 characterised by the use of electric means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/10Temperature
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/10Temperature
    • F24F2110/12Temperature of the outside air
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/50Air quality properties
    • F24F2110/65Concentration of specific substances or contaminants
    • F24F2110/70Carbon dioxide

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computational Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Medical Informatics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Combustion & Propulsion (AREA)
  • Databases & Information Systems (AREA)
  • Fuzzy Systems (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

The invention provides an indoor environment comfort level improving method based on a Simethite Monte Carlo algorithm, which is characterized in that in order to improve the indoor comfort level of people, the equipment such as an air conditioning system, a humidifier, a lighting system, a ventilation system and the like in an office are controlled based on the Simethite Monte Carlo algorithm, simple model construction is carried out on the equipment, parameters such as input temperature and humidity, illumination intensity, carbon dioxide concentration and the like are intelligently adjusted, and then all parameter values are controlled to be set optimal values so as to optimize the indoor comfort level. The invention carries out simulation experiments based on the constructed model, and the experimental results show that: (1) the method can achieve good convergence and stability under different parameter settings, and can well improve the comfort level of the indoor environment; (2) compared with a PID algorithm and a fuzzy control method, the algorithm has the advantages of high convergence speed, good robustness, high precision and the like in the aspects of controlling building equipment.

Description

Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm
Technical Field
The invention belongs to the field of indoor environment improvement, and particularly relates to an indoor environment comfort level improvement method based on a Simethite Monte Carlo algorithm.
Background
The indoor environment has decisive effect on the comfort level of human bodies, and the economic development and the living standard are continuously improved, so that the indoor environment problem is increasingly prominent. According to the research of scholars at home and abroad, the efficiency of indoor workers is improved by 15-20% if the indoor environment quality is improved. In the indoor environment, the influence of indoor hot and humid environment, light environment and air quality on people is particularly prominent. Therefore, the improvement of indoor hot and humid environment, light environment and air quality through regulating and controlling the indoor equipment also means that the comfort level of people in the room is improved.
When relevant systems in a building are researched and controlled, common methods such as fuzzy control, PID control and the like are adopted. The traditional methods have the defects of low convergence speed or poor convergence performance and the like when a complex system or a plurality of controlled objects are controlled.
Disclosure of Invention
The invention provides a controller based on an on-poliymontearlo (OMC) algorithm, which is used for controlling relevant equipment in a building. The Monte Carlo algorithm is an algorithm in reinforcement learning, and a reward value is obtained through states and actions so as to evaluate the quality of a strategy.
In order to achieve the purpose, the invention mainly provides the following technical scheme:
an indoor environment comfort level improving method based on a co-policy Monte Carlo algorithm comprises the following steps:
s1, establishing a reward function and a state transition function;
s2, initializing operation value function Q (S)t,at) Learning rate alpha and discount rate gamma, where s is a state parameter, derived from the room temperature TrIndoor carbon dioxide concentration [ rho ]tIndoor illuminance ItIndoor humidity HtAnd real-time energy consumption EtForming; a is an action parameter which is composed of an air conditioning system action, an illumination system action, a humidifier and dehumidifier action and a ventilation system action;
s3, setting parameters of each episode, including N being 4000 unit time steps, and t being 0, that is, keeping each state and the action parameter in an initial state;
s4, the running of each time step in each plot includes the current state StCalculating the action factor a at the momentt(ii) a When the action at the moment is taken, the state transition situation is calculated according to the established state transition function, and the corresponding state s at the next moment is obtainedt+1(ii) a Then according to the above-mentioned established reward function formula it can calculate out stAnd action factor atLower reward value rt
S5, judging termination conditions: and judging whether the values of the action value functions under all the state factors are preset values or not, if not, returning to the step S3 to perform new plot operation, and if so, ending the circulation.
Further, in step S1: establishing reward functions such as formulas (1) to (5) and state transition functions such as formulas (6) to (10):
Figure BDA0002644882950000031
Figure BDA0002644882950000032
Figure BDA0002644882950000033
Figure BDA0002644882950000034
r=-w1(T)-w2(h)-w3(I)-w4(CO2) (5)
T(t+1)=T(t)-[(-1)AC/2×Tc×(1-0.2×VS)] (6)
Figure BDA0002644882950000035
h(t+1)=h(t)+0.1×H-0.1×DH (8)
ρ(t+1)=ρ(t)-0.2×VS (9)
I(t+1)=I(t)+(-1)L%2×0.1×L (10);
wherein the environmental state s ═ T1,h1,ρ1,I1]The parameters are shown in formula (1) to formula (5);
in formula (1), TsIs the set most comfortable temperature, TmaxIs the maximum value within the range;
in the formula (2), hsThe denominator represents the maximum value h of the value range for the most suitable indoor relative humiditymaxSubtracting the optimum humidity value hsA difference of (d);
in the formula (3), the reference plane and the height of the illumination are 0.75m horizontal plane, IsIndicating the optimum indoor illuminance, ImaxIs a set maximum illuminance value, if the illuminance exceeds ImaxThe human eyes feel uncomfortable, and the denominator represents the difference between the two;
in the formula (4), ρsIs a set target value, is outdoor CO2The lowest level, p, at which the concentration can be reachedmaxIs a set maximum value beyond which comfort disappears;
in equation (5), the value of r is the final evaluation criterion of the system, the value of r is controlled between [ -1, 0], and equation (5) represents the superposition of reward values under different weights of various parameters; in the formulas (1) to (4), the larger the deviation of the values of the parameters from the set values is, the closer the r value is to-1 (the smaller the r value is), and the larger the r value is otherwise; therefore, the expression (5) is represented by a negative sign; the weight w ═ 0.6, 0.1, 0.1, 0.2 here is obtained through a number of experiments, which ensures that the value of r is between-1, 0 and that the system maintains good performance;
in the algorithm, a state transfer function is shown as a formula (6) to a formula (10), wherein the formula (6) represents the change of temperature along with time; however, when the air conditioner is operated, the indoor temperature is influenced by opening the ventilation system, so that the influence of the ventilation system on the temperature is shown in the equation by adding a weakening parameter of 0.2; in formula (7), TcThe temperature change rate is related to the strength of wind generated by the air conditioner; the expressions (8), (9) and (10) respectively represent humidity and CO2Density and illumination.
Further, the learning rate and the discount rate are set to α ═ 0.1 and γ ═ 0.9.
The invention has the beneficial effects that:
in order to improve the indoor comfort of people, the equipment such as an air conditioning system, a humidifier, a lighting system and a ventilation system in an office is controlled based on the Simmonte Carlo algorithm, simple model construction is carried out on the equipment, parameters such as input temperature and humidity, illumination intensity and carbon dioxide concentration are intelligently adjusted, and then all parameter values are controlled to be set optimal values to optimize indoor comfort.
The invention carries out simulation experiments based on the constructed model, and the experimental results show that: (1) the method can achieve good convergence and stability under different parameter settings, and can well improve the comfort level of the indoor environment; (2) compared with a PID algorithm and a fuzzy control method, the algorithm has the advantages of high convergence speed, good robustness, high precision and the like in the aspects of controlling building equipment.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a logic diagram of a device state change based on the co-policy Monte Carlo algorithm according to the present invention;
FIG. 2 is a flow chart of an algorithm framework based on the co-policy Monte Carlo algorithm of the present invention;
FIG. 3 is an indoor temperature convergence diagram based on the co-policy Monte Carlo algorithm of the present invention;
FIG. 4 is a graph of convergence of indoor humidity based on the co-policy Monte Carlo algorithm of the present invention;
FIG. 5 shows an indoor CO based on the CO-policy Monte Carlo algorithm of the present invention2A convergence map;
FIG. 6 is a graph of convergence of indoor illuminance based on the co-policy Monte Carlo algorithm according to the present invention;
FIG. 7 is a graph of the variation of the reward values of 200 episodes based on the Simmonte Carlo algorithm according to the present invention;
FIG. 8 is a graph of the convergence steps of 200 scenarios based on the Simmonte Carlo algorithm of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the protection scope of the present invention.
Optimization control based on the same-policy Monte Carlo algorithm:
reinforcement learning is learning how to map a scene to an action to obtain a maximum value reward signal. The reinforcement learning process for solving the problem is simply a process in which an agent (agent) takes action (action) to change its state (state) to obtain a return value (rewarded) and continuously interacts with an environment (environment-ment), the reinforcement learning includes a plurality of different algorithms, and whether a model is needed or not is an important feature for distinguishing the algorithms, wherein the samesian monte carlo method is an algorithm which does not need the model and only needs experience-to obtain the state, the action and the return from online or simulated interaction with the environment.
Factors affecting indoor comfort mainly include indoor hot and humid environment, luminous environment and indoor air quality, and state factors to be considered include: indoor temperature, carbon dioxide concentration, relative humidity, and illuminance. Indoor temperature needs air conditioning equipment to adjust and control it, and carbon dioxide concentration controls through ventilation system, and the change of humidity needs to pass through humidifier and dehumidifier, and the regulation illuminance needs to be controlled through lighting apparatus. Therefore, the action factors are: the operation conditions of the air conditioning system, the ventilation system, the humidifier, the dehumidifier and the lighting system.
Improving the comfort of the indoor environment needs to be analyzed from factors such as the indoor hot and humid environment, the light environment, the air quality and the like. The influence of the dry bulb temperature and the relative humidity on the comfort level of people is the most prominent in the indoor hot and wet environment, the indoor light environment depends on the indoor illumination condition, and the influence of the indoor air quality carbon dioxide concentration on the comfort level of people has the largest specific gravity. For Agent, assume that the external environment is an independent common office only containing 4 parameters of temperature, humidity, illumination and carbon dioxide concentration, so the involved parameters are: indoor temperature T (DEG C) is set within the range of [ Tmin, Tmax [)]Tmin is a set minimum temperature value, and Tmax is a set maximum temperature value. Indoor relative humidity H (HR) (relative humidity is expressed in percentage and is directly expressed in integer in the present invention for simplifying parameters), the setting range is [ hmin, hmax]Hmin is the minimum value and hmax is the set maximum value. Indoor illuminance I (lx) in the range of [ Imin, Imax]. Carbon dioxide concentration CO in room2(ppm) set to the range [ Pmin, Pmax [ ]]。
If the values of these parameters exceed the above-mentioned set maximum values, the person will feel uncomfortable. In order to meet the requirement of people on the environmental comfort level, each parameter needs to be set with a comfortable value and the set value is ensured to be within a given range.
Modeling an algorithm framework:
as shown in fig. 1, the controlled objects in the present invention are air conditioners, dimmable lighting devices, ventilation systems, humidifiers, and dehumidifiers, and the change of the environmental state needs to be realized by the change of the state of the controlled object. The controlled object, namely controlled equipment such as an air conditioning system, a ventilation system and the like selects equipment actions according to the current environment state so as to change the equipment state.
FIG. 2 shows a basic flow chart of the method, wherein the state at a certain moment is such as indoor temperature and humidity, and CO2And the concentration, the illumination and the like, generating actions at the next moment including actions of equipment such as an air conditioning system, a ventilation system, a lighting system, a dehumidifier, a humidifier and the like through a strategy selector and an action selector according to the environmental state at the current moment, and evaluating and improving the strategy through an action executor until judging whether the strategy is the optimal strategy.
At each time step t, agent gets several environment states st e s, where s is the set of all possible states. On this basis, an action at ∈ a (st) is selected according to the policy (state-to-action mapping s → a). Where a (st) is a set of optional actions. After a time step. agent gets a prize value r ← s × a. And the next state st +1 is obtained. And evaluating and improving the strategy according to the reward value.
Designing an algorithm framework:
the actions a in the algorithm with respect to changing the state of the device are modeled as a matrix, and the horizontal dimension is a five-dimensional vector representing the actions of the various devices.
The first dimension ac (air conditioning) represents an operation of the air conditioner, and may be represented by a vector of a1 ═ a10, a11, a12, a13, and a14, and there are 5 kinds of operations in total: 0 indicates off, 1 indicates hot air (small wind), 2 indicates cold air (small wind), 3 indicates hot air (large wind), and 4 indicates cold air (large wind).
The second dimension vs (ventilation system) represents the motion of the ventilation system, and the motion vector of the ventilation system is represented as a2 ═ a20, a21, a22, and 3 kinds of motions: 0 indicates off, 1 indicates a small gear, and 2 indicates a large gear.
The third dimension h (humimidifier) represents the humidifier action a3 ═ a 30. a31. a32], 3 actions in total: 0 indicates off, 1 indicates low, and 2 indicates high.
The fourth dimension dh (dehumidifer) represents the operation of the dehumidifier: 0 indicates off, 1 indicates low gear, 2 indicates high gear; the motion of the dehumidifier can be represented by the vector a4 ═ a40, a41, a 42.
A fifth dimension l (light) represents the motion of the lamp, and a motion vector a5 of the luminaire [ a50, a51, a52] represents: 0 indicates off, 1 indicates increasing illumination, and 2 indicates decreasing illumination.
Figure BDA0002644882950000091
Figure BDA0002644882950000092
Figure BDA0002644882950000093
Figure BDA0002644882950000094
r=-w1(T)-w2(h)-w3(I)-w4(CO2) (5)
T(t+1)=T(t)-[(-1)AC/2×Tc×(1-0.2×VS)] (6)
Figure BDA0002644882950000095
h(t+1)=h(t)+0.1×H-0.1×DH (8)
ρ(t+1)=ρ(t)-0.2×VS (9)
I(t+1)=I(t)+(-1)L%2×0.1×L (10)
In the OMC algorithm, the environment state s ═ T1,h1,ρ1,I1]The parameters are shown in formula (1) to formula (5);
in formula (1), TsIs the set most comfortable temperature, TmaxIs the maximum value within the range;
in the formula (2), hsThe denominator represents the maximum value h of the value range for the most suitable indoor relative humiditymaxSubtracting the optimum humidity value hsA difference of (d);
in the formula (3), the reference plane and the height of the illumination are 0.75m horizontal plane, IsIndicating the optimum indoor illuminance, ImaxIs a set maximum illuminance value, if the illuminance exceeds ImaxThe human eyes feel uncomfortable, and the denominator represents the difference between the two;
in the formula (4), ρsIs a set target value, is outdoor CO2The lowest level, p, at which the concentration can be reachedmaxIs a set maximum value beyond which comfort disappears;
in equation (5), the value of r is the final evaluation criterion of the system, the value of r is controlled between [ -1, 0], and equation (5) represents the superposition of reward values under different weights of various parameters; in the formulas (1) to (4), the larger the deviation of the values of the parameters from the set values is, the closer the r value is to-1 (the smaller the r value is), and the larger the r value is otherwise; therefore, the expression (5) is represented by a negative sign; the weight w ═ 0.6, 0.1, 0.1, 0.2 here is obtained through a number of experiments, which ensures that the value of r is between-1, 0 and that the system maintains good performance;
in the algorithm, a state transfer function is shown as a formula (6) to a formula (10), wherein the formula (6) represents the change of temperature along with time; however, when the air conditioner is operated, the indoor temperature is influenced by opening the ventilation system, so that the influence of the ventilation system on the temperature is shown in the equation by adding a weakening parameter of 0.2; in formula (7), TcThe temperature change rate is related to the strength of wind generated by the air conditioner; the expressions (8), (9) and (10) respectively represent humidity and CO2Density and illumination.
And (3) a control algorithm:
the algorithm flow is as follows:
1) initialization r is 0, action a
2) For each episode, the state s is initialized0(T0,h0,ρ0,I0)
3) Determining the state s 'of the next moment according to the state transition function'
4) Updating the r value according to equations (1) to (5)
5) For each state s in the episode: a' ← argmaxar (s, a)
6) Repeating each plot until s satisfies a termination condition
In reinforcement learning, the problem has no definite termination condition, so for convenience of experiment, a definite plot number needs to be set, each plot has N unit time step numbers, and when t +1 is equal to N, the operation of one plot is finished.
And (3) simulation result analysis:
the invention uses OMC algorithm to optimize indoor environment comfort level and control indoor related equipment. To verify the validity of the algorithm, simulation experiments were performed in a python2.7 environment. The method comprises the following specific steps:
s1, establishing a reward function and a state transition function;
s2, initializing operation value function Q (S)t,at) Learning rate alpha and discount rate gamma, where s is a state parameter, derived from the room temperature TrIndoor carbon dioxide concentration [ rho ]tIndoor illuminance ItIndoor humidity HtAnd real-time energy consumption EtForming; a is an action parameter which is composed of an air conditioning system action, an illumination system action, a humidifier and dehumidifier action and a ventilation system action;
s3, setting parameters of each episode, including N being 4000 unit time steps, and t being 0, that is, keeping each state and the action parameter in an initial state;
s4, the running of each time step in each plot includes the current state StCalculating the action factor a at the momentt(ii) a When the action at the moment is taken, the state transition situation is calculated according to the established state transition function, and the corresponding state s at the next moment is obtainedt+1(ii) a Then according to the above-mentioned established reward function formula it can calculate out stAnd action factor atLower reward value rt
S5, judging termination conditions: and judging whether the values of the action value functions under all the state factors are preset values or not, if not, returning to the step S3 to perform new plot operation, and if so, ending the circulation.
And (3) analyzing an experimental result:
the effectiveness of the Zygen Monte Carlo control algorithm is mainly verified and the convergence performance of the algorithm and PID control and fuzzy control is compared. 200 episodes are set in the present invention, and the number of steps of each episode is set to 4000 steps. Referring to the actual situation, the range set by each parameter is as follows: indoor temperature T (DEG C) is set within a range of 0, 40; indoor humidity H (HR), set range is [0, 100 ]; indoor illuminance I (Lx), set range is [0, 800 ]; the indoor carbon dioxide concentration ρ (ppm) is set in the range of [200, 1000 ].
Setting each parameter value to be when satisfying indoor comfort level: temperature 25 deg.C, humidity 50HR, CO2Concentration 300ppm, illuminance 300 lx. The invention makes a plurality of groups of experiments, and two groups of experiments are selected to explain the convergence performance of the algorithm. Experiment a initial state of each parameter setting is sa ═ 35, 70, 700, 100]. The initial state of experiment b is sb ═ 10, 20, 850, 600]. The experimental data are shown in fig. 3, 4, 5, 6, 7 and 8.
The indoor thermal and humidity environment is an important influence factor affecting the indoor comfort, and fig. 3 and 4 show the change of the indoor temperature and humidity under different control algorithms as the number of steps increases, respectively.
FIG. 3 is two sets of experimental graphs of temperature convergence change, and it can be known from FIG. 3(a) that the experimental aOMC method converges to the set parameter value, i.e. 25 ℃ in about 1800 steps, and can be maintained at this value, with good accuracy and stability; while experiment (b) changed the initial state value, reaching a converged preset value at about step 2200, which converged to the same effect as experiment (a). In comparison, in the two groups of experiments, although the PID algorithm and the fuzzy algorithm have smoother descending or ascending trends before reaching the preset value, the PID algorithm has poor stability after convergence and fluctuates at the set temperature value; the fuzzy algorithm has good stability but poor convergence accuracy, and cannot completely converge to a preset value. Experiments show that the OMC algorithm has better stability and convergence accuracy than the PID algorithm and the fuzzy algorithm.
FIG. 4 is a graph showing two sets of indoor humidity convergence tests, in which FIG. 4(a) achieves a convergence effect at about step 1000, and FIG. 4(b) converges to a set optimum humidity value of 50HR at about step 1600 due to a change in the initial set point of humidity; it can be seen from fig. 4 that PID reached the preset comfort value in experiment (a) and experiment (b) at approximately steps 1200 and 1800, respectively, but did not stabilize at the preset comfort value; the fuzzy algorithm starts to converge at approximately steps 1200 and 1600, respectively, but its convergence accuracy is not high. The experimental results show that: the OMC method has better performance than PID and fuzzy control, and can provide a good and comfortable hot and humid environment indoors.
FIG. 5 shows the indoor CO with increasing number of steps2The concentration varies under the control of different algorithms. By CO2The difference in concentration shows the quality of the indoor air quality, and fig. 5 shows CO2The difference between the two experiments in the concentration change pattern is that the initial state values set in experiment (a) are different. The initial value set in experiment (a) is slightly lower than the value set in experiment (b). The OMC converges at approximately 1000 steps in fig. 5 (a). FIG. 5(b) achieves convergence at approximately step 1400, PID converges at approximately step 1200 in FIG. 5(a), and FIG. 5(b) converges at step 1600; fuzzy begins to converge around 1000 steps in experiment (a) and converges around 1600 steps in experiment (b). The OMC algorithm is compared to both algorithms. Both algorithms have poorer convergence speed and accuracy than OMC. The experimental results show that: the OMC method can adjust and control the ventilation system in a shorter time while ensuring that the room air environment is good, so that the quality of the room air is improved, and the comfort of people in the room is greatly influenced by the indoor light environment.
Fig. 6 is a graph showing the change of the indoor illuminance according to the control algorithm with the increase of the number of steps. Fig. 6 is a graph showing a change in control illuminance. It can be seen that the OMC method has a convergence step number of about 1000 in the first set of experiments and 1400 in the second set of experiments, using the PID algorithm, fig. 6(a) shows convergence around 1200 steps, and fig. 6(b) converges around 1500 steps and fluctuates around 300 lx; by controlling the lighting system with Fuzzy, as shown in the figure, experiment (a) starts to converge at about 1200, and experiment (b) converges at about 1600 with a certain deviation of the convergence value from the preset value. The experimental result shows that the OMC algorithm is controlled to provide good indoor light environment, the convergence curve graph of each parameter in two groups of experiments using the OMC method is compared, the room temperature has the longest convergence time, the reason may be related to a ventilation system and indoor humidity environment, and the addition of more parameters and actions means that more complex control processes and convergence steps are needed. From the above-mentioned several groups of diagrams, the OMC method is compared with PID and Fuzzy algorithms, and the convergence rate and precision of the OMC method are found to be better.
FIG. 7 is a graph showing the convergence of the reward values for 200 episodes in this experiment. Experiment (a) in the first 50 plots, the fluctuation of return convergence is large, the vibration amplitude is greater than 2000, and agent is in the trial-and-error stage during this period; after the first 60 episodes of learning, the return value gradually stabilizes at about-7000, and the experiment (b) is a return convergence diagram in the second group of experiments, and after about 100 episodes of learning, the return value gradually stabilizes at about-1300.
FIG. 8 is a convergence step number graph showing the variation of the convergence step number for each of 200 episodes. As can be seen from the first graph, the convergence step number is maintained at the initial value and does not change for the first few episodes, after which the convergence step for the episodes begins to change. As can be seen from the figure, the convergence step number is greatly changed between about 3-60 plots, which indicates that the OMC is in the learning stage at this stage; after about 60-90 episodes, the oscillation amplitude of the system is small, and the OMC is in an adjusting stage at the moment; after 90 episodes the OMC converges around 1400 steps, indicating that the system finds the optimal strategy. In FIG. 8(b), the amplitude of the shock is large before 70 episodes, the shock is small during 70-110 episodes, and the convergence is reached after 110 episodes, which is about 1600 steps.
In order to improve the indoor comfort of people, the equipment such as an air conditioning system, a humidifier, a lighting system and a ventilation system in an office is controlled based on the Simmonte Carlo algorithm, simple model construction is carried out on the equipment, parameters such as input temperature and humidity, illumination intensity and carbon dioxide concentration are intelligently adjusted, and then all parameter values are controlled to be set optimal values to optimize indoor comfort.
The invention carries out simulation experiments based on the constructed model, and the experimental results show that: (1) the method can achieve good convergence and stability under different parameter settings, and can well improve the comfort level of the indoor environment; (2) compared with a PID algorithm and a fuzzy control method, the algorithm has the advantages of high convergence speed, good robustness, high precision and the like in the aspects of controlling building equipment.
The above examples are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims (3)

1. An indoor environment comfort level improving method based on a co-policy Monte Carlo algorithm is characterized by comprising the following steps:
s1, establishing a reward function and a state transition function;
s2, initializing operation value function Q (S)t,at) Learning rate alpha and discount rate gamma, where s is a state parameter, derived from the room temperature TrIndoor carbon dioxide concentration [ rho ]tIndoor illuminance ItIndoor humidity HtAnd real-time energy consumption EtForming; a is an action parameter which is composed of an air conditioning system action, an illumination system action, a humidifier and dehumidifier action and a ventilation system action;
s3, setting parameters of each episode, including N being 4000 unit time steps, and t being 0, that is, keeping each state and the action parameter in an initial state;
s4, at each time step in each episodeThe operation includes the comparison of the current state stCalculating the action factor a at the momentt(ii) a When the action at the moment is taken, the state transition situation is calculated according to the established state transition function, and the corresponding state s at the next moment is obtainedt+1(ii) a Then according to the above-mentioned established reward function formula it can calculate out stAnd action factor atLower reward value rt
S5, judging termination conditions: and judging whether the values of the action value functions under all the state factors are preset values or not, if not, returning to the step S3 to perform new plot operation, and if so, ending the circulation.
2. The method of claim 1, wherein in step S1: establishing reward functions such as formulas (1) to (5) and state transition functions such as formulas (6) to (10):
Figure FDA0002644882940000021
Figure FDA0002644882940000022
Figure FDA0002644882940000023
Figure FDA0002644882940000024
r=-ω1(T)-ω2(h)-ω3(I)-ω4(CO2) (5)
T(t+1)=T(t)-[(-1)AC/2×Tc×(1-0.2×VS)] (6)
Figure FDA0002644882940000025
h(t+1)=h(t)+0.1×H-0.1×DH (8)
ρ(t+1)=ρ(t)-0.2×VS (9)
I(t+1)=I(t)+(-1)L%2×0.1×L (10);
wherein the environmental state s ═ T1,h1,ρ1,I1]The parameters are shown in formula (1) to formula (5);
in formula (1), TsIs the set most comfortable temperature, TmaxIs the maximum value within the range;
in the formula (2), hsThe denominator represents the maximum value h of the value range for the most suitable indoor relative humiditymaxSubtracting the optimum humidity value hsA difference of (d);
in the formula (3), the reference plane and the height of the illumination are 0.75m horizontal plane, IsIndicating the optimum indoor illuminance, ImaxIs a set maximum illuminance value, if the illuminance exceeds ImaxThe human eyes feel uncomfortable, and the denominator represents the difference between the two;
in the formula (4), ρsIs a set target value, is outdoor CO2The lowest level, p, at which the concentration can be reachedmaxIs a set maximum value beyond which comfort disappears;
in equation (5), the value of r is the final evaluation criterion of the system, the value of r is controlled between [ -1, 0], and equation (5) represents the superposition of reward values under different weights of various parameters; in the formulas (1) to (4), the larger the deviation of the values of the parameters from the set values is, the closer the r value is to-1 (the smaller the r value is), and the larger the r value is otherwise; therefore, the expression (5) is represented by a negative sign; the weight w ═ 0.6, 0.1, 0.1, 0.2 here is obtained through a number of experiments, which ensures that the value of r is between-1, 0 and that the system maintains good performance;
in the algorithm, a state transfer function is shown as a formula (6) to a formula (10), wherein the formula (6) represents the change of temperature along with time; but when the air conditioner is operated,opening the ventilation system affects the indoor temperature, so it is shown in the equation that the influence of the ventilation system on the temperature is to add a weakening parameter of 0.2; in formula (7), TcThe temperature change rate is related to the strength of wind generated by the air conditioner; the expressions (8), (9) and (10) respectively represent humidity and CO2State transfer functions of density and illumination;
the action a related to the change of the equipment state in the algorithm is modeled as a matrix, and the horizontal dimension is a five-dimensional vector which is used for representing the action of different equipment; the first dimension ac (air conditioning) represents an operation of the air conditioner; the second dimension vs (ventilation system) represents the operation of the ventilation system; a third dimension h (humimidifier) represents the operation of the humidifier; the fourth dimension dh (dehumifier) represents the operation of the dehumidifier; the fifth dimension l (light) represents the lamp action.
3. The method of claim 1, wherein the co-policy monte carlo algorithm-based indoor environmental comfort level improving method comprises: the learning rate and discount rate are set to α ═ 0.1 and γ ═ 0.9.
CN202010851497.3A 2020-08-21 2020-08-21 Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm Pending CN112032982A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010851497.3A CN112032982A (en) 2020-08-21 2020-08-21 Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010851497.3A CN112032982A (en) 2020-08-21 2020-08-21 Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm

Publications (1)

Publication Number Publication Date
CN112032982A true CN112032982A (en) 2020-12-04

Family

ID=73580480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010851497.3A Pending CN112032982A (en) 2020-08-21 2020-08-21 Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm

Country Status (1)

Country Link
CN (1) CN112032982A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113357787A (en) * 2021-06-28 2021-09-07 天津大学 Preference and habit based modeling method for air conditioning behavior of multi-person office personnel

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104697107A (en) * 2013-12-10 2015-06-10 财团法人工业技术研究院 Intelligent learning energy-saving regulation and control system and method
CN104764173A (en) * 2014-03-11 2015-07-08 北京博锐尚格节能技术股份有限公司 Method, device and system for monitoring heating and ventilation air conditioning system
US20170102162A1 (en) * 2015-10-08 2017-04-13 Johnson Controls Technology Company Building management system with electrical energy storage optimization based on statistical estimates of ibdr event probabilities
CN106707999A (en) * 2017-02-09 2017-05-24 苏州科技大学 Building energy-saving system based on self-adaptive controller, control method and simulation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104697107A (en) * 2013-12-10 2015-06-10 财团法人工业技术研究院 Intelligent learning energy-saving regulation and control system and method
CN104764173A (en) * 2014-03-11 2015-07-08 北京博锐尚格节能技术股份有限公司 Method, device and system for monitoring heating and ventilation air conditioning system
US20170102162A1 (en) * 2015-10-08 2017-04-13 Johnson Controls Technology Company Building management system with electrical energy storage optimization based on statistical estimates of ibdr event probabilities
CN106707999A (en) * 2017-02-09 2017-05-24 苏州科技大学 Building energy-saving system based on self-adaptive controller, control method and simulation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113357787A (en) * 2021-06-28 2021-09-07 天津大学 Preference and habit based modeling method for air conditioning behavior of multi-person office personnel
CN113357787B (en) * 2021-06-28 2022-06-03 天津大学 Preference and habit based modeling method for air conditioning behavior of multi-person office personnel

Similar Documents

Publication Publication Date Title
Homod Analysis and optimization of HVAC control systems based on energy and performance considerations for smart buildings
CN105757892B (en) A kind of air conditioner intelligent control method and system based on personal information in air conditioning area
Nassif Modeling and optimization of HVAC systems using artificial neural network and genetic algorithm
Mossolly et al. Optimal control strategy for a multi-zone air conditioning system using a genetic algorithm
CN110186170B (en) Thermal comfort index PMV control method and equipment
CN109631241A (en) Architecture indoor a home from home regulating system
CN112963946B (en) Heating, ventilating and air conditioning system control method and device for shared office area
WO2023160110A1 (en) System frequency modulation method and system for thermostatically controlled load cluster, and electronic device and storage medium
CN106707999B (en) Building energy-saving system based on adaptive controller, control method and simulation
WO2019075821A1 (en) Method for controlling air conditioner in multimedia classroom
CN112032982A (en) Indoor environment comfort level improving method based on co-policy Monte Carlo algorithm
CN105674390A (en) Dynamic hydraulic balance adjusting method for centralized heating system
CN105674487A (en) Dynamic hydraulic balance adjusting method for central air conditioning system
CN110440385A (en) A kind of mechanical constructing device and method of comfortable natural-wind-imitating
WO2022198734A1 (en) Response priority-based dual optimization method for public building power demand response
CN116907036A (en) Deep reinforcement learning water chilling unit control method based on cold load prediction
CN111737857A (en) Heating ventilation air-conditioning cluster coordination control method based on interaction capacity curve
Hou et al. Real-time optimal control of HVAC systems: Model accuracy and optimization reward
CN111288610A (en) Variable static pressure self-adaptive fuzzy control method for variable air volume air conditioning system
Zhao et al. An optimal control method for discrete variable outdoor air volume setpoint determination in variable air volume systems
Ma et al. Test and evaluation of energy saving potentials in a complex building central chilling system using genetic algorithm
Yang et al. Hybrid artificial neural network− genetic algorithm technique for condensing temperature control of air-cooled chillers
Wang et al. A model-based control of CO2 concentration in multi-zone ACB air-conditioning systems
CN111829124A (en) Dry coal shed ventilation system and ventilation method
JP2022151563A (en) Extremal value seek control by stochastic gradient estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201204

RJ01 Rejection of invention patent application after publication