CN111505944B - Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control - Google Patents

Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control Download PDF

Info

Publication number
CN111505944B
CN111505944B CN201910091191.XA CN201910091191A CN111505944B CN 111505944 B CN111505944 B CN 111505944B CN 201910091191 A CN201910091191 A CN 201910091191A CN 111505944 B CN111505944 B CN 111505944B
Authority
CN
China
Prior art keywords
value
action
state
energy
reward
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910091191.XA
Other languages
Chinese (zh)
Other versions
CN111505944A (en
Inventor
谭建明
李绍斌
宋德超
陈翀
罗晓宇
邓家璧
王鹏飞
肖文轩
岳冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN201910091191.XA priority Critical patent/CN111505944B/en
Publication of CN111505944A publication Critical patent/CN111505944A/en
Application granted granted Critical
Publication of CN111505944B publication Critical patent/CN111505944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B15/00Systems controlled by a computer
    • G05B15/02Systems controlled by a computer electric
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM]
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/26Pc applications
    • G05B2219/2642Domotique, domestic, home control, automation, smart house

Abstract

The invention provides an energy-saving control strategy learning method, a method and a device for realizing air conditioning energy control, wherein the energy-saving control strategy learning method combines a Monte Carlo method and a reinforcement learning method, obtains an approximate solution of a problem by using a Monte Carlo sampling method, observes the transferred state and the obtained reward by executing a selected action on the current air conditioning environment, estimates a return value according to the sample average of the return values of all the states, and finally obtains an optimal control strategy. The invention also provides a method for realizing the air conditioning energy control based on the energy-saving control strategy learning method. The invention searches for the optimal control strategy through continuous interactive learning of the air conditioner operating environment, thereby achieving energy-saving control.

Description

Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control
Technical Field
The invention relates to the technical field of smart homes, in particular to an energy-saving control strategy learning method, and a method and a device for realizing air conditioning energy control.
Background
With the rapid development of science and technology, modern people are increasingly not satisfied with the existing living conditions, and instead, the people are urgently pursuing more comfortable living environments. At present, with the great improvement of the living standard of people, the air conditioner becomes one of the necessary household appliances for more and more families. However, the air conditioner consumes a large amount of electricity, which is a problem that consumers and manufacturers are very headache. In addition, the existing air conditioner control method mainly controls the temperature, and the energy-saving control of the air conditioner is difficult to realize due to the complex operation environment.
Disclosure of Invention
The invention provides an energy-saving control strategy learning method, a method and a device for realizing air conditioning energy control, which are used for searching an optimal control strategy through continuous interactive learning of an air conditioner operating environment so as to achieve air conditioner energy-saving control.
In a first aspect of the present invention, an energy saving control strategy learning method is provided, including:
s11, acquiring initial state parameters of the air conditioner, and determining an initial action value according to the initial state parameters;
s12, executing a control action corresponding to the initial action value, acquiring a target state parameter of the next state of the air conditioner and a generated energy-saving reward value after the control action is executed, and updating a sampling count value;
s13, searching a preset reward table according to the target state parameter to obtain a historical return value of a state action pair formed by the target state parameter and different preset action values, wherein the reward table comprises an energy-saving reward value and a historical return value of the state action pair formed by the state parameter and different preset action values;
s14, selecting a target action value in a state action pair formed by the target state parameters, wherein the probability that the state action pair corresponding to the target action value is the state action pair with the maximum historical return value in the formed state action pair is larger than a preset value;
s15, executing the control action corresponding to the target action value, and acquiring the generated target energy-saving reward value after the control action is executed;
s16, judging whether the sampling count value reaches a preset sampling threshold value;
if the sampling count value does not reach the preset sampling threshold value, repeatedly executing S12-S16, otherwise executing S17;
and S17, respectively counting the sampling mean value of the target energy-saving reward value of each state action pair formed by the target state parameters, taking the obtained sampling mean value as the estimated reward value of the corresponding state action pair, and updating the reward table according to the estimated reward value.
Optionally, after updating the prize table according to the estimated reward value, the method further comprises:
s18, updating an iteration count value, and judging whether the iteration count value reaches a preset iteration threshold value;
if the iteration count value does not reach the preset iteration threshold value, resetting the sampling count value, and repeatedly executing S12-S17, otherwise, ending the learning process.
Optionally, the selecting a target action value in a state action pair formed by the target state parameters includes:
and selecting the target action value in the state action pair formed by the target state parameters by adopting a gentle decision algorithm.
Optionally, the determining an initial action value according to the initial state parameter includes:
searching the reward table according to the initial state parameter;
and if the state action pair formed by the initial state parameters does not exist in the reward table, taking a preset default action value as the initial action value.
Optionally, the method further comprises:
if a state action pair formed by the initial state parameters exists in the reward table, acquiring historical return values of the state action pair formed by the initial state parameters and different preset action values;
and selecting the action value of the state action pair with the maximum historical return value among the state action pairs formed by the initial state parameters, and taking the selected action value as the initial action value.
In a second aspect of the present invention, there is provided a method for implementing air conditioning energy control based on the energy saving control strategy learning method described above, including:
acquiring current state parameters of the air conditioner;
searching a reward table learned by the energy-saving control strategy learning method according to the current state parameter to obtain a historical return value of a state action pair formed by the current state parameter and different preset action values;
selecting an action value of a state action pair with the maximum historical return value among the state action pairs formed by the current state parameters, and taking the selected action value as an optimal action value;
and executing the control action corresponding to the optimal action value to realize the energy-saving control of the air conditioner.
In a third aspect of the present invention, there is provided an energy-saving control strategy learning apparatus, including:
the first decision module is used for acquiring initial state parameters of the air conditioner and determining an initial action value according to the initial state parameters;
the first execution module is used for executing the control action corresponding to the initial action value, acquiring the target state parameter of the next state of the air conditioner and the generated energy-saving reward value after the control action is executed, and updating a sampling count value;
the processing module is used for searching a preset reward table according to the target state parameter so as to obtain a historical reward value of a state action pair formed by the target state parameter and different preset action values, and the reward table comprises an energy-saving reward value and a historical reward value of the state action pair formed by the state parameter and different preset action values;
a second decision module, configured to select a target action value from a state action pair formed by the target state parameters, where a probability that a state action pair corresponding to the target action value is a state action pair with a largest historical return value in the formed state action pair is greater than a preset value;
the second execution module is used for executing the control action corresponding to the target action value and acquiring the generated target energy-saving reward value after the control action is executed;
the first judging module is used for judging whether the sampling count value reaches a preset sampling threshold value or not, and if the sampling count value does not reach the preset sampling threshold value, returning to the first executing module;
and the learning module is used for respectively counting the sampling mean value of the target energy-saving reward value of each state action pair formed by the target state parameters when the judgment result of the judgment module is that the sampling count value reaches a preset sampling threshold value, taking the obtained sampling mean value as the estimated reward value of the corresponding state action pair, and updating the reward table according to the estimated reward value.
Optionally, the learning module is further configured to update an iteration count value after updating the reward table according to the estimated reward value;
the device further comprises:
the second judgment module is used for judging whether the iteration count value reaches a preset iteration threshold value;
the learning module is further configured to reset the sampling count value when the iteration count value does not reach a preset iteration threshold value, return to the first execution module, and end the learning process when the iteration count value reaches the preset iteration threshold value.
Optionally, the first decision module is specifically configured to search the bonus table according to the initial state parameter; if the state action pair formed by the initial state parameters does not exist in the reward table, taking a preset default action value as the initial action value; if the state action pair formed by the initial state parameters exists in the reward table, acquiring a historical return value of the state action pair formed by the initial state parameters and different preset action values, selecting an action value of the state action pair with the maximum historical return value among the state action pairs formed by the initial state parameters, and taking the selected action value as the initial action value.
A fourth aspect of the present invention provides an apparatus for implementing the air conditioning energy control based on the energy saving control strategy learning apparatus as described above, including:
the parameter acquisition module is used for acquiring current state parameters of the air conditioner;
the second processing module is used for searching a reward table learned by the energy-saving control strategy learning device according to the current state parameter so as to obtain a historical return value of a state action pair formed by the current state parameter and different preset action values;
a third decision module, configured to select an action value of a state action pair with a largest historical return value among state action pairs formed by the current state parameter, and use the selected action value as an optimal action value;
and the third execution module is used for executing the control action corresponding to the optimal action value to realize the energy-saving control of the air conditioner.
Furthermore, the invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth in any of the above.
The invention also provides an air conditioning device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of any one of the above methods.
The energy-saving control strategy learning method, the method for realizing the air conditioning energy control and the device thereof adopt the Monte Carlo method and the reinforcement learning method to be combined, the Monte Carlo sampling method is utilized to obtain the approximate solution of the problem, the transferred state and the obtained reward are observed by executing the selected action on the current air conditioning environment, the return value is estimated according to the sample average of the return value of each state by continuously interactive learning on the air conditioning running environment, and finally the optimal control strategy is obtained to achieve the energy-saving control.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flowchart of a method for learning an energy-saving control strategy according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a method for learning an energy saving control strategy according to another embodiment of the present invention;
fig. 3 is a schematic flowchart of a method for implementing air conditioning energy control based on an energy-saving control strategy learning method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an energy-saving control strategy learning apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a device for implementing air conditioning energy control based on an energy-saving control strategy learning device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Reinforcement learning is receiving increasing attention and is used in the field of artificial intelligence including industrial scheduling and path planning, and can be used for solving the decision-making problems of stochastic or uncertain dynamic system optimization. The reinforcement learning has outstanding significance and wide prospect in theoretical and practical application. The invention uses the reinforcement learning frame to control the air conditioner, the reinforcement learning is different from other algorithms (such as a neural network), the learning target of the reinforcement learning is changed and undefined, even absolute correct labels may not exist, the air conditioner control environment is more complex, the control target of the reinforcement learning is more and is influenced by the environment to be in dynamic change, therefore, the reinforcement learning is used to realize the learning of the energy-saving control strategy, the algorithm continuously and interactively learns the air conditioner operation environment, the optimal control strategy is searched, and the energy-saving control is achieved.
Fig. 1 schematically shows a flowchart of an energy saving control strategy learning method according to an embodiment of the present invention. Referring to fig. 1, the energy-saving control strategy learning method provided in the embodiment of the present invention specifically includes steps S11 to S17, as follows:
and S11, acquiring initial state parameters of the air conditioner, and determining an initial action value according to the initial state parameters.
Specifically, the state parameter of the air conditioner represents the current operating environment parameter of the air conditioner, including the temperature of the inner pipe, the indoor temperature, and the like, and the outdoor environment temperature, and the like.
And S12, executing a control action corresponding to the initial action value, acquiring a target state parameter of the next state of the air conditioner and a generated energy-saving reward value after the control action is executed, and updating a sampling count value n.
In this embodiment, after the control action is executed, the target state parameter and the energy-saving reward value of the next state are acquired and stored in a preset reward table.
The energy-saving reward value represents the comprehensive index of comfort and energy-saving of the air conditioner fed back after the air conditioner is controlled to run by specific actions and is used for guiding the algorithm to adjust the return value, namely the Q value of a state-action value function. The reward is that an action is executed for the air conditioner, and then the air conditioner returns relevant data to calculate the execution quality of the action, so that the algorithm is guided to adjust and output more suitable actions, and energy-saving control is achieved.
S13, searching a preset reward table according to the target state parameter to obtain a historical return value of a state action pair formed by the target state parameter and different preset action values, wherein the reward table comprises an energy-saving reward value and a historical return value of the state action pair formed by the state parameter and different preset action values.
Specifically, the operation value represents a control parameter of the air conditioner, and includes a compressor rotation speed, an electronic expansion valve opening degree, or a combination thereof. In this embodiment, different operation values are preset.
S14, selecting a target action value in a state action pair formed by the target state parameters, wherein the probability that the state action pair corresponding to the target action value is the state action pair with the maximum historical return value in the formed state action pair is larger than a preset value.
In this embodiment, a gentle decision algorithm may be adopted to select the target action value from the state action pair formed by the target state parameters. Specifically, the optimal action is "selected through a state-action value function Q," an action value in a state action pair with the largest historical return value among the state action pairs, and the random action refers to an action value randomly selected from the selectable state action pairs.
And S15, executing the control action corresponding to the target action value, and acquiring the generated target energy-saving reward value after the control action is executed.
And S16, judging whether the sampling count value N reaches a preset sampling threshold value N.
If the sampling count value N does not reach the preset sampling threshold value N, repeatedly executing S12-S16 until the repeated execution is carried out for N times, otherwise, executing S17;
where N denotes a monte carlo one-round sampling process, N denotes a sampling count value of the counter, and N starts counting from 1 in each round.
And S17, respectively counting the sampling mean value of the target energy-saving reward value of each state action pair formed by the target state parameters, taking the obtained sampling mean value as the estimated reward value of the corresponding state action pair, and updating the reward table according to the estimated reward value.
In this embodiment, the average value of the cumulative reward sampling values of each state-action pair is obtained from the reward table, and is used as the estimated reward value of the corresponding state-action pair, and the reward table is updated according to the estimated reward value, so as to replace the historical reward value of the corresponding state-action pair with the learned estimated reward value.
Further, after the control action corresponding to the initial action value is executed in step S12, if the air conditioner is abnormal, for example, the compressor is turned off, the method records that the air conditioner environment parameter is abnormal, and returns to step S12, otherwise, the method returns to step S13, and continues to execute the subsequent process.
The energy-saving control strategy learning method, the method for realizing the air conditioning energy control and the device thereof adopt the Monte Carlo method and the reinforcement learning method to be combined, the Monte Carlo sampling method is utilized to obtain the approximate solution of the problem, the transferred state and the obtained reward are observed by executing the selected action on the current air conditioning environment, the return value is estimated according to the sample average of the return value of each state by continuously interactive learning on the air conditioning running environment, and finally the optimal control strategy is obtained to achieve the energy-saving control.
In another embodiment of the present invention, referring to fig. 2, after updating the bonus table according to the estimated reward value, the method further comprises step S18:
and S18, updating an iteration count value, judging whether the iteration count value reaches a preset iteration threshold value, if the iteration count value does not reach the preset iteration threshold value, resetting the sampling count value, namely setting N to 0, and repeatedly executing S12-S17 until the repeated execution is carried out for N times, otherwise, ending the learning process.
In this embodiment, by determining whether the updated iteration count value satisfies the preset iteration threshold, if so, the process is ended; if the execution is not satisfied, the process is repeatedly executed from S12 to S17, and the learning is continued.
The energy-saving control strategy learning method provided by the invention does not need to rely on a Markov process, and can select an interested state to evaluate a function when a model is unknown without traversing all value functions.
In the embodiment of the invention, the algorithm selects corresponding actions each time, and the air conditioner feeds back related data to update the network and select the next action after executing the actions. Thus, the Q value is continuously updated in an iterative manner, and the learning of the model is stopped when the maximum iteration times is reached. And the air conditioner executes the reward table to perform energy-saving optimization control according to the action estimated by the current state. The algorithm can learn the optimal control strategy which accords with the space environment of the current air conditioner according to the specific environment of the air conditioner, namely self-adaptive energy-saving control.
In an embodiment of the present invention, the determining an initial action value according to the initial state parameter specifically includes:
searching the reward table according to the initial state parameter;
and if the state action pair formed by the initial state parameters does not exist in the reward table, taking a preset default action value as the initial action value.
Further, if a state action pair formed by the initial state parameters exists in the reward table, acquiring a historical return value of the state action pair formed by the initial state parameters and different preset action values; and selecting the action value of the state action pair with the maximum historical return value among the state action pairs formed by the initial state parameters, and taking the selected action value as the initial action value.
In this embodiment, if the state action pair formed by the initial state parameter does not exist in the reward table, it is proved that the learning is from the nonexistence to the existing learning, and the initial action value is implemented by using a preset default action value. If the state action pair formed by the initial state parameters exists in the reward table, the condition that the learning record exists before is shown, the learning is continued on the basis of the previous learning, and the action value of the state action pair with the maximum historical return value in the state action pair formed by the initial state parameters can be directly selected as the initial action value, and the subsequent learning is continued.
In this embodiment, for unknown state action pairs, the history report value is set to 0 by default.
The energy-saving control strategy learning method adopts a Monte Carlo reinforcement learning algorithm to realize energy-saving control strategy learning, wherein Monte Carlo reinforcement learning is used for evaluating strategies based on sampling, and an algorithm model can better simulate the control of an air conditioner operation environment (model unknown). Through continuous learning, the algorithm can find a more energy-saving control strategy under the condition that the operation of the air conditioner is kept to meet the set conditions. The method solves the problem of high energy consumption of the existing air conditioner, comprehensively controls the operation environment of the air conditioner and solves the problem of simulation optimization of the execution parameters of the air conditioner.
Fig. 3 schematically shows a flowchart of a method for implementing the air conditioning energy control based on the energy-saving control strategy learning method according to an embodiment of the present invention. Referring to fig. 3, the method for implementing air conditioning energy control based on the energy-saving control strategy learning method provided in the embodiment of the present invention specifically includes steps S21 to S24, as follows:
and S21, acquiring the current state parameters of the air conditioner.
S22, searching the reward table learned by the energy-saving control strategy learning method according to the current state parameter to obtain the historical return value of the state action pair formed by the current state parameter and different preset action values.
And S23, selecting the action value of the state action pair with the maximum historical return value among the state action pairs formed by the current state parameters, and taking the selected action value as the optimal action value.
And S24, executing the control action corresponding to the optimal action value to realize the energy-saving control of the air conditioner.
The invention uses the idea of reinforcement learning to continuously learn the air conditioner through dynamic adjustment, thereby finding out the optimal control strategy which is suitable for the environment where the air conditioner is located.
The general reinforcement learning method needs to obtain the state transition probability in the air-conditioning environment, the distribution of the state transition probability needs to accord with the limited Markov process, and the air-conditioning control cannot well meet the requirements. Therefore, the Monte Carlo-based reinforcement learning method is used, the Monte Carlo method does not need to meet the requirements, and the probability distribution of the states is estimated by continuously collecting samples, so that a better control strategy is found, and energy-saving control is achieved.
For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Fig. 4 schematically shows a structural diagram of an energy saving control strategy learning apparatus according to an embodiment of the present invention. Referring to fig. 4, the energy saving control strategy learning apparatus according to the embodiment of the present invention specifically includes a first decision module 401, a first execution module 402, a processing module 403, a second decision module 404, a second execution module 405, a first judgment module 406, and a learning module 407, where:
a first decision module 401, configured to obtain an initial state parameter of an air conditioner, and determine an initial action value according to the initial state parameter;
a first executing module 402, configured to execute a control action corresponding to the initial action value, obtain a target state parameter of a next state of the air conditioner and a generated energy saving reward value after the control action is executed, and update a sampling count value;
a processing module 403, configured to search a preset reward table according to the target state parameter to obtain a historical reward value of a state action pair formed by the target state parameter and different preset action values, where the reward table includes an energy-saving reward value and a historical reward value of the state action pair formed by the state parameter and different preset action values;
a second decision module 404, configured to select a target action value in a state action pair formed by the target state parameters, where a probability that a state action pair corresponding to the target action value is a state action pair with a largest historical return value in the formed state action pair is greater than a preset value;
a second executing module 405, configured to execute a control action corresponding to the target action value, and obtain a generated target energy saving reward value after the control action is executed;
a first determining module 406, configured to determine whether the sampling count value reaches a preset sampling threshold, and if the sampling count value does not reach the preset sampling threshold, return to the first executing module;
a learning module 407, configured to, when the determination result of the determining module is that the sampling count value reaches a preset sampling threshold value, respectively count sampling mean values of target energy saving reward values of each state action pair formed by the target state parameters, use the obtained sampling mean values as estimated reward values of corresponding state action pairs, and update the reward table according to the estimated reward values.
In an embodiment of the present invention, the learning module 407 is further configured to update an iteration count value after updating the bonus table according to the estimated reward value;
correspondingly, the apparatus further includes a second determining module, not shown in the drawings, configured to determine whether the iteration count value reaches a preset iteration threshold value;
the learning module 407 is further configured to reset the sampling count value when the iteration count value does not reach a preset iteration threshold, and return to the first execution module, and when the iteration count value reaches the preset iteration threshold, end the learning process.
The second decision module 404 is specifically configured to select the target action value in the state action pair formed by the target state parameters by using a soft decision algorithm.
In this embodiment of the present invention, the first decision module 401 is specifically configured to search the bonus table according to the initial state parameter; and if the state action pair formed by the initial state parameters does not exist in the reward table, taking a preset default action value as the initial action value.
Further, the first decision module 401 is specifically configured to, if a state action pair formed by the initial state parameter exists in the reward table, obtain a historical return value of the state action pair formed by the initial state parameter and different preset action values, select an action value of the state action pair with the largest historical return value among the state action pairs formed by the initial state parameter, and use the selected action value as the initial action value.
Fig. 5 is a schematic structural diagram of an apparatus for implementing the air conditioning energy control based on the energy-saving control strategy learning apparatus according to an embodiment of the present invention. Referring to fig. 5, the apparatus for implementing air conditioning energy control based on an energy-saving control strategy learning apparatus in the embodiment of the present invention specifically includes a parameter acquisition module 501, a second processing module 502, a third decision module 503, and a third execution module 504, where:
a parameter acquisition module 501, configured to acquire current state parameters of the air conditioner;
a second processing module 502, configured to search, according to the current state parameter, a reward table learned by using the energy-saving control policy learning apparatus, so as to obtain a historical return value of a state action pair formed by the current state parameter and different preset action values;
a third decision module 503, configured to select an action value of a state action pair with a largest historical return value among the state action pairs formed by the current state parameter, and use the selected action value as an optimal action value;
and a third executing module 504, configured to execute the control action corresponding to the optimal action value, so as to implement energy saving control of the air conditioner.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The energy-saving control strategy learning method, the method for realizing the air conditioning energy control and the device thereof adopt the Monte Carlo method and the reinforcement learning method to be combined, the Monte Carlo sampling method is utilized to obtain the approximate solution of the problem, the transferred state and the obtained reward are observed by executing the selected action on the current air conditioning environment, the return value is estimated according to the sample average of the return value of each state by continuously interactive learning on the air conditioning running environment, and finally the optimal control strategy is obtained to achieve the energy-saving control.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method according to any of the above embodiments.
In this embodiment, the module/unit integrated with the air-conditioning energy-saving control device or the device for realizing air-conditioning energy control based on the energy-saving control strategy learning device may be stored in a computer-readable storage medium if it is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The air conditioning equipment provided by the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the steps in the energy-saving control strategy learning method embodiment or realize the steps in the method embodiment of the air conditioning energy control based on the energy-saving control strategy learning method. Alternatively, when the processor executes the computer program, the processor implements the functions of each module/unit in the energy saving control policy learning apparatus embodiment, for example, the first decision module 401, the first execution module 402, the processing module 403, the second decision module 404, the second execution module 405, the first judgment module 406, and the learning module 407 shown in fig. 4, or implements the functions of each module/unit in the apparatus embodiment that implements the air conditioning energy control based on the energy saving control policy learning apparatus, for example, the parameter collection module 501, the second processing module 502, the third decision module 503, and the third execution module 504 shown in fig. 5.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the air-conditioning energy-saving control apparatus or the execution process in the apparatus for implementing air-conditioning energy control based on the energy-saving control strategy learning apparatus.
The air conditioning equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The air conditioning device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the air conditioning apparatus in the present embodiment may include more or less components, or combine some components, or different components, for example, the air conditioning apparatus may further include an input-output device, a network access device, a bus, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the air conditioning apparatus and connecting the various parts of the entire air conditioning apparatus with various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the air conditioning equipment by operating or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. An energy-saving control strategy learning method is characterized by comprising the following steps:
s11, acquiring initial state parameters of the air conditioner, and determining an initial action value according to the initial state parameters;
s12, executing a control action corresponding to the initial action value, acquiring a target state parameter of the next state of the air conditioner and a generated energy-saving reward value after the control action is executed, and updating a sampling count value;
s13, searching a preset reward table according to the target state parameter to obtain a historical return value of a state action pair formed by the target state parameter and different preset action values, wherein the reward table comprises an energy-saving reward value and a historical return value of the state action pair formed by the target state parameter and different preset action values;
s14, selecting a target action value in a state action pair formed by the target state parameters, wherein the probability that the state action pair corresponding to the target action value is the state action pair with the maximum historical return value in the formed state action pair is larger than a preset value;
s15, executing the control action corresponding to the target action value, and acquiring the generated target energy-saving reward value after the control action is executed;
s16, judging whether the sampling count value reaches a preset sampling threshold value;
if the sampling count value does not reach the preset sampling threshold value, repeatedly executing S12-S16, otherwise executing S17;
and S17, respectively counting the sampling mean value of the target energy-saving reward value of each state action pair formed by the target state parameters, taking the obtained sampling mean value as the estimated reward value of the corresponding state action pair, and updating the reward table according to the estimated reward value.
2. The method of claim 1, wherein after updating the rewards table according to the estimated reward value, the method further comprises:
s18, updating an iteration count value, and judging whether the iteration count value reaches a preset iteration threshold value;
if the iteration count value does not reach the preset iteration threshold value, resetting the sampling count value, and repeatedly executing S12-S17, otherwise, ending the learning process.
3. The method of claim 1, wherein selecting a target action value in a state action pair formed by the target state parameters comprises:
and selecting the target action value in the state action pair formed by the target state parameters by adopting a gentle decision algorithm.
4. The method according to any of claims 1-3, wherein said determining an initial action value from said initial state parameter comprises:
searching the reward table according to the initial state parameter;
and if the state action pair formed by the initial state parameters does not exist in the reward table, taking a preset default action value as the initial action value.
5. The method of claim 4, further comprising:
if a state action pair formed by the initial state parameters exists in the reward table, acquiring historical return values of the state action pair formed by the initial state parameters and different preset action values;
and selecting the action value of the state action pair with the maximum historical return value among the state action pairs formed by the initial state parameters, and taking the selected action value as the initial action value.
6. A method for implementing the air conditioning energy control based on the energy saving control strategy learning method according to any one of claims 1 to 5, comprising:
acquiring current state parameters of the air conditioner;
searching a reward table learned by the energy-saving control strategy learning method according to the current state parameter to obtain a historical return value of a state action pair formed by the current state parameter and different preset action values;
selecting an action value of a state action pair with the maximum historical return value among the state action pairs formed by the current state parameters, and taking the selected action value as an optimal action value;
and executing the control action corresponding to the optimal action value to realize the energy-saving control of the air conditioner.
7. An energy-saving control strategy learning device, comprising:
the first decision module is used for acquiring initial state parameters of the air conditioner and determining an initial action value according to the initial state parameters;
the first execution module is used for executing the control action corresponding to the initial action value, acquiring the target state parameter of the next state of the air conditioner and the generated energy-saving reward value after the control action is executed, and updating a sampling count value;
the processing module is used for searching a preset reward table according to the target state parameter so as to obtain a historical reward value of a state action pair formed by the target state parameter and different preset action values, and the reward table comprises an energy-saving reward value and a historical reward value of the state action pair formed by the target state parameter and different preset action values;
a second decision module, configured to select a target action value from a state action pair formed by the target state parameters, where a probability that a state action pair corresponding to the target action value is a state action pair with a largest historical return value in the formed state action pair is greater than a preset value;
the second execution module is used for executing the control action corresponding to the target action value and acquiring the generated target energy-saving reward value after the control action is executed;
the first judging module is used for judging whether the sampling count value reaches a preset sampling threshold value or not, and if the sampling count value does not reach the preset sampling threshold value, returning to the first executing module;
and the learning module is used for respectively counting the sampling mean value of the target energy-saving reward value of each state action pair formed by the target state parameters when the judgment result of the judgment module is that the sampling count value reaches a preset sampling threshold value, taking the obtained sampling mean value as the estimated reward value of the corresponding state action pair, and updating the reward table according to the estimated reward value.
8. The apparatus of claim 7, wherein the learning module is further configured to update an iteration count value after updating the rewards table based on the estimated reward value;
the device further comprises:
the second judgment module is used for judging whether the iteration count value reaches a preset iteration threshold value;
the learning module is further configured to reset the sampling count value when the iteration count value does not reach a preset iteration threshold value, return to the first execution module, and end the learning process when the iteration count value reaches the preset iteration threshold value.
9. The apparatus according to claim 7 or 8, wherein the first decision module is specifically configured to look up the bonus table according to the initial state parameter; if the state action pair formed by the initial state parameters does not exist in the reward table, taking a preset default action value as the initial action value; if the state action pair formed by the initial state parameters exists in the reward table, acquiring a historical return value of the state action pair formed by the initial state parameters and different preset action values, selecting an action value of the state action pair with the maximum historical return value among the state action pairs formed by the initial state parameters, and taking the selected action value as the initial action value.
10. An apparatus for implementing the idling control power control based on the energy-saving control strategy learning apparatus according to any one of claims 7 to 9, comprising:
the parameter acquisition module is used for acquiring current state parameters of the air conditioner;
the second processing module is used for searching a reward table learned by the energy-saving control strategy learning device according to the current state parameter so as to obtain a historical return value of a state action pair formed by the current state parameter and different preset action values;
a third decision module, configured to select an action value of a state action pair with a largest historical return value among state action pairs formed by the current state parameter, and use the selected action value as an optimal action value;
and the third execution module is used for executing the control action corresponding to the optimal action value to realize the energy-saving control of the air conditioner.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5 or the steps of the method according to claim 6.
12. An air conditioning apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method according to any one of claims 1 to 5 or the steps of the method according to claim 6.
CN201910091191.XA 2019-01-30 2019-01-30 Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control Active CN111505944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910091191.XA CN111505944B (en) 2019-01-30 2019-01-30 Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910091191.XA CN111505944B (en) 2019-01-30 2019-01-30 Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control

Publications (2)

Publication Number Publication Date
CN111505944A CN111505944A (en) 2020-08-07
CN111505944B true CN111505944B (en) 2021-06-11

Family

ID=71874024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910091191.XA Active CN111505944B (en) 2019-01-30 2019-01-30 Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control

Country Status (1)

Country Link
CN (1) CN111505944B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112394697A (en) * 2020-11-24 2021-02-23 中国铁路设计集团有限公司 Railway station building equipment monitoring and energy management system, program and storage medium
CN114580688A (en) * 2020-11-30 2022-06-03 中兴通讯股份有限公司 Control model optimization method of water cooling system, electronic equipment and storage medium
CN114251788B (en) * 2021-12-18 2023-03-17 珠海格力电器股份有限公司 Air conditioner energy consumption prompting method and system for rental platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101033882A (en) * 2007-04-28 2007-09-12 珠海格力电器股份有限公司 Air-conditioning unit operating according to user-defined curve and control method therefor
CN103017290A (en) * 2011-09-21 2013-04-03 珠海格力电器股份有限公司 Air conditioner electric energy control device and air conditioner electric energy management method
CN104132420A (en) * 2013-05-02 2014-11-05 珠海格力电器股份有限公司 Low-power consumption standby circuit device, air conditioner and control method of air conditioner
CN108844183A (en) * 2018-06-04 2018-11-20 珠海格力电器股份有限公司 A kind of energy-saving control method, device and household appliance

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103168278B (en) * 2010-08-06 2017-01-18 加利福尼亚大学董事会 Systems and methods for analyzing building operations sensor data
CN103248693A (en) * 2013-05-03 2013-08-14 东南大学 Large-scale self-adaptive composite service optimization method based on multi-agent reinforced learning
CN103324085B (en) * 2013-06-09 2016-03-02 中国科学院自动化研究所 Based on the method for optimally controlling of supervised intensified learning
US20150178421A1 (en) * 2013-12-20 2015-06-25 BrightBox Technologies, Inc. Systems for and methods of modeling, step-testing, and adaptively controlling in-situ building components
US10012965B2 (en) * 2013-12-27 2018-07-03 Quirky Ip Licensing Llc Window air conditioning apparatus and controller
CN104123598A (en) * 2014-08-07 2014-10-29 山东大学 Charging mode selection method based on multi-objective optimization for electric private car
US9869484B2 (en) * 2015-01-14 2018-01-16 Google Inc. Predictively controlling an environmental control system
US10465931B2 (en) * 2015-01-30 2019-11-05 Schneider Electric It Corporation Automated control and parallel learning HVAC apparatuses, methods and systems
CN104680339B (en) * 2015-03-26 2017-11-07 中国地质大学(武汉) A kind of household electrical appliance dispatching method based on Spot Price
US9482442B1 (en) * 2015-04-24 2016-11-01 Dataxu, Inc. Decision dashboard balancing competing objectives
US10839302B2 (en) * 2015-11-24 2020-11-17 The Research Foundation For The State University Of New York Approximate value iteration with complex returns by bounding
US10101050B2 (en) * 2015-12-09 2018-10-16 Google Llc Dispatch engine for optimizing demand-response thermostat events
CN107065582B (en) * 2017-03-31 2023-09-29 苏州科技大学 Indoor air intelligent adjusting system and method based on environment parameters
CN108419439B (en) * 2017-05-22 2020-06-30 深圳微自然创新科技有限公司 Household equipment learning method and server
JP6530783B2 (en) * 2017-06-12 2019-06-12 ファナック株式会社 Machine learning device, control device and machine learning program
CN107314477B (en) * 2017-07-04 2020-01-03 河南工程学院 Intelligent distribution system for refrigerating capacity of central air conditioner
CN107315572B (en) * 2017-07-19 2020-08-11 北京上格云技术有限公司 Control method of building electromechanical system, storage medium and terminal equipment
CN108088006A (en) * 2017-12-11 2018-05-29 珠海格力电器股份有限公司 Energy-saving type air conditioner and control method
CN108717873A (en) * 2018-07-20 2018-10-30 同济大学 A kind of space luminous environment AI regulating systems based on unsupervised learning technology
CN109166066A (en) * 2018-10-09 2019-01-08 河南水利与环境职业学院 A kind of Modeling Teaching of Mathematics learning system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101033882A (en) * 2007-04-28 2007-09-12 珠海格力电器股份有限公司 Air-conditioning unit operating according to user-defined curve and control method therefor
CN103017290A (en) * 2011-09-21 2013-04-03 珠海格力电器股份有限公司 Air conditioner electric energy control device and air conditioner electric energy management method
CN104132420A (en) * 2013-05-02 2014-11-05 珠海格力电器股份有限公司 Low-power consumption standby circuit device, air conditioner and control method of air conditioner
CN108844183A (en) * 2018-06-04 2018-11-20 珠海格力电器股份有限公司 A kind of energy-saving control method, device and household appliance

Also Published As

Publication number Publication date
CN111505944A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN111505944B (en) Energy-saving control strategy learning method, and method and device for realizing air conditioning energy control
CN111010700B (en) Method and device for determining load threshold
CN111476422A (en) L ightGBM building cold load prediction method based on machine learning framework
WO2021129086A1 (en) Traffic prediction method, device, and storage medium
CN107870810B (en) Application cleaning method and device, storage medium and electronic equipment
CN113379564A (en) Power grid load prediction method and device and terminal equipment
EP3872656A1 (en) Information processing apparatus, information processing method, and program
CN112801154B (en) Behavior analysis method and system for orphan elderly people
CN110956277A (en) Interactive iterative modeling system and method
WO2019085754A1 (en) Application cleaning method and apparatus, and storage medium and electronic device
CN114492279A (en) Parameter optimization method and system for analog integrated circuit
CN111949498A (en) Application server abnormity prediction method and system
CN107943537B (en) Application cleaning method and device, storage medium and electronic equipment
CN114139604A (en) Online learning-based electric power industrial control attack monitoring method and device
CN111338227B (en) Electronic appliance control method and control device based on reinforcement learning and storage medium
CN107861769B (en) Application cleaning method and device, storage medium and electronic equipment
CN111609525A (en) Air conditioner control method and device, electronic equipment and storage medium
CN109062396B (en) Method and device for controlling equipment
CN112486683B (en) Processor control method, control apparatus, and computer-readable storage medium
CN115081515A (en) Energy efficiency evaluation model construction method and device, terminal and storage medium
CN113741402A (en) Equipment control method and device, computer equipment and storage medium
CN113988670A (en) Comprehensive enterprise credit risk early warning method and system
CN113993343A (en) Energy-saving control method and device for refrigeration equipment, terminal and storage medium
CN114679899B (en) Self-adaptive energy-saving control method and device, medium and equipment for machine room air conditioner
CN113692177B (en) Control method, device and terminal for power consumption of refrigeration system of data center

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Tan Jianming

Inventor after: Yue Dong

Inventor after: Li Shaobin

Inventor after: Song Dechao

Inventor after: Chen Li

Inventor after: Tang Jie

Inventor after: Luo Xiaoyu

Inventor after: Deng Jiabi

Inventor after: Wang Pengfei

Inventor after: Xiao Wenxuan

Inventor before: Tan Jianming

Inventor before: Li Shaobin

Inventor before: Song Dechao

Inventor before: Chen Li

Inventor before: Luo Xiaoyu

Inventor before: Deng Jiabi

Inventor before: Wang Pengfei

Inventor before: Xiao Wenxuan

Inventor before: Yue Dong

CB03 Change of inventor or designer information
TR01 Transfer of patent right

Effective date of registration: 20221012

Address after: 519015 Room 601, Lianshan Lane, Jida Jingshan Road, Zhuhai City, Guangdong Province

Patentee after: Zhuhai Lianyun Technology Co.,Ltd.

Address before: 519070, Jinji Hill Road, front hill, Zhuhai, Guangdong

Patentee before: GREE ELECTRIC APPLIANCES Inc. OF ZHUHAI

TR01 Transfer of patent right