CN115773579A

CN115773579A - Energy-saving control method and device for water heater and water heater

Info

Publication number: CN115773579A
Application number: CN202211521132.XA
Authority: CN
Inventors: 罗晓宇; 唐杰; 陈向文; 岳冬
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-03-10

Abstract

The invention discloses an energy-saving control method and device for a water heater and the water heater. Wherein, the method comprises the following steps: when a starting instruction of the water heater is received, controlling the water heater to operate according to a first action instruction; acquiring a reward value of the water heater after the water heater operates according to a first action instruction, wherein the reward value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater; and when the reward value is larger than the preset value, continuing to acquire subsequent state information and generating a second action instruction based on the subsequent state information until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition time comes, and the second action instruction is used for controlling the operation of the water heater. The invention solves the technical problems that the reliability of the energy-saving control mode of the water heater in the related technology is lower and the energy-saving purpose can not be really achieved.

Description

Energy-saving control method and device for water heater and water heater

Technical Field

The invention relates to the technical field of intelligent control of household appliances, in particular to an energy-saving control method and device for a water heater and the water heater.

Background

At present, a water heater becomes an indispensable device for improving the internal environment quality in a household or public place, but the water heater is also one of devices with larger energy consumption. The existing energy-saving control of the water heater mainly carries out range-defining control on individual control parameters in a control theory, and the control process is rough, so that the water heater consumes more power and wastes energy.

Aiming at the problems that the mode for performing energy-saving control on the water heater in the related technology has lower reliability and can not really achieve the purpose of energy saving, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides an energy-saving control method and device for a water heater and the water heater, and aims to at least solve the technical problems that the reliability of an energy-saving control mode for the water heater in the related technology is low, and the energy-saving purpose cannot be really achieved.

According to an aspect of an embodiment of the present invention, there is provided an energy saving control method for a water heater, including: when a starting instruction of a water heater is received, controlling the water heater to operate according to a first action instruction, wherein the first action instruction is generated based on first action data, the first action data is data generated by an energy-saving control model based on current state information, the current state information is information related to the operation of the water heater, the input of the energy-saving control model is state information, and the output of the energy-saving control model is action data; acquiring an incentive value of the water heater after the water heater operates according to the first action instruction, wherein the incentive value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater; and when the reward value is larger than a preset value, continuing to acquire subsequent state information and generating a second action instruction based on the subsequent state information until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition time arrives, and the second action instruction is used for controlling the operation of the water heater.

Optionally, the energy saving control method of the water heater further comprises one of the following steps: responding to a trigger operation acted on a control panel of the water heater to generate the starting instruction; and when the scheduled starting time comes, generating the starting instruction.

Optionally, the energy saving control method of the water heater further includes: acquiring multiple groups of historical state information in historical time periods; and training a deep enhanced network DQN model by using the multiple groups of historical state information to obtain the energy-saving control model.

Optionally, training an initial model using the plurality of sets of historical state information includes: extracting a plurality of sets of historical state data from each of the plurality of sets of historical state information; inputting the plurality of sets of historical state data to the DQN model starting from a first set of the plurality of sets of historical state data; acquiring target action information with the maximum evaluation value in the action output information of the DQN model; controlling the water heater to operate according to the target action information, and acquiring a historical reward value of the water heater after the water heater operates according to the target action information; and circularly executing the steps until the execution times are more than the preset execution times.

Optionally, the energy saving control method of the water heater further includes: storing sets of training data including current historical state data, the target action, the historical reward value, and historical subsequent state data to a predetermined storage medium.

Optionally, the energy saving control method of the water heater further includes: and releasing the information stored in the predetermined storage medium firstly when the storage space of the predetermined storage medium is determined to be full.

Optionally, after the above steps are executed in a loop until the execution times is greater than a predetermined execution time, the energy saving control method of the water heater further includes: selecting part of the multiple groups of training data as sample data; inputting the sample data into a current value network and a target value network respectively, wherein the current value network and the target value network are both network structures in the DQN model; acquiring an output difference value of the current value network and the target value network; generating updating data based on the output difference value and the reward value in the sample data; updating the current value network with the update data; and when the update times of the current value network exceed the preset times, assigning the network parameters of the current value network to the target value network to obtain the energy-saving control model.

Optionally, the energy saving control method of the water heater further includes: when the corresponding action numerical value in the first action instruction and the second action instruction is larger than a preset action numerical value, generating a third action instruction based on action constraint data, wherein the action constraint data is generated according to attribute information of the water heater; and controlling the water heater to operate according to the third action instruction.

Optionally, the current state information includes: the system comprises outdoor environment temperature, current time, compressor running frequency, the rotating speed of an outer fan of the water heater, the temperature of a water tank of the water heater and set temperature.

According to another aspect of the embodiments of the present invention, there is provided an energy saving control device for a water heater, including: the control system comprises a first control unit, a second control unit and a control unit, wherein the first control unit is used for controlling a water heater to operate according to a first action instruction when receiving a starting instruction of the water heater, the first action instruction is generated based on first action data, the first action data is data generated by an energy-saving control model based on current state information, the current state information is information related to the operation of the water heater, the input of the energy-saving control model is state information, and the output of the energy-saving control model is action data; the first obtaining unit is used for obtaining a reward value after the water heater operates according to the first action instruction, wherein the reward value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater; and the processing unit is used for continuously acquiring subsequent state information and generating a second action instruction based on the subsequent state information when the reward value is larger than a preset value until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition time comes, and the second action instruction is used for controlling the operation of the water heater.

Optionally, the energy saving control device of the water heater further comprises one of the following: the first generation unit is used for responding to the trigger operation acted on a control panel of the water heater and generating the starting instruction; and the second generation unit is used for generating the starting instruction when the reserved starting time is determined to arrive.

Optionally, the energy saving control device of the water heater further includes: the second acquisition unit is used for acquiring multiple groups of historical state information in historical time periods; and the training unit is used for training the depth enhanced network DQN model by utilizing the multiple groups of historical state information to obtain the energy-saving control model.

Optionally, the training unit comprises: the extraction module is used for extracting a plurality of groups of historical state data from each group of the plurality of groups of historical state information; a first input module, configured to input the multiple sets of historical state data to the DQN model, starting from a first set of the multiple sets of historical state data; the first acquisition module is used for acquiring target action information with the largest evaluation value in the action output information of the DQN model; the control module is used for controlling the water heater to operate according to the target action information and acquiring a historical reward value after the water heater operates according to the target action information; and the execution module is used for circularly executing the steps until the execution times are more than the preset execution times.

Optionally, the energy saving control device of the water heater further includes: a storage module for storing a plurality of sets of training data including current historical state data, the goal actions, the historical reward values, and historical follow-up state data to a predetermined storage medium.

Optionally, the energy saving control device of the water heater further includes: and the releasing module is used for releasing the information stored in the preset storage medium firstly when the storage space of the preset storage medium is determined to be full.

Optionally, the energy saving control device of the water heater further includes: a selecting module, configured to select a part of the multiple sets of training data as sample data after the steps are executed in a loop until the execution times are greater than a predetermined execution time; a second input module, configured to input the sample data into a current value network and a target value network, respectively, where the current value network and the target value network are both network structures in the DQN model; the second acquisition module is used for acquiring the output difference value of the current value network and the target value network; a generating module, configured to generate update data based on the output difference and the reward value in the sample data; an update module for updating the current value network with the update data; and a third obtaining module, configured to assign the network parameters of the current value network to the target value network when the update times of the current value network exceed a predetermined number of times, so as to obtain the energy-saving control model.

Optionally, the energy-saving control device of the water heater further comprises: a third generating unit, configured to generate a third action instruction based on action constraint data when a corresponding action numerical value in the first action instruction and the second action instruction is greater than a predetermined action numerical value, where the action constraint data is generated according to attribute information of the water heater; and the second control unit is used for controlling the water heater to operate according to the third action instruction.

According to another aspect of the embodiment of the invention, a water heater using the energy-saving control method of the water heater is further provided.

According to another aspect of the embodiment of the present invention, there is also provided a computer-readable storage medium including a stored program, wherein the program executes the energy saving control method of the water heater described in any one of the above.

According to another aspect of the embodiment of the present invention, there is further provided a processor, where the processor is configured to execute a program, where the program executes the method for controlling energy conservation of a water heater described in any one of the above when running.

In the embodiment of the invention, when a starting instruction of a water heater is received, the water heater is controlled to operate according to a first action instruction, wherein the first action instruction is generated based on first action data, the first action data is data generated by an energy-saving control model based on current state information, the current state information is information related to the operation of the water heater, the input of the energy-saving control model is state information, and the output of the energy-saving control model is action data; acquiring an incentive value of the water heater after the water heater operates according to a first action instruction, wherein the incentive value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater; and when the reward value is larger than the preset value, continuing to acquire subsequent state information and generating a second action instruction based on the subsequent state information until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition moment arrives, and the second action instruction is used for controlling the operation of the water heater. The energy-saving control method of the water heater provided by the embodiment of the invention achieves the purpose of obtaining the optimal running time and the optimal control parameters of the water heater in the current environment state through self-learning of the water heater, improves the energy-saving performance of the water heater, and further solves the technical problems that the mode of performing energy-saving control on the water heater in the related art is low in reliability and cannot really achieve the purpose of energy saving.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method of energy saving control of a water heater according to an embodiment of the present invention;

fig. 2 is a flow chart of a DQN algorithm according to an embodiment of the invention;

FIG. 3 is a control flow diagram of executing an action constraint according to an embodiment of the invention;

FIG. 4 is a flow chart of an alternative method of energy savings control of a water heater according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an energy saving control device of a water heater according to the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of a method for energy saving control of a water heater, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than presented herein.

Fig. 1 is a flowchart of an energy-saving control method of a water heater according to an embodiment of the present invention, and as shown in fig. 1, the energy-saving control method of the water heater includes the following steps:

step S102, when a starting instruction of the water heater is received, the water heater is controlled to operate according to a first action instruction, wherein the first action instruction is generated based on first action data, the first action data is data generated by the energy-saving control model based on current state information, the current state information is information related to the operation of the water heater, the input of the energy-saving control model is state information, and the output of the energy-saving control model is action data.

Optionally, the water heater may be an air energy water heater, and may also be other forms of water heaters, which is not limited herein.

In this embodiment, the first motion command is generated based on the first motion data. The motion data may be represented in a form of motion set, for example, an operation frequency of the compressor — (-) up, hold, down ], an operation opening of the regulating valve — (-) up, hold, down ], an operation rotation speed of the external fan — (-) up, hold, down).

The current state information may be information related to the operation of the water heater. The current state information may include: the system comprises an outdoor environment temperature, the current time, the running frequency of a compressor, the rotating speed of an outer fan of a water heater, the temperature of a water tank of the water heater and a set temperature. In the reserved mode, the reserved time may be included.

In an optional scheme, after the control center receives a starting instruction of the water heater, a first action instruction is generated according to first action data, and then the water heater is triggered to operate according to the first action instruction to heat water in a water tank of the water heater.

The first operation data may be data generated from the current state information according to the energy saving control model. Namely, the current state information is input into the energy-saving control model, and the energy-saving control model processes the current state information to obtain first action data.

And step S104, acquiring an incentive value of the water heater after the water heater operates according to the first action instruction, wherein the incentive value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater.

Optionally, after the water heater operates according to the first action command, the water in the water tank of the water heater has a temperature value (i.e., an actual temperature), a temperature difference between the actual temperature and the set temperature is determined, and the temperature difference is used as a reward value. The reward value refers to an energy-saving comprehensive index of the water heater fed back after the water heater is controlled to run by specific actions in the water heater control, and is used for guiding the adjustment of an algorithm in an energy-saving control model, and under a specific environment state, a proper action can be selected to achieve the purpose of energy-saving control.

In the embodiment, the quality of the action is calculated by setting the reward value to the relevant data fed back after the action data is executed on the water heater, so as to guide the energy-saving control model to give an evaluation value to the action.

And step S106, when the reward value is larger than the preset value, continuing to acquire subsequent state information and generating a second action instruction based on the subsequent state information until the reward value is not larger than the preset value, wherein the subsequent state information is the state information acquired when an information acquisition period after the current state information acquisition time comes, and the second action instruction is used for controlling the operation of the water heater.

Optionally, in the operation process of the water heater, the data acquisition module automatically acquires the state information when the information acquisition period comes.

In this embodiment, when the reward value is greater than the preset value, it indicates that the temperature difference between the actual temperature and the set temperature of the water in the tank of the water heater is greater than the acceptable range for the user. Then, the system continuously acquires the subsequent state information, generates a second action instruction based on the subsequent state information, and controls the water heater to operate according to the second action instruction to heat the water in the water tank of the water heater; after the water heater operates based on the second action instruction, continuously acquiring the actual temperature of the water in the water tank, determining the reward value at the moment, and further determining the temperature difference between the actual temperature and the set temperature of the water in the water tank of the water heater; and circulating the steps until the temperature difference between the actual temperature and the set temperature in the water tank is less than the preset value, namely the reward value is less than or equal to the preset value.

As can be seen from the above, in the scheme described in embodiment 1 of the present invention, when the start instruction of the water heater is received, the water heater is controlled to operate according to the first action instruction, the reward value after the water heater operates according to the first action instruction is obtained, when the reward value is greater than the preset value, the subsequent state information is continuously obtained, and the second action instruction is generated based on the subsequent state information until the reward value is not greater than the preset value, so that the purpose of obtaining the optimal operation time and the optimal control parameter of the water heater in the current environment state through self-learning of the water heater is achieved, and the energy saving performance of the water heater is improved.

It is easy to notice that in the running process of the water heater, the running quality of the water heater is obtained to be evaluated, so as to determine whether to continue to obtain the subsequent state information, and an action instruction is generated by using the subsequent state information to guide the running of the water heater until the temperature difference between the actual temperature and the set water temperature of the water in the water tank of the water heater is less than or equal to a preset value, namely, the temperature difference between the actual temperature and the set temperature of the water in the water tank of the water heater is within an acceptable range of a user, so that the proper hot water is provided for the user, the use of the user is facilitated, and the purpose of saving energy can be effectively realized. Therefore, by the scheme provided by the embodiment of the invention, more functions and services can be provided on the basis of the original heating function of the water heater, more valuable and emotional information can be provided for users, and the effects of user experience and good sensitivity can be improved.

Therefore, the technical problem that the energy-saving control mode of the water heater in the related art is low in reliability and cannot really achieve the purpose of energy saving is solved through the scheme provided by the embodiment 1 of the invention.

According to the above embodiment of the present invention, in order to facilitate the use of the user, a plurality of water heater starting modes can be provided for the user, so as to provide better service for the user, and the energy saving control method of the water heater may further include one of the following steps: responding to a trigger operation acted on a control panel of the water heater to generate a starting instruction; and when the scheduled starting time arrives, generating a starting instruction.

In this embodiment, the starting mode of the water heater can be set. For example, a user may trigger a control on a control panel of a water heater to start the water heater, in which case a start instruction is generated when the user triggers the control on the control panel. For example, the user may set the start-up time of the water heater according to his/her living habits, and thus the start-up command may be generated every time the scheduled start-up time comes. Through providing more nimble water heater starting-up mode for the user for the start-up of water heater has optional mode, has not only improved the flexibility that the water heater started, and also makes the water heater use more reliably, for example, when control panel broke down, can set for the start-up time of water heater through the reservation function.

According to the above embodiment of the present invention, since the DQN algorithm adopts the deep learning model for learning, and the model fitting capability is relatively strong, in the embodiment of the present invention, the basic model of the energy-saving control model may be the DQN model. Namely, the energy-saving control method of the water heater may further include: acquiring multiple groups of historical state information in historical time periods; and training the DQN model of the deep reinforcement network by utilizing multiple groups of historical state information to obtain the energy-saving control model.

In the embodiment of the present invention, training the initial model by using multiple sets of historical state information may include: extracting a plurality of sets of historical state data from each of the plurality of sets of historical state information; starting from a first group of the multiple groups of historical state data, inputting the multiple groups of historical state data to the DQN model; obtaining target action information with the maximum evaluation value in the action output information of the DQN model; controlling the water heater to operate according to the target action information, and acquiring a historical reward value of the water heater after the water heater operates according to the target action information; and circularly executing the steps until the execution times are greater than the preset execution times.

Optionally, the historical state information, the current state information, and the subsequent state information may both include: outdoor environment temperature, current time, compressor running frequency, external fan running speed, water tank temperature, set temperature and reserved time (in a reserved mode). In the above embodiment, multiple sets of historical state information have been obtained, in this embodiment, historical state data in the multiple sets of historical state information may be extracted, then a first set of the multiple sets of historical state data is started to be input to the DQN model, and action output information obtained by processing the first set of historical state data by the DQN model is obtained, where the action output information may be multiple, and the evaluation value in the action output information that is the largest may be selected as target action information, so as to control the historical reward value after the water heater operates according to the target action information; next, a second set of the multiple sets of historical state data, 8230, etc., are processed as described above. Until the number of executions is greater than a predetermined number of executions.

Wherein, the preset execution times are set according to actual requirements.

According to the above embodiment of the present invention, for the storage of various data, the energy saving control method of the water heater may further include: storing sets of training data including current historical state data, target actions, historical reward values, and historical follow-up state data to a predetermined storage medium.

Alternatively, the predetermined storage medium may be a memory storage table, or may be a storage medium in another form.

According to the above embodiment of the present invention, since the storage space of the predetermined storage medium is limited, in order to be able to adapt to the usage requirement, the energy saving control method of the water heater may further include: and releasing the information stored in the predetermined storage medium firstly when the storage space of the predetermined storage medium is determined to be full.

For example, the current state, the selected action, the reward earned, and the next state information may be stored in a memory storage table; each row in the memory storage table stores the current state, the action, the reward and the next state, the size of the memory storage table is set, and when the number of exploration steps exceeds the size of the memory storage table, the oldest stored row is replaced in sequence.

According to the above embodiment of the present invention, after the above steps are executed in a loop until the execution times is greater than the predetermined execution times, the energy saving control method of the water heater further includes: selecting parts of a plurality of groups of training data as sample data; respectively inputting a plurality of sample data into a current value network and a target value network, wherein the current value network and the target value network are both network structures in a DQN model; acquiring an output difference value of a current value network and a target value network; generating update data based on the output difference value and the reward value in the plurality of sample data; updating the current value network with the update data; and when the updating times of the current network exceed the preset times, assigning the network parameters of the current network to the target value network to obtain the energy-saving control model.

Fig. 2 is a flowchart of the DQN algorithm according to an embodiment of the invention, and as shown in fig. 2, a water heater is taken as an air energy water heater for illustration. 1) Initializing two DNN network parameters; 2) Extracting state data as the input of a current value network according to the current air energy water heater operation environment and the external environment; 3) Selecting an action corresponding to the output node with the maximum Q value in the output layer according to the current network output value; 4) The air energy water heater executes the selected action, the system state is transferred to the next state after a certain time, namely the next state is obtained, and the current reward (the parameter suitable for energy conservation) is obtained from the environment of the water heater; 5) Storing the current state, the selected action, the obtained reward and the next state in a memory storage table; 6) If the number of the exploration steps is within 2000, switching to the step 2); if the number of times exceeds 2000, switching to step 7); 7) Randomly selecting 50 sample data from a memory storage table, and respectively inputting the sample data into a current value network and a target value network; 8) Calculating the difference value output by the two networks, namely the sum of the difference values of the corresponding nodes of the same action, adding the reward as an error required by network updating, and updating the parameters of the current network by taking the error as the error fed back by the current network; 9) When the current value network learning updating times exceed 50 times, assigning the current value network parameters to a target value network; 10 Judging whether the exploration times are met, and ending if the exploration times are met; if not, transferring to the step 2).

The next state is a state set, that is, an environment state, for example, a parameter of a state is acquired every three minutes, and a state parameter acquired three minutes after the current timing is the next state.

In the embodiment, the energy-saving control strategy of the air energy water heater in the corresponding environment is learned according to the actual environment, so that the water demand of a user is met, and meanwhile, an efficient control strategy is learned to reduce the electric energy consumption.

By the method, a reliable energy-saving control model can be obtained through training, and accurate action data can be obtained until the water heater runs.

According to the above embodiment of the present invention, since the action data generated by the energy saving control model may exceed the safe operation range of the water heater, in order to improve the operation safety of the water heater, the energy saving control method of the water heater may further include: when the corresponding action numerical value in the first action instruction and the second action instruction is larger than a preset action numerical value, generating a third action instruction based on action constraint data, wherein the action constraint data are generated according to the attribute information of the water heater; and controlling the water heater to operate according to the third action instruction.

In this embodiment, the constraint on the action execution of the air energy water heater is added on the basis of DQN-based policy learning to prevent the value obtained by the algorithm from being too large and affecting the control of the air energy water heater, fig. 3 is a control flow chart for executing the action constraint according to an embodiment of the present invention, and as shown in fig. 3, after the action value is planned, the action value control can be performed based on the action planned value; when the action value is controlled, each action range constraint is added, namely, the action value is adjusted for each action, and the actual action of the water heater is obtained to guide the operation of the air energy water heater. Through this embodiment, the action that the restraint was required to be carried out has improved the safety in the water heater operation process.

FIG. 4 is a flowchart of an alternative energy-saving control method for a water heater according to an embodiment of the present invention, as shown in FIG. 4, S11, constructing a reinforcement learning control model; 1) Setting a state set: outdoor environment temperature, current time, compressor running frequency, external fan running speed, water tank temperature, set temperature and reserved time (in a reserved mode); 2) Setting an action set: starting equipment (in a reservation mode), the running frequency of a compressor (upward, keeping and downward), the running opening of an adjusting valve (upward, keeping and downward) and the running speed of an outer fan (upward, keeping and downward); 3) And (3) reward model: reward (reward value) = water tank temperature-set temperature; 4) A reinforcement learning model: a deep Q learning algorithm (DQN); 5) The operation mode is as follows: reserved time and non-reserved time. S12: DQN algorithm structure design: the DQN algorithm structure mainly comprises two deep neural networks and a memory storage table. The two deep neural networks adopt three layers of DNN networks, and the input of the DNN networks is a state set: outdoor ambient temperature, current time, compressor operating frequency, outer fan operating speed, water tank temperature, set temperature, the Q value that the output layer corresponds for the action, the action is the action set: the operation frequency of the compressor, the operation opening of the regulating valve and the operation speed of the external fan are regarded as discrete values. Each row in the memory storage table stores the current state, the action, the reward and the next state, the size of the memory storage table is set, and when the number of exploration steps exceeds the size of the memory storage table, the oldest stored row is replaced in sequence. S13: learning an energy-saving control strategy of the air energy water heater based on the DQN; s14: and the air energy water heater executes constrained control of the action.

By the energy-saving control method of the water heater, provided by the embodiment of the invention, a reinforcement learning control method is provided, the air energy water heater is controlled in a self-learning manner, the optimal operation time and the optimal control parameter strategy of the water heater in the current environment state can be finally learned, the working efficiency of the air energy water heater is improved, and the purpose of saving energy is achieved.

It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

Example 2

According to an embodiment of the present invention, there is also provided an energy saving control device for a water heater, which is used for implementing the energy saving control method for a water heater, and fig. 5 is a schematic diagram of the energy saving control device for a water heater according to the present application, and as shown in fig. 5, the device includes: a first control unit 51, a first acquisition unit 53 and a processing unit 55.

The first control unit 51 is configured to control the water heater to operate according to a first action instruction when a start instruction of the water heater is received, where the first action instruction is generated based on first action data, the first action data is data generated by the energy-saving control model based on current state information, the current state information is information related to operation of the water heater, an input of the energy-saving control model is state information, and an output of the energy-saving control model is action data.

The first obtaining unit 53 is configured to obtain an incentive value after the water heater operates according to the first action instruction, where the incentive value is determined based on an actual temperature and a set temperature of water in a water tank of the water heater.

And the processing unit 55 is configured to, when the reward value is greater than the preset value, continue to acquire subsequent state information and generate a second action instruction based on the subsequent state information until the reward value is not greater than the preset value, where the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition time comes, and the second action instruction is used to control operation of the water heater.

It should be noted here that the first control unit 51, the first obtaining unit 53 and the processing unit 55 correspond to steps S102 to S106 in embodiment 1, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1.

As can be seen from the above, in the solution described in embodiment 2 of the present application, when receiving a start instruction of the water heater, the first control unit may be used to control the water heater to operate according to a first operation instruction, where the first operation instruction is generated based on first operation data, the first operation data is data generated by the energy saving control model based on current state information, the current state information is information related to operation of the water heater, an input of the energy saving control model is state information, and an output of the energy saving control model is operation data; acquiring an incentive value of the water heater after the water heater operates according to the first action instruction by using a first acquisition unit, wherein the incentive value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater; and when the reward value is larger than the preset value, the processing unit continues to acquire subsequent state information and generates a second action instruction based on the subsequent state information until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition moment arrives, and the second action instruction is used for controlling the operation of the water heater, so that the purpose of obtaining the optimal operation time and the optimal control parameter of the water heater in the current environment state through self-learning of the water heater is achieved, and the energy-saving performance of the water heater is improved.

Therefore, the technical problem that the energy-saving control mode of the water heater in the related art is low in reliability and cannot really achieve the purpose of energy saving is solved through the scheme provided by the embodiment 2 of the invention.

Optionally, the energy saving control device of the water heater further comprises one of the following: the first generation unit is used for responding to the trigger operation acted on a control panel of the water heater and generating a starting instruction; and the second generation unit is used for generating a starting instruction when the reserved starting time is determined to come.

Optionally, the energy saving control device of the water heater further includes: the second acquisition unit is used for acquiring multiple groups of historical state information in historical time periods; and the training unit is used for training the DQN model of the deep reinforcement network by utilizing multiple groups of historical state information to obtain the energy-saving control model.

Optionally, a training unit comprising: the extraction module is used for extracting a plurality of groups of historical state data from each group of the plurality of groups of historical state information; the first input module is used for inputting multiple groups of historical state data to the DQN model from a first group of the multiple groups of historical state data; the first acquisition module is used for acquiring target action information with the largest evaluation value in the action output information of the DQN model; the control module is used for controlling the water heater to operate according to the target action information and acquiring a historical reward value of the water heater after the water heater operates according to the target action information; and the execution module is used for circularly executing the steps until the execution times are more than the preset execution times.

Optionally, the energy-saving control device of the water heater further comprises: and the storage module is used for storing a plurality of groups of training data including the current historical state data, the target action, the historical reward value and the historical subsequent state data into a preset storage medium.

Optionally, the energy saving control device of the water heater further includes: the selection module is used for selecting parts of the multiple groups of training data as sample data after the steps are executed circularly until the execution times are larger than the preset execution times; the second input module is used for respectively inputting a plurality of sample data into a current value network and a target value network, wherein the current value network and the target value network are both network structures in the DQN model; the second acquisition module is used for acquiring the output difference value of the current value network and the target value network; a generation module for generating update data based on the output difference value and the reward value in the plurality of sample data; the updating module is used for updating the current value network by using the updating data; and the third acquisition module is used for assigning the network parameters of the current value network to the target value network when the update times of the current value network exceed the preset times, so as to obtain the energy-saving control model.

Optionally, the energy saving control device of the water heater further includes: a third generating unit, configured to generate a third action command based on action constraint data when a corresponding action numerical value in the first action command and the second action command is greater than a predetermined action numerical value, where the action constraint data is generated according to attribute information of the water heater; and the second control unit is used for controlling the water heater to operate according to the third action instruction.

Optionally, the current state information includes: the system comprises an outdoor environment temperature, the current time, the running frequency of a compressor, the rotating speed of an outer fan of a water heater, the temperature of a water tank of the water heater and a set temperature.

Example 3

According to another aspect of the embodiment of the invention, the invention further provides the water heater, and the water heater uses the energy-saving control method of the water heater.

Example 4

According to another aspect of the embodiment of the present invention, there is also provided a computer-readable storage medium including a stored program, wherein the program executes the energy saving control method of the water heater of any one of the above.

Optionally, in this embodiment, the computer-readable storage medium may be located in any one of a computer terminal in a computer terminal group in a computer network, or in any one of a communication device in a communication device group.

Optionally, in this embodiment, a computer-readable storage medium is configured to store program code for performing the steps of: when a starting instruction of the water heater is received, controlling the water heater to operate according to a first action instruction, wherein the first action instruction is generated based on first action data, the first action data is data generated by the energy-saving control model based on current state information, the current state information is information related to the operation of the water heater, the input of the energy-saving control model is state information, and the output of the energy-saving control model is action data; acquiring a reward value of the water heater after the water heater operates according to a first action instruction, wherein the reward value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater; and when the reward value is larger than the preset value, continuing to acquire subsequent state information and generating a second action instruction based on the subsequent state information until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition moment arrives, and the second action instruction is used for controlling the operation of the water heater.

Optionally, in this embodiment, a computer-readable storage medium is configured to store program code for performing the steps of: responding to a trigger operation acted on a control panel of the water heater to generate a starting instruction; and when the scheduled starting time arrives, generating a starting instruction.

Optionally, in this embodiment, a computer-readable storage medium is configured to store program code for performing the steps of: acquiring multiple groups of historical state information in historical time periods; and training the DQN model of the deep reinforcement network by utilizing multiple groups of historical state information to obtain the energy-saving control model.

Optionally, in this embodiment, a computer-readable storage medium is configured to store program code for performing the steps of: extracting a plurality of sets of historical state data from each of the plurality of sets of historical state information; starting from a first group of the multiple groups of historical state data, inputting the multiple groups of historical state data to the DQN model; acquiring target action information with the maximum evaluation value in the action output information of the DQN model; controlling the water heater to operate according to the target action information, and acquiring a historical reward value of the water heater after the water heater operates according to the target action information; and circularly executing the steps until the execution times are greater than the preset execution times.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: storing sets of training data including current historical state data, goal actions, historical reward values, and historical subsequent state data to a predetermined storage medium.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: and releasing the information stored in the predetermined storage medium firstly when the storage space of the predetermined storage medium is determined to be full.

Optionally, in this embodiment, a computer-readable storage medium is configured to store program code for performing the steps of: selecting parts of a plurality of groups of training data as sample data; respectively inputting a plurality of sample data into a current value network and a target value network, wherein the current value network and the target value network are both network structures in a DQN model; acquiring an output difference value of a current value network and a target value network; generating update data based on the output difference value and the reward value in the plurality of sample data; updating the current value network with the update data; and when the updating times of the current network exceed the preset times, assigning the network parameters of the current network to the target value network to obtain the energy-saving control model.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: when the corresponding action numerical value in the first action instruction and the second action instruction is larger than a preset action numerical value, generating a third action instruction based on action constraint data, wherein the action constraint data are generated according to the attribute information of the water heater; and controlling the water heater to operate according to the third action instruction.

Example 5

According to another aspect of the embodiment of the invention, a processor is further provided, and the processor is used for running a program, wherein when the program runs, the energy-saving control method of the water heater is executed.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An energy-saving control method of a water heater is characterized by comprising the following steps:

when a starting instruction of a water heater is received, controlling the water heater to operate according to a first action instruction, wherein the first action instruction is generated based on first action data, the first action data is data generated by an energy-saving control model based on current state information, the current state information is information related to the operation of the water heater, the input of the energy-saving control model is state information, and the output of the energy-saving control model is action data;

acquiring an incentive value of the water heater after the water heater operates according to the first action instruction, wherein the incentive value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater;

and when the reward value is larger than a preset value, continuing to acquire subsequent state information and generating a second action instruction based on the subsequent state information until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition time arrives, and the second action instruction is used for controlling the operation of the water heater.

2. The energy-saving control method of the water heater according to claim 1, further comprising one of:

responding to a trigger operation acted on a control panel of the water heater to generate the starting instruction;

and when the scheduled starting time comes, generating the starting instruction.

3. The energy-saving control method of a water heater according to claim 1, further comprising:

acquiring multiple groups of historical state information in historical time periods;

and training a deep enhanced network DQN model by using the multiple groups of historical state information to obtain the energy-saving control model.

4. The energy-saving control method of the water heater according to claim 3, wherein training an initial model by using the plurality of sets of historical state information comprises:

extracting a plurality of sets of historical state data from each of the plurality of sets of historical state information;

starting from a first set of the multiple sets of historical state data, inputting the multiple sets of historical state data to the DQN model;

obtaining target action information with the maximum evaluation value in the action output information of the DQN model;

controlling the water heater to operate according to the target action information, and acquiring a historical reward value of the water heater after the water heater operates according to the target action information;

and circularly executing the steps until the execution times are more than the preset execution times.

5. The energy-saving control method of the water heater according to claim 4, further comprising:

storing sets of training data including current historical state data, the target action, the historical reward value, and historical subsequent state data to a predetermined storage medium.

6. The energy saving control method of a water heater according to claim 5, further comprising:

and releasing the information stored in the predetermined storage medium firstly when the storage space of the predetermined storage medium is determined to be full.

7. The energy-saving control method of the water heater according to claim 5, further comprising, after the steps are executed in a cycle until the number of execution times is greater than a predetermined number of execution times:

selecting part of the multiple groups of training data as sample data;

inputting the sample data into a current value network and a target value network respectively, wherein the current value network and the target value network are both network structures in the DQN model;

acquiring an output difference value of the current value network and the target value network;

generating update data based on the output difference value and a reward value in the sample data;

updating the current value network with the update data;

and when the updating times of the current value network exceed the preset times, assigning the network parameters of the current value network to the target value network to obtain the energy-saving control model.

8. The energy saving control method of a water heater according to any one of claims 1 to 7, further comprising:

when the corresponding action numerical value in the first action instruction and the second action instruction is larger than a preset action numerical value, generating a third action instruction based on action constraint data, wherein the action constraint data is generated according to attribute information of the water heater;

and controlling the water heater to operate according to the third action instruction.

9. The energy saving control method of a water heater according to any one of claims 1 to 7, wherein the current state information includes: the system comprises an outdoor environment temperature, the current time, the running frequency of a compressor, the rotating speed of an outer fan of the water heater, the temperature of a water tank of the water heater and a set temperature.

10. An energy-saving control device of a water heater is characterized by comprising:

the control system comprises a first control unit and a second control unit, wherein the first control unit is used for controlling a water heater to operate according to a first action instruction when receiving a starting instruction of the water heater, the first action instruction is generated based on first action data, the first action data is data generated by an energy-saving control model based on current state information, the current state information is information related to the operation of the water heater, the input of the energy-saving control model is state information, and the output of the energy-saving control model is action data;

the first obtaining unit is used for obtaining a reward value after the water heater operates according to the first action instruction, wherein the reward value is determined based on the actual temperature and the set temperature of water in a water tank of the water heater;

and the processing unit is used for continuously acquiring subsequent state information and generating a second action instruction based on the subsequent state information when the reward value is larger than a preset value until the reward value is not larger than the preset value, wherein the subsequent state information is state information acquired when an information acquisition period after the current state information acquisition time comes, and the second action instruction is used for controlling the operation of the water heater.

11. A water heater using the energy saving control method of the water heater according to any one of claims 1 to 9.

12. A computer-readable storage medium characterized by comprising a stored program, wherein the program executes the energy saving control method of a water heater according to any one of claims 1 to 9.

13. A processor, characterized in that the processor is configured to run a program, wherein the program is executed to execute the energy saving control method of the water heater according to any one of claims 1 to 9.