CN111338227B

CN111338227B - Electronic appliance control method and control device based on reinforcement learning and storage medium

Info

Publication number: CN111338227B
Application number: CN202010416754.0A
Authority: CN
Inventors: 刘强; 许弘
Original assignee: Nanjing Sanman Internet Technology Co ltd
Current assignee: Nanjing Sanman Internet Technology Co ltd
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2020-12-01
Anticipated expiration: 2040-05-18
Also published as: CN111338227A

Abstract

The invention relates to an electronic appliance control method based on reinforcement learning, which is characterized in that a reinforcement learning control strategy is applied to an electronic appliance with a scene automatic control function, the continuous interference control of a user on the electronic appliance is obtained as decision input of the reinforcement learning, a scene algorithm model adaptive to the automatic control of the electronic appliance under different scenes of the user is dynamically generated, the scene algorithm model of the automatic working mode of the electronic appliance which is most close to the use habit of the user is obtained, the use efficiency of the electronic appliance is improved, the designed method is applied to various electronic appliances, the optimization and updating of a self-learning mode of the full-electronic appliance scene automatic control are further realized, and a better scene automatic control method is provided for intelligent home furnishing and intelligent office.

Description

Electronic appliance control method and control device based on reinforcement learning and storage medium

Technical Field

The invention relates to an electronic appliance control method, control equipment and a storage medium based on reinforcement learning, and belongs to the technical field of house internet of things intellectualization.

Background

Most of intelligent home systems in the current market all rely on two functions of scene and automation to complete most of functions, and the control mode mainly depends on voice control or mobile phone control. Although many users feel that artificial intelligence is developed at present, the system can learn user habits by themselves, and can use a home as a center to automatically apply scenes, and home equipment interacts and feeds back with the outside (through equipment monitoring automatic linkage scene equipment, personal or environmental information can be collected), so that the comfort of home life is greatly improved, actually, the application of the AI in the field of smart home is not wide enough, and the smart home is basically realized by depending on scenes and automation.

Reinforcement Learning (Reinforcement Learning) is a branch within machine Learning that is good at controlling an individual who is able to act autonomously in a certain environment, and emphasizes how to act based on the environment to achieve the maximum expected benefit by continuously improving its behavior through interaction between the individual and the environment. Reinforcement learning issues include learning how to do, how to map the environment into actions, and thus obtain the maximum reward. In reinforcement learning, a learner is a decision-making agent that is not informed of what action should be taken, but rather runs through repeated attempts to discover the behavior that can be most rewarded. Typically, the action will affect not only the current reward, but also the environment at the next point in time, and therefore all subsequent rewards. Reinforcement learning is essentially a closed-loop control problem, since the action of the learning system affects the environment, which in turn affects subsequent actions. The goal of reinforcement learning is to solve an MDP (markov decision process). Specifically, we want to place the learner and decision maker (Agent) in an Environment (Environment) that lets it learn how to maximize the total revenue obtained.

Therefore, if the idea of reinforcement learning can be applied to the control of the intelligent equipment, the scene automation use efficiency of the intelligent equipment can be greatly improved.

Disclosure of Invention

The invention aims to solve the technical problem of providing an electronic appliance control method based on reinforcement learning, aiming at an electric appliance with a scene automatic control function, applying a reinforcement learning control strategy to enable the electronic appliance to obtain an automatic working mode closer to the use habit of a user and improve the working efficiency of the electric appliance.

The invention adopts the following technical scheme for solving the technical problems: the invention designs an electronic appliance control method based on reinforcement learning, which is used for realizing the respective control of each electric appliance aiming at each electric appliance with a scene automatic control function; aiming at each electric appliance, different control methods for the electric appliance are realized according to the following states of the electric appliance based on the working process of each initial automatic control scene corresponding to the electric appliance;

the method comprises the following steps that 1, an electric appliance is in an un-started state, if the electric appliance receives an artificial starting action to work, a new automatic control scene corresponding to the electric appliance is formed according to the current time and the starting action of the electric appliance and by combining a factor corresponding to the working purpose of the electric appliance in the environment where the electric appliance is located and detection information before the electric appliance is started to work; if the electric appliance does not receive the manual opening action, no further operation is carried out;

state 2. the electric appliance is in working state, if the electric appliance receives humanWhen the closing action stops working, aiming at the current time and the closing action of the electric appliance, combining the environment of the electric appliance corresponding to the working purpose factor of the electric appliance and the detection information before the electric appliance stops working to form a new automatic control scene corresponding to the electric appliance; if the electric appliance does not receive the manual closing action, the electric appliance with the adjustable work purpose factor function is defined as a factor A, and the setting information A of the electric appliance for the factor A in the current automatic control scene is obtained_{Is provided with}Then, executing the following steps A to B, taking the artificial adjustment action as an intervention original point, changing the working state in the automatic control scene of the electric appliance based on the original point and combining the change of the factor A corresponding to the environment, and realizing the optimization of the automatic control scene from the original point corresponding to the electric appliance to the fluctuation tolerance range of the environmental factor;

step A, detecting and obtaining detection information A of the factor A corresponding to the environment where the electrical appliance is located_MeasuringAnd entering step B;

step B, if the electric appliance does not receive the adjustment setting action aiming at the factor A, defining and updating the factor A_{Is provided with}To A_MeasuringMaintaining the current working state of the electric appliance for the tolerance fluctuation range of the environment corresponding factor A in the current automatic control scene corresponding to the electric appliance, and then returning to the step A;

if the electric appliance receives the artificial adjustment setting action aiming at the factor A, the new setting information aiming at the factor A is used for updating the setting information A aiming at the factor A of the electric appliance under the current automatic control scene_{Is provided with}And controlling the electric appliance to work in the working state corresponding to the new setting information of the factor A, realizing the real-time response of the manual adjustment setting action of the electric appliance and the real-time update of the current automatic control scene, and then returning to the step A.

As a preferred technical scheme of the invention: based on different control methods for the electrical appliances, which are respectively realized according to the state 1 and the state 2, respectively aiming at the electrical appliances, updating and obtaining each automatic control scene corresponding to each electrical appliance respectively, forming each automatic control scene corresponding to each electrical appliance in the environment, further carrying out reinforcement learning algorithm modeling according to each automatic control scene corresponding to each electrical appliance in a geographical area with preset size and each environment, obtaining algorithm models of each automatic control scene with the highest occupation ratio corresponding to each electrical appliance in the geographical area, using the algorithm models as algorithm models of each initial automatic control scene corresponding to each electrical appliance in the geographical area, and combining and forming an initial set of algorithm models of the automatic control scenes of the electrical appliances corresponding to the geographical area, and the system is used for distributing the electrical appliances with scene automatic control functions in the newly set environment in the geographic area to perform application.

As a preferred technical scheme of the invention: the electronic appliance and the work purpose factor of the electronic appliance comprise that the electric appliance is a lighting device, and the work purpose factor of the electric appliance is a brightness factor; the electric appliance is a refrigerating and heating device, and the working purpose factor of the electric appliance is a temperature factor; the electric appliance is a humidifying device, and the working purpose factor of the electric appliance is a humidity factor; the electric appliance is an air purification device, and the working objective factors of the electric appliance are air index factors under each function corresponding to the air purification device.

Correspondingly, the invention also designs electronic appliance control equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the electronic appliance control equipment is respectively connected with each electric appliance with the scene automatic control function, and the processor executes the computer program to realize the steps of the electronic appliance control method based on reinforcement learning.

In addition, the present invention also contemplates a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the reinforcement learning-based electronic appliance control method of the present invention.

Compared with the prior art, the electronic appliance control method based on reinforcement learning, the control equipment and the storage medium have the following technical effects by adopting the technical scheme:

the designed electronic appliance control method based on reinforcement learning applies reinforcement learning control strategy aiming at the electronic appliance with scene automatic control function, dynamically generates a scene algorithm model adapting to automatic control of the electronic appliance under different scenes of a user by acquiring the continuous intervention control of the user on the electronic appliance as the decision input of the reinforcement learning, obtains the scene algorithm model of the automatic working mode of the electronic appliance which is most close to the use habit of the user, improves the use efficiency of the electronic appliance, applies the designed method to various electronic appliances, further realizes the optimization and update of the self-learning mode of the full-electronic appliance scene automatic control, and provides a better scene automatic control method for intelligent home and intelligent office; and further designing computer equipment and a storage medium based on the technical scheme, and implementing the designed electronic and electric appliance control method based on reinforcement learning in hardware to obtain the high-efficiency electronic and electric appliance use effect in practical application.

Drawings

Fig. 1 is a schematic flow chart of an electronic appliance control method based on reinforcement learning according to the present invention.

Detailed Description

The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.

The invention designs an electronic appliance control method based on reinforcement learning, which is used for realizing respective control of each electric appliance aiming at each electric appliance with a scene automatic control function, wherein the electric appliances comprise strong current equipment and weak current equipment; as shown in fig. 1, different control methods for the electrical appliances are implemented for the following states of the electrical appliances based on the working processes of the electrical appliances according to the corresponding initial automatic control scenes.

The method comprises the following steps that 1, an electric appliance is in an un-started state, if the electric appliance receives an artificial starting action to work, a new automatic control scene corresponding to the electric appliance is formed according to the current time and the starting action of the electric appliance and by combining a factor corresponding to the working purpose of the electric appliance in the environment where the electric appliance is located and detection information before the electric appliance is started to work; if the electric appliance does not receive the manual opening action, no further operation is carried out.

State 2. the electric appliance is in working stateIf the electric appliance receives the manual closing action and stops working, combining the current time and the closing action of the electric appliance, the environment of the electric appliance corresponding to the working purpose factor of the electric appliance and the detection information before the electric appliance stops working to form a new automatic control scene corresponding to the electric appliance; if the electric appliance does not receive the manual closing action, the electric appliance with the adjustable work purpose factor function is defined as a factor A, and the setting information A of the electric appliance for the factor A in the current automatic control scene is obtained_{Is provided with}And then executing the following steps A to B, taking the artificial adjustment action as an intervention original point, changing the working state in the electric appliance automation control scene based on the original point and combining the change of the factor A corresponding to the environment, and realizing the optimization of the electric appliance automation control scene.

Step A, detecting and obtaining detection information A of the factor A corresponding to the environment where the electrical appliance is located_MeasuringAnd proceeds to step B.

When the designed reinforcement learning-based electronic appliance control method is applied to practice, based on different control methods for the appliances respectively realized according to the state 1 and the state 2 respectively aiming at the appliances, the automatic control scenes respectively corresponding to the appliances in the environment are obtained by updating, the automatic control scenes respectively corresponding to the appliances in the environment are formed, the reinforcement learning algorithm modeling is further designed according to the automatic control scenes respectively corresponding to the appliances in the preset geographical area of the environment and in each environment, the algorithm models of the automatic control scenes respectively corresponding to the appliances in the geographical area with the highest occupation ratio are obtained, the algorithm models are used as the algorithm models of the initial automatic control scenes respectively corresponding to the appliances in the geographical area, and the initial set of the algorithm models of the automatic control scenes corresponding to the geographical area is formed by combination, and the system is used for distributing the electrical appliances with scene automatic control functions in the newly set environment in the geographic area to perform application.

In practical application, the above-mentioned scheme may be implemented by applying the above-mentioned electronic appliance control method to the respective built-in controllers of the respective electrical appliances, so as to respectively implement automatic optimization and update of the respective automated control scenes corresponding to the respective electrical appliances, or by applying the above-mentioned designed electronic appliance control method to the automated control scenes of the respective electrical appliances by using a master controller in the environment, so as to obtain automatic optimization and update of the respective automated control scenes corresponding to the respective electrical appliances in the environment.

Through the design of the automatic control function of each electric appliance scene in the geographical area expanded to the preset size, a method for analyzing, processing and recommending big data through reinforcement learning is introduced, the electric appliances with the automatic control function of the scene in the newly set environment are provided, the multi-scene automatic control mode in the initial state is selected, the automatic control of the scene is more convenient, more humanized experience is achieved under the condition of initial use of intelligent home or intelligent office, meanwhile, the designed electronic electric appliance control method based on reinforcement learning is combined, detection learning is continuously carried out on each electric appliance, the automatic control mode of the scene is updated, the electric appliances are more intelligently adapted to the habits of users, and the use effect of the electric appliances is improved.

In practical application, the electrical appliance and the working purpose factor of the electrical appliance, including the electrical appliance is a lighting device, and the working purpose factor of the electrical appliance is a brightness factor; the electric appliance is a refrigerating and heating device, and the working purpose factor of the electric appliance is a temperature factor; the electric appliance is a humidifying device, and the working purpose factor of the electric appliance is a humidity factor; the electric appliance is an air purification device, and the working objective factors of the electric appliance are air index factors under each function corresponding to the air purification device.

The designed electronic appliance control method based on reinforcement learning applies reinforcement learning control strategy aiming at the electronic appliance with scene automatic control function, dynamically generates the scene algorithm model adapting to the automatic control of the electronic appliance under different scenes of the user by acquiring the continuous intervention control of the user on the electronic appliance as the decision input of the reinforcement learning, obtains the scene algorithm model of the automatic working mode of the electronic appliance which is most close to the use habit of the user, improves the use efficiency of the electronic appliance, applies the designed method to various electronic appliances, further realizes the optimization and update of the self-learning mode of the automatic control of the scene of the electronic appliance, and provides a better scene automatic control method for intelligent home and intelligent office.

The designed electronic appliance control method based on reinforcement learning is applied to the practice, no matter each electric appliance independently executes the designed electronic appliance control method or a master controller in the environment respectively executes the designed electronic appliance control method aiming at each electric appliance, the method relates to an electric appliance control device which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, in the practical application, the electric appliance control device is respectively connected with each electric appliance with a scene automatic control function, and the processor realizes the steps of the designed electronic appliance control method based on reinforcement learning when executing the computer program; in practical applications, in order to cooperate with the electric appliance control device, the invention also relates to a computer readable storage medium, on which a computer program is stored, wherein the computer program is used for realizing the steps of the electronic electric appliance control method based on reinforcement learning.

When the electronic appliance control method based on reinforcement learning is applied in practical application, for example, the designed electronic appliance control method is executed aiming at the lighting device, namely, the electric appliance working purpose factor is a brightness factor, different control methods aiming at the lighting device are realized aiming at the following states of the lighting device based on the working process of the lighting device according to each corresponding initial automatic control scene.

The method comprises the following steps that 1, a lighting device is in an un-started state, if the lighting device receives an artificial starting action to work, aiming at the current time and the starting action of the lighting device, a new automatic control scene corresponding to the lighting device is formed by combining brightness factors corresponding to the environment where the lighting device is located and detection information before the lighting device is started to work; if the lighting device does not receive the manual opening action, no further operation is carried out.

The state 2, the lighting device is in a working state, if the lighting device receives the manual closing action and stops working, aiming at the current time and the closing action of the lighting device, combining the brightness factor corresponding to the environment where the lighting device is located and the detection information before the lighting device stops working, and forming a new automatic control scene corresponding to the lighting device; if the lighting device does not receive the manual closing action, in the state, further analyzing that no further operation is performed if the lighting device does not have the brightness adjusting function; if the lighting device has the brightness adjusting function, the setting information A of the lighting device aiming at the brightness factor under the current automatic control scene is obtained_{Is provided with}Then, the following steps a to B are performed.

Step A, detecting and obtaining detection information A of brightness factors corresponding to the environment where the lighting device is located_MeasuringAnd proceeds to step B.

Step B, if the lighting device does not receive the artificial adjustment setting action aiming at the brightness factor, defining and updating A_{Is provided with}To A_MeasuringMaintaining the current working state of the lighting device for the tolerance fluctuation range of the brightness factor corresponding to the environment in the current automatic control scene corresponding to the lighting device, and then returning to the step A;

if the lighting device receives the artificial adjustment and setting action aiming at the brightness factor, the new setting information aiming at the brightness factor is used for updating the setting information aiming at the brightness factor of the lighting device under the current automatic control sceneA_{Is provided with}And controlling the lighting device to work in the working state corresponding to the newly set information of the brightness factor, and then returning to the step A.

In practical application, if the electric appliance is a refrigerating and heating device, the objective factor of the electric appliance is a temperature factor, and different control methods for the refrigerating and heating device are realized for the following states of the refrigerating and heating device based on the working process of the refrigerating and heating device according to the corresponding initial automatic control scenes.

The state 1 is that the refrigerating and heating device is in a non-starting state, if the refrigerating and heating device receives an artificial starting action to work, a new automatic control scene corresponding to the refrigerating and heating device is formed by combining a temperature factor corresponding to the environment where the refrigerating and heating device is located and detection information before the refrigerating and heating device is started according to the current time and the starting action of the refrigerating and heating device; if the refrigeration and heating device does not receive the manual opening action, no further operation is carried out.

The state 2 is that the refrigerating and heating device is in the working state, if the refrigerating and heating device receives the manual closing action and stops working, aiming at the current time and the closing action of the refrigerating and heating device, combining the temperature factor corresponding to the environment where the refrigerating and heating device is located and the detection information before the refrigerating and heating device stops working, and forming a new automatic control scene corresponding to the refrigerating and heating device; if the refrigeration and heating device does not receive the manual closing action, the setting information A of the refrigeration and heating device for the temperature factor under the current automatic control scene is obtained_{Is provided with}Then, the following steps a to B are performed.

Step A, detecting and obtaining detection information A of temperature factors corresponding to environments where the refrigerating and heating device is located_MeasuringAnd proceeds to step B.

Step B, if the refrigerating and heating device does not receive the manual adjustment and setting action aiming at the temperature factor, defining and updating A_{Is provided with}To A_MeasuringMaintaining the current working state of the refrigerating and heating device for the tolerance fluctuation range of the temperature factors corresponding to the environment in the current automatic control scene corresponding to the refrigerating and heating device, and then returning to the step A;

if the refrigerating and heating device receives the manual adjustment and setting action aiming at the temperature factors, the new setting information aiming at the temperature factors is used for updating the setting information A aiming at the temperature factors of the refrigerating and heating device under the current automatic control scene_{Is provided with}And controlling the refrigerating and heating device to work in the working state corresponding to the new setting information of the temperature factors, and then returning to the step A.

In addition, if the electric appliance is a humidifying device, the electric appliance working purpose factor is a humidity factor, and different control methods for the humidifying device are realized for the following states of the humidifying device based on the working process of each initial automatic control scene corresponding to the humidifying device.

The method comprises the following steps that 1, a humidifying device is in an un-started state, if the humidifying device receives an artificial starting action to work, a new automatic control scene corresponding to the humidifying device is formed according to the current time and the starting action of the humidifying device by combining a humidity factor corresponding to the environment where the humidifying device is located and detection information before the humidifying device is started to work; if the humidifying device does not receive the manual opening action, no further operation is carried out.

State 2. the humidifying device is in a working state, if the humidifying device receives the manual closing action and stops working, aiming at the current time and the closing action of the humidifying device, combining the humidity factor corresponding to the environment where the humidifying device is located and the detection information before the humidifying device stops working, and forming a new automatic control scene corresponding to the humidifying device; if the humidifying device does not receive the manual closing action, the setting information A of the humidifying device aiming at the humidity factor under the current automatic control scene is obtained_{Is provided with}Then, the following steps a to B are performed.

Step A, detecting and obtaining detection information A of the humidity factor corresponding to the environment where the humidifying device is located_MeasuringAnd proceeds to step B.

Step B, if the humidifying device does not receive the artificial adjustment and setting action aiming at the humidity factor, defining and updating A_{Is provided with}To A_MeasuringIn the current automation control scene corresponding to the humidifying device, the environmentKeeping the current working state of the humidifying device corresponding to the tolerance fluctuation range of the humidity factor, and then returning to the step A;

if the humidifying device receives the manual adjustment and setting action aiming at the humidity factor, the new setting information aiming at the humidity factor is used for updating the setting information A aiming at the humidity factor of the humidifying device under the current automatic control scene_{Is provided with}And controlling the humidifying device to work in the working state corresponding to the newly set humidity factor information, and then returning to the step A.

Based on the above design, similarly to the case when the electrical appliance is an air purification device, the electrical appliance work purpose factor is each air index factor under each function corresponding to the air purification device, and the designed electronic appliance control method based on reinforcement learning is also executed for optimizing and updating the new automatic control scene corresponding to the electrical appliance, so that the electrical appliance is more humanized to adapt to the use habit of the user, and all the electrical appliances with the scene automatic control function become more intelligent.

In practical application, the electronic appliance control method based on reinforcement learning provided by the invention can be applied to security alarm equipment with a scene automatic control function, such as door magnets, door locks, cameras, one-key alarms and other equipment, different control methods for the equipment are realized aiming at each state of the equipment based on the working process of each initial automatic control scene corresponding to the equipment, specifically, the equipment receives manual opening and closing actions in different time periods, and the addition, deletion and updating of the automatic control scene corresponding to the equipment are realized; and moreover, the automatic control scene corresponding to the equipment can be newly added, deleted and updated by combining environmental information and intelligent portable equipment (a bracelet and a watch detect the conditions of falling asleep, coma, falling and the like) while the equipment receives manual opening and closing actions, so that the optimized updating of the respective automatic control scene corresponding to the security equipment is obtained.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. The electronic appliance control method based on reinforcement learning is used for respectively controlling each electric appliance with a scene automatic control function; the method is characterized in that different control methods for the electric appliances are realized for the electric appliances respectively based on the working process of the electric appliances according to the corresponding initial automatic control scenes of the electric appliances and for the following states of the electric appliances;

the state 2. the electric appliance is in the working state, if the electric appliance receives the manual closing action and stops working, aiming at the current time and the closing action of the electric appliance, combining the environment of the electric appliance corresponding to the working purpose factor of the electric appliance and the detection information before the electric appliance stops working, forming a new automatic control scene corresponding to the electric appliance; if the electric appliance does not receive the manual closing action, the electric appliance with the adjustable work purpose factor function is defined as a factor A, and the setting information A of the electric appliance for the factor A in the current automatic control scene is obtained_{Is provided with}Then, executing the following steps A to B, taking the artificial adjustment action as an intervention original point, changing the working state in the automatic control scene of the electric appliance based on the original point and combining the change of the factor A corresponding to the environment, and realizing the optimization of the automatic control scene from the original point corresponding to the electric appliance to the fluctuation tolerance range of the environmental factor;

2. The reinforcement learning-based electronic appliance control method according to claim 1, wherein: based on different control methods for the electrical appliances, which are respectively realized according to the state 1 and the state 2, respectively aiming at the electrical appliances, updating and obtaining each automatic control scene corresponding to each electrical appliance respectively, forming each automatic control scene corresponding to each electrical appliance in the environment, further carrying out reinforcement learning algorithm modeling according to each automatic control scene corresponding to each electrical appliance in a geographical area with preset size and each environment, obtaining algorithm models of each automatic control scene with the highest occupation ratio corresponding to each electrical appliance in the geographical area, using the algorithm models as algorithm models of each initial automatic control scene corresponding to each electrical appliance in the geographical area, and combining and forming an initial set of algorithm models of the automatic control scenes of the electrical appliances corresponding to the geographical area, and the system is used for distributing the electrical appliances with scene automatic control functions in the newly set environment in the geographic area to perform application.

3. The reinforcement learning-based electronic appliance control method according to claim 1, wherein: the electrical appliance and the working purpose factor of the electrical appliance comprise that the electrical appliance is a lighting device, and the working purpose factor of the electrical appliance is a brightness factor; the electric appliance is a refrigerating and heating device, and the working purpose factor of the electric appliance is a temperature factor; the electric appliance is a humidifying device, and the working purpose factor of the electric appliance is a humidity factor; the electric appliance is an air purification device, and the working objective factors of the electric appliance are air index factors under each function corresponding to the air purification device.

4. An electric appliance control device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the electric appliance control device is respectively connected with the electric appliances with scene automatic control function, and the processor implements the steps of the method of any one of claims 1 to 3 when executing the computer program.

5. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 3.