WO2018213999A1 - Procédé d'apprentissage d'appareil domestique, et serveur - Google Patents

Procédé d'apprentissage d'appareil domestique, et serveur Download PDF

Info

Publication number
WO2018213999A1
WO2018213999A1 PCT/CN2017/085385 CN2017085385W WO2018213999A1 WO 2018213999 A1 WO2018213999 A1 WO 2018213999A1 CN 2017085385 W CN2017085385 W CN 2017085385W WO 2018213999 A1 WO2018213999 A1 WO 2018213999A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
state
indoor environment
matrix
parameter
Prior art date
Application number
PCT/CN2017/085385
Other languages
English (en)
Chinese (zh)
Inventor
谢毅
张鹏程
张晴晴
Original Assignee
深圳微自然创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳微自然创新科技有限公司 filed Critical 深圳微自然创新科技有限公司
Priority to PCT/CN2017/085385 priority Critical patent/WO2018213999A1/fr
Priority to CN201780003362.8A priority patent/CN108419439B/zh
Publication of WO2018213999A1 publication Critical patent/WO2018213999A1/fr

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/50Control or safety arrangements characterised by user interfaces or communication
    • F24F11/56Remote control
    • F24F11/58Remote control using Internet communication
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/62Control or safety arrangements characterised by the type of control or by internal processing, e.g. using fuzzy logic, adaptive control or estimation of values
    • F24F11/63Electronic processing
    • F24F11/64Electronic processing using pre-stored data
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/70Control systems characterised by their outputs; Constructional details thereof
    • F24F11/72Control systems characterised by their outputs; Constructional details thereof for controlling the supply of treated air, e.g. its pressure
    • F24F11/74Control systems characterised by their outputs; Constructional details thereof for controlling the supply of treated air, e.g. its pressure for controlling air flow rate or air velocity
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/89Arrangement or mounting of control or safety devices
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/10Temperature
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/20Humidity

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a home device learning method and a server.
  • Embodiments of the present invention provide a method for learning a home device, which can quickly adjust an indoor environment to an expected state.
  • An embodiment of the present invention provides a method for learning a home device, including:
  • the operation set includes at least one type of adjustment operation
  • determining a target operation set to be selected according to the target matrix determining a target operation set to be selected according to the target matrix, generating a corresponding control instruction, and transmitting the control instruction to the environment adjustment device, the control instruction instructing the environment adjustment device to perform the The action specified by the target action collection;
  • the previously located second state, the first state, and the target state calculate a target value corresponding to the target operation set, and the target matrix is updated using the target value.
  • the method before the constructing the target matrix, the method further includes:
  • first indoor environment parameter represents the first state
  • first state is an initial indoor environment state
  • the constructing the target matrix includes:
  • the determining, by using the preset policy selection mechanism, the target operation set to be selected according to the target matrix includes:
  • the N operation sets corresponding to the N elements having the largest value are filtered out from the first row of the target matrix by a probability ⁇ , and an operation set is randomly selected from the N operation sets as the target Manipulating the set, the N is an integer greater than 1, the N elements do not include the element with the largest value; the operation set corresponding to the element having the largest value is selected from the first row by the probability 1- ⁇ , as the Target action collection.
  • the determining that the indoor environment does not reach the target state includes:
  • the updating the target matrix by using the target value includes:
  • Q(s t , a t ) on the left side of the equation is a parameter value corresponding to the target operation set after the target matrix is updated
  • Q(s t , a t ) on the right side of the equation is the target operation set in the a parameter value corresponding to the target matrix before the update
  • the ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is in the second The largest parameter value among the various parameter values corresponding to all the operation sets that can be selected in the state.
  • the second embodiment of the present invention provides a server, including:
  • a matrix construction unit configured to construct a target matrix, where a first row element of the target matrix is a parameter value corresponding to at least two operation sets selectable to adjust an indoor environment from a first state to a target state, and the parameter value is The higher the likelihood that the indoor environment is adjusted from the first state to the target state, the operation set including at least one type of adjustment operation;
  • a determining unit configured to determine, by using a preset policy selection mechanism, a target operation set to be selected according to the target matrix
  • a generating unit configured to generate, according to the target operation set, a corresponding control instruction, where the control instruction instructs the environment adjustment apparatus to perform an operation specified by the target operation set;
  • a sending unit configured to send the control instruction to the environment adjusting device
  • the determining unit is further configured to determine that the indoor environment does not reach the target state, and is further configured to determine that the indoor environment reaches the target state;
  • a calculating unit configured to calculate the target operation set according to the second state, the first state, and the target state in which the indoor environment is currently located, if it is determined that the indoor environment does not reach the target state Corresponding target value;
  • An update unit for updating the target matrix using the target value is
  • the server further includes:
  • An acquiring unit configured to acquire a first indoor environment parameter and an outdoor environment parameter, where the first indoor environment parameter represents the first state, the first state is an initial indoor environment state; and the obtaining is compared with the outdoor environment parameter Corresponding target indoor environment parameters, the target indoor environment parameters characterizing the target state.
  • the matrix construction unit is specifically configured to acquire, corresponding to the at least two operation sets that are selectable by adjusting the indoor environment from the first state to the target state. Constructing the target matrix by the parameter value;
  • the matrix construction unit is specifically configured to determine, according to the relationship between the at least two operation sets selectable in the first state and the target state, the corresponding at least two operation sets The parameter values are constructed, and the target matrix is constructed, and the state specified by the selectable at least two operation sets and the target state are closer to their corresponding parameter values.
  • the determining unit is specifically configured to select, from the first row of the target matrix, an operation set corresponding to an element with the largest value as the target operation set;
  • the determining unit is specifically configured to filter, by using the probability ⁇ , the N operation sets corresponding to the N elements with the largest value from the first row of the target matrix, and randomly select from the N operation sets.
  • An operation set, as the target operation set, the N is an integer greater than 1, the N elements do not include an element having the largest value; and the element having the largest value is selected from the first row by a probability 1- ⁇ A corresponding set of operations as the target operation set.
  • the determining unit is specifically configured to: after the preset time for sending the control instruction, determine that the second state in which the indoor environment is currently not reaching the target state .
  • the updating unit is specifically configured to update the target matrix by using the following formula:
  • Q(s t , a t ) on the left side of the equation is a parameter value corresponding to the target operation set after the target matrix is updated
  • Q(s t , a t ) on the right side of the equation is the target operation set in the a parameter value corresponding to the target matrix before the update
  • the ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is in the second The largest parameter value among the various parameter values corresponding to all the operation sets that can be selected in the state.
  • the third embodiment of the present invention further provides a server, including: a processor, a receiver, a transmitter, and a memory; an executable program is stored in the memory; and the processor implements the foregoing by executing the executable program.
  • a server including: a processor, a receiver, a transmitter, and a memory; an executable program is stored in the memory; and the processor implements the foregoing by executing the executable program.
  • the target matrix is constructed, and a corresponding operation set is selected according to the target matrix by using a preset policy selection mechanism.
  • the first row element of the target matrix is to adjust the indoor environment from the first state to the target state.
  • the parameter values corresponding to at least two operation sets; the algorithm of the reinforcement learning is used to continuously optimize the target matrix, and the operation set is selected according to the optimized target matrix, so that the indoor environment can quickly reach the target state.
  • FIG. 1 is a schematic structural diagram of a system according to an embodiment of the present invention.
  • FIG. 2 is a schematic flow chart of a method for learning a home device according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of constructing a target matrix according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a method for learning a home device according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a system according to an embodiment of the present invention.
  • the sensor in FIG. 1 can represent a plurality of sensors, such as a temperature sensor, a humidity sensor, a light intensity sensor, etc., for collecting temperature, humidity, light intensity, wind speed, and the like.
  • the sensor in Figure 1 can be located in the environmental conditioning device, or it can be installed in other devices, and the collected data can be uploaded to the server through the network.
  • the server in Figure 1 can communicate with the terminal device over the network.
  • the terminal device in FIG. 1 such as a smart phone or a tablet computer, can receive a control command sent by the server and send and receive a control command to the environment adjustment device.
  • the environment adjusting device in FIG. 1 can perform a corresponding operation according to a control command sent by the terminal device.
  • An embodiment of the present invention provides a method for learning a home device, as shown in FIG. 2, including:
  • the first row element of the target matrix is a parameter value corresponding to at least two operation sets selectable to adjust the indoor environment from the first state to the target state, and the larger the parameter value is to adjust the indoor environment from the first state to the foregoing
  • the above target matrix has at least one row.
  • the indoor environment may be in a car, inside an aircraft, in a ship, or the like.
  • the first state is a state in which the indoor environment is currently located, for example, (26 ° C, 67%, strong), the first parameter represents the current temperature of the indoor environment, and the second parameter represents the current humidity of the indoor environment.
  • the third parameter indicates the current indoor wind speed of the above indoor environment.
  • the indoor wind speed can be divided into three levels: weak, medium and strong according to the intensity of the air-conditioning wind speed.
  • the target state may be an ideal indoor environment state determined according to outdoor environmental parameters. Specifically, the target state may be determined according to a correspondence between an outdoor environment parameter and an indoor environment parameter. For example, outdoor environmental parameters (16 ° C, 37%) can correspond to indoor environmental parameters (26 ° C, 47%), outdoor environmental parameters (36 ° C, 37%) can be compared with indoor environmental parameters (28 ° C, 60%) correspond.
  • the above operation set corresponds to the working state of the environment adjusting device, taking the air conditioner as an example, and the corresponding operation set may be (air conditioning temperature, air conditioning mode, air conditioning wind speed).
  • the air conditioning mode may include cooling, dehumidification, automatic, air supply, heating, and the like.
  • the at least two sets of operations selectable above are the set of operations currently selectable by the environment adjustment device, and may not be limited to the set of operations capable of achieving the target state.
  • the first state is (28 ° C, 60%, strong)
  • the target state is (22 ° C, 50%, strong)
  • the at least two sets of operations selectable above may be (22 ° C, dehumidification, Strong), (21 ° C, dehumidification, strong), (26 ° C, refrigeration, weak), etc., where (26 ° C, refrigeration, weak) this set of operations can not make the above indoor environment reach the above target state.
  • the above selectable at least two sets of operations may also be limited to a set of operations capable of achieving the above target state, which may reduce the number of selectable sets of operations and improve the adjustment efficiency.
  • the first state is (28 ° C, 60%, strong)
  • the target state is (22 ° C, 50%, strong)
  • the above selectable at least two operation sets may not be (26 ° C, dehumidification) , weak), etc., because (26 ° C, dehumidification, weak) this set of operations can not make the above indoor environment reach the above target state.
  • the above target matrix may be a Q matrix, and the first state and the target state may be understood as being located in a state set, and the at least two selectable operations may be understood as a set of actions, and the parameter value may be understood as a bonus value.
  • the rows represent different states, and the columns represent different sets of operations.
  • the elements in the matrix are rewards that reach the target state after executing the set of operations represented by the column in which they are located from the state represented by the row in which they are located.
  • the value is the Q value, such as the first row of the first column element
  • the prime value represents a bonus value in which the first operation set is executed in the first state to reach the target state.
  • determining a target operation set to be selected according to the target matrix determining a target operation set to be selected according to the target matrix, generating a corresponding control instruction, and transmitting the control instruction to the environment adjustment device, where the control instruction instructs the environment adjustment device to perform the target operation set.
  • the specified operation determining a target operation set to be selected according to the target matrix, generating a corresponding control instruction, and transmitting the control instruction to the environment adjustment device, where the control instruction instructs the environment adjustment device to perform the target operation set.
  • the above-described environmental conditioning device may be an air conditioner, an air cleaner, a humidifier, a dehumidifier, or the like.
  • two methods for selecting a target operation set are provided, as follows: selecting an operation set corresponding to an element having the largest value from the first row of the target matrix as the target operation set;
  • the N operation sets corresponding to the N elements having the largest value are filtered out from the first row of the target matrix by the probability ⁇ , and an operation set is randomly selected from the N operation sets as the target operation set.
  • the above N is an integer greater than 1, and the N elements do not include the element having the largest value; and the operation set corresponding to the element having the largest value is selected from the first row by the probability 1- ⁇ as the target operation set.
  • the first method is to select the operation set corresponding to the element with the largest value from the first row of the target matrix. This method is simple to calculate, and when the target matrix approaches convergence, there is a large probability to find the best operation set.
  • the second method is that the probability 1- ⁇ selects an operation set corresponding to the element with the largest value from the first row, and as the target operation set, randomly selects an operation set from the N operation sets as the target operation set by probability ⁇ . There is a certain probability that the parameter value is not the largest operation set. When the above target matrix is far away from convergence, the speed of finding a better operation set is improved.
  • the server may send the foregoing control instruction to the environment adjustment device by using a terminal device such as a mobile phone.
  • the terminal device may be bound to the environment adjustment device and send the control command to the environment adjustment device by transmitting an infrared signal or the like.
  • two methods for selecting a target operation set are proposed, and a corresponding method may be selected according to the convergence of the target matrix to improve the speed of finding a preferred operation set.
  • whether the indoor environment reaches the target state may be detected according to the preset time interval, as follows: the foregoing determining that the indoor environment does not reach the target state includes:
  • the preset time may be 15 minutes, 20 minutes, 30 minutes, or the like.
  • the server starts timing. After the time reaches 20 minutes, the current indoor environment parameter is acquired, and it is determined whether the second state currently in the indoor environment reaches the target state.
  • the situation that the indoor environment does not reach the target state can be determined in time, so as to timely adjust the working state of the environmental adjustment device.
  • the target matrix is constructed, and a corresponding operation set is selected according to the target matrix by using a preset policy selection mechanism.
  • the first row element of the target matrix is to adjust the indoor environment from the first state to the target state.
  • the parameter values corresponding to at least two operation sets; the algorithm of reinforcement learning is used to continuously optimize the target matrix, and a better operation set is determined according to the optimized target matrix, so that the indoor environment can quickly reach the target state.
  • the server obtains the target indoor environment parameter by using the obtained outdoor environment parameter, as follows: Before the foregoing constructing the target matrix, the method further includes:
  • the server may obtain the first indoor environment parameter by using a sensor located indoors, and the outdoor environment parameter may be obtained by a sensor located outdoors or from another server.
  • the target state may be an ideal indoor environment state determined according to the outdoor environment parameter. Specifically, the target state may be determined according to the corresponding relationship between the outdoor environment parameter and the indoor environment parameter, where the corresponding relationship may be pre-stored in the server, and the correspondence relationship of different users may be different; the corresponding relationship may also be Determined by statistical analysis of multiple indoor environmental parameters. For example, when the outdoor temperature is 36 ° C and the humidity is 47%, the indoor environment is at the temperature of 26 ° C and the humidity is 40%.
  • the longest or longest time determines that the outdoor parameters (36 ° C, 47%) correspond to the indoor parameters (26 ° C, 40%).
  • the focus of the embodiments of the present invention is not how to determine the target indoor environmental parameters according to the outdoor environmental parameters, which will not be described in detail herein.
  • the target indoor environmental parameters can be accurately determined to meet the needs of different users.
  • the foregoing construction target matrix includes:
  • the first method is to acquire the parameter value corresponding to the at least two operation sets selected by adjusting the indoor environment from the first state to the target state from the target matrix that has been saved by the server, and construct the target matrix;
  • the second method is to determine the target matrix corresponding to the at least two operation sets that are selectable according to the relationship between the at least two operation sets that are selectable in the first state and the target state, and construct the target matrix.
  • the above set of operations includes at least one parameter representing the final state. For example, a set of operations is (26 ° C, dehumidified, strong), where 26 ° C is the final state of the temperature corresponding to the set of operations. For example, as shown in FIG.
  • the current temperature is 18 ° C
  • the target temperature is 21 ° C
  • the temperatures in the operation sets of different columns are different, such as the temperature in the operation set corresponding to the first column is 17 ° C, the second column
  • the temperature in the corresponding operation set is 18 ° C, and so on, it can be seen that the temperature corresponding to the operation set and the target temperature are closer to the parameter value.
  • the embodiment of the present invention can determine the proximity of the state specified by the operation set to the target state in a plurality of manners, which is not limited herein. For example, parameter values of at least two operation sets may be initialized according to preset rules.
  • two methods for constructing a target matrix are provided, which can accelerate the convergence of the target matrix and reduce the time required to reach the target state.
  • a method for updating a target matrix is provided, as follows: the foregoing updating the target matrix by using the foregoing target value includes:
  • the Q(s t , a t ) on the left side of the expression is the parameter value corresponding to the target operation set after the target matrix is updated
  • the Q(s t , a t ) on the right side of the expression is the target operation set before the target matrix is updated.
  • the corresponding parameter value, the above ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is corresponding to all the operation sets selectable in the second state.
  • the above ⁇ and the above ⁇ are preset constants, and different values can be set according to different problems.
  • the convergence of the target matrix can be accelerated, and the time required to reach the target state can be reduced.
  • An embodiment of the present invention provides an application scenario.
  • the specific process is as follows: a user sends an adjustment indoor environment command to a server through an application program on a terminal device, such as a mobile phone; after receiving the adjustment indoor environment command, the server parses the adjustment indoor environment command.
  • the server acquires the current outdoor environment parameter and the indoor environment parameter of the user according to the identification information, and determines a corresponding target indoor environment parameter, that is, The indoor environment parameter corresponding to the user's thermal comfort zone; the server selects the adjustment operation by using the reinforcement learning algorithm, and generates a corresponding control command to send to the terminal device; the terminal device sends the control instruction to the environment adjustment device; the environment The adjusting device performs an adjustment operation specified by the control instruction; the server detects a current state of the indoor environment after transmitting the preset time of the control command, and updates a target matrix, that is, a Q matrix, and sends a new control command; the server Keep updating this The target matrix until the parameters of the indoor environment are the same as the target indoor parameters.
  • An embodiment of the present invention provides another method for learning a home device, as shown in FIG. 4, including:
  • the first indoor environment parameter characterizes the first state, and the first state is an initial indoor environment state.
  • the target indoor environmental parameters described above characterize the target state.
  • the target matrix is constructed, and a corresponding operation set is selected according to the target matrix by using a preset policy selection mechanism.
  • the first row element of the target matrix is to adjust the indoor environment from the first state to the target state.
  • the parameter values corresponding to at least two operation sets; the algorithm of reinforcement learning is used to continuously optimize the target matrix, and a better operation set is determined according to the optimized target matrix, so that the indoor environment can quickly reach the target state and save power.
  • An embodiment of the present invention provides a server, as shown in FIG. 5, including:
  • the matrix construction unit 501 is configured to construct a target matrix, where the first row element of the target matrix is a parameter value corresponding to at least two operation sets that can be adjusted from the first state to the target state, and the parameter value is larger.
  • the operation set includes at least one type of adjustment operation;
  • a determining unit 502 configured to determine, by using a preset policy selection mechanism, a target operation set to be selected according to the target matrix
  • the generating unit 503 is configured to generate a corresponding control instruction according to the target operation set, where the control instruction instructs the environment adjusting device to perform an operation specified by the target operation set;
  • the sending unit 504 is configured to send the foregoing control instruction to the environment adjusting device;
  • the determining unit 502 is further configured to determine that the indoor environment does not reach the target state, and is further configured to determine that the indoor environment reaches the target state;
  • the calculating unit 505 is configured to, when determining that the indoor environment does not reach the target state, calculate a target value corresponding to the target operation set according to the second state, the first state, and the target state currently in the indoor environment;
  • the updating unit 506 is configured to update the target matrix by using the target value.
  • the server obtains the target indoor environment parameter by using the obtained outdoor environment parameter, as follows: as shown in FIG. 6, the server further includes:
  • the acquiring unit 601 is configured to obtain a first indoor environment parameter and an outdoor environment parameter, where the first indoor environment parameter represents the first state, the first state is an initial indoor environment state, and the target corresponding to the outdoor environment parameter is acquired.
  • the indoor environmental parameter, the target indoor environmental parameter characterizes the target state.
  • the target indoor environmental parameters can be accurately determined to meet the needs of different users.
  • the matrix construction unit 501 is specifically configured to obtain the selection of the indoor environment from the first state to the target state. Constructing the target matrix by using the parameter values corresponding to the at least two operation sets;
  • the matrix construction unit 501 is configured to determine, according to the relationship between the at least two operation sets that are selectable in the first state and the target state, the parameter values corresponding to the at least two operation sets that are selectable, and construct In the above target matrix, the state specified by the at least two selectable operation sets and the target state are closer to the corresponding parameter value.
  • two methods for constructing a target matrix are provided, which can accelerate the convergence of the target matrix and reduce the time required to reach the target state.
  • the determining unit 502 is specifically configured to select an operation set corresponding to an element with the largest value from the first row of the target matrix. , as the above target operation set;
  • the determining unit 502 is specifically configured to filter, by using the probability ⁇ , the N operation sets corresponding to the N elements having the largest value from the first row of the target matrix, and randomly select an operation from the N operation sets.
  • a set, as the target operation set, the above N is an integer greater than 1, and the N elements do not include an element having the largest value; and the operation set corresponding to the element having the largest value is selected from the first row by a probability 1- ⁇ The above set of target operations.
  • two methods for selecting a target operation set are proposed, and a corresponding method may be selected according to the convergence of the target matrix to improve the speed of finding a preferred operation set.
  • the indoor environment may be detected according to a preset time interval, as follows:
  • the determining unit 502 is specifically configured to determine the indoor after the preset time of sending the control command.
  • the above second state in which the environment is currently located does not reach the above target state.
  • the situation that the indoor environment does not reach the target state can be determined in time, so as to timely adjust the working state of the environmental adjustment device.
  • a method for updating the target matrix is provided, as follows:
  • the update unit 506 is specifically configured to update the target matrix by using the following formula:
  • the Q(s t , a t ) on the left side of the expression is the parameter value corresponding to the target operation set after the target matrix is updated
  • the Q(s t , a t ) on the right side of the expression is the target operation set before the target matrix is updated.
  • the corresponding parameter value, the above ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is corresponding to all the operation sets selectable in the second state. The largest of the various parameter values.
  • the convergence of the target matrix can be accelerated, and the time required to reach the target state can be reduced.
  • FIG. 7 is a server provided by an embodiment of the present invention.
  • the server includes a processor 701 (the number of processors 701 may be one or more, and one processor in FIG. 7 is taken as an example), and the memory 702.
  • the receiver 703, the transmitter 704, in some embodiments of the present invention, the processor 701, the memory 702, the receiver 703, and the transmitter 704 may be connected by a bus or other means.
  • Memory 702 includes, but is not limited to, random access memory (RAM), read only memory (ROM), An erasable programmable read only memory (EPROM or flash memory), or a portable read only memory (CD-ROM), which is used for related instructions and data.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD-ROM portable read only memory
  • the memory 702 is also used to store a target matrix.
  • the operation set includes at least one type of adjustment operation; using a preset policy selection mechanism, determining the target operation set to be selected according to the target matrix, and generating a corresponding control instruction, Transmitting the control command to the environment adjustment device, the control command instructing the environment adjustment device to perform an operation specified by the target operation set; and determining that the indoor environment does not reach the target state, according to the current indoor environment
  • the two states, the first state, and the target state calculate a target value corresponding to the target operation set, and the target matrix is updated using the target value.
  • the server obtains the target indoor environment parameter by using the acquired outdoor environment parameter, as follows: the processor 701 is further configured to acquire the first indoor environmental parameter and the outdoor before the constructing the target matrix.
  • the first parameter is characterized by the first indoor environment parameter, wherein the first state is an initial indoor environment state; the target indoor environment parameter corresponding to the outdoor environment parameter is acquired, and the target indoor environment parameter represents the target state.
  • the target indoor environmental parameters can be accurately determined to meet the needs of different users.
  • two methods for constructing a target matrix are provided, as follows: the processor 701 is specifically configured to obtain the foregoing that the indoor environment is adjusted from the first state to the target state. Constructing the target matrix according to the parameter values corresponding to the at least two operation sets; or specifically, determining, according to the relationship between the at least two operation sets that are selectable in the first state and the target state, determining the at least the selectable The target parameter matrix corresponding to the two operation sets is configured to construct the target matrix, and the state specified by the at least two selectable operation sets and the target state are closer to the corresponding parameter value.
  • two methods for constructing a target matrix are provided, which can accelerate the convergence of the target matrix and reduce the time required to reach the target state.
  • two methods for selecting a target operation set are provided, as follows:
  • the processor 701 is specifically configured to select an operation set corresponding to an element with the largest value from the first row of the target matrix.
  • the N operation groups corresponding to the N elements having the largest value are filtered out from the first row of the target matrix by a probability ⁇ , and randomly selected from the N operation sets.
  • An operation set, as the target operation set, the above N is an integer greater than 1, the N elements do not include an element having the largest value; and the operation set corresponding to the element having the largest value is selected from the first row by a probability 1- ⁇ , as a collection of the above target operations.
  • two methods for selecting a target operation set are proposed, and a corresponding method may be selected according to the convergence of the target matrix to improve the speed of finding a preferred operation set.
  • the indoor environment may be detected according to a preset time interval, as follows:
  • the processor 701 is specifically configured to determine the indoor after the preset time of sending the control command.
  • the above second state in which the environment is currently located does not reach the above target state.
  • the situation that the indoor environment does not reach the target state can be determined in time, so as to timely adjust the working state of the environmental adjustment device.
  • a method for updating a target matrix is provided, as follows:
  • the processor 701 is specifically configured to update the target matrix by using the following formula:
  • the Q(s t , a t ) on the left side of the expression is the parameter value corresponding to the target operation set after the target matrix is updated
  • the Q(s t , a t ) on the right side of the expression is the target operation set before the target matrix is updated.
  • the corresponding parameter value, the above ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is corresponding to all the operation sets selectable in the second state. The largest of the various parameter values.
  • the convergence of the target matrix can be accelerated, and the time required to reach the target state can be reduced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Mechanical Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Fluid Mechanics (AREA)
  • Human Computer Interaction (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

Les modes de réalisation de la présente invention appartiennent au domaine technique des ordinateurs, et concernent un procédé d'apprentissage d'appareil domestique et un serveur. Le procédé consiste : à construire une matrice cible; à utiliser un mécanisme de sélection de directive préconfiguré afin de déterminer, en fonction de la matrice cible, une collecte d'opérations cibles sélectionnées, à générer une instruction de commande correspondante, et à envoyer l'instruction de commande à un dispositif de conditionnement d'environnement, l'instruction de commande ordonnant au dispositif de conditionnement d'environnement d'exécuter des opérations spécifiées dans la collecte d'opérations cibles; et si un environnement intérieur est déterminé comme n'étant pas dans un état cible, alors à calculer, selon un second état dans lequel l'environnement intérieur est actuellement, un premier état et l'état cible, une valeur cible correspondant à la collecte d'opérations cibles, et à utiliser la valeur cible afin de mettre à jour la matrice cible. La solution, dans les modes de réalisation de la présente invention, permet un conditionnement rapide d'un environnement intérieur afin qu'il soit dans un état attendu.
PCT/CN2017/085385 2017-05-22 2017-05-22 Procédé d'apprentissage d'appareil domestique, et serveur WO2018213999A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/085385 WO2018213999A1 (fr) 2017-05-22 2017-05-22 Procédé d'apprentissage d'appareil domestique, et serveur
CN201780003362.8A CN108419439B (zh) 2017-05-22 2017-05-22 家用设备学习方法、及服务器

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/085385 WO2018213999A1 (fr) 2017-05-22 2017-05-22 Procédé d'apprentissage d'appareil domestique, et serveur

Publications (1)

Publication Number Publication Date
WO2018213999A1 true WO2018213999A1 (fr) 2018-11-29

Family

ID=63126496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/085385 WO2018213999A1 (fr) 2017-05-22 2017-05-22 Procédé d'apprentissage d'appareil domestique, et serveur

Country Status (2)

Country Link
CN (1) CN108419439B (fr)
WO (1) WO2018213999A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111505944A (zh) * 2019-01-30 2020-08-07 珠海格力电器股份有限公司 节能控制策略学习方法、实现空调节能控制的方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110925969B (zh) * 2019-10-17 2020-11-27 珠海格力电器股份有限公司 一种空调控制方法、装置、电子设备及存储介质
CN113834200A (zh) * 2021-11-26 2021-12-24 深圳市愚公科技有限公司 基于强化学习模型的空气净化器调节方法及空气净化器

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160048142A1 (en) * 2014-08-15 2016-02-18 Delta Electronics, Inc. Intelligent air-conditioning controlling system and intelligent controlling method for the same
CN105737340A (zh) * 2016-03-09 2016-07-06 深圳微自然创新科技有限公司 一种空调温度智能控制方法及装置
CN106247554A (zh) * 2016-08-16 2016-12-21 华南理工大学 基于人体热适应和气候特点的室内环境控制系统及方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103375869B (zh) * 2012-04-12 2015-12-02 珠海格力电器股份有限公司 空调器的控制方法、装置及空调器
CN105899885A (zh) * 2014-01-23 2016-08-24 三菱电机株式会社 空调机用控制器以及空气调节系统
CN105091202B (zh) * 2014-05-16 2018-04-17 株式会社理光 控制多个空调设备的方法和系统
CN105588251B (zh) * 2014-10-20 2018-10-02 株式会社理光 控制空气调节系统的方法和装置
US11156572B2 (en) * 2015-01-30 2021-10-26 Schneider Electric USA, Inc. Apparatuses, methods and systems for comfort and energy efficiency conformance in an HVAC system
CN105387565B (zh) * 2015-11-24 2018-03-30 深圳市酷开网络科技有限公司 调节温度的方法和装置
CN105548959B (zh) * 2015-12-07 2017-10-17 电子科技大学 一种基于稀疏重建的多传感器多目标的定位方法
CN106196423B (zh) * 2016-06-30 2018-08-24 西安建筑科技大学 一种基于模型预测的室内环境品质控制优化方法
CN106302041A (zh) * 2016-08-05 2017-01-04 深圳博科智能科技有限公司 一种智能家居设备控制方法及装置
CN106294881A (zh) * 2016-08-30 2017-01-04 五八同城信息技术有限公司 信息识别方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160048142A1 (en) * 2014-08-15 2016-02-18 Delta Electronics, Inc. Intelligent air-conditioning controlling system and intelligent controlling method for the same
CN105737340A (zh) * 2016-03-09 2016-07-06 深圳微自然创新科技有限公司 一种空调温度智能控制方法及装置
CN106247554A (zh) * 2016-08-16 2016-12-21 华南理工大学 基于人体热适应和气候特点的室内环境控制系统及方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111505944A (zh) * 2019-01-30 2020-08-07 珠海格力电器股份有限公司 节能控制策略学习方法、实现空调节能控制的方法及装置

Also Published As

Publication number Publication date
CN108419439B (zh) 2020-06-30
CN108419439A (zh) 2018-08-17

Similar Documents

Publication Publication Date Title
US10584892B2 (en) Air-conditioning control method, air-conditioning control apparatus, and storage medium
CN106842968B (zh) 一种控制方法、装置及系统
CN111121237B (zh) 空调设备及其控制方法、服务器、计算机可读存储介质
CN109600285B (zh) 一种智能家居中动态构建环境调节规则列表的方法及装置
JP6280569B2 (ja) 動作パラメータ値学習装置、動作パラメータ値学習方法、学習型機器制御装置及びプログラム
WO2018213999A1 (fr) Procédé d'apprentissage d'appareil domestique, et serveur
CN110895011B (zh) 一种空调控制方法、装置、存储介质及空调
CN110736248B (zh) 空调出风温度的控制方法和装置
CN111256325A (zh) 温度控制方法、空气调节设备及控制设备和存储介质
JP7039148B2 (ja) 制御システム、設備機器、リモートコントローラ、制御方法、及び、プログラム
CN113339965A (zh) 用于空调控制的方法、装置和空调
CN105511279B (zh) 家用电器远程控制方法及系统、家用电器和服务器
TWI679384B (zh) 空氣清淨機以及網路系統
CN109323403A (zh) 空调器及其控制方法和控制装置及电子设备
CN105241001A (zh) 一种参数调整方法及空调
EP3779618B1 (fr) Procédé de commande d'appareil intelligent, appareil, support d'informations informatique et appareil de commande d'appareil intelligent
JP2016029917A (ja) 栽培モニタリング装置、栽培モニタリング方法および栽培モニタリングプログラム
CN110986327A (zh) 空调器的睡眠模式控制方法与空调器
JP2021063611A (ja) 空気調和システム
KR20160071094A (ko) 차량의 공조 제어 방법 및 그 장치
CN115654647A (zh) 空调系统及其控制方法和装置、存储介质、电子设备
KR20200034017A (ko) 복수의 IoT 기기를 제어하는 서버, 사용자 단말 및 방법
JP6941819B2 (ja) 空気調和機の運転を開始させる方法および制御装置
CN111108489A (zh) 服务器、信息处理方法、网络系统以及空气净化器
JP2019207060A (ja) 空調制御装置、空調制御システム、空調制御方法及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17911204

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17911204

Country of ref document: EP

Kind code of ref document: A1