WO2023093388A1 - Air purifier adjusting method based on reinforcement learning model, and air purifier - Google Patents

Air purifier adjusting method based on reinforcement learning model, and air purifier Download PDF

Info

Publication number
WO2023093388A1
WO2023093388A1 PCT/CN2022/126407 CN2022126407W WO2023093388A1 WO 2023093388 A1 WO2023093388 A1 WO 2023093388A1 CN 2022126407 W CN2022126407 W CN 2022126407W WO 2023093388 A1 WO2023093388 A1 WO 2023093388A1
Authority
WO
WIPO (PCT)
Prior art keywords
air purifier
air
state
current
actions
Prior art date
Application number
PCT/CN2022/126407
Other languages
French (fr)
Chinese (zh)
Inventor
鲁峰
Original Assignee
深圳市愚公科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市愚公科技有限公司 filed Critical 深圳市愚公科技有限公司
Publication of WO2023093388A1 publication Critical patent/WO2023093388A1/en

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/62Control or safety arrangements characterised by the type of control or by internal processing, e.g. using fuzzy logic, adaptive control or estimation of values
    • F24F11/63Electronic processing
    • F24F11/64Electronic processing using pre-stored data
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/50Control or safety arrangements characterised by user interfaces or communication
    • F24F11/56Remote control
    • F24F11/58Remote control using Internet communication

Definitions

  • the invention relates to an air purifier adjustment method based on a machine learning algorithm-a reinforcement learning model, and an air purifier adopting the method.
  • User manual control is required - the user manually triggers the air purification by observing a series of factors such as the actual PM2.5 value on the air purifier or the app connected to the air purifier, his own work and rest time, whether to open the window, etc. Switch the control position of the controller to a high or low position or go directly to sleep;
  • an intelligent indoor air adjustment system and adjustment method based on environmental parameters are disclosed. Including window control terminal, air conditioning system, air purification system, distributed Wi-Fi network, data acquisition module and processing module, window control terminal, air conditioning system, air purification system, data acquisition module through distributed Wi-Fi network and The processing module is connected.
  • the regulation system in this document collects environmental parameters through the indoor temperature and humidity sensor, CO2 concentration sensor and indoor PM2.5 sensor, and uploads the data to the local server processing module through the distributed Wi-Fi network.
  • the local server processing module analyzes and calculates through a method based on reinforcement learning based on the obtained data and the set temperature and humidity, and gives the window control terminal, air conditioning system, and air purification equipment. Actions to change the indoor air conditions, in order to achieve the improvement of indoor air quality with the lowest energy consumption.
  • control of the air purification equipment is linked to the window control terminal and the air conditioning system. Although it can achieve the improvement of indoor air quality with the lowest energy consumption to a certain extent, the control is complicated, and the window needs to be considered. Control terminals, air-conditioning systems and other factors do not belong to the same equipment as air purifiers, which makes installation and use extremely inconvenient for users, and turns a simple purchase of household appliances into a small interior decoration project.
  • the present invention proposes an air purifier adjustment method based on a reinforcement learning model and an air purifier to improve the air purification effect.
  • An air purifier adjustment method based on a reinforcement learning model characterized in that it comprises the following steps: S1, according to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to open different The action or combination action A1; where the weight table refers to the comparison table between the action and its corresponding weight in a specific state; S2, after a predetermined time, obtain the weight of performing the action or combination action A1 in the state S1 Size, written into the weight table, to realize the update of the weight table; wherein the "predetermined time” is determined according to the area and floor height of the room; S3, follow-up with the updated weight table to perform step S1; wherein, the state is based on S1 is determined according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure.
  • Steps S1-S2 are the training phase
  • step S3 is the control phase.
  • steps S1-S2 are continuously performed, and it is judged whether the current training has reached the standard of the expected purpose, and if so, the current training is ended.
  • step S3 it is also judged whether the previous training model can no longer meet the needs of the current aerodynamic model, if so, then start retraining, perform different actions or combined actions An to obtain updated different weights according to the current state Sn of the air purifier value, update the weight table; where n is a natural number; the state is determined according to Sn according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure.
  • the weights corresponding to multiple actions are given in the weight table.
  • the action corresponding to the highest weight is found in the Q-table according to the state of the air purifier each time to perform the air purifier. control.
  • the state is determined according to S1 and Sn according to the current execution gear of the air cleaner and the current pollutant content in the air.
  • the weights are calculated as follows:
  • next_s represent the current state, current action, and next state respectively;
  • reward means reward, that is, the reward for performing action a
  • Q[s][a] represents value, that is, the value generated by action a in state s;
  • max(Q[next_s]) indicates the maximum value, that is, the maximum value of all actions in the next state
  • lr represents the learning rate (learning_rate), the larger the lr, the less the training effect before it is retained; when lr is 0, the value of Q[s, a] remains unchanged; when lr is 1, the original value is completely discarded;
  • Discount_factor the discount factor
  • the reward value after executing the control command in the current state is directly proportional to the reduction of air pollutants and inversely proportional to the executed gear.
  • the "certain time” is determined according to the area and floor height of the room.
  • the status of the air purifier is jointly determined according to the gear position of the air purifier and the content of air pollutants.
  • the air pollutant content is determined according to the following pollutant content: pm1, pm2.5 and pm10.
  • the Q-Learning algorithm is executed on the cloud, and the air purifier communicates with the cloud through the Internet.
  • the present invention also includes an air purifier, including a main control unit, an air sensor unit, and a memory, wherein computer software is stored in the memory, and the computer software can be executed to realize the above-mentioned method.
  • the beneficial effects of the present invention include: the present invention constantly monitors the content of air pollutants, so that the gear of the air purifier can always be adjusted to the most suitable gear instead of always on the maximum gear .
  • the improvement of the purification effect can be achieved through a single device without the cooperation of additional devices.
  • the effect is improved. About 65% of the air purifiers can filter to a good effect before, and after using the smart model, more than 85% can achieve a good purification effect;
  • the present invention can automatically adjust according to the purification effect, its adjustment can only be determined according to air parameters such as the content of air pollutants, and the influence of factors such as air conditioners and window controls can be reflected in the measurement results of air pollutants, etc., so there is no need to Consider factors such as air conditioners and window controls, so that the air purification effect can be improved only by the air purifier itself, without the need for coordinated action or parameter provision of additional devices, and it can also take into account other factors such as time factors and space factors. Really simple and efficient effect.
  • Fig. 1 is an implementation flow chart of a specific embodiment of the present invention.
  • FIG. 2 is a configuration diagram of a cloud server according to an embodiment of the present invention.
  • Fig. 3 is a hardware diagram of an embodiment of the present invention.
  • Fig. 4 is a schematic flowchart of the present invention.
  • Fig. 5A is a diagram of the purification effect of the AI mode of the embodiment of the present invention.
  • Fig. 5B is a purification effect diagram of the automatic mode in the prior art.
  • Fig. 6A is a purification efficiency diagram of the AI mode of the embodiment of the present invention.
  • Fig. 6B is a diagram of the purification efficiency of the automatic mode in the prior art.
  • FIG. 7A is a comparison chart of energy consumption in AI mode-APP report according to the embodiment of the present invention.
  • FIG. 7B is a comparison chart of energy consumption in the automatic mode in the prior art-APP report.
  • FIG. 7C is a third-party metering outlet report for weekly energy consumption reporting using the AI control of the present invention.
  • Fig. 7D is a weekly energy consumption report automatically controlled in the prior art - a third-party metering socket report.
  • Fig. 7E is a comparison chart of energy consumption between the AI control of the present invention and the automatic mode time-sharing control in the prior art - the third-party metering socket report.
  • orientation terms such as left, right, up, down, top, and bottom in this embodiment are only relative concepts, or refer to the normal use status of the product, and should not be regarded as having restrictive.
  • Time factor the time when the user uses the air purifier (spring, summer, autumn, winter, day or night).
  • the following embodiments of the present invention use an air purifier alone for AI-enhanced learning and control, which is not only easy to operate, but also improves the purification effect.
  • the basic idea is: according to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to start different actions or combined actions A1; where the weight table refers to the action in a specific state and the action it is in.
  • Corresponding weight comparison table after a predetermined time, obtain the weight value of the action A1 in state S1, write it into the weight table, and realize the update of the weight table; follow-up execution with the updated weight table Steps S1-S2, continuously perform different actions or combined actions An to obtain and update different weights according to the current state Sn of the air purifier, and update the weight table; where n is a natural number.
  • the combination action can be a combination of different gears, or an intermittent combination of the same gear. Its outline flowchart is shown in Fig. 4 .
  • the following table is an example of the relationship table between the state of pollutants in the air in the Q-Table and the gear weight of the air purifier during actual control:
  • the 0-99 value in the first column is the data obtained by dividing the PM2.5 value by 10 and rounding up, indicating multiple reading files from the PM value from 0-1000.
  • the method flow of this embodiment is shown in FIG. 1 , and the algorithm can be executed locally or in the cloud.
  • the composition of the cloud server is shown in FIG. 2 .
  • the air purifier According to the current state S1 of the air purifier, by letting the air purifier turn on different combination actions A1, after a period of time (time and space are related, as described below), observe whether the current control has an effective effect on the air quality Filtering, so as to obtain the weight value of performing A1 work under S1, and continuously obtain and update different weights according to the execution of different actions and write them into the Q-table.
  • next_s - current state current action, next state
  • reward (reward, the reward for performing an action
  • lr learning rate (learning_rate), the larger the lr, the less the training effect before the reservation; lr is 0, Q[s,a]
  • factor— the larger the factor, the more attention to historical experience
  • the reward value after executing the control command in the current state is directly proportional to the reduction of air pollutants, and inversely proportional to the executed gear (a larger gear value means that the power of the air purifier is higher).
  • the algorithm is as follows:
  • the determination method of "a period of time” is as follows: what the user enters is not the time, but the area and floor height of the room, so the time needs to be determined according to the area and floor height. Roughly when it is filtered to about 2/3 of the air capacity of the room.
  • An example is as follows: Assume that the air purifier has three gears of 1, 2, and 3, and the output speed of each gear is different. The first gear is 100 cubic meters per hour, and the second gear is 200 cubic meters per hour. , the third gear is 330 cubic meters per hour.
  • One hour is 3600 seconds, divide 100 by 3600 to get the volume of clean air output per second; if the room is 20 square meters, the floor height is 3 meters, then the space is about 60 cubic meters of air volume, if not Considering the replacement of indoor and outdoor air, when we filter about 2/3, we will observe a significant reduction in the content of air pollutants.
  • the waiting time is 2/3 of 60 cubic meters and then divided by the air filtration speed of the current gear.
  • the way to search for the AI control weight of the air purifier is to subdivide the state, the concentration of the current pollutant in the state, divide the concentration of the pollutant by 10 and round up, and the state of 35/10 is 3.
  • PM channels in this embodiment include: pm1, pm2.5, and pm10.
  • the sensor technical instructions are as follows:
  • Particle measurement range 1 0.1 ⁇ 1.0 Micron ( ⁇ m) Particle measurement range 2 1.0 ⁇ 2.5 Micron ( ⁇ m) Particle measurement range 3 2.5 ⁇ 10 Micron ( ⁇ m)
  • the air injected by the air purifier will affect the air flow in the space
  • the impact caused by the current gear can be understood as a parameter for us to evaluate the effect of the current gear (inversely proportional to the gear and directly proportional to the air purification effect);
  • the operation of the air purifier has the following characteristics: the feedback of the air quality is hysteresis, and it takes a period of time to see the effect; It is not necessarily the case that factors in time or space change.
  • Q-Learning is a value-based algorithm in the reinforcement learning algorithm.
  • Q is Q(s, a), which is the expectation that taking action a in the state at a certain moment can obtain benefits, and the environment will give feedback based on the action of the agent.
  • Corresponding reward rewards so the main idea of the algorithm is to construct a Q_table table to store the state and action to store the Q value, and then select the action that can obtain the maximum benefit according to the Q value.
  • Q-learning uses the time difference method (integrating Monte Carlo and dynamic programming) to learn off-policy, and uses the Bellman equation to solve the optimal strategy for the Markov process.
  • the AI control of the present invention can control the value of PM2.5 At lower levels, the cleansing effect is boosted.
  • Figure 5A is the AI mode
  • Figure 5B is the automatic mode.
  • the efficiency comparison is as follows: as shown in Figure 6A-6B, the AI control of the present invention can reduce the content of PM2.5 faster, and the purification efficiency has been improved.
  • Figure 6A is the AI mode
  • Figure 6B is the automatic mode.
  • Figure 7A-7B shows the energy consumption report in the APP, it is obvious that the AI mode of the present invention is more power-saving than the traditional mode; at the same time, from Figure 7C-7E
  • the report of the tripartite power meter socket also shows that the AI mode of the present invention is more energy-saving than the traditional mode.
  • Fig. 7C is the weekly energy consumption report using the AI control of the present invention
  • Fig. 7D is the weekly energy consumption report using the prior art automatic control. Consumption, 14:00-15:00 records the energy consumption of automatic opening.

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Mechanical Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

An air purifier adjusting method based on a machine learning algorithm-reinforcement learning model, and an air purifier using the method. The adjusting method comprises a training stage and a control stage. The training stage comprises the following steps: S1, looking up in a weight table according to the current state S1 of an air purifier, and controlling, by using a Q-learning algorithm, the air purifier to start different actions or combined actions A1, wherein the weight table refers to a comparison table of actions in a specific state and weights corresponding to the actions; and S2, acquiring, after lasting for a predetermined time, the magnitudes of weights of operations of executing the actions or combined actions A1 in the state S1, and writing the weights into the weight table to update the weight table, wherein the "predetermined time" is determined according to the area and storey height of a room. The control stage comprises the following step: S3, executing step S1 according to the updated weight table, wherein the state S1 is determined according to the current executed gear of the air purifier and parameters of the current air which can be measured by the air purifier.

Description

基于强化学习模型的空气净化器调节方法及空气净化器Air purifier adjustment method and air purifier based on reinforcement learning model 技术领域technical field
本发明涉及一种基于机器学习算法-强化学习模型的空气净化器调节方法,及一种采用该方法的空气净化器。The invention relates to an air purifier adjustment method based on a machine learning algorithm-a reinforcement learning model, and an air purifier adopting the method.
背景技术Background technique
传统的空气净化器有两种调节的方式:Traditional air purifiers have two adjustment methods:
A、需要用户手动控制——用户通过观察空气净化器上或者是与空气净化器联通的app上现实的PM2.5的数值、自己的作息时间、是否开窗等一系列因素,手动触发空气净化器的控制档位到高档位还是低档位或者直接进入休眠;A. User manual control is required - the user manually triggers the air purification by observing a series of factors such as the actual PM2.5 value on the air purifier or the app connected to the air purifier, his own work and rest time, whether to open the window, etc. Switch the control position of the controller to a high or low position or go directly to sleep;
B、使用自动化控制——通过空气净化器自身携带的污染物传感器,依据国家对污染物浓度的标准定义,设定阈值,当污染物浓度大于或者超过阈值,空气净化器自动增加档位,当浓度低于阈值,自动下降档位。B. Use automatic control - through the pollutant sensor carried by the air purifier itself, set the threshold according to the national standard definition of pollutant concentration. When the pollutant concentration is greater than or exceeds the threshold, the air purifier will automatically increase the gear. When When the concentration is lower than the threshold, the gear will be automatically lowered.
虽然理论上这套控制逻辑似乎是没错的:随着空气质量的变坏,自动控制会调整净化功率,空气质量越差,档位越高;空气质量越好,档位越低。这样在具备强大的过滤能力的同时,兼顾了节能和方便两个重要的因素。但是实际情况并非如此,按照自动模式,在实验室是可以快速地把空气质量由坏过滤到优的状态,但是本发明人通过已售物联网产品的大数据统计发现:Although in theory, this set of control logic seems to be correct: as the air quality gets worse, the automatic control will adjust the purification power. The worse the air quality, the higher the gear; the better the air quality, the lower the gear. In this way, while having a strong filtering ability, two important factors of energy saving and convenience are taken into account. But the actual situation is not the case. According to the automatic mode, the air quality can be quickly filtered from bad to excellent in the laboratory. However, the inventor found through the big data statistics of the sold Internet of Things products:
A、只有不到60%用户家里的空气净化器可以把空气调整到优;A. Only less than 60% of the air purifiers in the user's home can adjust the air to an optimal level;
B、同一个用户在位置环境不变的情况下,也不能保证100%的会始终把空气质量保持在优。B. The same user cannot guarantee 100% that the air quality will always be kept at an optimal level when the location and environment remain unchanged.
在公开号为CN107065582A、名称为《一种基于环境的参数的室内空气智能调节系统和调节方法》的中国专利申请文献中,公开了一种基于环境的参数的室内空气智能调节系统和调节方法,包括窗控终端、空调系统、空气净化系统,还包括分布式Wi-Fi网络、数据采集模块和处理模块,窗控终端、空调系统、空气净化系统、数据采集模块通过分布式Wi-Fi网络与处理模块连接,该文献中的调节系统,通过室内温湿度传感器、CO 2浓度传感器和室内PM2.5传感器收集环境参数,并把数据通过分布式Wi-Fi网络上传至本地服务器处理模块,通过电表把数据上传至本地服务器处理模块,然后本地服务器处理模块根据所获数据和设定温湿度的值通过一种基于强化学习的方法分析计算,给出窗控终端、空调系统、空气净化设备应做的动作,改变室内空气状况,以期在能耗最低的情况下实现室内空气质量的提高。 In the Chinese patent application document with the publication number CN107065582A and titled "An Intelligent Indoor Air Adjustment System and Adjustment Method Based on Environmental Parameters", an intelligent indoor air adjustment system and adjustment method based on environmental parameters are disclosed. Including window control terminal, air conditioning system, air purification system, distributed Wi-Fi network, data acquisition module and processing module, window control terminal, air conditioning system, air purification system, data acquisition module through distributed Wi-Fi network and The processing module is connected. The regulation system in this document collects environmental parameters through the indoor temperature and humidity sensor, CO2 concentration sensor and indoor PM2.5 sensor, and uploads the data to the local server processing module through the distributed Wi-Fi network. Upload the data to the local server processing module, and then the local server processing module analyzes and calculates through a method based on reinforcement learning based on the obtained data and the set temperature and humidity, and gives the window control terminal, air conditioning system, and air purification equipment. Actions to change the indoor air conditions, in order to achieve the improvement of indoor air quality with the lowest energy consumption.
该文献中将空气净化设备的动作的控制和窗控终端、空调系统挂钩,虽然一定程度能能实现在能耗最低的情况下实现室内空气质量的提高,但其控制复杂,所需要考虑的窗控终端、空调系统等因素和空气净化器设备不属于同一设备,造成用户安装使用极为不便,把一个简单的家用电器购置事务变成了一个小型的室内装修工程。In this document, the control of the air purification equipment is linked to the window control terminal and the air conditioning system. Although it can achieve the improvement of indoor air quality with the lowest energy consumption to a certain extent, the control is complicated, and the window needs to be considered. Control terminals, air-conditioning systems and other factors do not belong to the same equipment as air purifiers, which makes installation and use extremely inconvenient for users, and turns a simple purchase of household appliances into a small interior decoration project.
而且,该文献的技术方案的控制效果仍然不佳。Moreover, the control effect of the technical solution of this document is still not good.
发明内容Contents of the invention
为了弥补上述现有技术的不足,本发明提出一种基于强化学习模型的空气净化器调节方法 及空气净化器,提高空气净化效果。In order to make up for the above-mentioned deficiencies in the prior art, the present invention proposes an air purifier adjustment method based on a reinforcement learning model and an air purifier to improve the air purification effect.
本发明的技术问题通过以下的技术方案予以解决:Technical problem of the present invention is solved by following technical scheme:
1、一种基于强化学习模型的空气净化器调节方法,其特征在于,包括如下步骤:S1、根据空气净化器当前的状态S1,查找权值表,采用Q-Learning算法控制空气净化器开启不同的动作或组合动作A1;其中权值表是指在特定状态下动作与其所对应的权值的对照表;S2、持续预定时间以后,获取在状态S1下执行动作或组合动作A1工作的权值大小,写入权值表,实现对权值表的更新;其中“预定时间”根据房间的面积和层高确定;S3、后续以更新后的权值表执行步骤S1;其中,所述状态根据S1根据空气净化器当前的执行档位,以及空气净化器自身所能测得的当前空气中的参数确定。步骤S1-S2为训练阶段,步骤S3为控制阶段。1. An air purifier adjustment method based on a reinforcement learning model, characterized in that it comprises the following steps: S1, according to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to open different The action or combination action A1; where the weight table refers to the comparison table between the action and its corresponding weight in a specific state; S2, after a predetermined time, obtain the weight of performing the action or combination action A1 in the state S1 Size, written into the weight table, to realize the update of the weight table; wherein the "predetermined time" is determined according to the area and floor height of the room; S3, follow-up with the updated weight table to perform step S1; wherein, the state is based on S1 is determined according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure. Steps S1-S2 are the training phase, and step S3 is the control phase.
在一些实施例中,还包括如下改进:In some embodiments, the following improvements are also included:
在训练阶段,不断执行步骤S1-S2,并判断本次的训练是否达到了预期的目的的标准,如果达到,就结束本次训练。In the training phase, steps S1-S2 are continuously performed, and it is judged whether the current training has reached the standard of the expected purpose, and if so, the current training is ended.
在步骤S3中,还判断之前的训练模型是否已经不能满足当前空气动力模型的需求,如是,则开始重新训练,根据空气净化器当前的状态Sn执行不同的动作或组合动作An获取更新不同的权值,更新权值表;其中n为自然数;所述状态根据Sn根据空气净化器当前的执行档位,以及空气净化器自身所能测得的当前空气中的参数确定。In step S3, it is also judged whether the previous training model can no longer meet the needs of the current aerodynamic model, if so, then start retraining, perform different actions or combined actions An to obtain updated different weights according to the current state Sn of the air purifier value, update the weight table; where n is a natural number; the state is determined according to Sn according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure.
针对每一个状态Sn,权值表中给出多个动作所对应的权值,控制时,根据每次空气净化器的状态在Q-table中找到最高的权值对应的动作来进行空气净化器控制。For each state Sn, the weights corresponding to multiple actions are given in the weight table. During control, the action corresponding to the highest weight is found in the Q-table according to the state of the air purifier each time to perform the air purifier. control.
所述状态根据S1、Sn根据空气净化器当前的执行档位、当前空气中的污染物含量确定。The state is determined according to S1 and Sn according to the current execution gear of the air cleaner and the current pollutant content in the air.
权值的计算方式如下:The weights are calculated as follows:
Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))
表达式含义如下:The meaning of the expression is as follows:
s,a,next_s分别表示当前状态,当前动作,下一个状态;s, a, next_s represent the current state, current action, and next state respectively;
reward表示奖励,即执行a动作的奖励;reward means reward, that is, the reward for performing action a;
Q[s][a]表示价值,即状态s下,动作a产生的价值;Q[s][a] represents value, that is, the value generated by action a in state s;
max(Q[next_s])表示最大价值,即下一个状态下,所有动作价值的最大值;max(Q[next_s]) indicates the maximum value, that is, the maximum value of all actions in the next state;
lr表示学习速率(learning_rate),lr越大,保留之前训练效果越少;lr为0,Q[s,a]值不变;lr为1时,完全抛弃了原来的值;lr represents the learning rate (learning_rate), the larger the lr, the less the training effect before it is retained; when lr is 0, the value of Q[s, a] remains unchanged; when lr is 1, the original value is completely discarded;
factor表示折扣因子(discount_factor),factor越大,表示越重视历史的经验;factor为0时,只关心当前利益(reward)。factor represents the discount factor (discount_factor). The larger the factor is, the more attention is paid to historical experience; when the factor is 0, only the current benefit (reward) is concerned.
当前状态执行了控制指令后的reward数值跟空气污染物的降低成正比,跟执行的档位成反比。The reward value after executing the control command in the current state is directly proportional to the reduction of air pollutants and inversely proportional to the executed gear.
reward数值的算法如下:The algorithm of the reward value is as follows:
reward=(执行档位前的污染物浓度-执行该档位一段时间后的污染物浓度)/执行的档位;reward=(pollutant concentration before executing the gear - pollutant concentration after executing the gear for a period of time)/executed gear;
其中“一定时间”根据房间的面积和层高确定。Among them, the "certain time" is determined according to the area and floor height of the room.
空气净化器状态根据空气净化器的档位和空气污染物含量联合确定。The status of the air purifier is jointly determined according to the gear position of the air purifier and the content of air pollutants.
所述空气污染物含量根据如下污染物的含量确定:pm1、pm2.5和pm10。The air pollutant content is determined according to the following pollutant content: pm1, pm2.5 and pm10.
所述Q-Learning算法在云端执行,空气净化器通过互联网和云端通讯。The Q-Learning algorithm is executed on the cloud, and the air purifier communicates with the cloud through the Internet.
本发明还包括一种空气净化器,包括主控单元、空气传感器单元,以及存储器,所述存储器中存储有计算机软件,所述计算机软件可被执行以实现如上所述的方法。The present invention also includes an air purifier, including a main control unit, an air sensor unit, and a memory, wherein computer software is stored in the memory, and the computer software can be executed to realize the above-mentioned method.
本发明与现有技术对比的有益效果包括:本发明通过对空气污染物含量的不断监控,使得空气净化器的档位总能调整在最合适的档位上,而并非总开在最大档位。通过单一设备即可实现净化效果的提升,而无须额外设备的配合。Compared with the prior art, the beneficial effects of the present invention include: the present invention constantly monitors the content of air pollutants, so that the gear of the air purifier can always be adjusted to the most suitable gear instead of always on the maximum gear . The improvement of the purification effect can be achieved through a single device without the cooperation of additional devices.
通过实验表明,在一些实施例中,本发明获得了如下的技术效果:Show by experiment, in some embodiments, the present invention has obtained following technical effect:
a.效果提升,之前65%左右的空气净化器可以过滤到很好的效果,使用了智能模型后的,85%以上可以达到很好的净化效果;a. The effect is improved. About 65% of the air purifiers can filter to a good effect before, and after using the smart model, more than 85% can achieve a good purification effect;
b.效率提升,同样的空间和污染物,净化效率提升了20%;b. Efficiency improvement, the same space and pollutants, the purification efficiency increased by 20%;
c.能耗降低,能耗较之前降低了30%以上;c. Reduced energy consumption, which is more than 30% lower than before;
d.降低了手动干预,操作简便。d. Reduced manual intervention, easy to operate.
由于本发明能自动根据净化效果来进行调节,其调节可仅根据空气污染物的含量等空气参数确定,将空调、窗控等因素的影响体现在空气污染物等的测量结果里,因而无需再考虑空调、窗控等因素,这样仅靠空气净化器单机本身就能实现空气净化效果的提升,无需借助额外装置的协同动作或参数提供,并且还能兼顾考虑时间因素、空间因素等其他因素,真正达到简洁又高效的效果。Since the present invention can automatically adjust according to the purification effect, its adjustment can only be determined according to air parameters such as the content of air pollutants, and the influence of factors such as air conditioners and window controls can be reflected in the measurement results of air pollutants, etc., so there is no need to Consider factors such as air conditioners and window controls, so that the air purification effect can be improved only by the air purifier itself, without the need for coordinated action or parameter provision of additional devices, and it can also take into account other factors such as time factors and space factors. Really simple and efficient effect.
附图说明Description of drawings
图1是本发明的具体实施例的实现流程图。Fig. 1 is an implementation flow chart of a specific embodiment of the present invention.
图2是本发明实施例的云服务器构成图。FIG. 2 is a configuration diagram of a cloud server according to an embodiment of the present invention.
图3是本发明实施例的硬件图。Fig. 3 is a hardware diagram of an embodiment of the present invention.
图4是本发明概要流程图。Fig. 4 is a schematic flowchart of the present invention.
图5A是本发明实施例AI模式的净化效果图。Fig. 5A is a diagram of the purification effect of the AI mode of the embodiment of the present invention.
图5B是现有技术中自动模式的净化效果图。Fig. 5B is a purification effect diagram of the automatic mode in the prior art.
图6A是本发明实施例AI模式的净化效率图。Fig. 6A is a purification efficiency diagram of the AI mode of the embodiment of the present invention.
图6B是现有技术中自动模式的净化效率图。Fig. 6B is a diagram of the purification efficiency of the automatic mode in the prior art.
图7A是本发明实施例AI模式的能耗对比图-APP报告。FIG. 7A is a comparison chart of energy consumption in AI mode-APP report according to the embodiment of the present invention.
图7B是现有技术中自动模式的能耗对比图-APP报告。FIG. 7B is a comparison chart of energy consumption in the automatic mode in the prior art-APP report.
图7C是采用本发明的AI控制的周能耗报告第三方计量插座报告。FIG. 7C is a third-party metering outlet report for weekly energy consumption reporting using the AI control of the present invention.
图7D是现有技术自动控制的周能耗报告-第三方计量插座报告。Fig. 7D is a weekly energy consumption report automatically controlled in the prior art - a third-party metering socket report.
图7E是本发明AI控制和现有技术中自动模式分时控制的能耗对比图-第三方计量插座报告。Fig. 7E is a comparison chart of energy consumption between the AI control of the present invention and the automatic mode time-sharing control in the prior art - the third-party metering socket report.
具体实施方式Detailed ways
下面对照附图并结合优选的实施方式对本发明做进一步说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The present invention will be further described below with reference to the accompanying drawings and in combination with preferred embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.
需要说明的是,本实施例中的左、右、上、下、顶、底等方位用语,仅是互为相对概念,或是以产品的正常使用状态为参考的,而不应该认为是具有限制性的。It should be noted that the orientation terms such as left, right, up, down, top, and bottom in this embodiment are only relative concepts, or refer to the normal use status of the product, and should not be regarded as having restrictive.
通过大量的调研和测试,我们发现,现有技术中的方法效果不佳的问题在于没有考虑到如下因素:Through a lot of research and testing, we found that the problem of the poor effect of the methods in the prior art is that the following factors are not considered:
空间因素:用户家庭所处的位置;Space factor: the location of the user's family;
时间因素:用户使用空气净化器的时间(春、夏、秋、冬,白天还是夜晚)。Time factor: the time when the user uses the air purifier (spring, summer, autumn, winter, day or night).
这些因素,以及可能的其他因素,导致空气中污染物含量不同,成分不同,进一步导致了现有技术的空气净化效果不佳,造成:These factors, as well as possible other factors, lead to different levels and components of pollutants in the air, which further lead to the poor air purification effect of the prior art, resulting in:
1,净化效率不固定;1. The purification efficiency is not fixed;
2,净化效果不固定;2. The purification effect is not fixed;
3,功耗不稳定;3. Unstable power consumption;
4,没有办法脱离人工干预。4. There is no way to get rid of manual intervention.
本发明下述实施例单独使用空气净化器一个产品,进行AI强化学习和控制,不但操作简单,而且能提升净化效果。其基本构思是:根据空气净化器当前的状态S1,查找权值表,采用Q-Learning算法控制空气净化器开启不同的动作或组合动作A1;其中权值表是指在特定状态下动作与其所对应的权值的对照表;持续预定时间以后,获取在状态S1下执行动作A1工作的权值大小,写入权值表,实现对权值表的更新;后续以更新后的权值表执行步骤S1-S2,不断根据空气净化器当前的状态Sn执行不同的动作或组合动作An获取更新不同的权值,更新权值表;其中n为自然数。其中组合动作可以是不同档位的组合,或同一档位的间歇组合。其概要流程图如图4所示。The following embodiments of the present invention use an air purifier alone for AI-enhanced learning and control, which is not only easy to operate, but also improves the purification effect. The basic idea is: according to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to start different actions or combined actions A1; where the weight table refers to the action in a specific state and the action it is in. Corresponding weight comparison table; after a predetermined time, obtain the weight value of the action A1 in state S1, write it into the weight table, and realize the update of the weight table; follow-up execution with the updated weight table Steps S1-S2, continuously perform different actions or combined actions An to obtain and update different weights according to the current state Sn of the air purifier, and update the weight table; where n is a natural number. Wherein the combination action can be a combination of different gears, or an intermittent combination of the same gear. Its outline flowchart is shown in Fig. 4 .
相关概念定义说明如下:The definitions of related concepts are as follows:
Figure PCTCN2022126407-appb-000001
Figure PCTCN2022126407-appb-000001
本发明下述实施例中,用到Q值的初始化,学习时,其初始化Q值表如下表所示:In the following embodiments of the present invention, use the initialization of Q value, during learning, its initialization Q value table is as shown in the following table:
Q-Table(表中的10、20、30等数值表示权值)Q-Table (values such as 10, 20, 30 in the table represent weights)
Figure PCTCN2022126407-appb-000002
Figure PCTCN2022126407-appb-000002
其中,下表是实际控制时Q-Table中空气中污染物状态与空气净化器开启的档位权值的关系表的一个例子:Among them, the following table is an example of the relationship table between the state of pollutants in the air in the Q-Table and the gear weight of the air purifier during actual control:
空气中污染物状态标号State label of pollutants in the air 档位1权值Gear 1 weight 档位2权值Gear 2 weight 档位3权值Gear 3 weight
00 33 22 11
11 33 22 11
22 33 22 11
33 33 22 11
44 11 33 22
55 11 33 22
66 11 33 22
77 11 33 22
88 11 22 33
99 11 22 33
1010 11 22 33
1111 11 22 33
1212 11 22 33
1313 11 22 33
1414 11 22 33
1515 11 22 33
1616 11 22 33
1717 11 22 33
1818 11 22 33
1919 11 22 33
2020 11 22 33
21twenty one 11 22 33
22twenty two 11 22 33
23twenty three 11 22 33
24twenty four 11 22 33
2525 11 22 33
2626 11 22 33
2727 11 22 33
2828 11 22 33
2929 11 22 33
3030 11 22 33
3131 11 22 33
3232 11 22 33
3333 11 22 33
3434 11 22 33
3535 11 22 33
3636 11 22 33
3737 11 22 33
3838 11 22 33
3939 11 22 33
4040 11 22 33
4141 11 22 33
4242 11 22 33
4343 11 22 33
4444 11 22 33
4545 11 22 33
4646 11 22 33
4747 11 22 33
4848 11 22 33
4949 11 22 33
5050 11 22 33
5151 11 22 33
5252 11 22 33
5353 11 22 33
5454 11 22 33
5555 11 22 33
5656 11 22 33
5757 11 22 33
5858 11 22 33
5959 11 22 33
6060 11 22 33
6161 11 22 33
6262 11 22 33
6363 11 22 33
6464 11 22 33
6565 11 22 33
6666 11 22 33
6767 11 22 33
6868 11 22 33
6969 11 22 33
7070 11 22 33
7171 11 22 33
7272 11 22 33
7373 11 22 33
7474 11 22 33
7575 11 22 33
7676 11 22 33
7777 11 22 33
7878 11 22 33
7979 11 22 33
8080 11 22 33
8181 11 22 33
8282 11 22 33
8383 11 22 33
8484 11 22 33
8585 11 22 33
8686 11 22 33
8787 11 22 33
8888 11 22 33
8989 11 22 33
9090 11 22 33
9191 11 22 33
9292 11 22 33
9393 11 22 33
9494 11 22 33
9595 11 22 33
9696 11 22 33
9797 11 22 33
9898 11 22 33
9999 11 22 33
其中第一列0-99数值是PM2.5的值除以10取整得到的数据,表示的是从PM值从0-1000之间的多个读数档。The 0-99 value in the first column is the data obtained by dividing the PM2.5 value by 10 and rounding up, indicating multiple reading files from the PM value from 0-1000.
本实施例的方法流程如图1所示,其算法可以在本地执行,也可以在云端执行,云服务器构成如图2所示。The method flow of this embodiment is shown in FIG. 1 , and the algorithm can be executed locally or in the cloud. The composition of the cloud server is shown in FIG. 2 .
本实施例的方法介绍如下:The method of this embodiment is introduced as follows:
一、训练1. Training
根据空气净化器当前的状态S1,通过让空气净化器开启不同的组合动作A1,持续一段时间以后(时间和空间的大小有关系,如下所述),观察当前控制是否对空气质量进行了有效的过滤,从而获取在S1下,执行A1工作的权值大小,不断根据执行不同的动作获取更新不同的权值写入Q-table。According to the current state S1 of the air purifier, by letting the air purifier turn on different combination actions A1, after a period of time (time and space are related, as described below), observe whether the current control has an effective effect on the air quality Filtering, so as to obtain the weight value of performing A1 work under S1, and continuously obtain and update different weights according to the execution of different actions and write them into the Q-table.
Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))
表达式含义简介如下:The meaning of the expression is introduced as follows:
s,a,next_s——当前状态,当前动作,下一个状态;s, a, next_s - current state, current action, next state;
reward——奖励,执行a动作的奖励;reward——reward, the reward for performing an action;
Q[s][a]——价值,状态s下,动作a产生的价值;Q[s][a]——value, the value generated by action a in state s;
max(Q[next_s])——最大价值,下一个状态下,所有动作价值的最大值;max(Q[next_s])——Maximum value, the maximum value of all actions in the next state;
lr——学习速率(learning_rate),lr越大,保留之前训练效果越少;lr为0,Q[s,a]lr——learning rate (learning_rate), the larger the lr, the less the training effect before the reservation; lr is 0, Q[s,a]
值不变;lr为1时,完全抛弃了原来的值;The value remains unchanged; when lr is 1, the original value is completely discarded;
factor——折扣因子(discount_factor),factor越大,表示越重视历史的经验;factorfactor——discount factor (discount_factor), the larger the factor, the more attention to historical experience; factor
为0时,只关心当前利益(reward)。When it is 0, only the current benefit (reward) is concerned.
其中,当前状态执行了控制指令后的reward数值跟空气污染物的降低成正比,跟执行的档位成反比(档位数值大,意味着空气净化器开启的功率比较大),算法举例如下:Among them, the reward value after executing the control command in the current state is directly proportional to the reduction of air pollutants, and inversely proportional to the executed gear (a larger gear value means that the power of the air purifier is higher). The algorithm is as follows:
reward=(执行档位前的污染物浓度-执行该档位一段时间后的污染物浓度)/执行的档位;reward=(pollutant concentration before executing the gear - pollutant concentration after executing the gear for a period of time)/executed gear;
此处“一段时间”的确定方法如下:用户输入的不是时间,是房间的面积和层高,因此需根据面积和层高来确定时间。大致在过滤到房间空气容量的大概2/3的时候。举例说明如下:假设空气净化器有1,2,3三个档位,每个档位输出洁净空气的速度是不一样的,一档是100立方米每小时,二档是200立方米每小时,三档是330立方米每小时。一小时是3600秒,100除以3600就能得到每秒钟输出的洁净空气的体积;如果房间是20平方米,层高是3米,那么空间大概的是60立方米的空气容积,如果不考虑室内外空气的置换,我们过滤大概2/3的时候,会观察到空气污染物含量的明显降低,等待的时间就是60立方米的2/3然后除以当前档位空气过滤的速度。The determination method of "a period of time" here is as follows: what the user enters is not the time, but the area and floor height of the room, so the time needs to be determined according to the area and floor height. Roughly when it is filtered to about 2/3 of the air capacity of the room. An example is as follows: Assume that the air purifier has three gears of 1, 2, and 3, and the output speed of each gear is different. The first gear is 100 cubic meters per hour, and the second gear is 200 cubic meters per hour. , the third gear is 330 cubic meters per hour. One hour is 3600 seconds, divide 100 by 3600 to get the volume of clean air output per second; if the room is 20 square meters, the floor height is 3 meters, then the space is about 60 cubic meters of air volume, if not Considering the replacement of indoor and outdoor air, when we filter about 2/3, we will observe a significant reduction in the content of air pollutants. The waiting time is 2/3 of 60 cubic meters and then divided by the air filtration speed of the current gear.
二、AI控制2. AI control
根据每次空气净化器的状态在Q-table中找到最高的权值对应的动作。其中对空气净化器进行AI控制权值的查找方式:对状态进行细分,状态中当前污染物的浓度,用污染物浓度除以10取整,35/10状态就是3。According to the status of each air purifier, find the action corresponding to the highest weight in the Q-table. Among them, the way to search for the AI control weight of the air purifier is to subdivide the state, the concentration of the current pollutant in the state, divide the concentration of the pollutant by 10 and round up, and the state of 35/10 is 3.
对比实验数据Compare experimental data
本实施例中PM通道有:pm1、pm2.5,和pm10。传感器技术指示如下表:PM channels in this embodiment include: pm1, pm2.5, and pm10. The sensor technical instructions are as follows:
参数parameter 指标index 单位unit
颗粒物测量范围1 Particle measurement range 1 0.1~1.00.1~1.0 微米(μm)Micron (μm)
颗粒物测量范围2Particle measurement range 2 1.0~2.51.0~2.5 微米(μm)Micron (μm)
颗粒物测量范围3Particle measurement range 3 2.5~102.5~10 微米(μm)Micron (μm)
本申请利用pm1、pm2.5和pm10之间的天然关联性,将三者同测,实验证明能实现空气污染物含量的高效及准确测定。现有技术通常只关心pm2.5的值,由于不同地区、不同季节甚至不同天气时,这三者的关联方式并不相同,因此,现有技术无法设计出一个这样的关联策略,它能通过pm1和pm10的测量反过来促进pm2.5的净化效果。This application utilizes the natural correlation between pm1, pm2.5 and pm10 to measure the three at the same time, and the experiment proves that it can realize the efficient and accurate determination of the content of air pollutants. The existing technology usually only cares about the value of pm2.5. Since the correlation methods of the three are different in different regions, different seasons and even different weathers, the existing technology cannot design such a correlation strategy. It can pass The measurement of PM1 and PM10 in turn promotes the purification effect of PM2.5.
具体的实现流程图见图1:The specific implementation flow chart is shown in Figure 1:
1.对Q-table进行初始化1. Initialize the Q-table
Q-table初始化,有几种方式:随机、有序、基础经验初始化,其中随机和有序的方式都有可能造成用户初期的体验不佳;我们采用的是按照自动方式的操作,进行初始化,(这里不是指现有技术中的第二种调节方式,而是Q-learing学习之前自动给出的一个适应与空气净化器场景的初始值);There are several ways to initialize Q-table: random, orderly, and basic experience initialization. Both random and orderly methods may cause poor user experience at the beginning; we use automatic operation to initialize, (This does not refer to the second adjustment method in the prior art, but an initial value automatically given before Q-learning learning to adapt to the scene of the air purifier);
2.单点数据采集2. Single-point data acquisition
采集当前空气净化器在当前档位工作对空气质量造成的影响;主要考量一下几个因素:Collect the impact of the current air purifier working in the current gear on the air quality; mainly consider the following factors:
空气净化器喷射的空气会对空间的空气流动造成影响;The air injected by the air purifier will affect the air flow in the space;
对于改变档位对空气质量造成的影响,需要一段时间后空气净化器才能稳定的感知到;这个时间和空气净化器所处的空间大小有关系(如前述),用户可以通过app手动输入;For the impact of changing gears on air quality, it takes a period of time for the air purifier to perceive it stably; this time is related to the size of the space where the air purifier is located (as mentioned above), and the user can manually input it through the app;
3.对采集的数据进行分析,给出下一步控制指令;3. Analyze the collected data and give the next step control instructions;
当前档位造成的影响可以理解为我们对当前档位进行效果评估的一个参数,(和档位成反比,和空气净化效果成正比);The impact caused by the current gear can be understood as a parameter for us to evaluate the effect of the current gear (inversely proportional to the gear and directly proportional to the air purification effect);
计算这次AI操作得到的Q-table,并更新数据;Calculate the Q-table obtained by this AI operation, and update the data;
重新获取当前的污染物数据,如果污染物没有控制到理想程度(污染物浓度降低到各国要求的优质范围之内,比如中国pm2.5,0-30是优),则查询Q-table,获取目前需要下发的档位控制指令,跳转到2,等待结果;如果污染物已经控制到理想范围,则结束本次训练,保存Q-table,开启AI预测模式;Re-acquire the current pollutant data. If the pollutants are not controlled to the ideal level (the concentration of pollutants is reduced to the high-quality range required by various countries, such as China's pm2.5, 0-30 is excellent), then query the Q-table to obtain For the gear position control command that needs to be issued at present, jump to 2 and wait for the result; if the pollutants have been controlled to the ideal range, end this training, save the Q-table, and turn on the AI prediction mode;
4.AI预测模式评估模块4. AI prediction mode evaluation module
对AI预测模式下,空气质量的变化进行统计,12个小时内,当前的AI模型没有办法对空气质量形成好的促进(同时出现以下两种情况),则需要重新开始训练:当前的预测模式下,获取到的reward全是负向,并且空气污染物含量已经超出了安全区间。Make statistics on the changes in air quality under the AI prediction mode. Within 12 hours, if the current AI model has no way to promote the air quality (the following two situations occur at the same time), you need to restart the training: the current prediction mode In this case, the obtained rewards are all negative, and the air pollutant content has exceeded the safe range.
算法的上几种实现方式Several implementations of the algorithm
空气净化器的操作具有以下特点:空气质量的反馈具有滞后性,需要执行一段时间后才能看到效果;空气净化器对于空气的过滤具有阶段性,在一定阶段这种运行是最好的,当时间或者空间中的因素发生变化,就不一定了。The operation of the air purifier has the following characteristics: the feedback of the air quality is hysteresis, and it takes a period of time to see the effect; It is not necessarily the case that factors in time or space change.
所以我们没有用监督学习,因为我们不能明确地取得空气净化器执行某个指令后的好或者坏的定义,强化学习显然更符合这个场景,那么强化去学习中需要解决的:So we didn't use supervised learning, because we can't clearly get the definition of good or bad after the air purifier executes a certain instruction. Reinforcement learning is obviously more suitable for this scenario, so what needs to be solved in reinforcement learning:
Q-Learning是强化学习算法中value-based的算法,Q即为Q(s,a),就是在某一个时刻的state状态下,采取动作a能够获得收益的期望,环境会根据agent的动作反馈相应的reward奖赏,所以算法的主要思想就是将state和action构建成一张Q_table表来存储Q值,然后根据Q值来选取能够获得最大收益的动作。Q-Learning is a value-based algorithm in the reinforcement learning algorithm. Q is Q(s, a), which is the expectation that taking action a in the state at a certain moment can obtain benefits, and the environment will give feedback based on the action of the agent. Corresponding reward rewards, so the main idea of the algorithm is to construct a Q_table table to store the state and action to store the Q value, and then select the action that can obtain the maximum benefit according to the Q value.
Q-learning的主要优势就是使用了时间差分法(融合了蒙特卡洛和动态规划)能够进行off-policy的学习,使用贝尔曼方程可以对马尔科夫过程求解最优策略。The main advantage of Q-learning is that it uses the time difference method (integrating Monte Carlo and dynamic programming) to learn off-policy, and uses the Bellman equation to solve the optimal strategy for the Markov process.
软硬件构成图如图3所示。The hardware and software composition diagram is shown in Figure 3.
本发明的方法实施实现如下优点:Method implementation of the present invention realizes following advantage:
1、净化效果得到了提升;1. The purification effect has been improved;
相同非封闭环境,进行为期一周的测试,同一台净化器,分别使用AI模式和自动模式进行测试对比如下:如图5A-5B所示,采用本发明的AI控制能把PM2.5的值控制在更低水平,净化效果得到了提升。其中图5A是AI模式,图5B是自动模式。In the same non-enclosed environment, a one-week test was carried out, and the same purifier was tested and compared using AI mode and automatic mode respectively as follows: As shown in Figure 5A-5B, the AI control of the present invention can control the value of PM2.5 At lower levels, the cleansing effect is boosted. Figure 5A is the AI mode, and Figure 5B is the automatic mode.
2、净化效率提升:2. Purification efficiency improvement:
同等封闭的环境,进行人为放入污染物,效率对比如下:如图6A-6B所示,采用本发明的AI控制能更快的降低PM2.5的含量,净化效率得到了提升。其中图6A是AI模式,图6B是自动模式。In the same closed environment, artificially put in pollutants, the efficiency comparison is as follows: as shown in Figure 6A-6B, the AI control of the present invention can reduce the content of PM2.5 faster, and the purification efficiency has been improved. Figure 6A is the AI mode, and Figure 6B is the automatic mode.
3、功耗降低:3. Reduced power consumption:
综上两种情况下,功耗对比如下:如图7A-7B是APP内能耗报告,明显看出采用本发明的AI模式比采用传统模式更加节电;同时,从图7C-7E的第三方计电量插座报告也显示,采用本发明的AI模式比采用传统模式更加节电。其中,图7C是采用本发明的AI控制的周能耗报告,图7D是采用现有技术自动控制的周能耗报告,图7E中,10:00-11:00记录的是AI开启的能耗,14:00-15:00记录的是自动开启的能耗。To sum up, the comparison of power consumption in the above two cases is as follows: Figure 7A-7B shows the energy consumption report in the APP, it is obvious that the AI mode of the present invention is more power-saving than the traditional mode; at the same time, from Figure 7C-7E The report of the tripartite power meter socket also shows that the AI mode of the present invention is more energy-saving than the traditional mode. Among them, Fig. 7C is the weekly energy consumption report using the AI control of the present invention, and Fig. 7D is the weekly energy consumption report using the prior art automatic control. Consumption, 14:00-15:00 records the energy consumption of automatic opening.
4、无需人手控制4. No manual control
两种情况下人手动控制的次数记录显示,连续三周无人手控制的记录。The records of the number of times of human control in the two cases showed that there was no record of manual control for three consecutive weeks.
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的技术人员来说,在不脱离本发明构思的前提下,还可以做出若干等同替代或明显变型,而且性能或用途相同,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art to which the present invention belongs, several equivalent substitutions or obvious modifications can be made without departing from the concept of the present invention, and those with the same performance or use should be deemed to belong to the protection scope of the present invention.

Claims (10)

  1. 一种基于强化学习模型的空气净化器调节方法,其特征在于,包括训练阶段和控制阶段,训练阶段如下步骤:An air purifier adjustment method based on reinforcement learning model, is characterized in that, comprises training phase and control phase, and training phase is as follows:
    S1、根据空气净化器当前的状态S1,查找权值表,采用Q-Learning算法控制空气净化器开启不同的动作或组合动作A1;其中权值表是指在特定状态下动作与其所对应的权值的对照表;S1. According to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to start different actions or combined actions A1; where the weight table refers to the action in a specific state and its corresponding weight comparison table of values;
    S2、持续预定时间以后,获取在状态S1下执行动作或组合动作A1工作的权值大小,写入权值表,实现对权值表的更新;其中“预定时间”根据房间的面积和层高确定;S2. After the predetermined time, obtain the weight value of the action or combined action A1 in the state S1, write it into the weight table, and realize the update of the weight table; the "scheduled time" is based on the area and floor height of the room Sure;
    控制阶段包括如下步骤:The control phase includes the following steps:
    S3、以更新后的权值表执行步骤S1;S3. Execute step S1 with the updated weight table;
    其中,所述状态根据S1根据空气净化器当前的执行档位,以及空气净化器自身所能测得的当前空气中的参数确定。Wherein, the state is determined according to S1 according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure.
  2. 如权利要求1所述的基于强化学习模型的空气净化器调节方法,其特征在于:在训练阶段,不断执行步骤S1-S2,并判断本次的训练是否达到了预期的目的的标准,如果达到,就结束本次训练。The air purifier adjustment method based on the reinforcement learning model as claimed in claim 1, characterized in that: in the training phase, steps S1-S2 are continuously performed, and it is judged whether the training has reached the standard of the expected purpose, if it is reached , the training ends.
  3. 如权利要求1所述的基于强化学习模型的空气净化器调节方法,其特征在于;在步骤S3中,还判断之前的训练模型是否已经不能满足当前空气动力模型的需求,如是,则开始重新训练,根据空气净化器当前的状态Sn执行不同的动作或组合动作An获取更新不同的权值,更新权值表;其中n为自然数;所述状态根据Sn根据空气净化器当前的执行档位,以及空气净化器自身所能测得的当前空气中的参数确定。The method for adjusting an air purifier based on a reinforcement learning model according to claim 1, wherein in step S3, it is also judged whether the previous training model can no longer meet the requirements of the current aerodynamic model, and if so, retraining is started According to the current state Sn of the air purifier, perform different actions or combined actions An to obtain and update different weights, and update the weight table; wherein n is a natural number; the state is based on Sn according to the current execution gear of the air purifier, and The parameters in the current air that can be measured by the air purifier itself are determined.
  4. 如权利要求3所述的基于强化学习模型的空气净化器调节方法,其特征在于:针对每一个状态Sn,权值表中给出多个动作所对应的权值,控制时,根据每次空气净化器的状态在Q-table中找到最高的权值对应的动作来进行空气净化器控制。The air purifier adjustment method based on reinforcement learning model as claimed in claim 3, characterized in that: for each state Sn, the weights corresponding to multiple actions are given in the weight table, during control, according to each air For the state of the purifier, find the action corresponding to the highest weight in the Q-table to control the air purifier.
  5. 如权利要求1所述的基于强化学习模型的空气净化器调节方法,其特征在于,权值的计算方式如下:The air purifier adjustment method based on reinforcement learning model as claimed in claim 1, is characterized in that, the calculation method of weight is as follows:
    Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))
    表达式含义如下:The meaning of the expression is as follows:
    s,a,next_s分别表示:当前状态,当前动作,下一个状态;s, a, next_s respectively represent: current state, current action, next state;
    reward表示奖励,即执行a动作的奖励;reward means reward, that is, the reward for performing action a;
    Q[s][a]表示价值,即在状态s下,动作a产生的价值;Q[s][a] represents value, that is, the value generated by action a in state s;
    max(Q[next_s])表示最大价值,即在下一个状态下,所有动作价值的最大值;max(Q[next_s]) indicates the maximum value, that is, the maximum value of all actions in the next state;
    lr表示学习速率(learning_rate),lr越大,保留之前训练效果越少;lr为0时,Q[s,a]值不变;lr为1时,完全抛弃了原来的值;lr represents the learning rate (learning_rate), the larger the lr, the less the training effect before it is retained; when lr is 0, the value of Q[s,a] remains unchanged; when lr is 1, the original value is completely discarded;
    factor表示折扣因子(discount_factor),factor越大,表示越重视历史的经验;factor为0时,只关心当前利益(reward)。factor represents the discount factor (discount_factor). The larger the factor is, the more attention is paid to historical experience; when the factor is 0, only the current benefit (reward) is concerned.
  6. 权利要求5所述的基于强化学习模型的空气净化器调节方法,其特征在于,当前状态执行了控制指令后的reward数值跟空气污染物的降低成正比,跟执行的档位成反比。The air purifier adjustment method based on the reinforcement learning model according to claim 5, characterized in that the reward value after the control command is executed in the current state is proportional to the reduction of air pollutants and inversely proportional to the executed gear.
  7. 权利要求6所述的基于强化学习模型的空气净化器调节方法,其特征在于,reward数值的算法如下:The air purifier adjustment method based on the reinforcement learning model according to claim 6, wherein the algorithm of the reward value is as follows:
    reward=(执行档位前的污染物浓度-执行该档位一段时间后的空气污染物浓度)/执行的档位;reward=(concentration of pollutants before executing the gear - concentration of air pollutants after executing the gear for a period of time)/executed gear;
    其中“一定时间”根据房间的面积和层高确定。Among them, the "certain time" is determined according to the area and floor height of the room.
  8. 权利要求7所述的基于强化学习模型的空气净化器调节方法,其特征在于,所述空气污染物浓度根据如下污染物的含量确定:pm1、pm2.5和pm10。The air purifier adjustment method based on reinforcement learning model according to claim 7, characterized in that said air pollutant concentration is determined according to the following pollutant contents: pm1, pm2.5 and pm10.
  9. 权利要求1所述的基于强化学习模型的空气净化器调节方法,其特征在于,所述Q-Learning算法在云端执行,空气净化器通过互联网和云端通讯。The method for adjusting an air purifier based on a reinforcement learning model according to claim 1, wherein the Q-Learning algorithm is executed in the cloud, and the air purifier communicates with the cloud through the Internet.
  10. 一种空气净化器,包括主控单元、空气传感器单元,以及存储器,所述存储器中存储有计算机软件,所述计算机软件可被执行以实现如权利要求1-9所述的方法。An air purifier, comprising a main control unit, an air sensor unit, and a memory, wherein computer software is stored in the memory, and the computer software can be executed to realize the method according to claims 1-9.
PCT/CN2022/126407 2021-11-26 2022-10-20 Air purifier adjusting method based on reinforcement learning model, and air purifier WO2023093388A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111418292.7 2021-11-26
CN202111418292.7A CN113834200A (en) 2021-11-26 2021-11-26 Air purifier adjusting method based on reinforcement learning model and air purifier

Publications (1)

Publication Number Publication Date
WO2023093388A1 true WO2023093388A1 (en) 2023-06-01

Family

ID=78971617

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126407 WO2023093388A1 (en) 2021-11-26 2022-10-20 Air purifier adjusting method based on reinforcement learning model, and air purifier

Country Status (2)

Country Link
CN (1) CN113834200A (en)
WO (1) WO2023093388A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117398817A (en) * 2023-11-30 2024-01-16 中山市凌宇机械有限公司 Intensive compressed air purification system and method
CN117666378A (en) * 2024-02-01 2024-03-08 天津市品茗科技有限公司 Intelligent household system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113834200A (en) * 2021-11-26 2021-12-24 深圳市愚公科技有限公司 Air purifier adjusting method based on reinforcement learning model and air purifier

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108419439A (en) * 2017-05-22 2018-08-17 深圳微自然创新科技有限公司 Housed device learning method and server
CN108981115A (en) * 2018-07-23 2018-12-11 珠海格力电器股份有限公司 A kind of air purifier and control method based on machine learning
CN109827292A (en) * 2019-01-16 2019-05-31 珠海格力电器股份有限公司 Construction method, control method, the household electrical appliances of household electrical appliances adaptive power conservation Controlling model
US20210063036A1 (en) * 2019-08-29 2021-03-04 Lg Electronics Inc. Air purifier and operating method of the same
WO2021208771A1 (en) * 2020-04-18 2021-10-21 华为技术有限公司 Reinforced learning method and device
CN113551373A (en) * 2021-07-19 2021-10-26 江苏中堃数据技术有限公司 Data center air conditioner energy-saving control method based on federal reinforcement learning
CN113834200A (en) * 2021-11-26 2021-12-24 深圳市愚公科技有限公司 Air purifier adjusting method based on reinforcement learning model and air purifier

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
CN105843037B (en) * 2016-04-11 2019-05-10 中国科学院自动化研究所 Intelligent building temprature control method based on Q study
CN107065582B (en) * 2017-03-31 2023-09-29 苏州科技大学 Indoor air intelligent adjusting system and method based on environment parameters
EP3835895A1 (en) * 2019-12-13 2021-06-16 Tata Consultancy Services Limited Multi-agent deep reinforcement learning for dynamically controlling electrical equipment in buildings
CN111126605B (en) * 2020-02-13 2023-06-20 创新奇智(重庆)科技有限公司 Data center machine room control method and device based on reinforcement learning algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108419439A (en) * 2017-05-22 2018-08-17 深圳微自然创新科技有限公司 Housed device learning method and server
CN108981115A (en) * 2018-07-23 2018-12-11 珠海格力电器股份有限公司 A kind of air purifier and control method based on machine learning
CN109827292A (en) * 2019-01-16 2019-05-31 珠海格力电器股份有限公司 Construction method, control method, the household electrical appliances of household electrical appliances adaptive power conservation Controlling model
US20210063036A1 (en) * 2019-08-29 2021-03-04 Lg Electronics Inc. Air purifier and operating method of the same
WO2021208771A1 (en) * 2020-04-18 2021-10-21 华为技术有限公司 Reinforced learning method and device
CN113551373A (en) * 2021-07-19 2021-10-26 江苏中堃数据技术有限公司 Data center air conditioner energy-saving control method based on federal reinforcement learning
CN113834200A (en) * 2021-11-26 2021-12-24 深圳市愚公科技有限公司 Air purifier adjusting method based on reinforcement learning model and air purifier

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117398817A (en) * 2023-11-30 2024-01-16 中山市凌宇机械有限公司 Intensive compressed air purification system and method
CN117398817B (en) * 2023-11-30 2024-04-30 中山市凌宇机械有限公司 Intensive compressed air purification system and method
CN117666378A (en) * 2024-02-01 2024-03-08 天津市品茗科技有限公司 Intelligent household system
CN117666378B (en) * 2024-02-01 2024-04-09 天津市品茗科技有限公司 Intelligent household system

Also Published As

Publication number Publication date
CN113834200A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
WO2023093388A1 (en) Air purifier adjusting method based on reinforcement learning model, and air purifier
WO2020107851A1 (en) Low-cost commissioning method and system for air conditioning system based on existing large-scale public building
CN104214872B (en) Multifunctional indoor air purification system and control method thereof
CN107065582B (en) Indoor air intelligent adjusting system and method based on environment parameters
KR20170096730A (en) System and method of controlling air quality, and analyzing server
CN105318499A (en) User behavior self-learning air conditioning system and control method thereof
CN104482634A (en) Indoor air quality multi-parameter comprehensive control system
WO2010066076A1 (en) Energy saving air-conditioning control system based on predicted mean vote and method thereof
CN102980276A (en) Intelligent fresh air system for energy saving of base station
CN112283890A (en) Cold and heat quantity control method and device suitable for building heating and ventilation equipment monitoring system
CN105180370A (en) Intelligent control method and device for air conditioner and air conditioner
CN113091240A (en) Air conditioner control method, air conditioner control device, air conditioner, storage medium and program product
CN111998505B (en) Energy consumption optimization method and system for air conditioning system in general park based on RSM-Kriging-GA algorithm
CN114659237B (en) Air conditioner energy efficiency supervision method based on Internet of things
WO2020108666A1 (en) Method, device, and computer storage medium for purification control in air purification system
CN106230002B (en) A kind of air conditioner load demand response method based on index rolling average
CN108122067B (en) Modeling method and system for building demand response dynamic process
CN114838488A (en) Method and device for linkage control of intelligent household appliances, air conditioner and storage medium
CN113357768B (en) Air conditioner control method and device, electronic equipment and storage medium
Yuan et al. Two-level collaborative demand-side management for regional distributed energy system considering carbon emission quotas
CN203298441U (en) Centralized air conditioner control system
CN111380161A (en) Air conditioner operation mode adjusting method and device and air conditioner
CN111351174B (en) Control method and device of air conditioner, air conditioner and storage medium
CN110928188B (en) Air storage control method of air compressor
CN205540121U (en) Comprehensive power consumption measure and control management terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897471

Country of ref document: EP

Kind code of ref document: A1