WO2023093388A1

WO2023093388A1 - Air purifier adjusting method based on reinforcement learning model, and air purifier

Info

Publication number: WO2023093388A1
Application number: PCT/CN2022/126407
Authority: WO
Inventors: 鲁峰
Original assignee: 深圳市愚公科技有限公司
Priority date: 2021-11-26
Filing date: 2022-10-20
Publication date: 2023-06-01
Also published as: CN113834200A

Abstract

An air purifier adjusting method based on a machine learning algorithm-reinforcement learning model, and an air purifier using the method. The adjusting method comprises a training stage and a control stage. The training stage comprises the following steps: S1, looking up in a weight table according to the current state S1 of an air purifier, and controlling, by using a Q-learning algorithm, the air purifier to start different actions or combined actions A1, wherein the weight table refers to a comparison table of actions in a specific state and weights corresponding to the actions; and S2, acquiring, after lasting for a predetermined time, the magnitudes of weights of operations of executing the actions or combined actions A1 in the state S1, and writing the weights into the weight table to update the weight table, wherein the "predetermined time" is determined according to the area and storey height of a room. The control stage comprises the following step: S3, executing step S1 according to the updated weight table, wherein the state S1 is determined according to the current executed gear of the air purifier and parameters of the current air which can be measured by the air purifier.

Description

Air purifier adjustment method and air purifier based on reinforcement learning model

technical field

The invention relates to an air purifier adjustment method based on a machine learning algorithm-a reinforcement learning model, and an air purifier adopting the method.

Background technique

Traditional air purifiers have two adjustment methods:

A. User manual control is required - the user manually triggers the air purification by observing a series of factors such as the actual PM2.5 value on the air purifier or the app connected to the air purifier, his own work and rest time, whether to open the window, etc. Switch the control position of the controller to a high or low position or go directly to sleep;

B. Use automatic control - through the pollutant sensor carried by the air purifier itself, set the threshold according to the national standard definition of pollutant concentration. When the pollutant concentration is greater than or exceeds the threshold, the air purifier will automatically increase the gear. When When the concentration is lower than the threshold, the gear will be automatically lowered.

Although in theory, this set of control logic seems to be correct: as the air quality gets worse, the automatic control will adjust the purification power. The worse the air quality, the higher the gear; the better the air quality, the lower the gear. In this way, while having a strong filtering ability, two important factors of energy saving and convenience are taken into account. But the actual situation is not the case. According to the automatic mode, the air quality can be quickly filtered from bad to excellent in the laboratory. However, the inventor found through the big data statistics of the sold Internet of Things products:

A. Only less than 60% of the air purifiers in the user's home can adjust the air to an optimal level;

B. The same user cannot guarantee 100% that the air quality will always be kept at an optimal level when the location and environment remain unchanged.

In the Chinese patent application document with the publication number CN107065582A and titled "An Intelligent Indoor Air Adjustment System and Adjustment Method Based on Environmental Parameters", an intelligent indoor air adjustment system and adjustment method based on environmental parameters are disclosed. Including window control terminal, air conditioning system, air purification system, distributed Wi-Fi network, data acquisition module and processing module, window control terminal, air conditioning system, air purification system, data acquisition module through distributed Wi-Fi network and The processing module is connected. The regulation system in this document collects environmental parameters through the indoor temperature and humidity sensor, _CO2 concentration sensor and indoor PM2.5 sensor, and uploads the data to the local server processing module through the distributed Wi-Fi network. Upload the data to the local server processing module, and then the local server processing module analyzes and calculates through a method based on reinforcement learning based on the obtained data and the set temperature and humidity, and gives the window control terminal, air conditioning system, and air purification equipment. Actions to change the indoor air conditions, in order to achieve the improvement of indoor air quality with the lowest energy consumption.

In this document, the control of the air purification equipment is linked to the window control terminal and the air conditioning system. Although it can achieve the improvement of indoor air quality with the lowest energy consumption to a certain extent, the control is complicated, and the window needs to be considered. Control terminals, air-conditioning systems and other factors do not belong to the same equipment as air purifiers, which makes installation and use extremely inconvenient for users, and turns a simple purchase of household appliances into a small interior decoration project.

Moreover, the control effect of the technical solution of this document is still not good.

Contents of the invention

In order to make up for the above-mentioned deficiencies in the prior art, the present invention proposes an air purifier adjustment method based on a reinforcement learning model and an air purifier to improve the air purification effect.

Technical problem of the present invention is solved by following technical scheme:

1. An air purifier adjustment method based on a reinforcement learning model, characterized in that it comprises the following steps: S1, according to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to open different The action or combination action A1; where the weight table refers to the comparison table between the action and its corresponding weight in a specific state; S2, after a predetermined time, obtain the weight of performing the action or combination action A1 in the state S1 Size, written into the weight table, to realize the update of the weight table; wherein the "predetermined time" is determined according to the area and floor height of the room; S3, follow-up with the updated weight table to perform step S1; wherein, the state is based on S1 is determined according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure. Steps S1-S2 are the training phase, and step S3 is the control phase.

In some embodiments, the following improvements are also included:

In the training phase, steps S1-S2 are continuously performed, and it is judged whether the current training has reached the standard of the expected purpose, and if so, the current training is ended.

In step S3, it is also judged whether the previous training model can no longer meet the needs of the current aerodynamic model, if so, then start retraining, perform different actions or combined actions An to obtain updated different weights according to the current state Sn of the air purifier value, update the weight table; where n is a natural number; the state is determined according to Sn according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure.

For each state Sn, the weights corresponding to multiple actions are given in the weight table. During control, the action corresponding to the highest weight is found in the Q-table according to the state of the air purifier each time to perform the air purifier. control.

The state is determined according to S1 and Sn according to the current execution gear of the air cleaner and the current pollutant content in the air.

The weights are calculated as follows:

Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))

The meaning of the expression is as follows:

s, a, next_s represent the current state, current action, and next state respectively;

reward means reward, that is, the reward for performing action a;

Q[s][a] represents value, that is, the value generated by action a in state s;

max(Q[next_s]) indicates the maximum value, that is, the maximum value of all actions in the next state;

lr represents the learning rate (learning_rate), the larger the lr, the less the training effect before it is retained; when lr is 0, the value of Q[s, a] remains unchanged; when lr is 1, the original value is completely discarded;

factor represents the discount factor (discount_factor). The larger the factor is, the more attention is paid to historical experience; when the factor is 0, only the current benefit (reward) is concerned.

The reward value after executing the control command in the current state is directly proportional to the reduction of air pollutants and inversely proportional to the executed gear.

The algorithm of the reward value is as follows:

reward=(pollutant concentration before executing the gear - pollutant concentration after executing the gear for a period of time)/executed gear;

Among them, the "certain time" is determined according to the area and floor height of the room.

The status of the air purifier is jointly determined according to the gear position of the air purifier and the content of air pollutants.

The air pollutant content is determined according to the following pollutant content: pm1, pm2.5 and pm10.

The Q-Learning algorithm is executed on the cloud, and the air purifier communicates with the cloud through the Internet.

The present invention also includes an air purifier, including a main control unit, an air sensor unit, and a memory, wherein computer software is stored in the memory, and the computer software can be executed to realize the above-mentioned method.

Compared with the prior art, the beneficial effects of the present invention include: the present invention constantly monitors the content of air pollutants, so that the gear of the air purifier can always be adjusted to the most suitable gear instead of always on the maximum gear . The improvement of the purification effect can be achieved through a single device without the cooperation of additional devices.

Show by experiment, in some embodiments, the present invention has obtained following technical effect:

a. The effect is improved. About 65% of the air purifiers can filter to a good effect before, and after using the smart model, more than 85% can achieve a good purification effect;

b. Efficiency improvement, the same space and pollutants, the purification efficiency increased by 20%;

c. Reduced energy consumption, which is more than 30% lower than before;

d. Reduced manual intervention, easy to operate.

Since the present invention can automatically adjust according to the purification effect, its adjustment can only be determined according to air parameters such as the content of air pollutants, and the influence of factors such as air conditioners and window controls can be reflected in the measurement results of air pollutants, etc., so there is no need to Consider factors such as air conditioners and window controls, so that the air purification effect can be improved only by the air purifier itself, without the need for coordinated action or parameter provision of additional devices, and it can also take into account other factors such as time factors and space factors. Really simple and efficient effect.

Description of drawings

Fig. 1 is an implementation flow chart of a specific embodiment of the present invention.

FIG. 2 is a configuration diagram of a cloud server according to an embodiment of the present invention.

Fig. 3 is a hardware diagram of an embodiment of the present invention.

Fig. 4 is a schematic flowchart of the present invention.

Fig. 5A is a diagram of the purification effect of the AI mode of the embodiment of the present invention.

Fig. 5B is a purification effect diagram of the automatic mode in the prior art.

Fig. 6A is a purification efficiency diagram of the AI mode of the embodiment of the present invention.

Fig. 6B is a diagram of the purification efficiency of the automatic mode in the prior art.

FIG. 7A is a comparison chart of energy consumption in AI mode-APP report according to the embodiment of the present invention.

FIG. 7B is a comparison chart of energy consumption in the automatic mode in the prior art-APP report.

FIG. 7C is a third-party metering outlet report for weekly energy consumption reporting using the AI control of the present invention.

Fig. 7D is a weekly energy consumption report automatically controlled in the prior art - a third-party metering socket report.

Fig. 7E is a comparison chart of energy consumption between the AI control of the present invention and the automatic mode time-sharing control in the prior art - the third-party metering socket report.

Detailed ways

The present invention will be further described below with reference to the accompanying drawings and in combination with preferred embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

It should be noted that the orientation terms such as left, right, up, down, top, and bottom in this embodiment are only relative concepts, or refer to the normal use status of the product, and should not be regarded as having restrictive.

Through a lot of research and testing, we found that the problem of the poor effect of the methods in the prior art is that the following factors are not considered:

Space factor: the location of the user's family;

Time factor: the time when the user uses the air purifier (spring, summer, autumn, winter, day or night).

These factors, as well as possible other factors, lead to different levels and components of pollutants in the air, which further lead to the poor air purification effect of the prior art, resulting in:

1. The purification efficiency is not fixed;

2. The purification effect is not fixed;

3. Unstable power consumption;

4. There is no way to get rid of manual intervention.

The following embodiments of the present invention use an air purifier alone for AI-enhanced learning and control, which is not only easy to operate, but also improves the purification effect. The basic idea is: according to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to start different actions or combined actions A1; where the weight table refers to the action in a specific state and the action it is in. Corresponding weight comparison table; after a predetermined time, obtain the weight value of the action A1 in state S1, write it into the weight table, and realize the update of the weight table; follow-up execution with the updated weight table Steps S1-S2, continuously perform different actions or combined actions An to obtain and update different weights according to the current state Sn of the air purifier, and update the weight table; where n is a natural number. Wherein the combination action can be a combination of different gears, or an intermittent combination of the same gear. Its outline flowchart is shown in Fig. 4 .

The definitions of related concepts are as follows:

In the following embodiments of the present invention, use the initialization of Q value, during learning, its initialization Q value table is as shown in the following table:

Q-Table (values such as 10, 20, 30 in the table represent weights)

Among them, the following table is an example of the relationship table between the state of pollutants in the air in the Q-Table and the gear weight of the air purifier during actual control:

空气中污染物状态标号State label of pollutants in the air		档位1权值Gear 1 weight	档位2权值Gear 2 weight	档位3权值Gear 3 weight
00	33	22	11
11	33	22	11
22	33	22	11
33	33	22	11
44	11	33	22
55	11	33	22
66	11	33	22
77	11	33	22
88	11	22	33
99	11	22	33
1010	11	22	33
1111	11	22	33
1212	11	22	33
1313	11	22	33
1414	11	22	33
1515	11	22	33
1616	11	22	33
1717	11	22	33
1818	11	22	33
1919	11	22	33
2020	11	22	33
21twenty one	11	22	33
22twenty two	11	22	33
23twenty three	11	22	33
24twenty four	11	22	33

2525	11	22	33
2626	11	22	33
2727	11	22	33
2828	11	22	33
2929	11	22	33
3030	11	22	33
3131	11	22	33
3232	11	22	33
3333	11	22	33
3434	11	22	33
3535	11	22	33
3636	11	22	33
3737	11	22	33
3838	11	22	33
3939	11	22	33
4040	11	22	33
4141	11	22	33
4242	11	22	33
4343	11	22	33
4444	11	22	33
4545	11	22	33
4646	11	22	33
4747	11	22	33
4848	11	22	33
4949	11	22	33
5050	11	22	33
5151	11	22	33
5252	11	22	33
5353	11	22	33
5454	11	22	33
5555	11	22	33
5656	11	22	33
5757	11	22	33
5858	11	22	33
5959	11	22	33
6060	11	22	33
6161	11	22	33
6262	11	22	33
6363	11	22	33
6464	11	22	33
6565	11	22	33

6666	11	22	33
6767	11	22	33
6868	11	22	33
6969	11	22	33
7070	11	22	33
7171	11	22	33
7272	11	22	33
7373	11	22	33
7474	11	22	33
7575	11	22	33
7676	11	22	33
7777	11	22	33
7878	11	22	33
7979	11	22	33
8080	11	22	33
8181	11	22	33
8282	11	22	33
8383	11	22	33
8484	11	22	33
8585	11	22	33
8686	11	22	33
8787	11	22	33
8888	11	22	33
8989	11	22	33
9090	11	22	33
9191	11	22	33
9292	11	22	33
9393	11	22	33
9494	11	22	33
9595	11	22	33
9696	11	22	33
9797	11	22	33
9898	11	22	33
9999	11	22	33

The 0-99 value in the first column is the data obtained by dividing the PM2.5 value by 10 and rounding up, indicating multiple reading files from the PM value from 0-1000.

The method flow of this embodiment is shown in FIG. 1 , and the algorithm can be executed locally or in the cloud. The composition of the cloud server is shown in FIG. 2 .

The method of this embodiment is introduced as follows:

1. Training

According to the current state S1 of the air purifier, by letting the air purifier turn on different combination actions A1, after a period of time (time and space are related, as described below), observe whether the current control has an effective effect on the air quality Filtering, so as to obtain the weight value of performing A1 work under S1, and continuously obtain and update different weights according to the execution of different actions and write them into the Q-table.

Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))

The meaning of the expression is introduced as follows:

s, a, next_s - current state, current action, next state;

reward——reward, the reward for performing an action;

Q[s][a]——value, the value generated by action a in state s;

max(Q[next_s])——Maximum value, the maximum value of all actions in the next state;

lr——learning rate (learning_rate), the larger the lr, the less the training effect before the reservation; lr is 0, Q[s,a]

The value remains unchanged; when lr is 1, the original value is completely discarded;

factor——discount factor (discount_factor), the larger the factor, the more attention to historical experience; factor

When it is 0, only the current benefit (reward) is concerned.

Among them, the reward value after executing the control command in the current state is directly proportional to the reduction of air pollutants, and inversely proportional to the executed gear (a larger gear value means that the power of the air purifier is higher). The algorithm is as follows:

The determination method of "a period of time" here is as follows: what the user enters is not the time, but the area and floor height of the room, so the time needs to be determined according to the area and floor height. Roughly when it is filtered to about 2/3 of the air capacity of the room. An example is as follows: Assume that the air purifier has three gears of 1, 2, and 3, and the output speed of each gear is different. The first gear is 100 cubic meters per hour, and the second gear is 200 cubic meters per hour. , the third gear is 330 cubic meters per hour. One hour is 3600 seconds, divide 100 by 3600 to get the volume of clean air output per second; if the room is 20 square meters, the floor height is 3 meters, then the space is about 60 cubic meters of air volume, if not Considering the replacement of indoor and outdoor air, when we filter about 2/3, we will observe a significant reduction in the content of air pollutants. The waiting time is 2/3 of 60 cubic meters and then divided by the air filtration speed of the current gear.

2. AI control

According to the status of each air purifier, find the action corresponding to the highest weight in the Q-table. Among them, the way to search for the AI control weight of the air purifier is to subdivide the state, the concentration of the current pollutant in the state, divide the concentration of the pollutant by 10 and round up, and the state of 35/10 is 3.

Compare experimental data

PM channels in this embodiment include: pm1, pm2.5, and pm10. The sensor technical instructions are as follows:

参数parameter	指标index	单位unit
颗粒物测量范围1 Particle measurement range 1	0.1～1.00.1～1.0	微米(μm)Micron (μm)
颗粒物测量范围2Particle measurement range 2	1.0～2.51.0～2.5	微米(μm)Micron (μm)
颗粒物测量范围3Particle measurement range 3	2.5～102.5～10	微米(μm)Micron (μm)

This application utilizes the natural correlation between pm1, pm2.5 and pm10 to measure the three at the same time, and the experiment proves that it can realize the efficient and accurate determination of the content of air pollutants. The existing technology usually only cares about the value of pm2.5. Since the correlation methods of the three are different in different regions, different seasons and even different weathers, the existing technology cannot design such a correlation strategy. It can pass The measurement of PM1 and PM10 in turn promotes the purification effect of PM2.5.

The specific implementation flow chart is shown in Figure 1:

1. Initialize the Q-table

There are several ways to initialize Q-table: random, orderly, and basic experience initialization. Both random and orderly methods may cause poor user experience at the beginning; we use automatic operation to initialize, (This does not refer to the second adjustment method in the prior art, but an initial value automatically given before Q-learning learning to adapt to the scene of the air purifier);

2. Single-point data acquisition

Collect the impact of the current air purifier working in the current gear on the air quality; mainly consider the following factors:

The air injected by the air purifier will affect the air flow in the space;

For the impact of changing gears on air quality, it takes a period of time for the air purifier to perceive it stably; this time is related to the size of the space where the air purifier is located (as mentioned above), and the user can manually input it through the app;

3. Analyze the collected data and give the next step control instructions;

The impact caused by the current gear can be understood as a parameter for us to evaluate the effect of the current gear (inversely proportional to the gear and directly proportional to the air purification effect);

Calculate the Q-table obtained by this AI operation, and update the data;

Re-acquire the current pollutant data. If the pollutants are not controlled to the ideal level (the concentration of pollutants is reduced to the high-quality range required by various countries, such as China's pm2.5, 0-30 is excellent), then query the Q-table to obtain For the gear position control command that needs to be issued at present, jump to 2 and wait for the result; if the pollutants have been controlled to the ideal range, end this training, save the Q-table, and turn on the AI prediction mode;

4. AI prediction mode evaluation module

Make statistics on the changes in air quality under the AI prediction mode. Within 12 hours, if the current AI model has no way to promote the air quality (the following two situations occur at the same time), you need to restart the training: the current prediction mode In this case, the obtained rewards are all negative, and the air pollutant content has exceeded the safe range.

Several implementations of the algorithm

The operation of the air purifier has the following characteristics: the feedback of the air quality is hysteresis, and it takes a period of time to see the effect; It is not necessarily the case that factors in time or space change.

So we didn't use supervised learning, because we can't clearly get the definition of good or bad after the air purifier executes a certain instruction. Reinforcement learning is obviously more suitable for this scenario, so what needs to be solved in reinforcement learning:

Q-Learning is a value-based algorithm in the reinforcement learning algorithm. Q is Q(s, a), which is the expectation that taking action a in the state at a certain moment can obtain benefits, and the environment will give feedback based on the action of the agent. Corresponding reward rewards, so the main idea of the algorithm is to construct a Q_table table to store the state and action to store the Q value, and then select the action that can obtain the maximum benefit according to the Q value.

The main advantage of Q-learning is that it uses the time difference method (integrating Monte Carlo and dynamic programming) to learn off-policy, and uses the Bellman equation to solve the optimal strategy for the Markov process.

The hardware and software composition diagram is shown in Figure 3.

Method implementation of the present invention realizes following advantage:

1. The purification effect has been improved;

In the same non-enclosed environment, a one-week test was carried out, and the same purifier was tested and compared using AI mode and automatic mode respectively as follows: As shown in Figure 5A-5B, the AI control of the present invention can control the value of PM2.5 At lower levels, the cleansing effect is boosted. Figure 5A is the AI mode, and Figure 5B is the automatic mode.

2. Purification efficiency improvement:

In the same closed environment, artificially put in pollutants, the efficiency comparison is as follows: as shown in Figure 6A-6B, the AI control of the present invention can reduce the content of PM2.5 faster, and the purification efficiency has been improved. Figure 6A is the AI mode, and Figure 6B is the automatic mode.

3. Reduced power consumption:

To sum up, the comparison of power consumption in the above two cases is as follows: Figure 7A-7B shows the energy consumption report in the APP, it is obvious that the AI mode of the present invention is more power-saving than the traditional mode; at the same time, from Figure 7C-7E The report of the tripartite power meter socket also shows that the AI mode of the present invention is more energy-saving than the traditional mode. Among them, Fig. 7C is the weekly energy consumption report using the AI control of the present invention, and Fig. 7D is the weekly energy consumption report using the prior art automatic control. Consumption, 14:00-15:00 records the energy consumption of automatic opening.

4. No manual control

The records of the number of times of human control in the two cases showed that there was no record of manual control for three consecutive weeks.

The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art to which the present invention belongs, several equivalent substitutions or obvious modifications can be made without departing from the concept of the present invention, and those with the same performance or use should be deemed to belong to the protection scope of the present invention.

Claims

An air purifier adjustment method based on reinforcement learning model, is characterized in that, comprises training phase and control phase, and training phase is as follows:

S1. According to the current state S1 of the air purifier, look up the weight table, and use the Q-Learning algorithm to control the air purifier to start different actions or combined actions A1; where the weight table refers to the action in a specific state and its corresponding weight comparison table of values;

S2. After the predetermined time, obtain the weight value of the action or combined action A1 in the state S1, write it into the weight table, and realize the update of the weight table; the "scheduled time" is based on the area and floor height of the room Sure;

The control phase includes the following steps:

S3. Execute step S1 with the updated weight table;

Wherein, the state is determined according to S1 according to the current execution gear of the air purifier and the parameters in the current air that the air purifier itself can measure.
The air purifier adjustment method based on the reinforcement learning model as claimed in claim 1, characterized in that: in the training phase, steps S1-S2 are continuously performed, and it is judged whether the training has reached the standard of the expected purpose, if it is reached , the training ends.
The method for adjusting an air purifier based on a reinforcement learning model according to claim 1, wherein in step S3, it is also judged whether the previous training model can no longer meet the requirements of the current aerodynamic model, and if so, retraining is started According to the current state Sn of the air purifier, perform different actions or combined actions An to obtain and update different weights, and update the weight table; wherein n is a natural number; the state is based on Sn according to the current execution gear of the air purifier, and The parameters in the current air that can be measured by the air purifier itself are determined.
The air purifier adjustment method based on reinforcement learning model as claimed in claim 3, characterized in that: for each state Sn, the weights corresponding to multiple actions are given in the weight table, during control, according to each air For the state of the purifier, find the action corresponding to the highest weight in the Q-table to control the air purifier.
The air purifier adjustment method based on reinforcement learning model as claimed in claim 1, is characterized in that, the calculation method of weight is as follows:

Q[s][a]=(1-lr)*Q[s][a]+lr*(reward+factor*max(Q[next_s]))

The meaning of the expression is as follows:

s, a, next_s respectively represent: current state, current action, next state;

reward means reward, that is, the reward for performing action a;

Q[s][a] represents value, that is, the value generated by action a in state s;

max(Q[next_s]) indicates the maximum value, that is, the maximum value of all actions in the next state;

lr represents the learning rate (learning_rate), the larger the lr, the less the training effect before it is retained; when lr is 0, the value of Q[s,a] remains unchanged; when lr is 1, the original value is completely discarded;

factor represents the discount factor (discount_factor). The larger the factor is, the more attention is paid to historical experience; when the factor is 0, only the current benefit (reward) is concerned.
The air purifier adjustment method based on the reinforcement learning model according to claim 5, characterized in that the reward value after the control command is executed in the current state is proportional to the reduction of air pollutants and inversely proportional to the executed gear.
The air purifier adjustment method based on the reinforcement learning model according to claim 6, wherein the algorithm of the reward value is as follows:

reward=(concentration of pollutants before executing the gear - concentration of air pollutants after executing the gear for a period of time)/executed gear;

Among them, the "certain time" is determined according to the area and floor height of the room.
The air purifier adjustment method based on reinforcement learning model according to claim 7, characterized in that said air pollutant concentration is determined according to the following pollutant contents: pm1, pm2.5 and pm10.
The method for adjusting an air purifier based on a reinforcement learning model according to claim 1, wherein the Q-Learning algorithm is executed in the cloud, and the air purifier communicates with the cloud through the Internet.
An air purifier, comprising a main control unit, an air sensor unit, and a memory, wherein computer software is stored in the memory, and the computer software can be executed to realize the method according to claims 1-9.