CN116759014B

CN116759014B - Random forest-based gas type and concentration prediction method, system and device

Info

Publication number: CN116759014B
Application number: CN202311051588.9A
Authority: CN
Inventors: 黄奇景; 金红兴; 付倩倩; 刘国建
Original assignee: Qisi Semiconductor Hangzhou Co ltd
Current assignee: Qisi Semiconductor Hangzhou Co ltd
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2023-11-03
Anticipated expiration: 2043-08-21
Also published as: CN116759014A

Abstract

The invention discloses a method, a system and a device for predicting gas types and concentration based on random forests, wherein the method comprises the following steps: acquiring a data set formed by sampling data; judging whether a preset air baseline needs to be corrected or not, if so, calibrating the preset air baseline based on a preset baseline updating strategy to obtain a calibrated air baseline, and constructing a baseline updating model based on standard environment sampling data, historical sampling data and sampling data at the current moment; updating response data in the historical data set based on the air baseline to obtain an updated historical data set; preprocessing the updated historical data set to obtain a training sample set, and training and verifying based on the training sample set to obtain a random forest prediction model; and inputting the data to be tested into a random forest prediction model to obtain the gas classification and gas concentration prediction results. The method can improve the accuracy of gas type and concentration prediction based on random forests.

Description

Random forest-based gas type and concentration prediction method, system and device

Technical Field

The invention relates to the technical field of gas detection, in particular to a method, a system and a device for predicting gas types and concentration based on random forests.

Background

In the age of paying attention to environmental safety nowadays, the detection of gas components is increasingly paid attention to by various communities, and the main application of gas detection is to analyze the components of target gas, so as to identify and detect the concentration of harmful gas in the target gas. In the gas identification process, gas classification and gas concentration prediction depend on the resistance value of a sensor, but the resistance value of the sensor is greatly influenced by the surrounding environment such as temperature and humidity, and the temperature around the sensor is relatively stable due to the fact that a constant-temperature heating resistance wire is arranged around the sensor for heating, but the humidity changes along with the change of the surrounding environment. Therefore, in order to improve the accuracy of gas classification and gas concentration prediction, the resistance of the gas sensor needs to be calibrated (compensated) using the ambient humidity data.

However, the existing gas detection technology has a plurality of technical bottlenecks and technical defects which are difficult to solve, on one hand, the selectivity and the sensitivity of the existing gas sensor are difficult to improve, so that a condition of obvious cross sensitivity to gas exists, the measurement accuracy is reduced, and meanwhile, the baseline drift is a non-negligible influence factor. On the other hand, the accuracy of the data processing by the intelligent algorithm technology used by the existing gas detection technology can not always reach a higher level, and various problems such as poor accuracy, low efficiency and the like exist.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method, a system and a device for predicting the gas type and concentration based on random forests.

In order to solve the technical problems, the invention is solved by the following technical scheme:

a gas type and concentration prediction method based on random forests comprises the following steps:

acquiring standard environment sampling data, historical sampling data, sampling data at the current moment, a data set formed by a gas type label and a gas concentration label, and forming a training sample sequence based on response data, the gas type label and the gas concentration label, wherein the response data is obtained based on the sampling data and an air baseline;

judging whether a preset air baseline needs to be calibrated or not, if so, calibrating the preset air baseline based on a preset baseline updating strategy to obtain a calibrated air baseline, wherein a baseline updating model is built based on standard environment sampling data, historical sampling data and sampling data at the current moment;

updating response data in the data set based on the calibration air baseline to obtain an updated data set, and preprocessing the updated data set to obtain a training sample set;

Determining an optimal importance coefficient and an optimal first importance coefficient, respectively constructing and fusing a classification prediction pre-training model and a concentration prediction pre-training model, and training and verifying based on a training sample set to obtain a classification prediction model and a concentration prediction model;

and sequentially inputting the sequence to be detected into a classification prediction model and a concentration prediction model to obtain a prediction result of the gas type and the gas concentration.

As an implementation manner, the calibrating the preset air baseline based on the preset baseline updating strategy to obtain the calibrated air baseline includes the following steps:

calibrating the resistance value of the sensor at the current moment based on a preset resistance value calibration model to obtain a calibration result;

updating the preset air baseline based on the calibration result and the preset baseline updating model to obtain a calibration air baseline;

the preset resistance value calibration model is expressed as follows:

Z=Rg/Ra=R/R0=（R _t1 -R）H/R0（H _t1 -H）+R _t1 />d/R0-（1+i）/>（R _t1 -R）/>H _t1 /R0（H _t1 -H）

wherein Z represents a response function, R represents a sensor resistance value at the current moment, R0 represents a sensor resistance value after calibration to a standard environment, t1 represents a time t1, and R _t1 Represents the resistance measured at time t1, H _t1 The humidity at time t1 is represented, and H represents the humidity at the current time; i, d represent the adjustment coefficient, and the adjustment coefficient is adjusted with temperature;

The preset baseline updating model is expressed as follows:

wherein ,indicating air baseline +.>、/>Mean values of the sensor resistance values respectively representing the current update period t and the last period t-1,/->Humidity change indicative of the current update period t +.>，/>Represents the adjustment coefficient of the device,、/>2 respectively represent the humidity change in the current periodA threshold value of the conversion and a threshold value of the ratio of the front period to the rear period.

As one embodiment, the response data is expressed as z= [ z1, z2, ]; the gas class label and the gas concentration label are expressed as [ whether class 1 exists, class 1 concentration exists, class 2 concentration exists, ], whether class m exists, class m concentration exists ], the training sample sequence is expressed as [ z1, z2, zn, whether class 1 exists, class 1 concentration exists, class 2 concentration exists, whether class m exists, class m concentration exists ], wherein n represents the number of response data, and m represents the total number of gas classes.

As an embodiment, the determining whether the preset air baseline needs calibration at least includes:

setting calibration conditions, wherein the calibration conditions are at least: updating the air baseline based on the air baseline update period;

and if the time interval meets the calibration condition, updating the preset air baseline.

As an embodiment, the pretreatment includes at least a normalization treatment;

and carrying out standardization processing on each dimension of the updated response data, wherein the standardized formula is as follows:

wherein , _i ith response data representing a dimension, < >>Mean value of a dimension, +.>Representing the standard deviation of a dimension.

As an implementation manner, the classification prediction pre-training model is constructed as follows:

constructing an initial random forest model based on a set of decision trees;

based on the classification result of each data in the training sample sequence in the initial random forest model, obtaining the weight of each decision tree, wherein the weight evaluates the importance coefficient of the corresponding decision tree for classification accuracy, and the importance coefficient is expressed as follows:

wherein T represents the number of decision trees, T represents a certain decision tree, acc _t Represents the classification accuracy rate, qcls, of a decision tree t for predicting training sample sequences _t Representing importance coefficients of the corresponding decision tree;

updating the initial random forest model according to the weight of each decision tree;

and carrying out iterative calculation and updating on the weights until the convergence of the preset iterative times or the performance of the initial random forest model is reached, so as to obtain the classified prediction pre-training model.

As an implementation manner, the concentration prediction pre-training model is constructed as follows:

obtaining an initial concentration random forest model based on a group of regression trees;

based on the error sum of each data in the training sample sequence in the initial concentration random forest model, obtaining a first weight of each regression tree, wherein the first weight evaluates a first importance coefficient of the corresponding regression tree for classification accuracy, and the first importance coefficient is expressed as follows:

wherein T1 represents the number of regression trees, T ₁ Represents a regression tree, errt ₁ Representing a regression tree t ₁ Sum of prediction errors, qregt, for all sample predictions ₁ Representing a first importance coefficient of the corresponding regression tree;

updating the initial concentration random forest model according to the first weight of each regression tree;

and (3) carrying out iterative calculation and updating the weight until the convergence of the random forest model performance of the preset iterative times or the initial concentration is reached, so as to obtain a concentration prediction pre-training model.

As an embodiment, the classification prediction model is expressed as follows:

where H (x) represents the classification prediction model, T represents the number of decision trees,representing the predicted class of each decision tree, Y representing class labels, < > >Indicating that x belongs to class Y, qcls _t Representing the importance coefficient;

the concentration prediction model is expressed as follows:

Qreg _t1

wherein H (x 1) represents a concentration prediction model, T1 represents the number of regression trees,representing the predicted concentration of each regression tree, Y1 is the concentration label, ++>Prediction error, qreg, representing ht1 (x 1) and true concentration Y1 _t1 Representing the first importance coefficient.

As an embodiment, the method further comprises the steps of:

and constructing a first loss function of the classification prediction pre-training model and a second loss function of the concentration prediction pre-training model, and re-optimizing the classification prediction pre-training model and the concentration prediction pre-training model respectively by the first loss function and the second loss function.

As an embodiment, the method for obtaining the prediction result of the gas type by inputting the sequence to be measured into the classification prediction model includes the following steps:

inputting the response data and the gas category label into a classification prediction pre-training model, and training to obtain an initial classification prediction model;

combining the initial classification prediction model with a self-adaptive weight distribution algorithm to obtain the weight of each decision tree, obtaining the output result of the initial classification prediction model through voting rules combined with the weight, repeating the step until the iteration times are reached or the initial classification prediction model converges, and further obtaining a classification prediction pre-model;

The voting rule after combining the weights is as follows: selecting any decision tree to multiply the predicted category of a certain gas category label by the weight coefficient of the decision tree to obtain the classification score of the corresponding decision tree, obtaining the classification scores of other decision trees by the same method, and summing all the classification scores to obtain the category with the highest score, namely the predicted category of the gas.

As an implementation manner, the sequence to be measured is sequentially input into a concentration prediction model to obtain a prediction result of gas concentration, and the method comprises the following steps:

training the response data and the gas concentration label concentration prediction pre-training model to obtain an initial concentration prediction model;

combining the initial concentration prediction model with an adaptive weight distribution algorithm to obtain the weight of each regression tree, obtaining the output result of the initial concentration prediction model through a voting rule combined with the weight, repeating the step until the iteration times are reached or the initial concentration prediction model converges, and further obtaining a concentration prediction model;

the voting rule after combining the weights is as follows: and (3) selecting the predicted concentration of any regression tree on a certain gas concentration label, multiplying the predicted concentration by the weight coefficient of the regression tree to obtain the concentration predicted result of the corresponding regression tree, similarly obtaining the concentration predicted results of other regression trees, and summing all the concentration predicted results to obtain a final concentration predicted result, namely the concentration predicted result of the gas.

A gas type and concentration prediction system based on random forests comprises a data processing module, a judging and updating module, a data set updating module, a determining and preprocessing module and a data prediction module;

the data processing module is used for acquiring a data set formed by standard environment sampling data, historical sampling data, sampling data at the current moment, a gas type label and a gas concentration label, and forming a training sample sequence based on response data, the gas type label and the gas concentration label, wherein the response data is obtained based on the sampling data and an air baseline;

the judging and updating module is used for judging whether the preset air baseline needs to be calibrated or not, if yes, the preset air baseline is calibrated based on a preset baseline updating strategy to obtain a calibrated air baseline, wherein a baseline updating model is built based on standard environment sampling data, historical sampling data and sampling data at the current moment;

the data set updating module is used for updating response data in the data set based on the calibration air baseline to obtain an updated data set, and preprocessing the updated data set to obtain a training sample set;

the determining training module is used for determining an optimal importance coefficient and an optimal first importance coefficient, respectively constructing and fusing a classification prediction pre-training model and a concentration prediction pre-training model, and training and verifying based on a training sample set to obtain a classification prediction model and a concentration prediction model;

The data prediction module is used for inputting the sequence to be detected into the classification prediction model and the concentration prediction model in sequence to obtain the prediction results of the gas types and the gas concentrations.

A random forest based gas species and concentration prediction apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method as described above when executing the computer program.

A computer readable storage medium storing computer instructions, characterized in that the computer readable storage medium stores instructions for performing a method as described above.

The invention has the remarkable technical effects due to the adoption of the technical scheme:

the method is used for constructing a classification prediction model and a concentration prediction model based on the random forest model, is used for identifying and detecting the gas, and the random forest algorithm model has the characteristics of better identification effect, simplicity, higher precision, good robustness and the like, and not only realizes qualitative identification of the mixed gas, but also realizes concentration prediction of each component gas;

the air baseline is updated, and the collected data is processed based on the air baseline value, so that the problems of cross sensitivity and baseline drift of the existing gas sensor can be overcome, and the accuracy of data processing is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram showing the results of the experiment of the present invention;

fig. 3 is a schematic diagram of the system of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the present invention, the sample data is derived from acquired sample data for a gas sensor, which may be a sensor unit in a sensor array. A sensor array refers to a sensing unit having different sensing elements or pixels in a single substrate or a single device. A sensor unit refers to an element of a sensor or sensor array. And the response data is a measurable response of the sensing element or MOS pixel in the sensor, including a change in an electrical characteristic (resistance or impedance). It is an analog signal that can be converted and recorded into digital form and thus can be referred to as corresponding data. The unit of response data is typically the sensor resistance (R) ratio. The sampled data is actually collected as the resistance value of the sensor active material at a given temperature, and typically, the response data is expressed as Rg/Ra, where "R" is the resistance of the sensor active material at a given temperature; "Rg" represents the resistance of the sensor when exposed to a target gas; "Ra" is the resistance of the sensor pixel when exposed to air (as baseline information). Ra/Rg can also be used depending on the MOS material type (N-type or P-type), that is, depending on the material, response data=sample data/air baseline, or response data=air baseline/sample data.

In the prior art, odor data refers to data that is capable of reflecting the odor characteristics of an analyte. One or more gas sensors of different active materials may be exposed to the same analyte and one or more sets of response data obtained using a temperature scanning method may be used as the odor data. The data after further processing of the one or more sets of response data may also be used as scent data. The odor data may be in the form of one or more curves having at least one peak shape. Wherein, the response data of different groups are obtained by temperature scanning aiming at the gas sensors of different active materials.

Example 1:

a gas type and concentration prediction method based on random forests, as shown in figure 1, comprises the following steps:

s100, acquiring standard environment sampling data, historical sampling data, sampling data at the current moment, a data set formed by a gas type label and a gas concentration label, and forming a training sample sequence based on response data, the gas type label and the gas concentration label, wherein the response data is obtained based on the sampling data and an air baseline;

s200, judging whether a preset air baseline needs to be calibrated, if so, calibrating the preset air baseline based on a preset baseline updating strategy to obtain a calibrated air baseline, wherein a baseline updating model is built based on standard environment sampling data, historical sampling data and sampling data at the current moment;

S300, updating response data in the data set based on the calibration air baseline to obtain an updated data set, and preprocessing the updated data set to obtain a training sample set;

s400, determining an optimal importance coefficient and an optimal first importance coefficient, respectively constructing and fusing a classification prediction pre-training model and a concentration prediction pre-training model, and training and verifying based on a training sample set to obtain a classification prediction model and a concentration prediction model;

s500, inputting the sequence to be detected into the classification prediction model and the concentration prediction model in sequence to obtain the prediction results of the gas types and the gas concentrations.

In the gas sensor, the resistance value of the gas sensor depends on the type and concentration of the gas, but the resistance value of the gas sensor is greatly influenced by the surrounding environment (temperature and humidity), and the constant-temperature heating resistance wire is arranged around the gas sensor for heating in the working process, so that the temperature around the gas sensor is relatively stable and the humidity changes along with the change of the surrounding environment. Therefore, in order to improve the accuracy of gas classification and gas concentration prediction, it is necessary to calibrate the resistance of the gas sensor using ambient humidity data, and therefore, in the present invention, it is determined whether or not the air baseline is updated and whether or not there is a fluctuation in the response data.

See data in table 1:

TABLE 1

Table 1 shows the graph of the gas sensor decay with humidity (humidity 45 is a standard environment), that is:

the values of the attenuation coefficients of different sensors and the relationship of the attenuation coefficients with humidity are different;

the relation of the attenuation coefficient of the A sensing air resistance value along with humidity is as follows: y=x（-0.6527） + 133.079；

The relation of the attenuation coefficient of the B sensing air resistance value along with humidity is as follows: y=x（-0.2969 ）+ 111.5598；

The relation of the attenuation coefficient of the C sensing air resistance value along with humidity is as follows: y=x（-0.5878） + 124.7797；

The relation of the attenuation coefficient of the D sensing air resistance value along with humidity is as follows: y=x（-0.2701） +112.0919

Wherein y represents the change coefficient of the relative labeling environment resistance, and x represents the humidity value of the current environment.

The resistance of the sensor can be calibrated to the resistance value in a standard environment using the above relationship.

That is, r0=r/y, where R represents the current gas sensor resistance value and R0 represents the resistance value of the gas sensor after calibration to a standard environment.

Taking the influence of other adjustment coefficients into consideration, calibrating the preset air baseline based on a preset baseline updating strategy to obtain a calibrated air baseline, wherein the method comprises the following steps of:

the preset resistance value calibration model is expressed as follows:

the preset baseline update model is expressed as follows:

wherein ,indicating air baseline +.>、/>Mean values of the sensor resistance values respectively representing the current update period t and the last period t-1,/->Representing the current update period tHumidity change of>，/>Represents the adjustment coefficient of the device,、/>2 respectively represents the threshold value of humidity change in the current period and the threshold value of the ratio of the front period to the rear period. i, d represent the adjustment coefficients are determined in combination with the current humidity conditions and the gas sensor parameters and empirical values in order to make the results of the response function more accurate.

In one embodiment, the response data is represented as z= [ z1, z2, ]; the gas class label and the gas concentration label are expressed as [ whether there is class 1, class 1 concentration, whether there is class 2, class 2 concentration, ], whether there is class m, class m concentration ], the training sample sequence is expressed as [ z1, z2, ], zn, whether there is class 1, class 1 concentration, whether there is class 2, class 2 concentration, ], whether there is class m, class m concentration ], where n represents the number of response data, and m represents the total number of gas classes.

The sampled data can be understood as the resistance value of the sampled gas sensor in air, which data can change due to changes in humidity. The air baseline is updated, so that the problem of air baseline drifting can be solved, and the gas identification accuracy can be improved.

For example, in this embodiment, the gas sensor array may be a gas sensor or a single gas sensor unit, and obtain sampling data of a mixed gas, for example, the sampling data may be 1 temperature sensor including 10 temperature points and 5 constant temperature sensors, and further convert the sampling data into corresponding response data, and in addition, a label tag may be added according to the type and the concentration of the mixed gas, where the response data is represented by z= [ z1, z2, ], zn, where n represents the number of response data, the gas type tag and the gas concentration tag are represented by [ whether there is a type 1, a type 1 concentration, whether there is a type 2, a type 2 concentration, ], whether there is a type m, a type m concentration ], the training sample sequence is represented by [ z1, z2, ], whether there is a type 1, a type 2 concentration, a type m concentration, and m represents the total number of gas types.

The gas type has an influence on the resistance value of the gas sensor, the gas sensor has cross sensitivity on various gases, and different types of gases can cause different influences on the resistance value of the gas sensor, so that the gas type and the concentration can be distinguished. The data set comprises standard environment sampling data, historical sampling data, sampling data at the current moment, a gas type label and a gas concentration label. The characteristics of the gas sensor, therefore, will have a drift in the air baseline where the response data = sample data/air baseline.

In embodiments, the acquisition time may be sampled at 10 second sampling intervals, or other sampling rules may be employed.

In one embodiment, the mixed gas can be sampled by adopting the gas sensor array according to the concentration ratio (other ratios are also possible) of 120 different CH4 and CO mixed gases, the gas sensor is subjected to tens of thousands of calibration tests, the accuracy of the gas sensor array is judged by the calibration tests, the gas sensor is debugged and replaced according to the judgment result, and the accuracy of the sampled data acquired by the gas sensor array is ensured.

In one embodiment, determining whether the preset air baseline requires calibration includes at least:

Because of the characteristics of the gas sensor, when the air baseline needs to be updated, for example, when the gas type increases or decreases, the air baseline changes or the gas concentration increases, or if the concentration and the type are unchanged, the air baseline may also change according to time, so that the air baseline needs to be updated.

The initial value of the air baseline can be set according to experience and actual requirements, the corresponding updating mode can be selected according to specific conditions for subsequent updating or correction, the baseline drift and data deviation caused by the problem of gas sensor cross sensitivity can be controlled in a proper range, meanwhile, the operation processing process is reduced as much as possible while enough data are acquired, and the accuracy requirement of the gas sensor array for data sampling is met.

Preprocessing the data set, including at least normalization processing;

and carrying out standardization processing on each dimension of the updated response data, wherein a standardized formula is as follows:

wherein , _i ith response data representing a dimension, < >>Mean value of a dimension, +.>Representing the standard deviation of a dimension. Of course, other processing methods are also included, such as, for example, removing too large or too small data, or filling in missing data.

In this embodiment, a classification prediction model and a concentration prediction model are constructed depending on random forests, and the gas class and the gas concentration can be predicted by inputting the sequence to be measured into the classification prediction model and the concentration prediction model through the classification prediction model and the concentration prediction model.

In one embodiment, the classification prediction pre-training model is constructed as follows:

constructing an initial random forest model based on a set of decision trees;

based on the classification result of each data in the training sample sequence in the initial random forest model, obtaining the weight of each decision tree, and evaluating the importance coefficient of the corresponding decision tree by taking the weight as the classification accuracy, wherein the importance coefficient is expressed as follows:

The concentration prediction pre-training model is also constructed by adopting the same thought, and the construction process is as follows:

based on the absolute error sum of each data in the training sample sequence in the initial concentration random forest model, obtaining a first weight of each regression tree, and evaluating a first importance coefficient of the corresponding regression tree for classification accuracy, wherein the first importance coefficient is expressed as follows:

Wherein T1 represents the number of regression trees, T ₁ Represents a regression tree, errt ₁ Representing a regression tree t ₁ Sum of prediction absolute errors, qreg, for all sample predictions _t1 Representing a first importance coefficient of the corresponding regression tree;

Finally, the classification prediction model is expressed as follows:

where H (x) represents the classification prediction model, T represents the number of decision trees,representing the predicted class of each decision tree, Y representing class labels, < >>Indicating that x belongs to class Y, qcls _t Representing the importance coefficient;

the concentration prediction model is expressed as follows:

Qreg _t1

When the classification prediction pre-training model and the concentration prediction pre-training model are constructed, the trained classification prediction model and concentration prediction model can possibly cause larger deviation between the result and the actual result during prediction, so that a first loss function of the classification prediction pre-training model and a second loss function of the concentration prediction pre-training model can be constructed, and the first loss function and the second loss function respectively optimize the classification prediction pre-training model and the concentration prediction pre-training model. The first loss function may be a conventional cross entropy loss function, or other loss functions may be used. The second loss function can adopt a mean square error loss function, and the prediction results obtained by the classification prediction model and the concentration prediction model which are obtained through training can be more accurate through constructing the loss functions.

In the prediction stage, the sequence to be detected is input into a classification prediction model to obtain a prediction result of the gas type, and the method comprises the following steps:

inputting response data and gas category labels into a classification prediction pre-training model, and training to obtain an initial classification prediction model;

combining the initial classification prediction model with a self-adaptive weight distribution algorithm to obtain the weight of each decision tree, obtaining the output result of the initial classification prediction model through combining voting rules after the weight is combined, repeating the step until the iteration times are reached or the initial classification prediction model converges, and further obtaining the classification prediction model;

Inputting the sequence to be measured into a concentration prediction model in sequence to obtain a gas concentration prediction result, wherein the method comprises the following steps:

The whole process can be shown in a figure 2, the concentration range of methane is 0-1500ppm, the concentration range of carbon monoxide is 0-500ppm, label is a label of actual data, prediction represents a label of a predicted result, the vertical axis represents concentration data of a gas component, the horizontal axis represents sampling times, 20 groups of data are sampled together, a data line graph obtained according to the actual data of methane and carbon monoxide and a data line graph obtained according to the predicted result of the methane and carbon monoxide are obtained, and the experimental result shows that the predicted result has higher coincidence degree with the actual data.

The method is used for constructing a classification prediction model and a concentration prediction model based on the random forest model, is used for identifying and detecting the gas, and the random forest algorithm model has the characteristics of better identification effect, simplicity, higher precision, good robustness and the like, and not only realizes qualitative identification of the mixed gas, but also realizes concentration prediction of each component gas.

The method is firstly suitable for early warning of kitchen dangerous gas, mainly detecting methane and carbon monoxide gas, is certainly suitable for other fields, and is used for detecting components of other gases.

Example 2:

a gas type and concentration prediction system based on random forest, as shown in figure 3, comprises a data processing module 100, a judgment updating module 200, a data set updating module 300, a determination preprocessing module 400 and a data prediction module 500;

The data processing module 100 is configured to obtain standard environmental sampling data, historical sampling data, sampling data at the current time, a data set formed by a gas type tag and a gas concentration tag, and form a training sample sequence based on the response data, the gas type tag and the gas concentration tag, wherein the response data is obtained based on the sampling data and an air baseline;

the judging and updating module 200 is configured to judge whether a preset air baseline needs to be calibrated, if yes, calibrate the preset air baseline based on a preset baseline updating strategy to obtain a calibrated air baseline, where a baseline updating model is constructed based on standard environment sampling data, historical sampling data and sampling data at the current moment;

the data set updating module 300 updates response data in the data set based on the calibration air baseline to obtain an updated data set, and preprocesses the updated data set to obtain a training sample set;

the determining training module 400 is configured to determine an optimal importance coefficient and an optimal first importance coefficient, respectively construct and fuse a classification prediction pre-training model and a concentration prediction pre-training model, and train and verify based on a training sample set to obtain a classification prediction model and a concentration prediction model;

The data prediction module 500 is configured to sequentially input the sequence to be measured into the classification prediction model and the concentration prediction model, and obtain a prediction result of the gas type and the gas concentration.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CK-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In addition, in the specific embodiments described in the present specification, the shapes, the names, and the like of the components may be different. All equivalent or simple changes of the structure, characteristics and principle according to the inventive concept are included in the protection scope of the present invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions in a similar manner without departing from the scope of the invention as defined in the accompanying claims.

Claims

1. The method for predicting the gas type and concentration based on the random forest is characterized by comprising the following steps of:

judging whether a preset air baseline needs to be calibrated, if so, calibrating the preset air baseline based on a preset baseline updating strategy to obtain a calibrated air baseline, wherein a baseline updating model is constructed based on standard environment sampling data, historical sampling data and sampling data at the current moment, and the baseline updating model is expressed as follows:

wherein ,indicating air baseline +.>、/>Mean values of the sensor resistance values respectively representing the current update period t and the last period t-1,/->Humidity change indicative of the current update period t +.>，/>Representing adjustment coefficients->、/>2 respectively represents a threshold value of humidity change in the current period and a threshold value of the ratio of the front period to the rear period;

Determining an optimal importance coefficient and an optimal first importance coefficient, respectively constructing and fusing a classification prediction pre-training model and a concentration prediction pre-training model, and training and verifying based on a training sample set to obtain a classification prediction model and a concentration prediction model, wherein the classification prediction model is expressed as follows:

the concentration prediction model is expressed as follows:

Qreg _t1

wherein H (x 1) represents a concentration prediction model, T1 represents the number of regression trees,representing the predicted concentration of each regression tree, Y1 is the concentration label, ++>Prediction error, qreg, representing ht1 (x 1) and true concentration Y1 _t1 Representing the first importance coefficient;

2. The method for predicting gas types and concentrations based on random forests as claimed in claim 1, wherein the step of calibrating the preset air baseline based on the preset baseline updating strategy to obtain a calibrated air baseline comprises the following steps:

the preset resistance value calibration model is expressed as follows:

wherein Z represents a response function, R represents a sensor resistance value at the current moment, R0 represents a sensor resistance value after calibration to a standard environment, t1 represents a time t1, and R _t1 Represents the resistance measured at time t1, H _t1 The humidity at time t1 is represented, and H represents the humidity at the current time; i, d represent adjustment coefficients, and the adjustment coefficients are adjusted with temperature.

3. The random forest based gas species and concentration prediction method of claim 1, wherein the response data is expressed as z= [ z1, z2, ], zn ]; the gas class label and the gas concentration label are expressed as [ whether class 1 exists, class 1 concentration exists, class 2 concentration exists, ], whether class m exists, class m concentration exists ], the training sample sequence is expressed as [ z1, z2, zn, whether class 1 exists, class 1 concentration exists, class 2 concentration exists, whether class m exists, class m concentration exists ], wherein n represents the number of response data, and m represents the total number of gas classes.

4. The method for predicting gas type and concentration based on random forest of claim 1, wherein the determining whether the preset air baseline needs calibration at least comprises:

5. A random forest based gas species and concentration prediction method as claimed in claim 1 wherein said pre-processing comprises at least normalization processing;

6. The random forest based gas type and concentration prediction method according to claim 1, wherein the classification prediction pre-training model is constructed as follows:

constructing an initial random forest model based on a set of decision trees;

7. The method for predicting gas types and concentrations based on random forests as claimed in claim 1, wherein the concentration prediction pre-training model is constructed by the following steps:

8. The random forest based gas species and concentration prediction method of claim 1, further comprising the steps of:

9. The method for predicting gas types and concentrations based on random forests as claimed in claim 1, wherein the step of inputting the sequence to be measured into the classification prediction model to obtain the prediction result of the gas types comprises the steps of:

10. The method for predicting gas types and concentrations based on random forests as claimed in claim 1, wherein the sequence to be measured is sequentially input into a concentration prediction model to obtain a predicted result of gas concentration, comprising the following steps:

11. The gas type and concentration prediction system based on the random forest is characterized by comprising a data processing module, a judging and updating module, a data set updating module, a determining and training module and a data prediction module;

the judging and updating module is used for judging whether the preset air baseline needs to be calibrated, if yes, the preset air baseline is calibrated based on a preset baseline updating strategy to obtain a calibrated air baseline, wherein a baseline updating model is constructed based on standard environment sampling data, historical sampling data and current time sampling data, and the baseline updating model is expressed as follows:

wherein ,indicating air baseline +.>、/>Mean values of the sensor resistance values respectively representing the current update period t and the last period t-1,/->Representing the current update period tHumidity change- >，/>Representing adjustment coefficients->、/>2 respectively represents a threshold value of humidity change in the current period and a threshold value of the ratio of the front period to the rear period;

the determining training module is used for determining an optimal importance coefficient and an optimal first importance coefficient, respectively constructing and fusing a classification prediction pre-training model and a concentration prediction pre-training model, and training and verifying based on a training sample set to obtain the classification prediction model and the concentration prediction model, wherein the classification prediction model is expressed as follows:

the concentration prediction model is expressed as follows:

Qreg _t1

12. A random forest based gas species and concentration prediction device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 10 when executing the computer program.

13. A computer readable storage medium storing computer instructions, characterized in that the computer readable storage medium stores instructions for performing the method of any one of claims 1 to 10.