CN115618190A - Environmental particulate matter concentration estimation method and system based on sensor data and terminal - Google Patents

Environmental particulate matter concentration estimation method and system based on sensor data and terminal Download PDF

Info

Publication number
CN115618190A
CN115618190A CN202211189210.0A CN202211189210A CN115618190A CN 115618190 A CN115618190 A CN 115618190A CN 202211189210 A CN202211189210 A CN 202211189210A CN 115618190 A CN115618190 A CN 115618190A
Authority
CN
China
Prior art keywords
fitting
hum
tem
data
sensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211189210.0A
Other languages
Chinese (zh)
Inventor
刘心宇
熊静平
黄�俊
薛辉溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huansi Technology Co ltd
Original Assignee
Shenzhen Huansi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huansi Technology Co ltd filed Critical Shenzhen Huansi Technology Co ltd
Priority to CN202211189210.0A priority Critical patent/CN115618190A/en
Publication of CN115618190A publication Critical patent/CN115618190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume, or surface-area of porous materials
    • G01N15/06Investigating concentration of particle suspensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/12Simultaneous equations, e.g. systems of linear equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume, or surface-area of porous materials
    • G01N2015/0038Investigating nanoparticles

Abstract

The invention relates to a sensor data-based environmental particulate matter concentration estimation method, a system and a terminal, wherein the method comprises three links of modeling, evaluation and application, and the three links can be updated and called regularly after being deployed on hardware equipment; and (3) modeling: inputting a training time sequence set and modeling configuration parameters, and outputting an alternative estimation model; and (3) evaluation link: the input is a test time sequence set, after the test time sequence set passes through an alternative estimation model obtained in a modeling link, a test error is output, and an optimal estimation model can be output according to the alternative estimation model with the minimum test error; an application link: the method comprises the steps that an original time sequence is input, and after calculation of a preferred estimation model, an estimated time sequence is output; by the method, the measurement capability of the sensor on the concentration of the environmental particulate matter can be effectively improved after the data of the sensor is subjected to software level processing.

Description

Environmental particulate matter concentration estimation method and system based on sensor data and terminal
Technical Field
The invention relates to the technical field of environmental data detection, in particular to a method, a system and a terminal for estimating environmental particulate matter concentration based on sensor data.
Background
The concentration level of the atmospheric pollutants is an important index for environmental management, wherein the index of the particulate matter category is PM 2.5 (Fine particulate matter) and PM 10 (inhalable particles) are the main ones, and are closely related to environmental quality and human health;
currently for PM 2.5 And PM 10 In addition to the measurement methods based on precision instruments, such as the oscillating balance method and the beta-ray method which are commonly used in official monitoring sites, the concentration monitoring means of the method also has wider application based on a sensor of a light scattering principle; although there are significant gaps in accuracy and stability between sensors and precision instruments, due to the low cost and small size of sensors, it is still possible to provide advantageous hardware support for deploying sensing systems with high spatial and temporal resolution.
At present, the mainstream sensor based on the light scattering principle can achieve certain correlation on the response trend compared with the standardized precision instrument measurement for measuring the concentration of the environmental particulate matter; however, under certain conditions, particularly high humidity and conditions of large short-term changes in humidity, the correlation between the sensor measurements and the standardized precision instrument measurements is poor.
One of the important reasons is that, in the current mainstream of the sensor based on the light scattering principle, the typical processing method is to calculate the receiving intensity of light signal scattering caused by suspended particles in the detection cavity, which is neglected for the particle size of the particulate matter; some types of sensors only use standard instruments to obtain scattered receiving signals to PM when leaving factory 2.5 The concentration is calibrated, and for PM 10 The concentration is as followsThrough PM 2.5 Concentration is calculated by simple linear mapping (multiplication by a coefficient plus a numerical value). In this case, the PM output from such a particulate matter sensor 10 Concentration of only one with PM 2.5 The concentration highly correlated sequence (as shown in the figure 1 of the specification) does not really reflect the environmental PM under many conditions 10 The true concentration of (c).
Under actual conditions, PM is caused by changes of factors such as humidity and the like 2.5 And PM 10 The trend has a reduced correlation with the real environment concentration (as shown in fig. 2 in the specification), thereby causing measurement deviation of the particulate matter sensor.
Improvement in hardware requires separation of particle size in the sensor, for PM 2.5 And PM 10 The scattered received signals of (a) are counted respectively. On the one hand, the complexity of structural design and equipment cost are greatly increased; while it is difficult for deployed particulate matter sensors to perform as a performance improvement. Thus, the environmental PM is estimated and approximated from the existing information by software improvement 10 The real concentration becomes a practical and feasible method;
in the conventional calibration problem, the problem is regarded as a classical regression problem, i.e. by regression modeling, establishing a functional mapping relationship between input (multiple) parameters and output, and multiple (non-) linear regression, etc. can be used. In practice, however, the problem has many additional challenges, and the performance of regression modeling needs to be effectively improved through design optimization on a mechanism. These challenges include:
(1) The individual difference of hardware affects the regression model and the difference of parameters thereof.
(2) Standard metrology equipment is lacking to provide calibration data support.
(3) The samples are not representative enough in the regression modeling process within a limited time.
In response to the above challenges, there is a need for an environmental PM based on sensor data 2.5 /PM 10 The concentration estimation method effectively improves the measurement capability of the sensor on the environmental particulate matter concentration at the software level.
Disclosure of Invention
The present invention provides an environmental particulate matter concentration estimation method based on sensor data, an environmental particulate matter concentration estimation system based on sensor data, an environmental particulate matter concentration estimation terminal based on sensor data, and a computer-readable storage medium, aiming at the above-mentioned defects of the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for estimating the concentration of environmental particles based on sensor data is constructed, wherein the method comprises the following steps:
the first step is as follows: selecting a training time period and a testing time period;
the second step is that: extracting original environment data output by a sensor to generate a training time sequence set and a testing time sequence set with an hour interval; training a time sequence set sample self-training time period, and testing the time sequence set sample self-testing time period;
the third step: setting modeling configuration parameters, and determining a regression modeling structure:
selecting a plurality of different set fitting item sets, inputting training time sequence set data into a composition formula corresponding to the selected fitting item set to obtain a plurality of different fitting item sets, and obtaining a plurality of alternative estimation model structures according to the different fitting item sets and fitting coefficient vectors corresponding to the fitting item sets;
the fourth step: constructing a plurality of matrixes according to different fitting item sets and fitting coefficient vectors corresponding to the fitting item sets;
the fifth step: solving fitting coefficient vectors corresponding to the matrixes according to the obtained matrixes and a vector calculation formula, and respectively substituting the plurality of fitting coefficient vectors obtained by solving into the structure of the candidate estimation model to obtain a plurality of candidate estimation models;
and a sixth step: inputting the test time sequence set into each alternative estimation model, testing the error value of each alternative estimation model, and selecting the alternative estimation model with the minimum error value as the preferred estimation model;
the seventh step: and inputting the second-level data of the sensor into the optimal estimation model to obtain a real-time estimation output value of the sensor to the concentration of the environmental particulate matters.
The invention discloses an environment particulate matter concentration estimation method based on sensor data, which comprises the following steps:
in the first step, the training time period is recorded as [ T ] 1 ,T 2 ]And the test time period is marked as [ T ] 3 ,T 4 ];
In the second step, the PM of the raw output of the sensor 2.5 Or PM 10 The time series is marked as X (t); recording a humidity time sequence originally output by a humidity sensing module as Hum (t); the time sequence of the temperature originally output by the temperature sensing module is recorded as Tem (t); t represents a second-order time;
training time sequence set C train ={X 1 (h),Hum 1 (h),Tem 1 (h),R 1 (h) }, wherein:
X 1 (h) The method comprises the following steps All original samples of X (T) are taken out to satisfy h E [ T ] of the sequence X (h) after being merged according to the hour mean value 1 ,T 2 ]Is composed of X 1 (h);
Hum 1 (h) The method comprises the following steps All original samples of Hum (T) are taken out to satisfy h E [ T ] of a sequence Hum (h) after being merged according to an hour mean value 1 ,T 2 ]The term of (1) constitutes Hum 1 (h)。
Tem 1 (h) The method comprises the following steps All original samples of Tem (T) are taken out to satisfy h E [ T (h) of a sequence Tem (h) after being merged according to an hour mean value 1 ,T 2 ]Term of (2) constitutes Tem 1 (h);
R 1 (h):h∈[T 1 ,T 2 ]A reference calibration sequence that is a training time sequence set;
test timing set C test ={X 2 (h),Hum 2 (h),Tem 2 (h),R 2 (h) }, wherein:
X 2 (h) The method comprises the following steps All original samples of X (T) are taken out to satisfy h E [ T ] of the sequence X (h) after being merged according to the hour mean value 3 ,T 4 ]The term of (A) constitutes X 2 (h);
Hum 2 (h) The method comprises the following steps All original samples of Hum (T) are taken out to satisfy h E [ T ] of a sequence Hum (h) after being merged according to an hour mean value 3 ,T 4 ]Is composed ofTo Hum 2 (h);
Tem 2 (h) The method comprises the following steps All original samples of Tem (T) are taken out to satisfy h E [ T (h) of a sequence Tem (h) after being merged according to an hour mean value 3 ,T 4 ]Term of (2) constitutes Tem 2 (h);
R 2 (h):h∈[T 3 ,T 4 ]A reference calibration sequence for the test timing set.
The method for estimating the concentration of the environmental particles based on the sensor data comprises a third step of fitting a term set FI comprising a first-order fitting term set FI 0 Time-delay fitting term set FI d And a complementary set of fitting terms FI c A plurality of (a);
first-order fitting item set: the fitting terms that the model needs to contain at least are defined as:
FI 0 ={X(h),Hum(h),Tem(h),1};
the first-fit term set corresponds to the most basic multiple linear regression modeling:
E(h)=a*X(h)+b*Hum(h)+c*Tem(h)+d;
time-delay fitting term set FI d : in a first fitting term set FI 0 On the basis, adding the previous historical data into a fitting item set, and defining as follows:
FI d ={X(h),X(h-1)…Hum(h),Hum(h-1)…Tem(h),Tem(h-1)…1};
the regression modeling corresponding to the time delay fitting item set is as follows:
E(h)=a 0 *X(h)+b 0 *Hum(h)+b 1 *Hum(h-1)+c 0 *Tem(h)+c 1 *Tem(h-1)+d;
set of complementary fitting terms FI c : fitting a set of terms FI at delay d On the basis, the fitting effect on the details is improved by adding a quadratic term, wherein the quadratic term comprises the square of any one primary term and/or the product of two different primary terms;
the candidate model M uses the fitting item set FI and the fitting coefficient vector B = [ B ] of the candidate model M 1 ,b 2 ,b 3 …b L ] T Where L = | FI | represents the number of terms of FI, noted M FI,B (ii) a Denote FI as FI = { a = i (h) I is more than or equal to 1 and less than or equal to L, L = | FI | } and an alternative model M FI,B The structure of (1) is as follows:
Figure BDA0003868674590000051
the invention discloses a sensor data-based environmental particulate matter concentration estimation method, wherein in the fourth step:
the parameter estimation is carried out under the condition of least square fitting, a matrix is constructed according to a fitting item set FI and a fitting coefficient vector B, and an over-determined equation set is provided:
AB=R;
wherein:
Figure BDA0003868674590000061
B=[b 1 ,b 2 ...b L ] T
R=[R 1 (h),R 1 (h-1)...R 1 (h-N)] T
the values in matrix A and vector R can be derived from self-training time sequence set C train The homonymous sequence in (a) is obtained directly or by calculation.
The invention discloses a sensor data-based environmental particulate matter concentration estimation method, wherein in the fifth step:
the formula for calculating vector B is:
B=(A T A) -1 A T R;
substituting vector B into candidate model M FI,B In the structural formula (b), an alternative estimation model M under the condition of a fitting term set FI is obtained FI,B
The invention discloses a sensor data-based environmental particulate matter concentration estimation method, wherein in the sixth step:
obtaining model MFI and model B on training time sequence set Ctrace in the condition of applying fitting item set FI, and calculating training time sequence set C train Estimate output E (h), noted as:
E 1 (h)=M FI,B (X 1 (h),Hum 1 (h),Tem 1 (h));
model M FI,B In test time order set C test The estimated output of (c) is:
E 2 (h)=M FI,B (X 2 (h),Hum 2 (h),Tem 2 (h));
then model M can be defined FI,B In test time sequence set C test The error above is Err (M) FI,B ,C test ) The calculation formula is as follows:
Figure BDA0003868674590000071
wherein P is the current model M FI,B And the maximum number that the test timing set Ctest can support; w (-) is a weighting function defined as:
Figure BDA0003868674590000072
represents R 2 (h) A sequence probability density function;
if there are k candidate estimation models, the preferred estimation model is the one that minimizes the error on the test time series Ctest:
arg min Err(M FI(k),B(k) ,C test )。
the invention discloses a sensor data-based environmental particulate matter concentration estimation method, wherein in the seventh step:
and replacing the small-scale data during training and testing in the optimal estimation model with the second-scale data of the sensor to obtain the real-time estimation output of the sensor on the concentration of the environmental particulate matters:
E PM (t)=M FI,B (X(t),Hum(t),Tem(t))。
an environment particulate matter concentration estimation system based on sensor data is applied to the environment particulate matter concentration estimation method based on the sensor data, and comprises a data sampling unit, a data processing unit and a data output unit;
the data sampling unit is used for selecting a training time period and a testing time period and sampling the original environment data output by the sensor;
the data processing unit for performing the calculations as in the first to seventh steps;
and the data output unit is used for outputting the calculated data of the data processing unit.
An ambient particulate matter concentration estimation terminal based on sensor data, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the steps of the method as described above are implemented when the computer program is executed by the processor.
A computer-readable storage medium, in which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method as set forth above.
The invention has the beneficial effects that: the method comprises three links of modeling, evaluation and application, wherein the three links can be updated and called regularly after being deployed on hardware equipment; and (3) a modeling link: inputting a training time sequence set and modeling configuration parameters, and outputting an alternative estimation model; and (3) evaluation link: the input is a test time sequence set, after the test time sequence set passes through an alternative estimation model obtained in a modeling link, a test error is output, and an optimal estimation model can be output according to the alternative estimation model with the minimum test error; an application link: the method comprises the steps that an original time sequence is input, and after calculation of a preferred estimation model, an estimated time sequence is output; by the method, the measurement capability of the sensor on the concentration of the environmental particulate matter can be effectively improved after the data of the sensor is subjected to software level processing.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be further described with reference to the accompanying drawings and embodiments, wherein the drawings in the following description are only part of the embodiments of the present invention, and for those skilled in the art, other drawings can be obtained without inventive efforts according to the accompanying drawings:
FIG. 1 is a PM of some type of light scattering particulate matter sensor 2.5 And PM 10 Trend diagram (PM on upper line) 10 The lower line is PM 2.5 );
FIG. 2 is a PM in a real environment 10 A difference diagram of the official site reading and the sensor reading (the thicker lines are sensor data, and the thinner lines are official monitoring station standard data);
FIG. 3 is a flow chart of a method for estimating the concentration of particulate matter in an environment based on sensor data according to a preferred embodiment of the present invention;
FIG. 4 is a logical schematic diagram of a method for estimating the concentration of particulate matter in the environment based on sensor data according to a preferred embodiment of the present invention;
FIG. 5 is a diagram illustrating the effects of a method for estimating the concentration of particulate matter in an environment based on sensor data according to a preferred embodiment of the present invention;
FIG. 6 is a schematic block diagram of an ambient particulate matter concentration estimation system based on sensor data according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without inventive step, are within the scope of the present invention.
The method for estimating the concentration of particulate matter in the environment based on sensor data according to the preferred embodiment of the present invention, as shown in fig. 3, with reference to fig. 4 and 5, includes the following steps:
s01: selecting a training time period and a testing time period;
the training time period is noted as [ T ] 1 ,T 2 ]And the test time period is marked as [ T ] 3 ,T 4 ](ii) a The designated training time period and the testing time period may be in tandem or may be in tandemSo as to have a partial overlap;
s02: extracting original environment data output by a sensor to generate a training time sequence set and a testing time sequence set with an hour interval; training a time sequence set sample self-training time period, and testing the time sequence set sample self-testing time period;
PM of raw output of sensor 2.5 Or PM 10 The time series is marked as X (t); recording a humidity time sequence originally output by the humidity sensing module as Hum (t); the time sequence of the temperature originally output by the temperature sensing module is recorded as Tem (t); t represents a second-order time;
training time sequence set C train ={X 1 (h),Hum 1 (h),Tem 1 (h),R 1 (h) }, wherein:
X 1 (h) The method comprises the following steps All original samples of X (T) are taken out to satisfy h E [ T ] of the sequence X (h) after being merged according to the hour mean value 1 ,T 2 ]The term of (A) constitutes X 1 (h);
Hum 1 (h) The method comprises the following steps All original samples of Hum (T) are taken out to satisfy h E [ T ] of a sequence Hum (h) after being merged according to an hour mean value 1 ,T 2 ]The term of (1) constitutes Hum 1 (h)。
Tem 1 (h) The method comprises the following steps All original samples of Tem (T) are taken out to satisfy h E [ T (h) of a sequence Tem (h) after being merged according to an hour mean value 1 ,T 2 ]Term of (2) constitutes Tem 1 (h);
R 1 (h):h∈[T 1 ,T 2 ]For training the reference calibration sequence of a time-series set, for the PM that can be considered as environment-true PM 10 The time sequence of the concentration can generally adopt sensing equipment to deploy hour-level release data of an official environment monitoring station at the site;
test timing set C test ={X 2 (h),Hum 2 (h),Tem 2 (h),R 2 (h) }, wherein:
X 2 (h) The method comprises the following steps All original samples of X (T) are taken out to satisfy h E [ T ] of the sequence X (h) after being merged according to the hour mean value 3 ,T 4 ]The term of (A) constitutes X 2 (h);
Hum 2 (h) The method comprises the following steps Hum (t) all original samples are taken out according to a sequence Hum (h) after being merged according to an hour mean valueSatisfies h epsilon [ T ∈ 3 ,T 4 ]The term of (1) constitutes Hum 2 (h);
Tem 2 (h) The method comprises the following steps All original samples of Tem (T) are taken out to satisfy h E [ T (h) of a sequence Tem (h) after being merged according to an hour mean value 3 ,T 4 ]Term of (2) constitutes Tem 2 (h);
R 2 (h):h∈[T 3 ,T 4 ]For testing reference calibration sequences of time-series sets, for PM that can be considered as environment-true PM 10 The time sequence of the concentration can generally adopt sensing equipment to deploy hour-level release data of an official environment monitoring station at the site;
s03: setting modeling configuration parameters, and determining a regression modeling structure:
selecting a plurality of different set fitting item sets, inputting training time sequence set data into a composition formula corresponding to the selected fitting item set to obtain a plurality of different fitting item sets, and obtaining a plurality of alternative estimation model structures according to the different fitting item sets and fitting coefficient vectors corresponding to the fitting item sets;
the fitting term set FI comprises a first-order fitting term set FI 0 Time-delay fitting term set FI d And a complementary set of fitting terms FI c A plurality of (a); it should be noted that there may be other forms besides these three sets of fitting terms;
first-order fitting item set: the fitting term that the model needs to contain at least is defined as:
FI 0 ={X(h),Hum(h),Tem(h),1};
the first-fit term set corresponds to the most basic multiple linear regression modeling:
E(h)=a*X(h)+b*Hum(h)+c*Tem(h)+d;
time-delay fitting term set FI d : in a first fitting term set FI 0 On the basis, the previous historical data is added into a fitting item set, mainly for embodying the gain of the 'change process' on the estimation performance, and is defined as:
FI d ={X(h),X(h-1)…Hum(h),Hum(h-1)…Tem(h),Tem(h-1)…1};
the regression modeling corresponding to the time delay fitting item set is as follows:
E(h)=a 0 *X(h)+b 0 *Hum(h)+b 1 *Hum(h-1)+c 0 *Tem(h)+c 1 *Tem(h-1)+d;
set of complementary fitting terms FI c : fitting a set of terms FI at delay d On the basis, the fitting effect on the details is improved by adding a quadratic term, wherein the quadratic term comprises the square of any one primary term and/or the product of two different primary terms;
for example:
when FIc = { X (h), hum (h-1) Tem (h), hum2 (h-1), 1}, it is stated that modeling requires the use of the concentration, humidity at the current time, humidity at the previous time and its quadratic term, and the product of the temperature at the current time and the humidity at the previous time. At this time, the regression modeling corresponding to the time delay fitting term set is as follows:
E(h)=a0*X(h)+b0*Hum(h)+c0*Hum(h-1)*Tem(h)+d0*Hum2(h-1)+e0;
from the above definition, F I0 Is contained in FI d ,FI d Is contained in FI c The three are collectively referred to herein as a set of fitting terms FI; in the framework of multiple linear regression, a candidate model M uses its fitting term set FI and fitting coefficient vector B = [ B ] 1 ,b 2 ,b 3 …b L ] T Where L = | FI | represents the number of terms of FI, noted M FI,B (ii) a Denote FI as FI = { a = i (h) I is more than or equal to 1 and less than or equal to L, L = | FI | } and an alternative model M FI,B The structure of (1) is as follows:
Figure BDA0003868674590000121
s04: constructing a plurality of matrixes according to different fitting item sets and fitting coefficient vectors corresponding to the fitting item sets;
the parameter estimation is carried out under the condition of least square fitting, a matrix is constructed according to a fitting item set FI and a fitting coefficient vector B, and an over-determined equation set is provided:
AB=R;
wherein:
Figure BDA0003868674590000122
B=[b 1 ,b 2 ...b L ] T
R=[R 1 (h),R 1 (h-1)...R 1 (h-N)] T
the values in matrix A and vector R can be derived from self-training time sequence set C train The homonymous sequence in (1) is directly obtained or obtained by calculation; the size of the matrix A is (N + 1). Times.L, typically N>>L and N are selected as large as possible, but the data required in A and R can be ensured to be in training time sequence set C train Is found out;
s05: solving fitting coefficient vectors corresponding to the matrixes according to the obtained matrixes and a vector calculation formula, and respectively substituting the plurality of fitting coefficient vectors obtained by solving into the structure of the candidate estimation model to obtain a plurality of candidate estimation models;
the formula for calculating vector B is:
B=(A T A) -1 A T R;
substituting vector B into candidate model M FI,B In the structural formula (b), an alternative estimation model M under the condition of a fitting term set FI is obtained FI,B
S06: inputting the test time sequence set into each alternative estimation model, testing the error value of each alternative estimation model, and selecting the alternative estimation model with the minimum error value as the optimal estimation model;
under the condition of applying a fitting item set FI, the elements in the matrix A can be all according to X for the training time sequence set 1 (h),Hum 1 (h),Tem 1 (h) Calculated, so that a fitting item set FI can be applied to a training time sequence set C train Get model M FI,B Computing a training time sequence set C train Estimate output E (h), noted as:
E 1 (h)=M FI,B (X 1 (h),Hum 1 (h),Tem 1 (h));
model M FI,B In test time order set C test The estimated output of (c) is:
E 2 (h)=M FI,B (X 2 (h),Hum 2 (h),Tem 2 (h));
then model M can be defined FI,B In test time order set C test The error above is Err (M) FI,B ,C test ) The calculation formula is as follows:
Figure BDA0003868674590000131
wherein P is the current model M FI,B And test time ordered set C test The maximum number that can be supported; w (-) is a weighting function, with R 2 (h) The probability distribution of the sequence is related, and the more reference calibration value errors occur, the smaller the expected error can be, which is defined as:
Figure BDA0003868674590000132
represents R 2 (h) A sequence probability density function;
if there are k candidate estimation models, the preferred estimation model is the one that minimizes the error on the test time series Ctest:
arg min Err(M FI(k),B(k) ,C test );
s07: inputting the second-level data of the sensor into the optimal estimation model to obtain a real-time estimation output value of the sensor to the concentration of the environmental particulate matters;
and replacing the hour-level data during training and testing in the optimal estimation model with the second-level data of the sensor to obtain the real-time estimation output of the sensor on the environmental particulate matter concentration:
E PM (t)=M FI,B (X(t),Hum(t),Tem(t));
the method comprises three links of modeling, evaluation and application, wherein the three links can be updated and called regularly after being deployed on hardware equipment; and (3) modeling: inputting a training time sequence set and modeling configuration parameters, and outputting an alternative estimation model; and (3) evaluation link: the input is a test time sequence set, after the test time sequence set passes through an alternative estimation model obtained in a modeling link, a test error is output, and an optimal estimation model can be output according to the alternative estimation model with the minimum test error; an application link: the method comprises the steps that an original time sequence is input, and after calculation of a preferred estimation model, an estimated time sequence is output; by the method, the measurement capability of the sensor on the concentration of the environmental particulate matter can be effectively improved after the data of the sensor is subjected to software level processing.
[ design idea ]
(1) Aiming at individual difference of hardware, the method is based on providing independent regression models and parameters for each set of sensing terminal, and the regression calibration function can be realized on the terminal equipment and the software platform server side.
(2) Aiming at the problem that calibration data is provided by lack of standard metering equipment, the designed method tries to acquire local coarse-grained reference data as required data for calibration modeling under the actual working condition of hardware equipment.
(3) Aiming at the sample representativeness deficiency in the regression modeling process in limited time, the method designed by the inventor can be updated and called, and the regression model and the parameters thereof can be updated at fixed time, so that the method effectively utilizes the limited sample data and improves the adaptability under different working conditions.
[ conditions of Equipment ]
Hardware conditions: carry on particulate matter sensor's terminal equipment, measurable quantity PM 2.5 And PM 10 Concentration, and simultaneously integrates a temperature sensor and a humidity sensor; the terminal equipment supports a wireless data transmission module; the terminal device supports a local data entry module.
Software conditions are as follows: the method can be realized at both an equipment end and a server end; when the device side is realized, the terminal device carrying the particulate matter sensor needs to have embedded programming conditions and support remote program updating; and when the software deployment condition is realized at the server side, the software deployment condition is only required to be met.
[ examples ] A
The method comprises the following steps: specifying a training period [2021-09-01 00,2021-09-30 ] and a test period [2021-10-01 00;
step two: and extracting and generating a training time sequence set and a testing time sequence set, and after the training time sequence set and the testing time sequence set are arranged into an hour average value, 720 samples exist in each time sequence in the training time sequence set, and 240 samples exist in each time sequence in the testing time sequence set.
C train ={X 1 (1)…X 1 (720);Hum 1 (1)…Hum 1 (720);Tem 1 (1)…Tem 1 (720);R 1 (1)…R 1 (720)}
C test ={X 2 (1)…X 2 (240);Hum 2 (1)…Hum 2 (240);Tem 2 (1)…Tem 2 (240);R 2 (1)…R 2 (240)}
Step three: setting modeling configuration parameters and determining a model structure.
Here, two sets of fitting terms are set, corresponding to the two models
FI 1 ={X(h),Hum(h),Hum(h-1),Tem(h),1}
FI 2 ={X(h),Hum(h),Hum(h-1)Tem(h),1}
The corresponding model structure is
FI 1 :E(h)=b 11 *X(h)+b 12 *Hum(h)+b 13 *Hum(h-1)+b 14 *Tem(h)+b 15
FI 2 :E(h)=b 21 *X(h)+b 22 *Hum(h)+b 23 *Hum(h-1)*Tem(h)+b 24 *Tem(h)+b 25
Step four: construction matrix
For the
Figure BDA0003868674590000161
Figure BDA0003868674590000162
For
Figure BDA0003868674590000163
Figure BDA0003868674590000164
Step five: obtaining alternative estimation models
The formula is calculated according to the formula:
B 1 =(A 1 T A 1 ) -1 A 1 T R 1 B 2 =(A 2 T A 2 ) -1 A 2 T R 2
step six: model evaluation
Using test sets
For the
Figure BDA0003868674590000165
Figure BDA0003868674590000166
For the
Figure BDA0003868674590000167
Figure BDA0003868674590000168
Calculating error
Figure BDA0003868674590000169
Figure BDA0003868674590000171
If it is
Figure BDA0003868674590000172
Selecting a model
Figure BDA0003868674590000173
As a preferred estimation model, otherwise the model is selected
Figure BDA0003868674590000174
As a preferred estimation model
Step seven: application model
If model
Figure BDA0003868674590000175
As a preferred estimation model, then the sensor is to the ambient PM 10 The estimated output of the concentration is:
E PM (t)=b 11 *X(t)+b 12 *Hum(t)+b 13 *Hum(t-1)+b 14 *Tem(t)+b 15
the case effect is shown in figure 5. The proposed method enables sensor estimated ambient PM, especially at sudden or high relative humidity 10 The concentration is closer to the measurement trend of a local standard monitoring station;
an environment particulate matter concentration estimation system based on sensor data is applied to the above environment particulate matter concentration estimation method based on sensor data, as shown in fig. 6, and comprises a data sampling unit 1, a data processing unit 2 and a data output unit 3;
the data sampling unit 1 is used for selecting a training time period and a testing time period and sampling original environment data output by a sensor;
a data processing unit 2 for performing the calculations as in the first to seventh steps;
a data output unit 3 for outputting the calculation data of the data processing unit;
by means of the method, the measuring capacity of the sensor for the concentration of the environmental particulate matter can be effectively improved after the data of the sensor is subjected to software level processing.
An ambient particulate matter concentration estimation terminal based on sensor data comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method when executing the computer program.
A computer-readable storage medium, in which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method as set forth above.
The computer readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program code, a recording medium, such as a ROM/RAM, a magnetic disk, an optical disk, a flash memory, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A method of estimating ambient particulate matter concentration based on sensor data, comprising the steps of:
the first step is as follows: selecting a training time period and a testing time period;
the second step is that: extracting original environment data output by a sensor to generate a training time sequence set and a testing time sequence set which take hours as time intervals; training a time sequence set sample to be in a self-training time period, and testing the time sequence set sample to be in a self-testing time period;
the third step: setting modeling configuration parameters, and determining a regression modeling structure:
selecting a plurality of different set fitting item sets, inputting training time sequence set data into a composition formula corresponding to the selected fitting item set to obtain a plurality of different fitting item sets, and obtaining a plurality of alternative estimation model structures according to the different fitting item sets and fitting coefficient vectors corresponding to the fitting item sets;
the fourth step: constructing a plurality of matrixes according to different fitting item sets and fitting coefficient vectors corresponding to the fitting item sets;
the fifth step: solving fitting coefficient vectors corresponding to the matrixes according to the obtained matrixes and a vector calculation formula, and respectively substituting the plurality of fitting coefficient vectors obtained by solving into the structure of the candidate estimation model to obtain a plurality of candidate estimation models;
and a sixth step: inputting the test time sequence set into each alternative estimation model, testing the error value of each alternative estimation model, and selecting the alternative estimation model with the minimum error value as the preferred estimation model;
the seventh step: and inputting the second-level data of the sensor into the optimal estimation model to obtain a real-time estimation output value of the sensor to the concentration of the environmental particulate matters.
2. The sensor-data-based ambient particulate matter concentration estimation method of claim 1, wherein:
in the first step, the training time period is recorded as [ T ] 1 ,T 2 ]And the test time period is marked as [ T ] 3 ,T 4 ];
In the second step, the PM of the raw output of the sensor 2.5 Or PM 10 The time series is marked as X (t); recording a humidity time sequence originally output by the humidity sensing module as Hum (t); the time sequence of the temperature originally output by the temperature sensing module is recorded as Tem (t); t represents a second-order time;
training time sequence set C train ={X 1 (h),Hum 1 (h),Tem 1 (h),R 1 (h) }, in which:
X 1 (h) The method comprises the following steps All original samples of X (T) are taken out to satisfy h E [ T ] of the sequence X (h) after being merged according to the hour mean value 1 ,T 2 ]The term of (A) constitutes X 1 (h);
Hum 1 (h) The method comprises the following steps All original samples of Hum (T) are taken out to satisfy h E [ T ] of a sequence Hum (h) after being merged according to an hour mean value 1 ,T 2 ]The term of (1) constitutes Hum 1 (h)。
Tem 1 (h) The method comprises the following steps All original samples of the Tem (T) are taken out to satisfy h E [ T (h) after the sequence Tem (h) is merged according to the hour mean value 1 ,T 2 ]Term of (2) constitutes Tem 1 (h);
R 1 (h):h∈[T 1 ,T 2 ]A reference calibration sequence that is a training time sequence set;
test timing set C test ={X 2 (h),Hum 2 (h),Tem 2 (h),R 2 (h) }, wherein:
X 2 (h) The method comprises the following steps All original samples of X (T) are taken out to satisfy h E [ T ] of the sequence X (h) after being merged according to the hour mean value 3 ,T 4 ]Is composed of X 2 (h);
Hum 2 (h) The method comprises the following steps All original samples of Hum (T) are taken out to satisfy h E [ T ] of a sequence Hum (h) after being merged according to an hour mean value 3 ,T 4 ]The term of (1) constitutes Hum 2 (h);
Tem 2 (h) The method comprises the following steps All original samples of Tem (T) are taken out to satisfy h E [ T (h) of a sequence Tem (h) after being merged according to an hour mean value 3 ,T 4 ]Term of (2) constitutes Tem 2 (h);
R 2 (h):h∈[T 3 ,T 4 ]A reference calibration sequence for the test timing set.
3. The sensor-data-based method for estimating an ambient particle concentration according to claim 2, wherein in the third step, the fitting term set FI includes a first-order fitting term set FI 0 Time-delay fitting item set FI d And a complementary set of fitting terms FI c A plurality of (a);
first-order fitting item set: the fitting term that the model needs to contain at least is defined as:
FI 0 ={X(h),Hum(h),Tem(h),1};
the first-fit term set corresponds to the most basic multiple linear regression modeling:
E(h)=a*X(h)+b*Hum(h)+c*Tem(h)+d;
time-delay fitting term set FI d : in a first fitting term set FI 0 On the basis, adding the previous historical data into a fitting item set, and defining as follows:
FI d ={X(h),X(h-1)…Hum(h),Hum(h-1)…Tem(h),Tem(h-1)…1};
the regression modeling corresponding to the time delay fitting item set is as follows:
E(h)=a 0 *X(h)+b 0 *Hum(h)+b 1 *Hum(h-1)+c 0 *Tem(h)+c 1 *Tem(h-1)+d;
supplementary fitting term set FI c : in a time-delay fitting term set FI d On the basis, the fitting effect on the details is improved by adding a quadratic term, wherein the quadratic term comprises the square of any one primary term and/or the product of two different primary terms;
the candidate model M uses the fitting item set FI and the fitting coefficient vector B = [ B ] of the candidate model M 1 ,b 2 ,b 3 …b L ] T Where L = | FI | represents the number of terms of FI, noted M FI,B (ii) a Denote FI as FI = { a = i (h) I is more than or equal to 1 and less than or equal to L, L = | FI | } and an alternative model M FI,B The structure of (1) is as follows:
Figure FDA0003868674580000031
4. the sensor-data-based ambient particulate matter concentration estimation method according to claim 3, characterized in that in the fourth step:
the parameter estimation is carried out under the condition of least square fitting, a matrix is constructed according to a fitting item set FI and a fitting coefficient vector B, and an over-determined equation set is provided:
AB=R;
wherein:
Figure FDA0003868674580000041
B=[b 1 ,b 2 ...b L ] T
R=[R 1 (h),R 1 (h-1)...R 1 (h-N)] T
the values in matrix A and vector R can be derived from self-training time sequence set C train The homonymous sequence in (a) is obtained directly or by calculation.
5. The sensor-data-based ambient particulate matter concentration estimation method according to claim 4, characterized in that in the fifth step:
the formula for calculating vector B is:
B=(A T A) -1 A T R;
substituting vector B into candidate model M FI,B In the structural formula (b), an alternative estimation model M under the condition of a fitting term set FI is obtained FI,B
6. The sensor-data-based ambient particulate matter concentration estimation method according to claim 5, characterized in that in the sixth step:
obtaining model MFI and model B on training time sequence set Ctrace in the condition of applying fitting item set FI, and calculating training time sequence set C train Estimate output E (h), noted as:
E 1 (h)=M FI,B (X 1 (h),Hum 1 (h),Tem 1 (h));
model M FI,B In test time sequence set C test The estimated output of (d) is:
E 2 (h)=M FI,B (X 2 (h),Hum 2 (h),Tem 2 (h));
then model M can be defined FI,B In test time order set C test The error above is Err (M) FI,B ,C test ) The calculation formula is as follows:
Figure FDA0003868674580000051
wherein P is the current model M FI,B And the maximum number that the test timing set Ctest can support; w (-) is a weighting function defined as:
Figure FDA0003868674580000052
Figure FDA0003868674580000053
represents R 2 (h) A sequence probability density function;
if there are k candidate estimation models, the preferred estimation model is the one that minimizes the error on the test time series Ctest:
arg min Err(M FI(k),B(k) ,C test )。
7. the sensor-data-based ambient particulate matter concentration estimation method according to claim 6, characterized in that in the seventh step:
and replacing the small-scale data during training and testing in the optimal estimation model with the second-scale data of the sensor to obtain the real-time estimation output of the sensor on the concentration of the environmental particulate matters:
E PM (t)=M FI,B (X(t),Hum(t),Tem(t))。
8. an environmental particulate matter concentration estimation system based on sensor data, which is applied to the environmental particulate matter concentration estimation method based on sensor data according to any one of claims 1 to 7, and is characterized by comprising a data sampling unit, a data processing unit and a data output unit;
the data sampling unit is used for selecting a training time period and a testing time period and sampling the original environment data output by the sensor;
the data processing unit for performing the calculations as in the first to seventh steps;
and the data output unit is used for outputting the calculated data of the data processing unit.
9. An ambient particulate matter concentration estimation terminal based on sensor data, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor realizes the steps of the method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a method according to any one of claims 1 to 7.
CN202211189210.0A 2022-09-28 2022-09-28 Environmental particulate matter concentration estimation method and system based on sensor data and terminal Pending CN115618190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211189210.0A CN115618190A (en) 2022-09-28 2022-09-28 Environmental particulate matter concentration estimation method and system based on sensor data and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211189210.0A CN115618190A (en) 2022-09-28 2022-09-28 Environmental particulate matter concentration estimation method and system based on sensor data and terminal

Publications (1)

Publication Number Publication Date
CN115618190A true CN115618190A (en) 2023-01-17

Family

ID=84861671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211189210.0A Pending CN115618190A (en) 2022-09-28 2022-09-28 Environmental particulate matter concentration estimation method and system based on sensor data and terminal

Country Status (1)

Country Link
CN (1) CN115618190A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115792067A (en) * 2023-02-07 2023-03-14 河北对外经贸职业学院 Computer data analysis method based on toxicity detection of industrial bactericide

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115792067A (en) * 2023-02-07 2023-03-14 河北对外经贸职业学院 Computer data analysis method based on toxicity detection of industrial bactericide
CN115792067B (en) * 2023-02-07 2023-04-14 河北对外经贸职业学院 Computer data analysis method based on toxicity detection of industrial bactericide

Similar Documents

Publication Publication Date Title
CN116879297B (en) Soil moisture collaborative inversion method, device, equipment and medium
CN113901384A (en) Ground PM2.5 concentration modeling method considering global spatial autocorrelation and local heterogeneity
CN115618190A (en) Environmental particulate matter concentration estimation method and system based on sensor data and terminal
KR20200003664A (en) Apparatus and method for generating correction logic of air quality data
CN117540174B (en) Building structure multi-source heterogeneous data intelligent analysis system and method based on neural network
CN109191408B (en) Rapid circulation ground weather fusion method and device and server
CN111582387A (en) Rock spectral feature fusion classification method and system
WO2020255305A1 (en) Prediction model re-learning device, prediction model re-learning method, and program recording medium
CN110553631A (en) water level measurement series error analysis method about water level flow relation
CN110516890B (en) Crop yield monitoring system based on gray combined model
Dong et al. Prognostics 102: efficient Bayesian-based prognostics algorithm in Matlab
Errico et al. Some general and fundamental requirements for designing observing system simulation experiments (OSSEs)
CN111879709A (en) Method and device for detecting spectral reflectivity of lake water body
CN115855133A (en) Calibration method and device of sensor, computer equipment and readable storage medium
US20220188307A1 (en) Data analysis apparatus, method and system
Rischard et al. Bias correction in daily maximum and minimum temperature measurements through Gaussian process modeling
CN114624791A (en) Rainfall measurement method and device, computer equipment and storage medium
Fleming et al. Sensitivity of a white‐tailed deer habitat‐suitability index model to error in satellite land‐cover data: implications for wildlife habitat‐suitability studies
CN113191536A (en) Near-ground environment element prediction model training and prediction method based on machine learning
RU2714039C1 (en) Smart sensor development system
CN111123406A (en) Handheld meteorological instrument temperature data fitting method
Wadoux et al. Optimization of rain gauge sampling density for river discharge prediction using Bayesian calibration
CN116879121B (en) Air particulate matter concentration real-time monitoring system based on optical fiber sensing technology
CN115183805B (en) Instrument automatic metrological verification method and system based on artificial intelligence
CN117648537B (en) Atmospheric pollution real-time monitoring method and system based on hyperspectral technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination