AU2018100221A4 - A correction method based on linear regression algorithm for PM2.5 sensors - Google Patents

A correction method based on linear regression algorithm for PM2.5 sensors Download PDF

Info

Publication number
AU2018100221A4
AU2018100221A4 AU2018100221A AU2018100221A AU2018100221A4 AU 2018100221 A4 AU2018100221 A4 AU 2018100221A4 AU 2018100221 A AU2018100221 A AU 2018100221A AU 2018100221 A AU2018100221 A AU 2018100221A AU 2018100221 A4 AU2018100221 A4 AU 2018100221A4
Authority
AU
Australia
Prior art keywords
data
value
sensor
linear regression
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2018100221A
Inventor
Yinan Feng
Taifu Li
Yifu Qiao
Weiyi Shi
Hao Wu
Ziying Zhou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhou Ziying Miss
Original Assignee
Zhou Ziying Miss
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhou Ziying Miss filed Critical Zhou Ziying Miss
Priority to AU2018100221A priority Critical patent/AU2018100221A4/en
Application granted granted Critical
Publication of AU2018100221A4 publication Critical patent/AU2018100221A4/en
Assigned to WU, HAO, ZHOU, ZIYING, Li, Taifu, Qiao, Yifu, Shi, Weiyi, Feng, Yinan reassignment WU, HAO Request to Amend Deed and Register Assignors: Feng, Weinan, Li, Taifu, Qiao, Yifu, Shi, Weiyi, WU, HAO, ZHOU, ZIYING
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

With the increasing booming population, use of transportation vehicles and the establishment of factories, the society is moving forward at an unprecedented rate. These advancement and innovation have greatly improved the lives of those who live in the 21" century, while many serious environmental and health issues emerged, one of which is PM2.5 or also known as atmospheric particulate matter with a diameter of 2.5 micrometers. This article present an invention that relates to a method for correcting PM 2.5 sensors based on linear regression algorithm. Based on accurate data received from PM 2.5 monitoring station near the proximity of the PM 2.5 sensor, the method uses linear regression algorithm to correct the PM 2.5 output value of the sensor, making the output value of the sensor consistent with the accurate value of PM 2.5 from the monitoring station, and therefore achieving the purpose of making the PM 2.5 sensor more accurate and reliable. This method is simple and has high precision and strong practicability.

Description

DESCRIPTION
Title
A correction method based on linear regression algorithm for PM2.5 sensors FIELD OF THE INVENTION
This invention belongs to the field of detection equipment technology, and especially, it is a method of correcting and adjusting PM2.5 sensors using linear regression algorithm. BACKGROUND OF THE INVENTION
With the spread of industrialization, one problem that we are always obsessed by is the increment of haze. Peculiarly, in the north of China, many citizens have suffered from the hazards brought by haze. According to the analysis of China meteorological administration, an area of over 101 million square kilometers in the north of China, the Yellow river-huai river valley and Changjiang-huaihe basin were covered by haze in December, 2016, and 23 cities including Beijing and Tianjin started Red alert to this situation. What’s more, nearly 50 expressways had been suspended. Fortunately, Chinese government has implemented the regulation that PM2.5 which is the main cause of haze should be included in daily air quality monitoring.
This also brings another problem to the public. Mostly, due to the paucity of accurate and precise devices, ordinary people can hardly pinpoint the amount of PM2.5 with general detection equipment. Humidity, temperature, sunlight and many other factors can affect the result of the detection. For these reasons, it is highly necessary to apply a method to correct these gears to acquire a relatively precise index. One way to finish this job is using the data collected at nearby monitoring station as true index. This patent utilizes equation of linear regression to set a connection between the data from detection equipment and data of monitoring station. Hence, with the help of Linear Regression, we can correct the following data.
Compared to other methods, Linear Regression is not only convenient but also valid. Though it is a speculation, once we have enough data, it can foretell the index accurately as well. Data from monitoring station is open to the public and it is convenient for everyone to acquire huge amount of data to foresee the true index.
SUMMARY OF THE INVENTION
We start with pretreatment to the data of PM2.5 collected by the sensor and the real data of PM2.5 collected by the monitoring station. We suppose that a sample of data from i -th PM2.5 sensor. The sample is represented asx(i). For any
n is the data amount of the sample, m is the number of features and we assume that
Besides, we construct a linear-regression model h
, is the parameter of i -th feature, x, is the value of i -th feature andh (x)is the value predicted by the linear-regression algorithm. We also construct a cost function that is
. The aim of the cost function is to minimize the value of J( )and then find the value of . yu) is the value of the real data of the i-th sample.
By gradient descent, we get a function
, where is the step size. From this function, we can calculate
. We can get the value of
After that, we let the
and
. Through the function
, we can get the value of
Using a new sample x<new), we could calculate the predication h (x("m}) of this sample. The prediction is the shift value of the PM2.5 sensor, and we can get the correct measured value of the PM2.5 sensor.
DESCRIPTION OF DRAWINGS
Fig. 1 Original data
Fig. 2 The monitoring station data at Wanliu, Beijing Fig. 3 The flow chart of data pre-processing Fig.4 Processed data
Fig. 5 The flow chart of gradient descent method Fig. 6 Correction for the sensor data
DESCRIPTION OF PREFERRED EMBODIMENT
The invention patent of the haze sensor was placed in Wanliu, Beijing. Its sampling period is 15 seconds and the original data is shown in Fig. 1. Real haze data is collected from Wanliu, Beijing. Wanliu monitoring station and is shown in Fig. 2. Concrete implementation steps are as follows:
Stepl: The data pre-processing stage, as shown in Fig. 3, can be divided into three steps. In step 1.1 and 1.2, we process the PM2.5 sensor data and Wanliu station data, and then in step 1.3 we integrate the processed data. Specific steps are as follows:
Step 1.1 This step is called the PM2.5 sensor data processing. The sensor data is shown in Fig.l. It processes the data of October 20 to 31 collected by the sensor and gets the hourly haze average value of October 20 to October 31. Specific steps are as follows:
Define function ReadData(month, day, min, max), in which month, day, min and max represent the month, day, hour upper limit and hour lower limit when the data was collected. Set the sum of PM25 PM25_sum=0, counter num=0
Open the original sensor data m3_y201710.txt which is stored in Data/Original_SampleData, the data form is shown in figure 1.
Store the original data in different files according to its months and dates. For every line of the original data, open or cerate the corresponding files under the catalog /Data/PM25 in a form of month_day.
In order to figure out the everyday hourly haze value and store the data in the corresponding file month_day, we split every line in the data according to the space character, take the second term and get the string of year/month/date. Then we split the string according to the 7’, take the third term and use int function to get value of the data. Judging the data, if the date value equals to the value of day and the value of hours is greater than or equal to min, less than max, gets into the next step. If the value of the data is not 0, add the value to PM25 sum, add 1 to num. If the data is not within the set range, there are two situations. In the first case, num is not equal to 0, it shows that the data has been processed. Work out the average value of PM25 PM25_sum/num and write it into the file month day. Break the loop.
In the second case, num is equal to 0, it shows that the data of this period is missing or no eligible data has been read. Continue the loop. The code is shown below. def ReadData(month, day, min, max): PM25 sum = 0 num = 0
Original data = open("../Data/Original SampleData/m3_y201710.txt") for line in Original data:
else:
else: continue
Step 1.2 This step is called the Wanliu station data processing. The data in Wanliu station refers to the inaccurate data. The data is stored in csv fdes. From Fig2 we have noticed that there are several lines about PM 10 or AQI, and there is also information in the areas other than Wanliu.
Therefore, the main task of Step 1.2 is to collect the information of only PM2.5 from Oct 21st to Oct 31st, which is called, screening. Specific steps are as follows.
We defined a function called Wanliupm25, the parameters of which are month, dmin and dmax. Dmin and dmax refer to the earliest and the latest date, and are not included in the date in which we want to collect the data. In our case, we use Wanliupm25( 10,20,32), meaning we want to operate the files from Oct 21st to Oct 31st.
According to the date we have set, the appropriate csv files are opened one by one in a loop. The first line of the csv file is the header, including the name of the locations. By using the function of islice in the itertools library, we have cut the header of the charts, making the loop start from the second line of the chart.
We have created another loop which scans all the lines in the file. Using the split function, we are able to ignore the comma character, and collect the rest elements in the line as a list called mm.
The third column of the file is the type of pollution, and the tenth column stores the information in Wanliu area. If mm[2], the third element of a certain line, equals to “PM2.5” and mm[9], the tenth element of the line, is not empty, it means we have found the target information. We write mm[9] and mm[l], the time of the day, into a newly created file according to the date. Therefore, there will be 11 files, referring to the 11 different dates.
The code is shown below. from itertools import islice def Wanliupm25(month,dmin,dmax): forj in range(dmin,dmax): original_file = open(r'Data\Wanliu\beijing_all_201710' + str(j) + '.csv','r',encoding='UTF-8') f = open('Data/WanliuPM25/'+str(month)+'J+str(j),'a') for line in islice(original_file,l,None): mm = line.split(V) for i in range(25): if(int(mm[l]) == i and mm[2] == 'PM2.5' and mm[9] != "): f.write(mm[ 1 ]+'\t'+mm[9]+'\n') if_name_== '_main_
Wanliupm25( 10,20,32)
Step 1.3 This step is called the combination of the data. Through step 1.1 and step 1.2, we have got 11 files of original data and 11 files of Wanliu station data. The files are divided according to the date. However, when operating the data, the input information is made up of two files: the training set and the testing set. The main task of this step is to create a combination of the two groups of the data.
We collected the original and station data from Oct 21st to Oct 30th as the training set, and the data from the date of Oct 31st serve as the testing set. There are two newly created files called “com” and “test” to respectively store the training data and the testing data, by using alternative statement.
In the loop which scans the whole 11 days, for each day, we open the appropriate files in original and Wanliu folder. We set two empty list called t and m, to store the information of the two files. During the process, we use the split function to ignore the space character and make the rest elements a list. We use the function of append to add new element, in this case, the time and the numerical value of the PM2.5, into a list.
After storing the list t and m, we create a nested loop which scans both the elements in list t and those in list m. The parameter i and j respectively range from the number of lines in list t and list m. When m[j].split('\f)[0], the time in original data, equals to t[i].split('\t')[0], the time in Wanliu data, we write them into the appropriate fde. According to the date, we choose to write them in the training set, or the testing set.
The form of a signal line when we write information into the file is: the number “1” followed by the number in Wanliu data, then the original data. The three numbers are divided by space. It is shown in Fig.4.
The code is shown below. fcom = open(r'Data\COMData\com','a') ftest = open(r'Data\COMData\test','a') for filename in range(20,32): if filename != 31: sampledata = open(r'Data\PM25\PM2510 '+str(filename)) wanliudata = open(r'Data\WanliuPM25\10 '+str(filename)) t = [] m=[] for sampleline in sampledata: samplem = sampleline.split('\t') t.append(sample_m[0]+'\f+sampie_m[l]) for wanliuline in wanliudata: wanliu m = wanliuline. split('Yt') m.append(wanliu_m[0]+'\t'+wanliu_m[ 1 ]) for i in range(len(t)): forj in range(len(m)): if(m[j].split('\t')[0] == t[i].spiit('\t')[0]): f_com.write(T+'\t'+t[i].split('\t')[l].strip('\n')+'\t'+m[i].split('\f)[l]) else: sampledata = open(r' \Data\PM25\PM2510 _'+str(filename)) wanliudata = open(r'Data\WanliuPM25\10 '+str(filename)) t = [] m = [] for sampleline in sampledata: sample_m = sampleline.split('\t') t.append(sampie_m[0]+'\t'+sample_m[l]) for wanliuline in wanliudata: wanliu_m = wanliuline. split('Yt') m.append(waniiu_m[0]+'\t'+wanliu_m[ 1 ]) for i in range(len(t)): forj in range(len(m)): if (m[j].split('\t')[0] == t[i].split('\t')[0]): f_test.write(T + '\t' + t[i].split('\f)[l].strip('\n')+'\t'+ m[i].split('\t')[l])
Step 2 Use preceding gradient descent method to figure out parameters of linear regression model. Specific steps are as follows:
Step 2.1 Read the data, define function loadDataSet(fileName):
Split the first line of the file with '\t', get the number of the elements. Set numFeat as the number of the elements minus 1 to get the number of fields.
Set three empty matrix dataMat, labelMat and lineArr. Split every single line with '\t' , CurLine are the elements of every single line. Append every line’s first to Flo.numFeat-λ element’s value to lineArr. Then append lineArr to dataMat. Datamat contains the haze data collected from the haze sensor.
Append the value of the last element of each line float(curLine[-l]) to labelMat then we get a matrix which contains the haze data form wanliu haze monitoring station. The code is shown below: def loadDataSet(fileName): #general function to parse tab -delimited floats numFeat = len(open(flleName).readline().split('\t')) - 1 #get number of fields print(numFeat) dataMat = []; labelMat = [] fr = open(flleName) for line in fr.readlines(): lineArr =[] curLine = line.strip().split('\t') for i in range(numFeat): lineArr.append(float(curLine[i])) dataMat.append(lineArr) labelMat.append(float(curLine[-l])) return dataMat,labelMat
Step2.2 Use gradient descent method to solve the model parameters, as is shown in figure 5. Define the gradient descent function gradAscent(dataMatIn, classLabels):
DataMatln and classLabels are the dataMat and labelMat which is obtained from the loadDataSetijlleName) respectively.
Set DataMatrix and labelMat as dataMatln matrix and the transpose of classLabels respectively. Create nX 1 all 1 matrix weights, n is the number of columns in the dataMatrix. Execute the loop for MaxCycles times:
Calculate the predicted value of haze h=linearFunc(dataMatrix,weights)= dataMatrix X weights. Calculate the error error=labelMat-h. Update the parameter weights=weights+ alpha
XdataMatrix Xerror, in which, alpha is the learning rate. The code is shown below: def gradAscent(dataMatIn, classLabels): dataMatrix = mat(dataMatln) labelMat = mat(classLabels).transpose() m, n = shape(dataMatrix) weights = ones((n, 1)) print(weights) print(shape(dataMatrix)) for k in range(maxCycles): h = linearFunc(dataMatrix, weights) error = (labelMat - h) weights = weights + alpha * dataMatrix.transpose() * error return weights
Step3 For a new set of sensor data, firstly we use Step 1.1 to process the data, then we can call our linear regression model and easily get the corrected value of this measuring value. That is to say the haze sensor is corrected. Step 3 can be shown in Fig. 6.

Claims (2)

1. A correction method based on linear regression algorithm for PM2.5 sensors, including following steps: that a sample of data from i -th PM2.5 sensor. The sample is represented as
n is the data amount of the sample, m is the number of features and we assume that
2. The method according claim 1, we construct a linear-regression model
,· is the parameter of i -th feature, xt is the value of i -th feature and h (x) is the value predicted by the linear-regression algorithm. We also construct a cost function that is
. The aim of the cost function is to minimize the value of J( ) and then find the value of . yU) is the value of the real data of the i -th sample. we get a function
where is the step size. From this function, we can calculate
. We can get the value of
1 After that, we let the
and
. Through the function
, we can get the value of . Using a new sample \ we could calculate the predication^ ^ of this sample. The prediction is the shift value of the PM2.5 sensor, and we can get the correct measured value of the PM2.5 sensor.
AU2018100221A 2018-02-21 2018-02-21 A correction method based on linear regression algorithm for PM2.5 sensors Ceased AU2018100221A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2018100221A AU2018100221A4 (en) 2018-02-21 2018-02-21 A correction method based on linear regression algorithm for PM2.5 sensors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2018100221A AU2018100221A4 (en) 2018-02-21 2018-02-21 A correction method based on linear regression algorithm for PM2.5 sensors

Publications (1)

Publication Number Publication Date
AU2018100221A4 true AU2018100221A4 (en) 2018-03-29

Family

ID=61693396

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2018100221A Ceased AU2018100221A4 (en) 2018-02-21 2018-02-21 A correction method based on linear regression algorithm for PM2.5 sensors

Country Status (1)

Country Link
AU (1) AU2018100221A4 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447373A (en) * 2018-11-16 2019-03-08 上海海事大学 Haze method is predicted based on the LSTM neural network of python platform
CN110210681A (en) * 2019-06-11 2019-09-06 西安电子科技大学 A kind of prediction technique of the monitoring station PM2.5 value based on distance
CN111210081A (en) * 2020-01-09 2020-05-29 中国人民解放军国防科技大学 Bi-GRU-based PM2.5 data processing and prediction method
CN111256745A (en) * 2020-02-28 2020-06-09 芜湖职业技术学院 Data calibration method for portable air quality monitor
CN113297527A (en) * 2021-06-09 2021-08-24 四川大学 PM based on multisource city big data2.5Overall domain space-time calculation inference method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447373A (en) * 2018-11-16 2019-03-08 上海海事大学 Haze method is predicted based on the LSTM neural network of python platform
CN110210681A (en) * 2019-06-11 2019-09-06 西安电子科技大学 A kind of prediction technique of the monitoring station PM2.5 value based on distance
CN110210681B (en) * 2019-06-11 2023-06-27 西安电子科技大学 Prediction method of PM2.5 value of monitoring station based on distance
CN111210081A (en) * 2020-01-09 2020-05-29 中国人民解放军国防科技大学 Bi-GRU-based PM2.5 data processing and prediction method
CN111256745A (en) * 2020-02-28 2020-06-09 芜湖职业技术学院 Data calibration method for portable air quality monitor
CN113297527A (en) * 2021-06-09 2021-08-24 四川大学 PM based on multisource city big data2.5Overall domain space-time calculation inference method
CN113297527B (en) * 2021-06-09 2022-07-26 四川大学 PM based on multisource city big data 2.5 Overall domain space-time calculation inference method

Similar Documents

Publication Publication Date Title
AU2018100221A4 (en) A correction method based on linear regression algorithm for PM2.5 sensors
CN109543906A (en) A kind of method and apparatus of atmospheric visibility prediction
CN114168906B (en) Mapping geographic information data acquisition system based on cloud computing
Murphy et al. Multi‐century trends to wetter winters and drier summers in the England and Wales precipitation series explained by observational and sampling bias in early records
CN108614071A (en) Distributed outside atmosphere quality-monitoring accuracy correction system and parameter updating method
CN108846503B (en) Dynamic respiratory system disease ill person number prediction method based on neural network
CN112036075A (en) Abnormal data judgment method based on environmental monitoring data association relation
Bugaets et al. Information system to support regional hydrological monitoring and forecasting
CN114819360A (en) Traffic flow prediction method, device and equipment
Hersbach et al. The ERA5 Global Atmospheric Reanalysis at ECMWF as a comprehensive dataset for climate data homogenization, climate variability, trends and extremes.
CN114722204A (en) Multi-label text classification method and device
Stavert et al. The Macquarie Island (LoFlo2G) high-precision continuous atmospheric carbon dioxide record
Bugaets et al. An experience of updated hydrological network data processing using the CUAHSI HIS ODM data management system
Hall et al. USACE coastal and hydraulics laboratory quality controlled, consistent measurement archive
CN103048056B (en) Determination method of probability density of sunshine temperature difference acquisition sample
Panfilov et al. Problems with construction of technical means for energy saving and pollution mitigation
Bowdalo et al. GHOST: A globally harmonised dataset of surface atmospheric composition measurements
CN113822564A (en) Flight plan minimum sample size confirmation method and device for airspace simulation analysis
Seeley The diurnal curve in estimates of primary productivity
CN104614981A (en) Method for acquiring day frequency stability of atomic clock remote calibration system
Staudinger et al. The CH-IRP data set: a decade of fortnightly data on δ 2 H and δ 18 O in streamflow and precipitation in Switzerland
CN111428440B (en) Automatic time sequence log sample labeling method and device based on conditional probability
Bennett et al. Recent streamflow trends across permafrost basins of North America: Datasets
CN104008310B (en) Electric energy quality steady state index calculation method
Eskridge Comprehensive aerological reference data set (CARDS).[Meteorological reference data]

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
DA3 Amendments made section 104

Free format text: THE NATURE OF THE AMENDMENT IS: AMEND THE NAME OF THE INVENTOR TO READ WU, HAO; FENG, YINAN; LI, TAIFU; QIAO, YIFU; SHI, WEIYI AND ZHOU, ZIYING

HB Alteration of name in register

Owner name: ZHOU, Z.

Free format text: FORMER NAME(S): WU, HAO; FENG, WEINAN; LI, TAIFU; QIAO, YIFU; SHI, WEIYI; ZHOU, ZIYING

Owner name: QIAO, Y.

Free format text: FORMER NAME(S): WU, HAO; FENG, WEINAN; LI, TAIFU; QIAO, YIFU; SHI, WEIYI; ZHOU, ZIYING

Owner name: SHI, W.

Free format text: FORMER NAME(S): WU, HAO; FENG, WEINAN; LI, TAIFU; QIAO, YIFU; SHI, WEIYI; ZHOU, ZIYING

Owner name: LI, T.

Free format text: FORMER NAME(S): WU, HAO; FENG, WEINAN; LI, TAIFU; QIAO, YIFU; SHI, WEIYI; ZHOU, ZIYING

Owner name: WU, H.

Free format text: FORMER NAME(S): WU, HAO; FENG, WEINAN; LI, TAIFU; QIAO, YIFU; SHI, WEIYI; ZHOU, ZIYING

Owner name: FENG, Y.

Free format text: FORMER NAME(S): WU, HAO; FENG, WEINAN; LI, TAIFU; QIAO, YIFU; SHI, WEIYI; ZHOU, ZIYING

MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry