CN110426755B

CN110426755B - Ground temperature data quality control method based on improved reverse distance weighting

Info

Publication number: CN110426755B
Application number: CN201910686271.XA
Authority: CN
Inventors: 姚锦松; 叶小岭; 巩灿灿; 金瞳宇; 陈畅
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing Kebo Environmental Technology Co ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2021-09-14
Anticipated expiration: 2039-07-26
Also published as: CN110426755A

Abstract

The invention provides a method for quality control of surface air temperature data based on improved inverse distance weighting. The steps are as follows: S1 collects the actual observation data of daily average temperature of the target station and n reference stations within a certain range; S2 establishes the original inverse distance weighting method model to conduct experiments to estimate the daily average temperature of the target station; S3 establishes an improved inverse distance weighting method model for experimental prediction: calculate the first root mean square of the predicted value of the daily average temperature of the target station and the actual value of the daily average temperature Error RMSE; Repeat S1 and S2 to calculate the second root mean square error RMSE between the predicted value of the daily average temperature of the target station and the actual value of the daily average temperature; repeat m times to obtain m groups of predicted values and the corresponding RMSE, m and n is obtained through PSO optimization; S4 assigns the weight value of the estimated value result according to the value of RMSE, and the final estimated result is obtained by multiplying the estimated value of each experiment by the corresponding weight. The algorithm of the invention has better accuracy, stability and adaptability, and smaller deviation value.

Description

Ground temperature data quality control method based on improved reverse distance weighting

Technical Field

The invention belongs to the technical field of quality control of ground temperature data, and particularly relates to a ground temperature data quality control method based on improved reverse distance weighting.

Background

The meteorological data is the basis for meteorological research, and in order to guarantee that the meteorological research process truly reflects weather and change rules thereof, the quality control must be carried out on the meteorological data before the research, so that the authenticity and the usability of the meteorological data are ensured. In recent years, with intensive research on climate change in various countries, many quality control methods for meteorological data have appeared. The traditional quality control method mainly analyzes the rationality of meteorological data according to the principles of meteorology, meteorological principle and climatology, the time and space change rules of each element of meteorological data and the mutual relation among the elements as clues. The method mainly comprises the following steps: at present, the methods are widely applied to the quality control of ground meteorological data, and the comprehensive application effect of the methods is reflected to be good through actual application.

With the increase of the space density of the data of the automatic station, especially the use of the data of high-density precipitation, air temperature and wind, the data quality of single elements needs to be effectively controlled. If the data is single element data or newly-built measuring stations or measuring points, the quality control of the observed data can only be carried out by an interpolation estimation method due to the lack of historical data or the mutual comparison of the elements. Currently, there are many methods for interpolation estimation, such as: interpolation methods such as Inverse Distance Weighting (IDW) and Spatial Regression (SRT). The original IDW is mainly weighted by the distance between a target station and a reference station, and the closer the distance is, the larger the weight of the station is, so that the processing effect on complex terrain and abnormal weather is poor. The method is improved on the basis of the original IDW, so that the precision, stability and adaptability of the algorithm are obviously improved.

Disclosure of Invention

The invention aims to provide a ground temperature data quality control method based on improved inverse distance weighting aiming at the defects of the original IDW, and the accuracy, stability and adaptability of the algorithm are improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

a ground temperature data quality control method based on improved inverse distance weighting comprises the following steps:

s1, acquiring the daily average temperature actual observation data of the target station and n reference stations within a certain range;

s2, establishing an original inverse distance weighting method model for experiment, and estimating the daily average temperature of the target station through actual observation data of the reference station to obtain a predicted value of the daily average temperature of the target station;

s3, establishing an improved inverse distance weighting method model for experimental prediction:

calculating a first Root Mean Square Error (RMSE) of a daily average temperature estimated value of the target station and a daily average temperature actual value of the target station;

repeating S1 and S2, randomly selecting n reference stations in the range of the target station, and calculating a second Root Mean Square Error (RMSE) of the estimated daily average temperature value of the target station and the actual daily average temperature value of the target station; repeating the steps m times to obtain m groups of estimated values and corresponding RMSE;

and S4, according to the value of RMSE, giving the weight value of the estimated value result obtained in each experiment in S3, and multiplying the estimated value obtained in each experiment by the corresponding weight value to obtain the final estimated result of the daily average temperature of the target station.

In order to optimize the technical scheme, the specific measures adopted further comprise:

the reference station is n stations randomly selected within 150 kilometers around the target station.

The formula of the original inverse distance weighting method model is as follows:

in the formula, Z(s)₀) Is a target station s₀A predicted value of (2); z (i) is the actual observation at reference station i; lambda [ alpha ]_iThe weight value of the reference station i to the target station is taken as the weight value; n is the number of reference stations; d_i0The distance between the reference station i and the target station is taken as the distance; p defaults to 2.

The formula for calculating RSME is as follows:

in the formula, y_rIs the actual observed value of day r, y_r' is an estimated value for day r, r is the number of days observed, and t is the total number of days observed.

In step S4, the smaller the RMSE value, the larger the weight value of the corresponding estimated result.

The RMSE weighting method formula is as follows:

wherein X' is a final estimated result; w is a_iThe weight coefficient of the ith experiment; z_iThe estimated result of the ith experiment is obtained; a is the parameter value of the exponential function; r is_iThe root mean square error of the i-th experiment.

The formula of the exponential function is that y is a^x Wherein 0 is<a<1。

And finding out the optimal value of the experiment iteration times m and the optimal value of the random site number n by using a particle swarm algorithm.

The actual annual average temperature observation data is collected in S1.

The invention has the beneficial effects that: on the basis of an original reverse distance weighting method, the idea of random forests is introduced, a single reverse distance weighting method is used as a weak regression to carry out combined prediction, an exponential function weighting method is introduced, the estimation error RMSE of each reverse distance weighting method is used as an evaluation standard, the smaller the RMSE is, the larger the corresponding estimation result weight is, and the larger the RMSE is, the smaller the corresponding estimation result weight is. Experiments prove that the algorithm accuracy, stability and adaptability of the method are better than those of the original inverse distance weighting method and the space regression interpolation method, and the deviation value of the experimental result of the method is smaller.

Drawings

FIG. 1 is a flow diagram of the ID-IDW algorithm of the present invention.

Fig. 2 is a diagram of a distribution of four target stations and stations within 150 km of the surroundings according to an embodiment of the invention.

FIG. 3 is a diagram illustrating Particle Swarm Optimization (PSO) results according to an embodiment of the present invention.

FIG. 4 is a graph of α -RMSE relationships for an embodiment of the present invention.

FIG. 5 is a graph of the results of RMSE and MAE indicators for an embodiment of the present invention.

FIG. 6 is a diagram showing ID-IDW result fluctuation according to an embodiment of the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

The invention discloses a ground temperature data quality control method based on improved reverse distance weighting, as shown in figure 1, the basic flow of the invention is as follows:

s1, acquiring the daily average temperature actual observation data of the target station and n reference stations within a certain range.

In order to verify the generalization capability of the algorithm in space and time, four different sites of Jiangsu Nanjing, Sichuan Wenjiang, Hunan Xintian and Shanxi Taiyuan are selected as target sites, all sites within a range of 150 kilometers around the four sites are simultaneously selected as reference sites, and the four sites and the surrounding sites are distributed as shown in FIG. 2. All data with south Jing of Jiangsu as a target station are 2014-year-day average temperature data, data with Wenjiang of Sichuan as a target station are 2005-year-day average temperature data, data with New field of Hunan as a target station are 2007-year-day average temperature data, and data with Taiyuan of Shanxi as a target station are 2010-year-day average temperature data.

S2, an original Inverse Distance Weighting (IDW) model is established for experiment, the daily average temperature of the target station is estimated according to actual observation data of the reference station, and a predicted value of the daily average temperature of the target station is obtained.

The formula of the original IDW model is as follows:

IDW is mainly performed to control the quality of a target station based on actual measurement data of peripheral stations, and the assignment of weight is performed by distance. If abnormal weather or wrong data occurs at a reference station close to the reference station, the accuracy of the IDW algorithm is reduced, which is a main disadvantage of IDW.

S3, establishing an improved inverse distance weighting method (ID-IDW) model for experimental prediction.

1) Randomly selecting n stations within 150 kilometers around a selected target station as a reference station of the 1 st experiment, estimating the daily average temperature of the target station by using an IDW algorithm, calculating the Root Mean Square Error (RMSE) of the estimated value and the actual value of the target station, and calculating the RSME according to the following formula:

wherein y is_rIs the actual observed value of day r, y_r' is the estimated value of the r day, r is the number of days observed, and the present embodiment is the estimated average temperature of the day of the whole year, so the total number of days is 365 days.

Randomly selecting n stations from the original data as reference stations to carry out the 2 nd experiment, and calculating the estimated value and RMSE of the target station; repeating the experiment m times to obtain m groups of estimated values and RMSE values, and giving a weight value to the result of each experiment according to the size of each group of RMSE values, wherein the smaller the RMSE, the larger the weight value is, the larger the corresponding RMSE is, and the smaller the weight value is. And the final target station day average temperature estimation result is obtained by multiplying each group of experimental results by the corresponding weight.

The exponential function weighting method comprises the following steps: by exponential function image y ═ a^x(0<a<1) It can be seen that the smaller the value of the positive half axis of the x-axis, the larger the value of y. Where x denotes the root mean square error, i.e. the smaller the root mean square error, the greater the weight, the formula is as follows:

wherein X' is the final prediction result; w is a_iThe weight coefficient of the ith experiment; z_iIs the ith timeAn experiment estimation result; a is the parameter value of the exponential function; r is_iThe root mean square error of the i-th experiment.

2) Determination and selection of parameters m and n

As shown in fig. 3, the optimal number of iterations m and the number of random sites n are found by Particle Swarm Optimization (PSO). The particle swarm optimization method comprises the following steps:

wherein the q-dimensional particle is represented by x_i＝(x_i1,x_i2,x_i3……x_iq) Each particle corresponding to a velocity v_i＝(v_i1,v_i2,v_i3……v_iq) Each particle is searched for considering its own optimum p_iAnd particle swarm optimization p_gW is the inertial weight, c₁Tracking the optimal historical weight coefficient for the particle, c₂For the optimal weight coefficient of the particle tracking population, delta and gamma are intervals of [0, 1 ]]And r is a constraint factor.

Taking the Guangdong Zijin station as an example, searching for the optimal solution 1000 times by using a particle swarm optimization algorithm, setting the m value to be 5 to 200, and setting the n value to be 5 to 20, wherein the root mean square error obtained by ID-IDW each time is taken as a fitness value to be calculated. FIG. 3 is a graph of the results obtained from 1000 seeks, with the final RMSE remaining substantially around 0.35, assuming a convergent regime, and resulting in final seek m and n values. According to the process, the invention respectively performs experiments on four sites of Nanjing, Wenjiang, Xintian and Taiyuan to obtain the values of corresponding m and n shown in the table 1.

TABLE 1

3) Value of a in index weight

The invention uses an exponential function weighting method, the smaller the root mean square error is, the larger the weight is, and the selection of a in the exponential function is determined by a plurality of groups of experiments. The method takes four stations of Nanjing, Wenjiang, Xintian and Taiyuan as target stations to carry out experiments respectively, wherein a is more than 0 and less than 1, takes the four selected target stations as an example, respectively takes different values of a to carry out experiments to obtain a relation curve of a relative to RMSE, and judges the optimal value of a. Fig. 4 is a graph of experimental results of four stations, where the abscissa is a value, and the ordinate is an RMSE value corresponding to a value, and it can be seen from a relationship graph obtained from a and RMSE that the larger the value of a is, the larger the RMSE value is, that is, the larger the error is, and according to the experimental results, when a is 0.01, the target station is used to perform an experiment with four stations of nanjing, wenjiang, new field, and taiyuan as target stations, and the root mean square error obtained is the minimum, which is 0.16, 0.105, 0.23, and 0.323, respectively, so that the present invention selects a to be 0.01 to perform the next experiment, and brings the a value into the next experiment, and estimates the air temperature data of the target station by using the ID-IDW algorithm of the present invention.

The results of the experiment are shown in FIGS. 5 and 6, wherein FIG. 5 shows the results of RMSE and MAE corresponding to the ID-IDW, IDW and SRT experiments. It can be seen that the ID-IDW performs best in the results of this experiment, and then the SRT, IDW is the relative poor point of the experiment. The RMSE result of the ID-IDW at the Nanjing station of the target station is approximately about 0.2, and the MAE is about 0.3; the RMSE result of the SRT at the Nanjing station of the target station is about 0.33, and the MAE is about 0.42; similarly, it can be seen that the RMSE of IDW at the target station Nanjing station is about 0.4, and the MAE is about 0.5, compared to the applicability of ID-IDW at this station, which is better than the performance of IDW and SRT. And similarly, the rest 3 sites are analyzed and compared to draw a similar conclusion, and the ID-IDW has a more prominent effect on the experimental result.

In order to verify the stability of the method, in this embodiment, 50 sets of experiments are performed on the same site respectively to see whether the fluctuation range of the RMSE and MAE is too large and the stability of the algorithm is not high due to the superiority and inferiority of the selected site, where fig. 6 shows the fluctuation results of the RMSE and MAE corresponding to the 50 sets of experiments on the four sites respectively. The results of 50 groups of experiments of the Nanjing station at the target station show that the RMSE change tends to be stable, the result is kept about 0.2, and the MAE is about 0.3; similarly, the experimental results of the target station in Wenjiangjun show that RMSE is about 0.1, MAE is about 0.25, RMSE is about 0.25 and MAE is about 0.38; RMSE is about 0.35 by taking the Tatarian as a target station, and MAE is about 0.48. Four groups of experiments basically prove that the stability of the algorithm is high, and the deviation value of the experimental result is small.

The experimental results were analyzed:

comparing and analyzing an experimental result obtained by the ID-IDW algorithm with the original IDW and SRT algorithms, wherein the evaluation of the model adopts Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for evaluation, wherein the RMSE mainly measures the error between an actual value and a predicted value, and the smaller the RMSE is, the smaller the difference between the actual value and the predicted value is, and the more accurate the predicted result is; the MAE can accurately reflect the actual prediction error, and the smaller the MAE is, the smaller the difference between the predicted value and the true value is. The experimental results are more ideal when both RMSE and MAE are relatively small.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. a method for quality control of surface air temperature data based on improved inverse distance weighting, is characterized in that, comprises the steps:

S1 collects the actual observation data of the daily average temperature of the target station s ₀ and n reference stations within a certain range;

S2 establishes the original inverse distance weighting method model for experiments, and estimates the daily average temperature of the target station s ₀ through the actual observation data of the reference station, and obtains the predicted value of the daily average temperature of the target station s ₀ ;

S3 establishes an improved inverse distance weighting method model for experimental prediction:

Calculate the first root mean square error RMSE between the predicted value of the daily average temperature of the target station s ₀ and the actual value of the daily average temperature of the target station s ₀ ;

Repeat S1 and S2 to calculate the second root mean square error RMSE between the estimated daily average temperature of the target station s ₀ and the actual value of the daily average temperature of the target station s ₀ ; repeat m times to obtain m groups of estimated values and corresponding RMSE;

S4 assigns the weight value of the estimated value obtained in each experiment in S3 according to the value of RMSE, and by multiplying the estimated value of each experiment by the corresponding weight, the final estimated result of the daily average temperature of the target station s ₀ is obtained. ;

The RMSE weighting method formula is as follows:

Among them, X' is the final estimated result; _wi is the weight coefficient of the ith experiment; Z _i is the estimated result of the ith experiment; a is the parameter value of the base of the exponential function; ri is the average value of the _ith experiment square root error.

2 . The method for quality control of surface air temperature data according to claim 1 , wherein the reference stations are n stations randomly selected within a range of 150 kilometers around the target station s ₀ . 3 .

3. ground air temperature data quality control method according to claim 1, is characterized in that, the formula of original inverse distance weighting method model is as follows:

In the formula, Z(s ₀ ) is the estimated value of the target station s ₀ ; Z(i) is the actual observation value at the reference station i; λ _i is the weight of the reference station i to the target station s ₀ ; n is the reference The number of stations; d _i0 is the distance between the reference station i and the target station s ₀ ; the default value of p is 2.

4. surface air temperature data quality control method according to claim 1, is characterized in that, the formula that calculates RSME is as follows:

In the formula, y _r is the actual observed value on the rth day, y _r ′ is the estimated value on the rth day, r is the number of days of observation, and t is the total number of days of observation.

5 . The method for quality control of surface air temperature data according to claim 1 , wherein in step S4 , the smaller the value of RMSE, the larger the weight value of the corresponding estimated value result. 6 .

6 . The quality control method for surface air temperature data according to claim 1 , wherein the exponential function formula is y=a ^x , wherein 0<a<1, and x refers to the root mean square error. 7 .

7 . The quality control method of surface air temperature data according to claim 1 , wherein the optimal experimental iteration times m value and random station number n value are found by particle swarm algorithm. 8 .

8 . The method for quality control of surface air temperature data according to claim 1 , wherein what is collected in S1 is the actual observation data of annual daily average temperature. 9 .