Disclosure of Invention
The invention aims to provide a ground temperature data quality control method based on improved inverse distance weighting aiming at the defects of the original IDW, and the accuracy, stability and adaptability of the algorithm are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a ground temperature data quality control method based on improved inverse distance weighting comprises the following steps:
s1, acquiring the daily average temperature actual observation data of the target station and n reference stations within a certain range;
s2, establishing an original inverse distance weighting method model for experiment, and estimating the daily average temperature of the target station through actual observation data of the reference station to obtain a predicted value of the daily average temperature of the target station;
s3, establishing an improved inverse distance weighting method model for experimental prediction:
calculating a first Root Mean Square Error (RMSE) of a daily average temperature estimated value of the target station and a daily average temperature actual value of the target station;
repeating S1 and S2, randomly selecting n reference stations in the range of the target station, and calculating a second Root Mean Square Error (RMSE) of the estimated daily average temperature value of the target station and the actual daily average temperature value of the target station; repeating the steps m times to obtain m groups of estimated values and corresponding RMSE;
and S4, according to the value of RMSE, giving the weight value of the estimated value result obtained in each experiment in S3, and multiplying the estimated value obtained in each experiment by the corresponding weight value to obtain the final estimated result of the daily average temperature of the target station.
In order to optimize the technical scheme, the specific measures adopted further comprise:
the reference station is n stations randomly selected within 150 kilometers around the target station.
The formula of the original inverse distance weighting method model is as follows:
in the formula, Z(s)0) Is a target station s0A predicted value of (2); z (i) is the actual observation at reference station i; lambda [ alpha ]iThe weight value of the reference station i to the target station is taken as the weight value; n is the number of reference stations; di0The distance between the reference station i and the target station is taken as the distance; p defaults to 2.
The formula for calculating RSME is as follows:
in the formula, yrIs the actual observed value of day r, yr' is an estimated value for day r, r is the number of days observed, and t is the total number of days observed.
In step S4, the smaller the RMSE value, the larger the weight value of the corresponding estimated result.
The RMSE weighting method formula is as follows:
wherein X' is a final estimated result; w is aiThe weight coefficient of the ith experiment; ziThe estimated result of the ith experiment is obtained; a is the parameter value of the exponential function; r isiThe root mean square error of the i-th experiment.
The formula of the exponential function is that y is ax Wherein 0 is<a<1。
And finding out the optimal value of the experiment iteration times m and the optimal value of the random site number n by using a particle swarm algorithm.
The actual annual average temperature observation data is collected in S1.
The invention has the beneficial effects that: on the basis of an original reverse distance weighting method, the idea of random forests is introduced, a single reverse distance weighting method is used as a weak regression to carry out combined prediction, an exponential function weighting method is introduced, the estimation error RMSE of each reverse distance weighting method is used as an evaluation standard, the smaller the RMSE is, the larger the corresponding estimation result weight is, and the larger the RMSE is, the smaller the corresponding estimation result weight is. Experiments prove that the algorithm accuracy, stability and adaptability of the method are better than those of the original inverse distance weighting method and the space regression interpolation method, and the deviation value of the experimental result of the method is smaller.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
The invention discloses a ground temperature data quality control method based on improved reverse distance weighting, as shown in figure 1, the basic flow of the invention is as follows:
s1, acquiring the daily average temperature actual observation data of the target station and n reference stations within a certain range.
In order to verify the generalization capability of the algorithm in space and time, four different sites of Jiangsu Nanjing, Sichuan Wenjiang, Hunan Xintian and Shanxi Taiyuan are selected as target sites, all sites within a range of 150 kilometers around the four sites are simultaneously selected as reference sites, and the four sites and the surrounding sites are distributed as shown in FIG. 2. All data with south Jing of Jiangsu as a target station are 2014-year-day average temperature data, data with Wenjiang of Sichuan as a target station are 2005-year-day average temperature data, data with New field of Hunan as a target station are 2007-year-day average temperature data, and data with Taiyuan of Shanxi as a target station are 2010-year-day average temperature data.
S2, an original Inverse Distance Weighting (IDW) model is established for experiment, the daily average temperature of the target station is estimated according to actual observation data of the reference station, and a predicted value of the daily average temperature of the target station is obtained.
The formula of the original IDW model is as follows:
in the formula, Z(s)0) Is a target station s0A predicted value of (2); z (i) is the actual observation at reference station i; lambda [ alpha ]iThe weight value of the reference station i to the target station is taken as the weight value; n is the number of reference stations; di0The distance between the reference station i and the target station is taken as the distance; p defaults to 2.
IDW is mainly performed to control the quality of a target station based on actual measurement data of peripheral stations, and the assignment of weight is performed by distance. If abnormal weather or wrong data occurs at a reference station close to the reference station, the accuracy of the IDW algorithm is reduced, which is a main disadvantage of IDW.
S3, establishing an improved inverse distance weighting method (ID-IDW) model for experimental prediction.
And S4, according to the value of RMSE, giving the weight value of the estimated value result obtained in each experiment in S3, and multiplying the estimated value obtained in each experiment by the corresponding weight value to obtain the final estimated result of the daily average temperature of the target station.
1) Randomly selecting n stations within 150 kilometers around a selected target station as a reference station of the 1 st experiment, estimating the daily average temperature of the target station by using an IDW algorithm, calculating the Root Mean Square Error (RMSE) of the estimated value and the actual value of the target station, and calculating the RSME according to the following formula:
wherein y is
rIs the actual observed value of day r, y
r' is the estimated value of the r day, r is the number of days observed, and the present embodiment is the estimated average temperature of the day of the whole year, so the total number of days is 365 days.
Randomly selecting n stations from the original data as reference stations to carry out the 2 nd experiment, and calculating the estimated value and RMSE of the target station; repeating the experiment m times to obtain m groups of estimated values and RMSE values, and giving a weight value to the result of each experiment according to the size of each group of RMSE values, wherein the smaller the RMSE, the larger the weight value is, the larger the corresponding RMSE is, and the smaller the weight value is. And the final target station day average temperature estimation result is obtained by multiplying each group of experimental results by the corresponding weight.
The exponential function weighting method comprises the following steps: by exponential function image y ═ ax(0<a<1) It can be seen that the smaller the value of the positive half axis of the x-axis, the larger the value of y. Where x denotes the root mean square error, i.e. the smaller the root mean square error, the greater the weight, the formula is as follows:
wherein X' is the final prediction result; w is aiThe weight coefficient of the ith experiment; ziIs the ith timeAn experiment estimation result; a is the parameter value of the exponential function; r isiThe root mean square error of the i-th experiment.
2) Determination and selection of parameters m and n
As shown in fig. 3, the optimal number of iterations m and the number of random sites n are found by Particle Swarm Optimization (PSO). The particle swarm optimization method comprises the following steps:
wherein the q-dimensional particle is represented by xi=(xi1,xi2,xi3……xiq) Each particle corresponding to a velocity vi=(vi1,vi2,vi3……viq) Each particle is searched for considering its own optimum piAnd particle swarm optimization pgW is the inertial weight, c1Tracking the optimal historical weight coefficient for the particle, c2For the optimal weight coefficient of the particle tracking population, delta and gamma are intervals of [0, 1 ]]And r is a constraint factor.
Taking the Guangdong Zijin station as an example, searching for the optimal solution 1000 times by using a particle swarm optimization algorithm, setting the m value to be 5 to 200, and setting the n value to be 5 to 20, wherein the root mean square error obtained by ID-IDW each time is taken as a fitness value to be calculated. FIG. 3 is a graph of the results obtained from 1000 seeks, with the final RMSE remaining substantially around 0.35, assuming a convergent regime, and resulting in final seek m and n values. According to the process, the invention respectively performs experiments on four sites of Nanjing, Wenjiang, Xintian and Taiyuan to obtain the values of corresponding m and n shown in the table 1.
TABLE 1
3) Value of a in index weight
The invention uses an exponential function weighting method, the smaller the root mean square error is, the larger the weight is, and the selection of a in the exponential function is determined by a plurality of groups of experiments. The method takes four stations of Nanjing, Wenjiang, Xintian and Taiyuan as target stations to carry out experiments respectively, wherein a is more than 0 and less than 1, takes the four selected target stations as an example, respectively takes different values of a to carry out experiments to obtain a relation curve of a relative to RMSE, and judges the optimal value of a. Fig. 4 is a graph of experimental results of four stations, where the abscissa is a value, and the ordinate is an RMSE value corresponding to a value, and it can be seen from a relationship graph obtained from a and RMSE that the larger the value of a is, the larger the RMSE value is, that is, the larger the error is, and according to the experimental results, when a is 0.01, the target station is used to perform an experiment with four stations of nanjing, wenjiang, new field, and taiyuan as target stations, and the root mean square error obtained is the minimum, which is 0.16, 0.105, 0.23, and 0.323, respectively, so that the present invention selects a to be 0.01 to perform the next experiment, and brings the a value into the next experiment, and estimates the air temperature data of the target station by using the ID-IDW algorithm of the present invention.
The results of the experiment are shown in FIGS. 5 and 6, wherein FIG. 5 shows the results of RMSE and MAE corresponding to the ID-IDW, IDW and SRT experiments. It can be seen that the ID-IDW performs best in the results of this experiment, and then the SRT, IDW is the relative poor point of the experiment. The RMSE result of the ID-IDW at the Nanjing station of the target station is approximately about 0.2, and the MAE is about 0.3; the RMSE result of the SRT at the Nanjing station of the target station is about 0.33, and the MAE is about 0.42; similarly, it can be seen that the RMSE of IDW at the target station Nanjing station is about 0.4, and the MAE is about 0.5, compared to the applicability of ID-IDW at this station, which is better than the performance of IDW and SRT. And similarly, the rest 3 sites are analyzed and compared to draw a similar conclusion, and the ID-IDW has a more prominent effect on the experimental result.
In order to verify the stability of the method, in this embodiment, 50 sets of experiments are performed on the same site respectively to see whether the fluctuation range of the RMSE and MAE is too large and the stability of the algorithm is not high due to the superiority and inferiority of the selected site, where fig. 6 shows the fluctuation results of the RMSE and MAE corresponding to the 50 sets of experiments on the four sites respectively. The results of 50 groups of experiments of the Nanjing station at the target station show that the RMSE change tends to be stable, the result is kept about 0.2, and the MAE is about 0.3; similarly, the experimental results of the target station in Wenjiangjun show that RMSE is about 0.1, MAE is about 0.25, RMSE is about 0.25 and MAE is about 0.38; RMSE is about 0.35 by taking the Tatarian as a target station, and MAE is about 0.48. Four groups of experiments basically prove that the stability of the algorithm is high, and the deviation value of the experimental result is small.
The experimental results were analyzed:
comparing and analyzing an experimental result obtained by the ID-IDW algorithm with the original IDW and SRT algorithms, wherein the evaluation of the model adopts Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for evaluation, wherein the RMSE mainly measures the error between an actual value and a predicted value, and the smaller the RMSE is, the smaller the difference between the actual value and the predicted value is, and the more accurate the predicted result is; the MAE can accurately reflect the actual prediction error, and the smaller the MAE is, the smaller the difference between the predicted value and the true value is. The experimental results are more ideal when both RMSE and MAE are relatively small.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.