CN116701371B - Method and device for interpolating missing values of atmospheric temperature data under covariance analysis - Google Patents

Method and device for interpolating missing values of atmospheric temperature data under covariance analysis Download PDF

Info

Publication number
CN116701371B
CN116701371B CN202310682985.XA CN202310682985A CN116701371B CN 116701371 B CN116701371 B CN 116701371B CN 202310682985 A CN202310682985 A CN 202310682985A CN 116701371 B CN116701371 B CN 116701371B
Authority
CN
China
Prior art keywords
temperature data
atmospheric temperature
data
missing
station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310682985.XA
Other languages
Chinese (zh)
Other versions
CN116701371A (en
Inventor
殷倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Geographic Sciences and Natural Resources of CAS
Original Assignee
Institute of Geographic Sciences and Natural Resources of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Geographic Sciences and Natural Resources of CAS filed Critical Institute of Geographic Sciences and Natural Resources of CAS
Priority to CN202310682985.XA priority Critical patent/CN116701371B/en
Publication of CN116701371A publication Critical patent/CN116701371A/en
Application granted granted Critical
Publication of CN116701371B publication Critical patent/CN116701371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention provides an atmospheric temperature data missing value interpolation method under covariance analysis, which comprises the following steps: constructing an influence factor of the atmospheric temperature data, manufacturing an atmospheric temperature data set, building a multiple regression model of a plurality of station positions, traversing and searching for an atmospheric temperature data missing data set, building an atmospheric temperature data sequence set, calculating a covariance matrix of temperature data reference sequence data, solving a variance weight vector, building a variance model of a plurality of station positions, building an atmospheric temperature data missing value interpolation model based on covariance analysis, and circularly interpolating the atmospheric temperature data missing value; and provides an interpolation device based on the same. The method synthesizes two large statistical models of multiple regression and covariance, introduces a plurality of influencing factors, and fully considers the correlation in time and space; and the atmospheric temperature data of different stations are divided into a missing sequence, a reference sequence and a test sequence for training, testing and missing value interpolation, so that the reliability and accuracy of data interpolation are improved.

Description

Method and device for interpolating missing values of atmospheric temperature data under covariance analysis
Technical Field
The invention belongs to the technical field of interpolation of missing values of meteorological elements, in particular to an interpolation method and an interpolation device of missing values of atmospheric temperature data under covariance analysis.
Background
In order to monitor the atmospheric temperature data in real time, more and more atmospheric temperature monitoring stations are generally provided to obtain more accurate and complete atmospheric temperature data. The complete high-precision temperature observation data is an important input parameter for agricultural meteorological disaster monitoring and ecosystem simulation. Because of the limitation of field meteorological conditions, the phenomenon of meteorological data missing is normal, and the data interpolation method is a necessary processing step for meteorological data application.
The interpolation method of the atmospheric temperature data missing value commonly used at present includes a regression-based method, a geostatistical method and a machine learning method. However, in interpolating the atmospheric temperature data loss value, there is a significant disadvantage in accuracy of the above-described several methods: regression-based temperature data interpolation methods may generate over-fitting problems, and geostatistical methods cannot constrain the minimum estimation error, while machine learning methods, while being able to construct new temperature missing value interpolation deep learning models, perform high-precision interpolation on missing half-hour temperature observation data, require training through a large amount of atmospheric temperature data and lack clear mechanism meaning.
Patent CN113495913a discloses a method and device for interpolating air quality data missing values, which calculates covariance and weight based on a first air quality data sequence, a second air quality data sequence and a third air quality data sequence, calculates air quality data corresponding to a first time point in the first air quality data sequence according to a plurality of third air quality data sequences and corresponding weights, thereby completing the supplementation of the air quality data missing values, and calculates missing values based on a historical dataset by a covariance calculation weight method without considering the influence of other influencing parameters on interpolation results, so that the accuracy of interpolation cannot be ensured.
Therefore, in order to solve the problem, it is urgent and necessary to introduce a method for interpolating the atmospheric temperature data missing value under covariance analysis to improve the accuracy of the interpolation result of the atmospheric temperature missing value so as to ensure the accuracy and perfection of the application of the meteorological data.
Disclosure of Invention
The invention provides an atmospheric temperature data missing value interpolation method and an interpolation device thereof under covariance analysis aiming at the defects in the prior art.
The method synthesizes a multiple regression model and a covariance model, introduces influencing factors such as time, atmospheric pressure, maximum wind speed, relative humidity, water vapor pressure, precipitation, reference temperature and the like, and fully considers the correlation in time and space; and the atmospheric temperature data of different stations are divided into a missing sequence, a reference sequence and a test sequence, so that the reliability and the statistical test sensitivity of the interpolated data are improved.
The invention provides an atmospheric temperature data missing value interpolation method under covariance analysis, which comprises the following steps:
s1, constructing an influence factor of atmospheric temperature data;
the influence factors of the atmospheric temperature data are determined according to the climate change theory and the traditional temperature prediction scale, and are selected from the atmospheric temperature data set, and the influence factors comprise: time x 1 Atmospheric pressure x 2 Maximum wind speed x 3 Relative humidity x 4 Steam pressure x 5 Precipitation x 6 Reference temperature x 7 Seven influencing factors are added.
S2, making an atmospheric temperature data set and establishing a multiple regression model of a plurality of station positions;
the multiple regression model of the plurality of station locations is represented in matrix form as follows:
Y b =Xw (1);
s3, traversing to find an atmospheric temperature data missing data set;
according to the Monte Carlo method, traversing all the atmospheric temperature data and station position information of the atmospheric temperature data set, finding out stations with all the atmospheric temperature data in the atmospheric temperature data set missing as stations to be interpolated, and extracting all the missing moments of the atmospheric temperature data of each station to be interpolated and the corresponding station position information;
s4, establishing an atmospheric temperature data sequence set for each station to be interpolated;
s41, combining the moments corresponding to the plurality of atmosphere temperature data which are continuously deleted to form a temperature data deletion sequence, wherein the initial moment of the specified temperature data deletion sequence is a first interpolation point;
s42, manufacturing temperature data missing sequence data, temperature data reference sequence data and temperature data test sequence data;
s5, calculating a covariance matrix of the temperature data reference sequence data;
the covariance of all the reference sequence data of any two temperature data is calculated to obtain a covariance matrix as follows
Wherein y is 1 ,y n A single column matrix representing the composition of all temperatures in the 1 st and nth station locations, respectively; c () represents covariance calculation;
s6, solving a variance weight matrix, and establishing a variance model of a plurality of station positions;
s61, solving a variance weight matrix r;
s62, establishing a variance model of a plurality of station positions through a covariance matrix and a variance weight matrix of temperature data reference sequence data as follows:
Y a =Zr (3);
s7, constructing an atmospheric temperature data missing value interpolation model based on covariance analysis;
s71, defining and solving an error remainder d;
s72, establishing an atmospheric temperature data missing value interpolation model of a plurality of station positions through a multiple regression model and a variance model of temperature data reference sequence data, wherein the interpolation model is as follows:
Y=(Xw+Y 0 (Zr)+d)/2 (4)
wherein Y is 0 (Zr) represents a predicted value obtained based on the variance model;
s8, circularly interpolating the atmospheric temperature data missing value;
substituting the temperature data missing sequence data of all the stations to be interpolated into the interpolation model of the atmospheric temperature data missing values in the step S72, and repeatedly executing the steps S2 to S7 until the atmospheric temperature data in the atmospheric temperature data set is free of missing.
Preferably, the step S2 specifically includes the following steps:
s21, acquiring an atmospheric temperature data set based on a national geographic database and a temperature monitoring station real-time database, and manufacturing a data set containing time x 1 Atmospheric pressure x 2 Maximum wind speed x 3 Relative humidity x 4 Steam pressure x 5 Precipitation x 6 And a reference temperature x 7 Is a set of station location information;
s22, establishing a multiple regression model of the position of a single station in the following basic form:
Y b =w 1 x 1 +w 2 x 2 +…+w j x j (5)
wherein Y is b Representing a predicted atmospheric temperature; w (w) j Regression weight coefficients representing the jth influence factor;
s23, aiming at different station positions, performing multiple regression processing on the atmospheric temperature data set in the step S21 by utilizing a MATLAB software multiple regression module to obtain preliminary weight coefficients w of multiple regression models of different station positions j
S24, establishing a multiple regression model of a plurality of station positions in the following basic form:
Y ib =w 1 x i1 +w 2 x i2 +…+w j x ij (6)
wherein Y is ib A predicted atmospheric temperature representing an ith station location sample; x is x ij A j-th influencing factor representing the i-th station location sample;
preferably, the step S42 specifically includes the following steps:
s421, extracting the atmospheric temperature data of the station to be interpolated in a preset period as temperature data missing sequence data, wherein the missing value is replaced by an atmospheric temperature average value in the preset period;
s422, taking all stations except the station to be interpolated, which have no loss of atmospheric temperature data in the preset period, as reference stations;
s423, respectively extracting 80% of data from the atmospheric temperature data of each reference station in the preset period to obtain a plurality of temperature data reference sequence data, and correspondingly extracting 80% of temperature data missing sequence data to obtain temperature data reference prediction sequence data;
s424, respectively extracting the remaining 20% of data from the atmospheric temperature data of each reference station in the preset period to obtain a plurality of temperature data test sequence data, and correspondingly extracting 20% of temperature data missing sequence data to obtain temperature data test prediction sequence data;
preferably, the step S61 specifically includes the steps of:
s611, calculating covariance of all temperature data reference sequence data and temperature data missing sequence data to obtain a reference covariance matrix Z r The following are provided:
Z r =[C(y 1 y 0 )…C(y n y 0 )] T (7)
wherein y is 0 A single column matrix representing all temperatures in the station position to be interpolated; t is the transpose;
s612, defining a variance weight matrix r:
r=[r 1 …r n ] T (8)
wherein r is 1 ,r n Variance weights of the 1 st and nth temperature data reference sequence data are respectively represented;
s613, based on the covariance matrix and the reference covariance matrix, the solving equation of the variance weight matrix r is as follows:
preferably, the step S71 specifically includes the steps of:
s711, defining error vectorWherein d 1 ,d n The 1 st and nth atmospheric temperature data loss value interpolation error values are shown, respectively.
S712, substituting the temperature data test sequence data into the variance model to obtain a temperature data test prediction sequence data predicted value, and comparing the temperature data test prediction sequence data predicted value with a true value behind the temperature data test prediction sequence data to obtain an error vector D;
s713, taking the average value of the error vector D by the error remainder D;
preferably, the step S21 specifically includes the following steps:
s211, collecting the atmospheric temperature data of all stations hour by hour, and constructing an atmospheric temperature data set containing station position information;
s212, converting the position information of all stations in the atmospheric temperature data set by utilizing coordinate conversion, and constructing an atmospheric temperature data layer on a map.
Preferably, there are 1 or more stations to be interpolated in the step S4, 1 or more first interpolation points in the stations to be interpolated in the step S41, and not less than 2 reference stations in the step S42.
Preferably, the data missing value interpolation model based on covariance analysis in step S7 is also applicable to other meteorological element missing value interpolation technical fields.
In another aspect of the present invention, an interpolation device using the foregoing method for interpolating missing values of atmospheric temperature data under covariance analysis is provided, which includes an acquisition and display module, a search module, a processing module, and a calculation module, where the acquisition and display module acquires atmospheric temperature data of all stations from hour to hour and introduces the position information of all stations as a layer into ArcGIS software through coordinate conversion, and displays an atmospheric temperature data layer established on a map; the searching module searches all reference stations through ArcGIS software to establish an atmospheric temperature data sequence set, and traverses to search an atmospheric temperature data missing data set; the calculation module builds an atmospheric temperature data missing value interpolation model based on covariance analysis by building a multiple regression model of a plurality of station positions and a covariance matrix of temperature data reference sequence data; the processing module circularly interpolates the missing value of the atmospheric temperature data until the atmospheric temperature data in the atmospheric temperature data set is free of missing.
Compared with the prior art, the invention has the technical effects that:
1. according to the atmospheric temperature data missing value interpolation method under covariance analysis, a multiple regression model and a covariance model in mathematical statistics are combined, an atmospheric temperature data missing value interpolation model based on covariance analysis is constructed, atmospheric temperature data of different stations are divided into missing sequences, reference sequences and test sequences, so that training, testing and missing value interpolation of the atmospheric temperature data are completed circularly, reliability and statistical test sensitivity of the interpolated data are improved, and error of atmospheric temperature data missing value interpolation is reduced.
2. According to the atmospheric temperature data missing value interpolation method under covariance analysis, influence factors such as seasons, weather, time and reference temperature are introduced, an atmospheric temperature data set is manufactured, multiple regression models of a plurality of station positions are built, and MATLAB software multiple regression modules are utilized to conduct multiple regression processing on the atmospheric temperature data set, so that preliminary weight coefficients of the multiple regression models of different station positions are obtained, the mutation influence of discontinuous variables on the atmospheric temperature data is weakened, and the stability, accuracy and efficiency of data interpolation are greatly improved.
3. According to the atmospheric temperature data missing value interpolation method under covariance analysis, an atmospheric temperature data sequence set is established for each station to be interpolated, a covariance matrix of temperature data reference sequence data is calculated, variance weight vectors are solved, variance models of a plurality of station positions are established, correlation of atmospheric temperature data in time and space is fully considered, and reliability and accuracy of data interpolation are improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings.
FIG. 1 is a flow chart of a method for interpolating missing values of atmospheric temperature data under covariance analysis according to the invention;
FIG. 2a is a graph comparing predicted atmospheric temperature with true atmospheric temperature curves obtained by a multiple regression model of station number 1 in an embodiment of the invention;
FIG. 2b is a graph comparing predicted atmospheric temperature with real atmospheric temperature curves obtained by a multiple regression model of station number 2 in an embodiment of the invention;
FIG. 2c is a graph comparing predicted atmospheric temperature with true atmospheric temperature using a multiple regression model for station number 3 in accordance with an embodiment of the present invention;
FIG. 2d is a graph comparing predicted atmospheric temperature with real atmospheric temperature curves obtained by a multiple regression model of station number 4 in an embodiment of the present invention;
FIG. 2e is a graph comparing predicted atmospheric temperature with true atmospheric temperature using a multiple regression model for station number 5 in an embodiment of the present invention;
FIG. 2f is a graph comparing predicted atmospheric temperature with true atmospheric temperature curves obtained by a multiple regression model of station number 6 in an embodiment of the invention;
FIG. 3 is a schematic diagram of an atmospheric temperature data sequence set construction strategy of the present invention;
FIG. 4 is a schematic diagram of an interpolation device according to the present invention;
fig. 5 is a map of the atmospheric temperature data set of the present invention.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows an atmospheric temperature data missing value interpolation method under covariance analysis according to the present invention, which comprises the steps of:
s1, constructing an influence factor of the atmospheric temperature data.
The influence factors of the atmospheric temperature data are determined according to the climate change theory and the traditional temperature prediction scale, and are selected from the atmospheric temperature data set, and the influence factors comprise: time x 1 Atmospheric pressure x 2 Maximum wind speed x 3 Relative humidity x 4 Steam pressure x 5 Precipitation x 6 Reference temperature x 7 Seven influencing factors are added.
S2, making an atmospheric temperature data set and establishing a multiple regression model of a plurality of station positions.
S21, acquiring an atmospheric temperature data set based on a national geographic database and a temperature monitoring station real-time database, and manufacturing a data set containing time x 1 Atmospheric pressure x 2 Maximum wind speed x 3 Relative humidity x 4 Steam pressure x 5 Precipitation x 6 And a reference temperature x 7 Is provided, and a station location information set.
S211, collecting the atmospheric temperature data of all stations hour by hour, and constructing an atmospheric temperature data set containing station position information.
S212, converting the position information of all stations in the atmospheric temperature data set by using coordinate conversion, and constructing an atmospheric temperature data layer on a map.
S22, establishing a multiple regression model of the position of a single station in the following basic form:
Y b =w 1 x 1 +w 2 x 2 +…+w j x j (5)
wherein Y is b Representing a predicted atmospheric temperature; w (w) j Regression weight coefficients representing the jth influence factor.
S23, aiming at different station positions, performing multiple regression processing on the atmospheric temperature data set in the step S21 by utilizing a MATLAB software multiple regression module to obtain preliminary weight coefficients w of multiple regression models of different station positions j
S24, establishing a multiple regression model of a plurality of station positions in the following basic form:
Y ib =w 1 x i1 +w 2 x i2 +…+w j x ij (6)
wherein Y is ib A predicted atmospheric temperature representing an ith station location sample; x is x ij The jth influencing factor representing the ith station location sample.
The multiple regression model of the plurality of station locations is represented in matrix form as follows:
Y b =Xw (1)。
in a specific embodiment, the station with the latitude of 44.02 and the longitude of 114.95 is selected as the station 1, the station with the longitude of 119.93 and the latitude of 47.17 is selected as the station 2, the station with the longitude of 120.05 and the latitude of 43.88 of the interior Mongolian A Lu Keer is selected as the station 3, the station with the longitude of 119.92 and the latitude of 42.28 of the interior Mongolian A is selected as the station 4, the station with the longitude of 118.75 and the latitude of 41.52 of the interior Mongolian A is selected as the station 5, and the station with the longitude of 118.65 and the latitude of 43.53 of the interior Mongolian B is selected as the station 6.
Aiming at each item of data from the station 1 to the station 6 from the time of 6 months, 10 days, 0 to the time of 6 months, 10 days, 24, respectively establishing multiple regression models, and solving by Matlab to obtain regression weight coefficients of each influence factor, wherein the multiple regression models are respectively obtained as follows:
the graphs of the predicted atmospheric temperature and the true atmospheric temperature obtained by the multiple regression models of the stations 1 to 6 are shown in fig. 2a to 2f, respectively, so that the prediction results of the multiple regression models are effective.
S3, traversing and searching for an atmospheric temperature data missing data set.
According to the Monte Carlo method, traversing all the atmospheric temperature data and station position information of the atmospheric temperature data set, finding out stations with all the atmospheric temperature data in the atmospheric temperature data set missing as stations to be interpolated, and extracting all missing moments of the atmospheric temperature data of each station to be interpolated and corresponding station position information.
S4, establishing an atmospheric temperature data sequence set for each station to be interpolated, wherein FIG. 3 shows a construction strategy of the atmospheric temperature data sequence set. There are 1 or more stations to be interpolated.
S41, combining the moments corresponding to the plurality of atmosphere temperature data which are continuously deleted to form a temperature data deletion sequence, wherein the initial moment of the specified temperature data deletion sequence is a first interpolation point. There are 1 or more first interpolation points in the station to be interpolated.
S42, manufacturing temperature data missing sequence data, temperature data reference sequence data and temperature data test sequence data.
S421, extracting the atmospheric temperature data of the station to be interpolated in a preset period as temperature data missing sequence data, wherein the missing value is replaced by an atmospheric temperature average value in the preset period.
S422, taking all stations except the station to be interpolated, which have no loss of atmospheric temperature data in a preset period, as reference stations. The reference stations are not less than 2.
S423, extracting 80% of data from the atmospheric temperature data of each reference station in a preset period of time respectively to obtain a plurality of temperature data reference sequence data, and correspondingly extracting 80% of temperature data missing sequence data to obtain temperature data reference prediction sequence data.
S424, extracting the remaining 20% of data from the atmospheric temperature data of each reference station in a preset period of time respectively to obtain a plurality of temperature data test sequence data, and correspondingly extracting 20% of temperature data missing sequence data to obtain temperature data test prediction sequence data.
In one embodiment, assuming that station 2 starts to have air temperature data missing for 3 hours at 6 months and 10 days 0, and the missing data is a temperature data missing sequence, then the first interpolation point is determined at 6 months and 10 days 0.
Extracting atmospheric temperature time sequence data in a preset period (from 6 months, 10 days, 0 time to 6 months, 10 days, 24 time) from all other stations without missing atmospheric temperature data except station No. 2, randomly extracting 80% of the data to obtain 5 temperature data reference sequence data, and correspondingly extracting 80% of the temperature data missing sequence data to obtain 1 temperature data reference predicted sequence data; and extracting the remaining 20% of data to obtain 5 temperature data test sequence data, and correspondingly extracting 20% of temperature data missing sequence data to obtain 1 temperature data test prediction sequence data.
S5, calculating a covariance matrix of the temperature data reference sequence data.
The covariance of the reference sequence data of all any two temperature data is calculated, and a covariance matrix is obtained as follows:
wherein y is 1 ,y n A single column matrix representing the composition of all temperatures in the 1 st and nth station locations, respectively; c () represents covariance calculation.
In one embodiment, the covariance matrix calculated based on the temperature data reference sequence data for stations 1, 3, 4, 5, 6 is:
s6, solving a variance weight matrix, and establishing a variance model of a plurality of station positions.
S61, solving a variance weight matrix r.
S611, calculating covariance of all temperature data reference sequence data and temperature data missing sequence data to obtain a reference covariance matrix Z r The following are provided:
Z r =[C(y 1 y 0 )…C(y n y 0 )] T (7)
wherein y is 0 A single column matrix representing all temperatures in the station position to be interpolated; t is the transpose.
S612, defining a variance weight matrix r:
r=[r 1 …r n ] T (8)
wherein r is 1 ,r n The variance weights of the 1 st and nth temperature data reference sequence data are represented, respectively.
S613, based on the covariance matrix and the reference covariance matrix, the solving equation of the variance weight matrix r is as follows:
in one embodiment, the calculated reference covariance matrix Z is based on the temperature data reference sequence data of the stations 1, 3, 4, 5, 6 and the temperature data missing sequence data of the station 2 r The method comprises the following steps:
Z r =[5.9648 6.9089 6.8233 7.7380 7.7973] T (12)。
and (4) calculating a variance weight matrix r to obtain:
r=[0.8600 -0.0313 -0.0556 -0.0487 0.4553] T (13)。
s62, establishing a variance model of a plurality of station positions through a covariance matrix and a variance weight matrix of temperature data reference sequence data as follows:
Y a =Zr (3)。
s7, constructing an atmospheric temperature data missing value interpolation model based on covariance analysis.
S71, defining and solving an error remainder d.
S711, defining error vectorWherein d 1 ,d n The 1 st and nth atmospheric temperature data loss value interpolation error values are shown, respectively.
S712, substituting the temperature data test sequence data into the variance model to obtain a temperature data test prediction sequence data predicted value, and comparing the temperature data test prediction sequence data predicted value with a true value after the temperature data test prediction sequence data to obtain an error vector D.
S713, taking the mean value of the error vector D by the error remainder D.
In one embodiment, the covariance matrix calculated based on the temperature data test sequence data for stations 1, 3, 4, 5, 6 is:
at this time, the variance model is:
Y a =[1.5945 1.2049 -0.6922 1.9456 0.7322] T (15)。
further, based on the temperature data test sequence data of the stations 1, 3, 4, 5, 6, the temperature data test predicted sequence data predicted value y is obtained by Matlab numerical calculation 20
y 20 =[2.8798 2.4898 2.7146 4.2351 6.2836] T (16)。
And comparing the difference with the true value after the temperature data test prediction sequence data to obtain an error vector D:
D=[0.4202 1.0102 1.4854 0.8649 0.4164] T (17)。
then the error margin d is 0.8394.
S72, establishing an atmospheric temperature data missing value interpolation model of a plurality of station positions through a multiple regression model and a variance model of temperature data reference sequence data, wherein the interpolation model is as follows:
Y=(Xw+Y 0 (Zr)+d)/2 (4)
wherein Y is 0 (Zr) represents a predicted value obtained based on the variance model.
Furthermore, the data missing value interpolation model based on covariance analysis is also applicable to the technical field of other meteorological element missing value interpolation.
S8, circularly interpolating the atmospheric temperature data missing value.
Substituting the temperature data missing sequence data of all the stations to be interpolated into the interpolation model of the atmospheric temperature data missing values in the step S72, and repeatedly executing the steps S2 to S7 until the atmospheric temperature data in the atmospheric temperature data set is free of missing.
In one embodiment, the predictive value y is derived based on a multiple regression model 21 The method comprises the following steps:
y 21 =Xw=[12.2284 12.1677 12.2077] T (18)。
predictive value y obtained based on variance model 22 The method comprises the following steps:
y 22 =Y 0 (Zr)=[9.5470 12.8636 14.7894] T (19)。
taking the error remainder into consideration, station No. 2 predicts the air temperature data of 3 hours from day 0 of 6 months to 10 hours as [11.3074 12.9354 13.9183 ]] T By means of the real air temperature data [12 11.3.12.1 ] starting 3 hours at 0 of 6 months and 10 days with station No. 2] T By contrast, the error is [ -0.6926 1.6354 1.8183] T This error is within (-1, 2), proving the effectiveness of the proposed method.
In another aspect of the present invention, an interpolation device using the foregoing method for interpolating missing values of atmospheric temperature data under covariance analysis is provided, as shown in fig. 4, which includes a collection and display module, a search module, a processing module and a calculation module, where the collection and display module collects the atmospheric temperature data of all stations from hour to hour and imports the position information of all stations as one layer into ArcGIS software through coordinate conversion, and displays an atmospheric temperature data layer established on a map; the searching module searches all reference stations through ArcGIS software to establish an atmospheric temperature data sequence set, and traverses to search an atmospheric temperature data missing data set, and the distribution of each atmospheric temperature data sequence set is shown in figure 5; the calculation module builds an atmospheric temperature data missing value interpolation model based on covariance analysis by building a multiple regression model of a plurality of station positions and a covariance matrix of temperature data reference sequence data; the processing module circularly interpolates the missing value of the atmospheric temperature data until the atmospheric temperature data in the atmospheric temperature data set is free of missing.
The invention provides an atmospheric temperature data missing value interpolation method under covariance analysis, which integrates two large statistical models of multiple regression and covariance, introduces influence factors such as seasons, weather, time and reference temperature, and fully considers the correlation in time and space; and the atmospheric temperature data of different stations are divided into a missing sequence, a reference sequence and a test sequence for training, testing and missing value interpolation, so that the reliability and accuracy of data interpolation are improved.
Finally, what should be said is: the above embodiments are merely for illustrating the technical aspects of the present invention, and it should be understood by those skilled in the art that although the present invention has been described in detail with reference to the above embodiments: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention, which is intended to be encompassed by the claims.

Claims (7)

1. The method for interpolating the atmospheric temperature data missing value under the covariance analysis is characterized by comprising the following steps of:
s1, constructing an influence factor of atmospheric temperature data;
the influence factors of the atmospheric temperature data are determined according to the climate change theory and the traditional temperature prediction scale, and are selected from the atmospheric temperature data set, and the influence factors comprise: time x 1 Atmospheric pressure x 2 Maximum wind speed x 3 Relative humidity x 4 Steam pressure x 5 Precipitation x 6 Reference temperature x 7 Seven influence factors in total;
s2, making an atmospheric temperature data set and establishing a multiple regression model of a plurality of station positions;
the multiple regression model of the plurality of station locations is represented in matrix form as follows:
Y b =Xw (1)
wherein Y is b Representing a predicted atmospheric temperature;
s3, traversing to find an atmospheric temperature data missing data set;
according to the Monte Carlo method, traversing all the atmospheric temperature data and station position information of the atmospheric temperature data set, finding out stations with all the atmospheric temperature data in the atmospheric temperature data set missing as stations to be interpolated, and extracting all the missing moments of the atmospheric temperature data of each station to be interpolated and the corresponding station position information;
s4, establishing an atmospheric temperature data sequence set for each station to be interpolated;
s41, combining the moments corresponding to the plurality of atmosphere temperature data which are continuously deleted to form a temperature data deletion sequence, wherein the initial moment of the specified temperature data deletion sequence is a first interpolation point;
s42, manufacturing temperature data missing sequence data, temperature data reference sequence data and temperature data test sequence data;
the step S42 specifically includes the following steps:
s421, extracting the atmospheric temperature data of the station to be interpolated in a preset period as temperature data missing sequence data, wherein the missing value is replaced by an atmospheric temperature average value in the preset period;
s422, taking all stations except the station to be interpolated, which have no loss of atmospheric temperature data in the preset period, as reference stations;
s423, respectively extracting 80% of data from the atmospheric temperature data of each reference station in the preset period to obtain a plurality of temperature data reference sequence data, and correspondingly extracting 80% of temperature data missing sequence data to obtain temperature data reference prediction sequence data;
s424, respectively extracting the remaining 20% of data from the atmospheric temperature data of each reference station in the preset period to obtain a plurality of temperature data test sequence data, and correspondingly extracting 20% of temperature data missing sequence data to obtain temperature data test prediction sequence data;
s5, calculating a covariance matrix of the temperature data reference sequence data;
the covariance of all the reference sequence data of any two temperature data is calculated to obtain a covariance matrix as follows
Wherein y is 1 ,y n A single column matrix representing the composition of all temperatures in the 1 st and nth station locations, respectively; c () represents covariance calculation;
s6, solving a variance weight matrix, and establishing a variance model of a plurality of station positions;
s61, solving a variance weight matrix r;
the step S61 specifically includes the following steps:
s611, calculating covariance of all temperature data reference sequence data and temperature data missing sequence data to obtain a reference covariance matrix Z r The following are provided:
Z r =[C(y 1 y 0 )…C(y n y 0 )] T (7)
wherein y is 0 A single column matrix representing all temperatures in the station position to be interpolated; t is the transpose;
s612, defining a variance weight matrix r:
r=[r 1 …r n ] T (8)
wherein r is 1 ,r n Variance weights of the 1 st and nth temperature data reference sequence data are respectively represented;
s613, based on the covariance matrix and the reference covariance matrix, the solving equation of the variance weight matrix r is as follows:
s62, establishing a variance model of a plurality of station positions through a covariance matrix and a variance weight matrix of temperature data reference sequence data as follows:
Y a =Zr (3);
s7, constructing an atmospheric temperature data missing value interpolation model based on covariance analysis;
s71, defining and solving an error remainder d;
s72, establishing an atmospheric temperature data missing value interpolation model of a plurality of station positions through a multiple regression model and a variance model of temperature data reference sequence data, wherein the interpolation model is as follows:
Y=(Xw+Y 0 (Zr)+d)/2 (4)
wherein Y is 0 (Zr) represents a predicted value obtained based on the variance model;
s8, circularly interpolating the atmospheric temperature data missing value;
substituting the temperature data missing sequence data of all the stations to be interpolated into the interpolation model of the atmospheric temperature data missing values in the step S72, and repeatedly executing the steps S2 to S7 until the atmospheric temperature data in the atmospheric temperature data set is free of missing.
2. The method for interpolating an atmospheric temperature data loss value under covariance analysis according to claim 1, wherein said step S2 comprises the steps of:
s21, acquiring an atmospheric temperature data set based on a national geographic database and a temperature monitoring station real-time database, and manufacturing a data set containing time x 1 Atmospheric pressure x 2 Maximum wind speed x 3 Relative humidity x 4 Steam pressure x 5 Precipitation x 6 And a reference temperature x 7 Is a set of station location information;
s22, establishing a multiple regression model of the position of a single station in the following basic form:
Y b =w 1 x 1 +w 2 x 2 +…+ w j x j (5)
wherein Y is b Representing a predicted atmospheric temperature; w (w) j Regression weight coefficients representing the jth influence factor;
s23, aiming at different station positions, performing multiple regression processing on the atmospheric temperature data set in the step S21 by utilizing a MATLAB software multiple regression module to obtain preliminary weight coefficients w of multiple regression models of different station positions j
S24, establishing a multiple regression model of a plurality of station positions in the following basic form:
Y ib =w 1 x i1 +w 2 x i2 +…+w j x ij (6)
wherein Y is ib A predicted atmospheric temperature representing an ith station location sample; x is x ij The jth influencing factor representing the ith station location sample.
3. The method for interpolating an atmospheric temperature data loss value under covariance analysis according to claim 1, wherein said step S71 comprises the steps of:
s711, defining error vectorWherein d 1 ,d n The 1 st and nth atmospheric temperature data missing value interpolation error values are respectively represented;
s712, substituting the temperature data test sequence data into the variance model to obtain a temperature data test prediction sequence data predicted value, and comparing the temperature data test prediction sequence data predicted value with a true value behind the temperature data test prediction sequence data to obtain an error vector D;
s713, taking the mean value of the error vector D by the error remainder D.
4. The method for interpolating an atmospheric temperature data loss value under covariance analysis according to claim 2, wherein said step S21 comprises the steps of:
s211, collecting the atmospheric temperature data of all stations hour by hour, and constructing an atmospheric temperature data set containing station position information;
s212, converting the position information of all stations in the atmospheric temperature data set by utilizing coordinate conversion, and constructing an atmospheric temperature data layer on a map.
5. The method according to claim 1, wherein the number of stations to be interpolated in step S4 is 1 or more, the number of first interpolation points in the stations to be interpolated in step S41 is 1 or more, and the number of reference stations in step S42 is not less than 2.
6. The method for interpolating missing values of atmospheric temperature data under covariance analysis according to claim 1, wherein the data missing value interpolation model based on covariance analysis in step S7 is also applicable to the technical field of interpolation of missing values of other meteorological elements.
7. An interpolation device using the interpolation method of atmospheric temperature data missing values under covariance analysis according to any one of claims 1 to 6, characterized in that it comprises an acquisition and display module, a search module, a processing module and a calculation module, wherein the acquisition and display module acquires the atmospheric temperature data of all stations hour by hour and imports the position information of all stations as one layer into ArcGIS software through coordinate conversion, and displays the atmospheric temperature data layer established on a map; the searching module searches all reference stations through ArcGIS software to establish an atmospheric temperature data sequence set, and traverses to search an atmospheric temperature data missing data set; the calculation module builds an atmospheric temperature data missing value interpolation model based on covariance analysis by building a multiple regression model of a plurality of station positions and a covariance matrix of temperature data reference sequence data; the processing module circularly interpolates the missing value of the atmospheric temperature data until the atmospheric temperature data in the atmospheric temperature data set is free of missing.
CN202310682985.XA 2023-06-09 2023-06-09 Method and device for interpolating missing values of atmospheric temperature data under covariance analysis Active CN116701371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310682985.XA CN116701371B (en) 2023-06-09 2023-06-09 Method and device for interpolating missing values of atmospheric temperature data under covariance analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310682985.XA CN116701371B (en) 2023-06-09 2023-06-09 Method and device for interpolating missing values of atmospheric temperature data under covariance analysis

Publications (2)

Publication Number Publication Date
CN116701371A CN116701371A (en) 2023-09-05
CN116701371B true CN116701371B (en) 2024-03-22

Family

ID=87827182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310682985.XA Active CN116701371B (en) 2023-06-09 2023-06-09 Method and device for interpolating missing values of atmospheric temperature data under covariance analysis

Country Status (1)

Country Link
CN (1) CN116701371B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473408A (en) * 2013-08-28 2013-12-25 河南大学 Method for restoring missing air temperature records on basis of spatial-temporal information fusion
EP2704065A1 (en) * 2012-08-06 2014-03-05 ATS Group (IP Holdings) Limited System and method for updating a data structure with sensor measurement data
CN105678046A (en) * 2014-11-18 2016-06-15 日本电气株式会社 Missing data repairing method and device in time-space sequence data
CN107577649A (en) * 2017-09-26 2018-01-12 广州供电局有限公司 The interpolation processing method and device of missing data
AU2020104000A4 (en) * 2020-12-10 2021-02-18 Guangxi University Short-term Load Forecasting Method Based on TCN and IPSO-LSSVM Combined Model
CN113312587A (en) * 2021-06-16 2021-08-27 福建中锐网络股份有限公司 Sensor acquisition data missing value processing method based on ARIMA prediction and regression prediction
CN113495913A (en) * 2021-09-07 2021-10-12 中国科学院地理科学与资源研究所 Air quality data missing value interpolation method and device
CN114840616A (en) * 2021-12-28 2022-08-02 北京航空航天大学 Dynamic atmospheric natural environment modeling method based on space-time interpolation
CN115203625A (en) * 2022-07-29 2022-10-18 应急管理部国家减灾中心 Drought and waterlogging index data missing value interpolation method and device
CN115391746A (en) * 2022-10-28 2022-11-25 航天宏图信息技术股份有限公司 Interpolation method, device, electronic device and medium for meteorological element data

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9811614B2 (en) * 2013-03-13 2017-11-07 The United States Of America, As Represented By The Secretary Of The Navy System and method for correcting a model-derived vertical structure of ocean temperature and ocean salinity based on velocity observations
US9355069B2 (en) * 2013-12-20 2016-05-31 Johnson Controls Technology Company Systems and methods for determining the uncertainty in parameters of an energy use model
US9824067B2 (en) * 2014-08-01 2017-11-21 Tata Consultancy Services Limited System and method for forecasting a time series data
GB2547816B (en) * 2014-12-01 2019-08-07 Univ Harbin Eng Actually-measured marine environment data assimilation method based on sequence recursive filtering three-dimensional variation
US10247853B2 (en) * 2015-04-06 2019-04-02 The United States Of America, As Represented By The Secretary Of The Navy Adaptive ecosystem climatology
JP7101084B2 (en) * 2018-08-29 2022-07-14 株式会社東芝 Information processing equipment, information processing system and information processing method
US20200193220A1 (en) * 2018-12-18 2020-06-18 National Sun Yat-Sen University Method for data imputation and classification and system for data imputation and classification

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2704065A1 (en) * 2012-08-06 2014-03-05 ATS Group (IP Holdings) Limited System and method for updating a data structure with sensor measurement data
CN103473408A (en) * 2013-08-28 2013-12-25 河南大学 Method for restoring missing air temperature records on basis of spatial-temporal information fusion
CN105678046A (en) * 2014-11-18 2016-06-15 日本电气株式会社 Missing data repairing method and device in time-space sequence data
CN107577649A (en) * 2017-09-26 2018-01-12 广州供电局有限公司 The interpolation processing method and device of missing data
AU2020104000A4 (en) * 2020-12-10 2021-02-18 Guangxi University Short-term Load Forecasting Method Based on TCN and IPSO-LSSVM Combined Model
CN113312587A (en) * 2021-06-16 2021-08-27 福建中锐网络股份有限公司 Sensor acquisition data missing value processing method based on ARIMA prediction and regression prediction
CN113495913A (en) * 2021-09-07 2021-10-12 中国科学院地理科学与资源研究所 Air quality data missing value interpolation method and device
CN114840616A (en) * 2021-12-28 2022-08-02 北京航空航天大学 Dynamic atmospheric natural environment modeling method based on space-time interpolation
CN115203625A (en) * 2022-07-29 2022-10-18 应急管理部国家减灾中心 Drought and waterlogging index data missing value interpolation method and device
CN115391746A (en) * 2022-10-28 2022-11-25 航天宏图信息技术股份有限公司 Interpolation method, device, electronic device and medium for meteorological element data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于GIS的浙江省年平均气温空间模拟研究;金志凤;李波;袁德辉;;气象科学;20101215(第06期);全文 *

Also Published As

Publication number Publication date
CN116701371A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN108701274B (en) Urban small-scale air quality index prediction method and system
US20160203245A1 (en) Method for simulating wind field of extreme arid region based on wrf
CN110442911B (en) High-dimensional complex system uncertainty analysis method based on statistical machine learning
CN110909447B (en) High-precision short-term prediction method for ionization layer region
CN108764527B (en) Screening method for soil organic carbon library time-space dynamic prediction optimal environment variables
CN114399073A (en) Ocean surface temperature field prediction method based on deep learning
CN112070272A (en) Method and device for predicting icing thickness of power transmission line
CN115032720A (en) Application of multi-mode integrated forecast based on random forest in ground air temperature forecast
CN115204032A (en) ENSO prediction method and device based on multi-channel intelligent model
CN114595876A (en) Regional wind field prediction model generation method and device and electronic equipment
CN116701371B (en) Method and device for interpolating missing values of atmospheric temperature data under covariance analysis
CN115905997B (en) Wind turbine generator meteorological disaster early warning method and system based on prediction deviation optimization
CN110852415A (en) Vegetation index prediction method, system and equipment based on neural network algorithm
CN110909492A (en) Sewage treatment process soft measurement method based on extreme gradient lifting algorithm
CN116245018A (en) Sea wave missing measurement data forecasting method based on bivariate long-short-term memory algorithm
JP7332554B2 (en) Information processing device, information processing method, and computer program
CN113962432A (en) Wind power prediction method and system integrating three-dimensional convolution and light-weight convolution threshold unit
CN112580899A (en) Medium-and-long-term wind power generation prediction method and system fused with machine learning model
CN115186941B (en) Variable optimization climate mode method based on multiple space-time indexes and comprehensive sequencing
CN111199283A (en) Air temperature prediction system and method based on convolution cyclic neural network
CN116449460B (en) Regional month precipitation prediction method and system based on convolution UNet and transfer learning
CN115022348B (en) Intelligent factory cloud-level architecture data storage method for high-end battery
Kaligambe et al. Indoor Room Temperature and Relative Humidity Estimation in a Commercial Building Using the XGBoost Machine Learning Algorithm
Hedlin et al. Accounting for imperfect detection in estimates of yearly site occupancy
CN112651537A (en) Photovoltaic power generation ultra-short term power prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant