CN116432032A - Meteorological data abnormal event identification method based on multi-source data and machine learning - Google Patents

Meteorological data abnormal event identification method based on multi-source data and machine learning Download PDF

Info

Publication number
CN116432032A
CN116432032A CN202310400212.8A CN202310400212A CN116432032A CN 116432032 A CN116432032 A CN 116432032A CN 202310400212 A CN202310400212 A CN 202310400212A CN 116432032 A CN116432032 A CN 116432032A
Authority
CN
China
Prior art keywords
data
meteorological
day
deviation
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310400212.8A
Other languages
Chinese (zh)
Inventor
刘莹
闫荞荞
刘园园
王星宇
王海军
李波
刘梦雨
匡晓为
严婧
孙越
杨宏谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Meteorological Information And Technology Support Center
Original Assignee
Hubei Meteorological Information And Technology Support Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Meteorological Information And Technology Support Center filed Critical Hubei Meteorological Information And Technology Support Center
Priority to CN202310400212.8A priority Critical patent/CN116432032A/en
Publication of CN116432032A publication Critical patent/CN116432032A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention provides a meteorological data abnormal event identification method based on multi-source data and machine learning. When the ground meteorological data quality abnormal event is identified, the multisource data such as geographic information data, ground related meteorological data and annual climate standard values are introduced; the method comprises the steps of establishing a feature vector by utilizing multi-source data of a current site and surrounding adjacent sites, constructing a meteorological element deviation estimation model by using two machine learning algorithms, calculating a deviation sequence of each meteorological element value of the site, and obtaining event statistical features such as average deviation, deviation standard deviation, daily evaluation factors, duration and the like of each meteorological element of the site by a statistical method; by analyzing the event statistics characteristics, the quality abnormal event dividing index is constructed, the abnormal event type is further determined, the purpose of identifying the weather data quality abnormal event is achieved, the ground weather data long-term quality problem can be monitored in real time, and the method has a good supporting effect on improving the ground weather data quality and promoting the weather site instrument maintenance.

Description

Meteorological data abnormal event identification method based on multi-source data and machine learning
Technical Field
The invention belongs to the technical field of applied weather, and particularly relates to a method for identifying abnormal events of weather data based on multi-source data and machine learning.
Background
The quality management of meteorological data can be logically divided into 3 links of quality control, quality assessment and quality monitoring. The important point of quality control is to detect coarse errors of observed data, meet the high-timeliness requirement of users, and detect single isolated error data, but have limited detection capability on systematic deviation problems; the quality evaluation is to evaluate the quality of a batch of data, reflects the overall quality of the observation element, and can detect the hidden deeper data quality problem caused by the performance degradation of the sensor and the poor observation environment; the quality monitoring is to feed back the results of quality control and quality evaluation to the observation end, which facilitates the closed-loop management of the data quality.
In 2015, a meteorological data service system (MDOS) is applied to various nationwide provinces, so that real-time quality control of ground meteorological observation data is realized, coarse error data in the observation data are effectively identified, and the application quality of ground observation data is improved. But the promotion of the business works such as ground observation automation work and live business construction work from 2018 brings new requirements for the quality and management of ground observation data. On one hand, the requirements on the data quality are more strict and fine; on the other hand, the unattended operation of the station and the change of the data processing service layout, how to timely discover and discharge the influence of bad observation instruments and environments on the quality of the observed data are the problems that the data quality management must face and solve. Aiming at the new characteristics and the new problems of the quality and quality management of the ground observation data, a set of weather data quality evaluation method is urgently needed to be developed and put into business application.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention provides a meteorological data abnormal event identification method based on multi-source data and machine learning, which comprises the following steps:
s1, firstly, multi-source data such as ground meteorological observation data, year-old climate standard values and geographic information data of meteorological elements are acquired, wherein the meteorological elements comprise air temperature, air pressure, relative humidity, wind speed and precipitation;
s2, establishing initial feature vectors { lat, lon, alt, slope, aspect, sea, relatedeles }, by utilizing the multi-source data; wherein lat represents latitude, lon represents longitude, alt altitude, slope represents slope, aspect represents slope direction, sea represents marine effect factor, and relatedeles represents meteorological element;
s3, analyzing and screening the initial feature vector by utilizing a feature importance analysis tool, and respectively retaining some feature factors which have larger influence on meteorological factors such as air temperature, air pressure, relative humidity and wind speed;
s4, based on the multi-source data corresponding to the reserved characteristic factors, taking the characteristic factors of surrounding adjacent weather stations as model input, taking the weather element observation values corresponding to the current stations as target values, and respectively adopting a random forest algorithm and an extreme gradient lifting algorithm to perform model training to obtain a trained random forest model and an extreme gradient lifting model;
s5, taking characteristic factors of surrounding adjacent weather stations as model inputs, respectively inputting the model inputs into a random forest model and an extreme gradient lifting model to respectively obtain estimated values Est_RF and Est_XGB of weather elements, and then obtaining an hour-by-hour deviation value BIAS of the weather elements to construct a weather element deviation estimation model, wherein a specific calculation formula is as follows
Est=ω 1 *Est_RF+ω 2 *Est_XGB,
BIAS=Obs-Est;
Wherein omega 1 And omega 2 For weighting the corresponding estimate, ω 1 And omega 2 Are all greater than 0, and omega 12 =1, obs represents the observed value of the meteorological element corresponding to the current station, and Est represents the final estimated value of the meteorological element corresponding to the current station;
s6, constructing a daily evaluation factor based on an hour-by-hour deviation value BIAS of the meteorological element, confirming a daily time scale quality abnormal event based on the daily evaluation factor, and constructing event statistical features such as average deviation, deviation standard deviation and abnormal event duration by combining the daily time scale event to realize identification of the quality abnormal event of the meteorological data at any period. Further, in the meteorological data abnormal event identification method based on multi-source data and machine learning of the invention, the ground meteorological observation data comprises hour-by-hour data of meteorological elements and average values and variability of the meteorological elements corresponding to a longer time range obtained based on the hour-by-hour data.
Further, in the meteorological data abnormal event identification method based on multi-source data and machine learning, the annual climate standard values comprise average values, extreme values and various event occurrence frequency of day, month and year meteorological elements of the last 30 years.
Further, in the meteorological data abnormal event identification method based on multi-source data and machine learning, the geographic information data comprises DEM elevation, gradient, slope direction and marine effect factors, and is manufactured based on an SRTM3 topographic dataset and global geographic information public products of Global positioning system 30.
In the meteorological data abnormal event identification method based on multi-source data and machine learning, the neighbor stations specifically use a neighborhood principle on time matching of the neighbor stations, a round boundary expansion is used on space matching, the time range of the neighborhood principle is defined as an automatic station observation period in the current hour, the space range of the round boundary expansion is defined as a round area which takes the position of the station as the center and extends a specific radius range to the periphery, and the specific radius range is selected from a round neighborhood range with the radius of 50-70 km.
Further, in the meteorological data abnormal event identification method based on multi-source data and machine learning, the feature importance analysis tool is provided by a random forest algorithm.
In the method for identifying abnormal events of meteorological data based on multi-source data and machine learning, the daily-based time window is used for constructing an evaluation factor, and specifically, a daily average deviation DAY is selected bias_avg Standard deviation DAY bias_stdev Daily average DAY obs_avg DAY standard deviation DAY obs_stdev DAY standard deviation DAY bias_obs_stdev Wind gust coefficient DAY gust_factor As a daily evaluation factor, the calculation formula is as follows:
Figure BDA0004179206960000031
Figure BDA0004179206960000032
Figure BDA0004179206960000033
Figure BDA0004179206960000034
Figure BDA0004179206960000035
Figure BDA0004179206960000036
wherein, the obs i Is the ith time observation, bias i For the ith time deviation value, n is the effective observation time number in the DAY, m is the adjacent station number of the station to be evaluated, and DAY fmost For maximum wind speed on DAY, DAY fmax Is the daily maximum wind speed.
Further, in the meteorological data abnormal event identification method based on multi-source data and machine learning, the daily evaluation factor specifically comprises that the data of air temperature, air pressure and relative humidity are higher, the data are lower, the amplitude is larger and the amplitude is smaller, the relative humidity is undersaturated, the data of wind speed is higher, the data are lower and the starting wind speed is increased for determining the standard to finish the daily time scale quality abnormal event.
The invention provides a meteorological data abnormal event identification method based on multi-source data and machine learning. When the ground meteorological data abnormal event is identified, the multisource data such as geographic information data, related ground meteorological data and annual climate standard values are introduced; calculating deviation conditions among meteorological element values of a site by using a multisource data set and characteristic factors of a current site and surrounding adjacent sites and using two machine learning method modeling, and obtaining statistical characteristics such as average deviation, deviation standard deviation and the like of abnormal data by a statistical method; according to the distribution and the continuous condition of element value deviation, an abnormal event index is constructed, the types of the abnormal events are further divided, the purpose of identifying the data abnormal events is achieved, and the method has good effects on improving the quality of ground meteorological data and maintaining supporting site instruments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application, are briefly described below. It is evident that the figures in the following description are only some embodiments of the invention, from which other figures can be obtained without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for identifying unusual events of meteorological data based on multi-source data and machine learning according to an embodiment of the present invention;
fig. 2 is a flowchart of the production of geographic information data according to an embodiment of the invention.
FIG. 3 is a block diagram of a model for estimating meteorological elements based on a random forest algorithm according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements throughout the different drawings, unless indicated otherwise. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a method for identifying unusual events of meteorological data based on multi-source data and machine learning according to an embodiment of the present invention. The meteorological data abnormal event identification method based on multi-source data and machine learning of the embodiment comprises the following steps:
s1, firstly, multi-source data such as ground meteorological observation data, year-old climate standard values and geographic information data of meteorological elements are obtained, wherein the meteorological elements comprise air temperature, air pressure, relative humidity, wind speed and precipitation.
The ground meteorological observation data are observation data of a meteorological station and mainly comprise hour-by-hour barometric pressure, air temperature, relative humidity, wind and precipitation elements. In addition, parameters such as mean value, variability and the like of the corresponding meteorological elements in a given longer time range are counted according to sites and according to ground meteorological observation standards based on the hour-by-hour observation data to be supplemented.
The cumulative climate standard values comprise the climate standard values of average values, extreme values and occurrence frequency of various events of weather elements of days, months and years of the last 30 years. The cumulative climate standard value used in the embodiment is the day, month and year climate standard value of more than 2000 national grade ground meteorological stations in the country in the last 30 years, and specific elements comprise air pressure, air temperature, relative humidity, wind and precipitation elements.
The geographic information data includes DEM elevation, slope direction and marine effect factors, and is made based on the SRTM3 terrain dataset and GlobeLand30 global geographic information public product, and reference is made specifically to fig. 2. The gradient represents the degree of steepness of the ground surface unit, the ratio of the vertical height of the slope to the distance in the horizontal direction is generally called the gradient, and the gradient in the grid image calculates the maximum change rate of the value of each pixel in the direction from the pixel to the pixels adjacent to the pixel; the slope direction is defined as the projection direction of the normal line of the slope surface on the horizontal plane, and the slope direction in the grid image is the downhill direction with the largest change rate of the values from each pixel to the directions of adjacent pixels; the effect of sea Liu Reli differences on the spatial distribution of meteorological elements is called the ocean effect, the ocean effect factors are related to the distance between a site and the ocean, and the average value of the distance from a grid point to a coastline in a specific range around the site is taken.
S2, establishing initial feature vectors { lat, lon, alt, slope, aspect, sea, relatedeles }, by utilizing the multi-source data; wherein lat represents latitude, lon represents longitude, alt altitude, slope represents slope, aspect represents slope direction, sea represents marine effect factor, and relatedeles represents meteorological element. Since each training sample includes one input vector and one output vector during the training in the subsequent step S4, the meteorological elements corresponding to the corresponding output vectors are different according to the meteorological elements to be identified by the abnormal event. For example, when an abnormal event is identified for a certain weather factor, the observed value of the weather factor is used as an output vector, and the elements in the initial feature vector relateives and the remaining elements in the relateives are used as input vectors to perform step S3, or some elements irrelevant to the weather factor to be identified for the abnormal event currently can be manually removed from the input vector, and the invention is not limited to this specific method. S3, analyzing and screening the initial feature vector by using a feature importance analysis tool provided by a machine learning algorithm provided by a random forest algorithm, and respectively retaining some feature factors with larger influence on meteorological factors such as air temperature, air pressure, relative humidity and wind speed. In other embodiments of the invention, the primary analysis may also be implemented. In the invention, only some characteristic factors which have great influence on weather factors such as air temperature, air pressure, relative humidity and wind speed are analyzed and selected, and some characteristic factors which have great influence on rainfall are not analyzed and selected, so that when the initial characteristic vector is established in step S2 of some embodiments of the invention, the output vector can not be established as the relevant initial characteristic vector of the rainfall. In the subsequent steps, the weather data abnormal event corresponding to the rainfall is not identified. S4, based on the multi-source data corresponding to the reserved characteristic factors, taking the characteristic factors of surrounding adjacent weather stations as model input, taking the weather element observation value corresponding to the current station as a target value, and respectively adopting a random forest algorithm and an extreme gradient lifting algorithm to perform model training to obtain a trained random forest model and an extreme gradient lifting model.
The change in meteorological elements has a spatial correlation, taking air temperature as an example, the air temperature value observed by one site and the air temperature observed by a nearby adjacent site generally have a high spatial correlation, and the closer the distance between sites is, the greater the correlation. In view of this, the weather element value of the current site can be estimated based on the observation data of the neighboring site by using a data space analysis technique or the like, and the deviation of the observation value of the current site from the estimated value can be stabilized within a fixed interval, generally fluctuates around a zero value, and the data quality can be estimated by analyzing the deviation sequence change according to the rule.
Meteorological element identification based on space consistency test in the inventionOtherwise, the "neighborhood principle" is used for time matching of neighboring stations, and the circular boundary expansion method is used for space matching. The "neighborhood principle" time range is defined as the current hour automatic station observation period, the space range of the expansion of the circular boundary is defined as the circular area srange=pi R of which the specific radius range is expanded to the periphery by taking the position of the station as the center 2 R is a round neighborhood range with the radius of 50-70km for the adjacent stations.
Random forests are an integrated learning algorithm based on the Bagging framework proposed by Breiman in 2001, and can be used for processing classification and regression problems at the same time. A large number of theoretical and empirical researches prove that the random forest has high prediction accuracy, has good tolerance to abnormal values and noise, and has good application effect in a plurality of fields. The ground meteorological element estimation belongs to regression problem, a random forest regression algorithm is adopted to establish a ground meteorological element estimation model, and the algorithm model is shown in figure 3. The main idea of the random forest algorithm is to randomly extract a plurality of Bootstrap sample sets from a given data set, respectively construct a CART decision tree for each sample set, randomly extract M features from N features in a feature set for each splitting node when constructing the decision tree, select an optimal feature from the M features as a splitting variable, establish the CART decision tree in a complete splitting mode, and finally obtain a final predicted value based on the predicted results of the plurality of decision trees.
XGBoost is an integrated machine learning algorithm based on Boosting framework, which is proposed by Chen Tianji et al in 2016, is improved and expanded from GBDT (gradient lifting tree) algorithm, and the algorithm is greatly improved in parallel and operation efficiency. The XGBboost tree model is used as a base classifier to realize the prediction and estimation of the ground meteorological elements, the central idea of the algorithm is to continuously split the characteristics to grow into a plurality of trees, each generated tree is a new function used for simulating the last residual error, and the calculated value of each leaf node is added to obtain the final estimated value. The XGBoost algorithm is used for estimating the ground meteorological elements, and can better characterize nonlinear relations among the meteorological elements, complex terrains, ocean effects, cloud parameters and the like.
S5, taking characteristic factors of surrounding adjacent weather stations as model inputs, respectively inputting the model inputs into a random forest model and an extreme gradient lifting model, respectively obtaining estimated values Est_RF and Est_XGB of weather elements, and then obtaining an hour-by-hour deviation value BIAS of the weather elements, thereby constructing a weather element deviation estimation model, wherein the specific calculation formula is as follows:
Est=ω 1 *Est_RF+ω 2 *Est_XGB,
BIAS=Obs-Est;
wherein omega 1 And omega 2 For weighting the corresponding estimate, ω 1 And omega 2 Are all greater than 0, and omega 12 =1, obs represents the observed value of the weather element corresponding to the current station, and Est represents the final estimated value of the weather element corresponding to the current station. In the present embodiment, ω1=ω2=0.5.
S6, constructing a daily evaluation factor based on an hour-by-hour deviation value BIAS of the meteorological element, confirming a daily time scale quality abnormal event based on the daily evaluation factor, and constructing event statistical features such as average deviation, deviation standard deviation and abnormal event duration by combining the daily time scale event to realize identification of the quality abnormal event of the meteorological data at any period. On the basis of carrying out statistical analysis on weather element hour-by-hour deviation BIAS, comprehensively considering the statistical characteristics of the observed values, dividing data quality abnormal events into guide, orienting to different quality evaluation application scenes, establishing quality evaluation indexes by elements, and completing confirmation of the quality abnormal events based on the evaluation indexes.
Considering the influence of a short duration and local weather system on the space change of meteorological elements, shielding the interference of random uncertainty factors on the BIAS analysis of hour-by-hour deviation, constructing an evaluation factor by taking a day as a time window, then finishing the confirmation of a daily time scale quality abnormal event by taking the daily evaluation factor as a judgment standard, and finally integrating the duration, deviation and element time sequence statistical characteristics of the daily time scale quality abnormal event, establishing a quality abnormal event confirmation mechanism and indexes at any time period, and realizing the manufacture of quality evaluation products of different types.
In order to realize traceable management of data quality assessment, the designability of an assessment result is improved, the concept of 'meteorological data abnormal event identification' is introduced into the ground meteorological data quality assessment work, data with larger deviation between ground meteorological observation data and a true value are defined as quality abnormal data, an event with the quality abnormal data is defined as quality abnormal event, and a ground meteorological data quality abnormal event division model is established.
In order to enable the quality abnormal event dividing list to reflect the quality problems existing in ground observation data as comprehensively as possible, a ground meteorological data quality problem individual data set is collected and established, the data set collects the observation data of a certain time range of a site and the surrounding vicinity reference stations, and partial fault station investigation feedback results, and real-time abnormal element 4 types are defined through space-time distribution characteristics and cause analysis of the quality problem individual data, wherein the real-time abnormal element 4 types comprise 16 abnormal events, and the specific division is shown in the following table.
Figure BDA0004179206960000081
Figure BDA0004179206960000091
In this embodiment, the DAY is used as the time window to construct the evaluation factor, and the DAY evaluation factor is used as the judgment standard to complete the confirmation of the abnormal event of the quality of the DAY time scale, specifically, the DAY average deviation DAY is selected bias_avg Standard deviation DAY bias_stdev Daily average DAY obs_avg DAY standard deviation DAY obs_stdev DAY standard deviation DAY bias_obs_stdev Wind gust coefficient DAY gust_factor As a daily evaluation factor, the calculation formula is as follows:
Figure BDA0004179206960000092
Figure BDA0004179206960000093
Figure BDA0004179206960000094
Figure BDA0004179206960000095
Figure BDA0004179206960000096
Figure BDA0004179206960000097
wherein, the obs i Is the ith time observation, bias i For the ith time deviation value, n is the effective observation time number in the DAY, m is the adjacent station number of the station to be evaluated, and DAY fmost For maximum wind speed on DAY, DAY fmax Is the daily maximum wind speed.
And (3) taking the daily evaluation factor as a judgment standard, and adopting a threshold value method item to identify the daily quality abnormal event. Taking air temperature as an example, the labeling rules of the events of higher air temperature, lower air temperature, larger air temperature amplitude and smaller air temperature amplitude are shown in the following table, wherein DAY is as follows t_bias_avg DAY t_bias_avg DAY average air temperature deviation, DAY t_stdev_bias DAY mean air temperature deviation standard deviation, DAY t_obs_stdev Is the standard difference of the temperature of the sun and the air, DAY t_bias_obs_stdev Standard deviation of solar air temperature, alpha 1 、α 2 、α 3 、α 4 α 4 Beta and beta 1 、β 2 、β 3 、β 4 For quality evaluation index threshold values, 1.6, 2.0, 0.6 and 1.8, 0.4, 3.0, 0.4 are determined by statistical analysis based on national station data. Other factors such as air pressure, relative humidity, wind and other daily quality abnormal events are confirmed by a method similar to the air temperature, and only the evaluation index and the threshold value of the evaluation index are adjusted.
Figure BDA0004179206960000101
According to the labeling result of the daily quality abnormal event in the evaluation period, the statistical characteristics of the hour-by-hour deviation are synthesized, and the quality abnormal event in any given period is confirmed through analysis of a certain event duration, so that the influence of random factors on the evaluation result is reduced.
Referring to the table below, taking the case of the air temperature month quality abnormality event determination as an example, avg_montath_tbias is used as the intra-month average deviation DAY t_bias_avg Average value of N t_devent_datahigh 、N t_devent_datalow 、N t_devent_chghigh N t_devent_chglow The confirmation rules of the abnormal event of the quality of the month are shown in table 5 when the days of the abnormal event of the quality of the day, which are higher, lower, larger and smaller in temperature amplitude, appear in the month, and the values of alpha and beta are respectively 10 days and 1.2 ℃ through statistical analysis.
Event name Labeling rules
High temperature N t_devent_datahigh ≥α&avg_month_tbias≥β
Low air temperature N t_devent_datalow ≥α&avg_month_tbias<-β
The amplitude of air temperature is large N t_devent_chghigh ≥α
Small air temperature amplitude N t_devent_chglow ≥α
Aiming at the meteorological element estimated value, the invention introduces an implantation error checking method to verify the accuracy of a deviation estimation method based on machine learning, and understand the performance of the deviation value BIAS in meteorological data quality evaluation. The implantation error checking method is to artificially add a known amount of error data to the checking data and then to check and analyze the checking object.
Taking an air temperature element implantation error method as an example, adding a known air temperature error value to the air temperature observation data of an inspected site in an hour, calculating an air temperature estimated value of the site by using a machine learning deviation estimation method, and performing deviation analysis on the implantation error value and the estimated value to quantitatively detect the detection rate of the estimated value, thereby measuring the detection capability of a quality estimation method on data errors. The method uses the detection rate DRate T As a main test index, the calculation method is as follows:
Figure BDA0004179206960000111
wherein T is the temperature value of implantation error, and is 0 ℃,1 ℃, 2 ℃, 3 ℃, 4 ℃ and 5 ℃ respectively, and AvgBias T Is the average deviation, i.e. the average difference between the observed data Obs and the estimated value Est.
The applicability of the quality assessment method across the country was examined using the 2021 month 1 and 7 month national 158 national grade reference weather station observed hour air temperature data. After the air temperature observation data are respectively implanted with errors of 1-5 ℃, the air temperature implantation errors can be effectively detected by a deviation detection method based on machine learning, and the detection rate can be effectively improved
The implantation error is above 84.0%, and the detection rate is up to above 90% when the implantation error is at 5 ℃; the detection rates of 1 month are respectively 84.6%, 86.5%, 87.8%, 89.1% and 90.3%, and the detection rates of 7 months are respectively 84.5%, 85.6%, 87.5%, 88.8% and 89.9%, so that the deviation value detected by the method is close to the implantation error value, and the stronger the deviation detection capability is, the slightly higher the 1 month detection capability is as the implantation error value is increased. The final conclusion is that the deviation estimation method based on machine learning can effectively detect the implantation error of the air temperature, and the air temperature element estimation value obtained by the method can reflect the actual air temperature level of the site to a certain extent.
It should be understood that in the description of all embodiments of the present invention, the terms "upper," "lower," "left," "right," and the like indicate an orientation or a positional relationship based on that shown in the drawings, and are merely for convenience of description and simplification of the description, and do not indicate or imply that the apparatus or element in question must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "coupled," "connected," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally formed; may be mechanically connected, may be electrically connected or may communicate with each other; the two elements can be directly connected or indirectly connected through an intermediate medium to form a linkage relationship, and the linkage relationship can be the communication between the two elements or the interaction relationship between the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the description of the present specification, reference to the terms "some embodiments," "one particular implementation," "a particular implementation," "one example," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, a particular feature, structure, material, or characteristic described in connection with the above may be combined in any suitable manner in one or more embodiments or examples.
In addition, it should be noted that the foregoing embodiments may be combined with each other, and the same or similar concept or process may not be repeated in some embodiments, that is, the technical solutions disclosed in the later (described in the text) embodiments should include the technical solutions described in the embodiment and the technical solutions described in all the embodiments before the embodiment.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (8)

1. A meteorological data abnormal event identification method based on multi-source data and machine learning is characterized by comprising the following steps:
s1, firstly, multi-source data such as ground meteorological observation data, year-old climate standard values and geographic information data of meteorological elements are acquired, wherein the meteorological elements comprise air temperature, air pressure, relative humidity, wind speed and precipitation;
s2, establishing initial feature vectors { lat, lon, alt, slope, aspect, sea, relatedeles }, by utilizing the multi-source data; wherein lat represents latitude, lon represents longitude, alt altitude, slope represents slope, aspect represents slope direction, sea represents marine effect factor, and relatedeles represents meteorological element;
s3, analyzing and screening the initial feature vector by utilizing a feature importance analysis tool, and respectively retaining some feature factors which have larger influence on meteorological factors such as air temperature, air pressure, relative humidity and wind speed;
s4, based on the multi-source data corresponding to the reserved characteristic factors, taking the characteristic factors of surrounding adjacent weather stations as model input, taking the weather element observation values corresponding to the current stations as target values, and respectively adopting a random forest algorithm and an extreme gradient lifting algorithm to perform model training to obtain a trained random forest model and an extreme gradient lifting model;
s5, taking characteristic factors of surrounding adjacent weather stations as model inputs, respectively inputting the model inputs into a random forest model and an extreme gradient lifting model, respectively obtaining estimated values Est_RF and Est_XGB of weather elements, and then obtaining an hour-by-hour deviation value BIAS of the weather elements, thereby constructing a weather element deviation estimation model, wherein the specific calculation formula is as follows:
Est=ω 1 *Est_RF+ω 2 *Est_XGB,
BIAS=Obs-Est;
wherein omega 1 And omega 2 For weighting the corresponding estimate, ω 1 And omega 2 Are all greater than 0, and omega 12 =1, obs represents the observed value of the meteorological element corresponding to the current station, and Est represents the final estimated value of the meteorological element corresponding to the current station;
s6, constructing a daily evaluation factor based on an hour-by-hour deviation value BIAS of the meteorological element, confirming a daily time scale quality abnormal event based on the daily evaluation factor, and constructing event statistical features such as average deviation, deviation standard deviation and abnormal event duration by combining the daily time scale event to realize identification of the quality abnormal event of the meteorological data at any period.
2. The method for identifying abnormal events of meteorological data based on multi-source data and machine learning of claim 1, wherein the ground meteorological observation data comprises hour-by-hour data of meteorological elements and average and variability of meteorological elements corresponding to a longer time range obtained based on the hour-by-hour data.
3. The method for identifying abnormal events of meteorological data based on multi-source data and machine learning of claim 1, wherein the annual climate standard values comprise the average, extremum and various types of event occurrence frequency of day, month and year meteorological elements of the last 30 years.
4. The method of claim 1, wherein the geographic information data includes DEM elevation, grade, slope and marine effect factors and is based on an SRTM3 terrain dataset and GlobeLand30 global geographic information common product.
5. The method for identifying the abnormal events of the meteorological data based on the multi-source data and the machine learning according to claim 1, wherein a neighborhood principle is used by the peripheral adjacent meteorological stations on time matching of adjacent stations, a round boundary expansion is used on space matching, a time range of the neighborhood principle is defined as an automatic station observation period in the current hour, a space range of the round boundary expansion is defined as a round area which takes the position of the station as the center and extends a specific radius range to the periphery, and the specific radius range is selected from a round neighborhood range with the radius of 50-70 km.
6. The method for identifying unusual events of meteorological data based on multi-source data and machine learning of claim 1, wherein the feature importance analysis tool is a feature importance analysis tool provided by a random forest algorithm.
7. The method for identifying unusual events of meteorological data based on multi-source data and machine learning according to claim 1, wherein the first DAY-to-DAY time window is used for constructing the evaluation factor, in particular selecting a DAY average deviation DAY bias_avg Standard deviation DAY bias_ Daily average DAY obs_ DAY standard deviation DAY obs_
DAY standard deviation DAY bias_bs_ Wind gust coefficient DAY gust_ As a daily evaluation factor, the calculation formula is as follows:
Figure FDA0004179206940000021
Figure FDA0004179206940000031
Figure FDA0004179206940000032
Figure FDA0004179206940000033
Figure FDA0004179206940000034
Figure FDA0004179206940000035
wherein, the obs i Is the ith time observation, bias i For the ith time deviation value, n is the effective observation time number in the DAY, m is the adjacent station number of the station to be evaluated, and DAY fmost For maximum wind speed on DAY, DAY fmax Is the daily maximum wind speed.
8. The method for identifying abnormal events of meteorological data based on multi-source data and machine learning according to claim 7, wherein the determination of abnormal events of quality of time scale of day based on the determination criterion is completed by the daily evaluation factor specifically comprises that data of air temperature, air pressure and relative humidity are higher, data are lower, amplitude is larger and amplitude is smaller, relative humidity is undersaturated, wind speed is higher, data are lower and starting wind speed is increased.
CN202310400212.8A 2023-04-12 2023-04-12 Meteorological data abnormal event identification method based on multi-source data and machine learning Pending CN116432032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310400212.8A CN116432032A (en) 2023-04-12 2023-04-12 Meteorological data abnormal event identification method based on multi-source data and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310400212.8A CN116432032A (en) 2023-04-12 2023-04-12 Meteorological data abnormal event identification method based on multi-source data and machine learning

Publications (1)

Publication Number Publication Date
CN116432032A true CN116432032A (en) 2023-07-14

Family

ID=87082846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310400212.8A Pending CN116432032A (en) 2023-04-12 2023-04-12 Meteorological data abnormal event identification method based on multi-source data and machine learning

Country Status (1)

Country Link
CN (1) CN116432032A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688975A (en) * 2024-02-02 2024-03-12 南京信息工程大学 Meteorological event prediction method and system based on evolution rule mining
CN117688975B (en) * 2024-02-02 2024-05-14 南京信息工程大学 Meteorological event prediction method and system based on evolution rule mining

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688975A (en) * 2024-02-02 2024-03-12 南京信息工程大学 Meteorological event prediction method and system based on evolution rule mining
CN117688975B (en) * 2024-02-02 2024-05-14 南京信息工程大学 Meteorological event prediction method and system based on evolution rule mining

Similar Documents

Publication Publication Date Title
CN111260111B (en) Runoff forecasting improvement method based on weather big data
US7228235B2 (en) System and method for enhanced measure-correlate-predict for a wind farm location
CN113919231B (en) PM2.5 concentration space-time change prediction method and system based on space-time diagram neural network
CN113033957B (en) Multi-mode rainfall forecast and real-time dynamic inspection and evaluation system
Free et al. Creating climate reference datasets: CARDS workshop on adjusting radiosonde temperature data for climate monitoring
CN113108918B (en) Method for inverting air temperature by using thermal infrared remote sensing data of polar-orbit meteorological satellite
CN110134907B (en) Rainfall missing data filling method and system and electronic equipment
CN113592132B (en) Rainfall objective forecasting method based on numerical weather forecast and artificial intelligence
CN113901384A (en) Ground PM2.5 concentration modeling method considering global spatial autocorrelation and local heterogeneity
CN114186423A (en) Method and system for predicting and evaluating suitable planting area of cigar smoking product
CN116363601A (en) Data acquisition and analysis method and system for pollution monitoring equipment
Li et al. Projection and possible causes of summer precipitation in eastern China using self-organizing map
CN116308958A (en) Carbon emission online detection and early warning system and method based on mobile terminal
CN115575601A (en) Vegetation drought index evaluation method and system based on water vapor flux divergence
CN116223395A (en) Near-surface trace gas concentration inversion model and inversion method
CN109543911B (en) Sunlight radiation prediction method and system
CN116432032A (en) Meteorological data abnormal event identification method based on multi-source data and machine learning
Mandal et al. Precipitation forecast verification of the Indian summer monsoon with intercomparison of three diverse regions
CN113176420B (en) Wind speed forecast correction system for power grid pole tower point
CN113742929A (en) Data quality evaluation method for grid weather live
CN111289725B (en) Farmland soil organic carbon reserve estimation method and system combining model and time sequence sampling
CN110208876B (en) Characterization method for cooperative change of radial positions of subtropical zone torrent and polar front torrent
CN112380778A (en) Weather drought forecasting method based on sea temperature
CN112163639A (en) Crop lodging classification method based on height distribution characteristic vector
CN115952690B (en) Construction method and system of typhoon wind field with long reproduction period

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination