Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a short-time strong precipitation identification method based on Doppler weather radar data, which solves the problem that the radar characteristics of hail are mainly utilized, the short-time strong precipitation is identified as an example of the hail, the characteristics of short-time strong precipitation convection monomers and the difference between the short-time strong precipitation and non-strong convection weather are not considered, and therefore, the type of the non-strong precipitation monomers can be mixed in the input convection monomers which are identified as the non-hail through a hail-short-time strong precipitation identification model. The method can detect the convection monomer generating strong short-time rainfall in the non-hail monomer by using Doppler weather radar data and a machine learning algorithm, and can perform early warning on disasters in time. The technical scheme of the invention is described in detail as follows:
the invention provides a short-time heavy precipitation identification method based on Doppler radar data, which comprises the following steps of:
step one, collecting short-time strong precipitation live information and Doppler weather radar data matched with the short-time strong precipitation live information, and converting the radar data into three-dimensional grid point data;
the short-time strong precipitation live information comprises precipitation starting time, precipitation ending time, an automatic station number, and 5-minute accumulated precipitation of the automatic station from the starting time to the precipitation ending time.
The step of converting the radar data into the three-dimensional lattice point data refers to the step of performing bilinear interpolation operation on the reflectivity data of 9 elevation angles of the radar to obtain 512 x 31 three-dimensional lattice point data, wherein the data resolution is 1km x 0.5 km.
Step two, identifying convection monomers at all moments from the radar data of each short-time strong precipitation event, matching the convection monomers with corresponding live information and marking the convection monomers;
the steps of matching and marking the convection current monomer with the live information are as follows:
calculating the total rainfall accumulated by each mobile station for one hour from the current moment according to the original live information, and using the total rainfall as a type label of a relevant monomer of each mobile station at the current moment;
the types of monomers include heavy precipitation convection monomers and non-heavy precipitation convection monomers.
For the convection monomer at each moment, recording the positions and the hourly rainfall of all automatic stations corresponding to the same or the nearest moment in the live information, and marking the strong precipitation convection monomer and the non-strong precipitation convection monomer according to the following rules:
rule 1-1: recording the position of the automatic station in the range of the monomer area, and marking the monomer as a strong precipitation monomer if the hourly rainfall of the automatic station is greater than or equal to 20 mm;
rule 1-2: the upper limit of the non-precipitation monomer is set to 18mm/h, considering that the automation station may have large or missing errors in the recorded data due to temporary instrument failure, and that the public is not sensitive to the difference between the 20mm/h threshold and the 19mm/h threshold of the short-term precipitation. Therefore, the single cells which are within the single cell area and have the hour rainfall less than 18mm are marked as non-strong rainfall single cells;
rules 1-3: the rainfall amount of the sample is the maximum value in all automatic station hour rainfall amount records which meet the conditions.
Rules 1-4: samples of radar-based data corruption are deleted.
Recording the strong precipitation convection monomer as a positive sample, recording the non-strong precipitation monomer as a negative sample, and extracting the characteristics of all monomers;
the characteristics of the monomer include a reflectivity density characteristic, a reflectivity intensity characteristic, a reflectivity gradient characteristic, a distance characteristic and a liquid water content characteristic.
The reflectivity density type characteristics are obtained from radar three-dimensional lattice point data and comprise single 30dBZ reflectivity density, single 40dBZ reflectivity density and space single reflectivity 40dBZ ratio;
the reflectivity intensity class characteristics are obtained from radar combined reflectivity data and comprise a monomer combined reflectivity mean value, a 90% quantile of the monomer maximum reflectivity intensity, an 85% quantile of the monomer maximum reflectivity intensity, an 80% quantile of the monomer maximum reflectivity intensity, a proportion of the monomer reflectivity being more than 40dBZ and a proportion of the monomer reflectivity being more than 45 dBZ;
the reflectivity gradient characteristics are obtained from radar combined reflectivity data and comprise reflectivity gradient _ th1, reflectivity gradient _ th2 and reflectivity gradient _ th 3;
the distance type characteristics are obtained from radar combined reflectivity data and comprise a monomer core point monomer 30dBZ contour line average distance and a monomer core point monomer 40dBZ contour line average distance;
the liquid water content characteristic is obtained from radar three-dimensional lattice point data and comprises area vertical accumulated liquid water content, liquid water content density _1 and liquid water content density _ 2.
Step four, taking each feature of all positive and negative sample feature sets as a group of input, respectively carrying out statistical test, taking the feature as an original hypothesis that the features have no significant difference on the respective population, taking the significant difference as a candidate hypothesis, and defining the statistical quantity obeying t distribution as:
in the formula, x
1、x
2Respectively the mean of the sample features from the two populations,
is the corresponding variance, n
1And n
2Tests of confidence level (1- α) were developed for both types of samples.The significance level alpha is taken to be 0.01, and the table look-up can obtain t
a/2(n
1+n
2-2) if the value of t of a feature is greater than t
a/2(n
1+n
2-2), the original hypothesis is overridden at a confidence level of "0.99" and the alternative hypothesis is considered to be true. To make the statistical difference of the features on the two sample sets more significant, the value of t is selected to be larger than t
a/2(n
1+n
2-2) as a valid feature of the positive and negative sample sets.
And step five, dividing the data set into a training set and a testing set, training a classifier model according to the effective characteristics of positive and negative samples of the training set, and identifying the short-time heavy precipitation by using the classifier model.
Compared with the prior art, the technical scheme provided by the invention has the beneficial effects that: the method changes the conventional method of generally constructing characteristics aiming at hail convection monomers and identifying the short-time strong precipitation convection monomers as counter examples, constructs the characteristics aiming at the short-time strong precipitation convection monomers, distinguishes the short-time strong precipitation convection monomers and non-strong precipitation convection monomers by utilizing the characteristics, trains a classifier model by combining a machine learning method, realizes the identification of the short-time strong precipitation convection monomers, and verifies the effectiveness of the method through experiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The invention provides a short-time heavy precipitation identification method based on Doppler radar data, which comprises the following steps of:
step one, collecting short-time strong precipitation live information and Doppler weather radar data matched with the short-time strong precipitation live information, and converting the radar data into three-dimensional grid point data;
the short-time strong precipitation live information comprises precipitation starting time, precipitation ending time, an automatic station number, and 5-minute accumulated precipitation of the automatic station from the starting time to the precipitation ending time.
The step of converting the radar data into the three-dimensional lattice point data refers to the step of performing bilinear interpolation operation on the reflectivity data of 9 elevation angles of the radar to obtain 512 x 31 three-dimensional lattice point data, wherein the data resolution is 1km x 0.5 km.
Step two, identifying convection monomers at all moments from the radar data of each short-time strong precipitation event, matching the convection monomers with corresponding live information and marking the convection monomers;
the steps of matching and marking the convection current monomer with the live information are as follows:
calculating the total rainfall accumulated by each mobile station for one hour from the current moment according to the original live information, and using the total rainfall as a type label of a relevant monomer of each mobile station at the current moment;
the types of monomers include heavy precipitation convection monomers and non-heavy precipitation convection monomers.
For the convection monomer at each moment, recording the positions and the hourly rainfall of all automatic stations corresponding to the same or the nearest moment in the live information, and marking the strong precipitation convection monomer and the non-strong precipitation convection monomer according to the following rules:
rule 1-1: recording the position of the automatic station in the range of the monomer area, and marking the monomer as a strong precipitation monomer if the hourly rainfall of the automatic station is greater than or equal to 20 mm;
rule 1-2: the upper limit of the non-precipitation monomer is set to 18mm/h, considering that the automation station may have large or missing errors in the recorded data due to temporary instrument failure, and that the public is not sensitive to the difference between the 20mm/h threshold and the 19mm/h threshold of the short-term precipitation. Therefore, the single cells which are within the single cell area and have the hour rainfall less than 18mm are marked as non-strong rainfall single cells;
rules 1-3: the rainfall amount of the sample is the maximum value in all automatic station hour rainfall amount records which meet the conditions.
Rules 1-4: deleting samples of radar-based data corruption, as shown in FIG. 2;
recording the strong precipitation convection monomer as a positive sample, recording the non-strong precipitation monomer as a negative sample, and extracting the characteristics of all monomers;
the characteristics of the monomer include a reflectivity density characteristic, a reflectivity intensity characteristic, a reflectivity gradient characteristic, a distance characteristic and a liquid water content characteristic.
The reflectivity density type characteristics are obtained from radar three-dimensional lattice point data and comprise single 30dBZ reflectivity density, single 40dBZ reflectivity density and space single reflectivity 40dBZ ratio;
the reflectivity intensity class characteristics are obtained from radar combined reflectivity data and comprise a monomer combined reflectivity mean value, a 90% quantile of the monomer maximum reflectivity intensity, an 85% quantile of the monomer maximum reflectivity intensity, an 80% quantile of the monomer maximum reflectivity intensity, a monomer reflectivity 40dBZ ratio and a monomer reflectivity 45dBZ ratio;
the reflectivity gradient characteristics are obtained from radar combined reflectivity data and comprise reflectivity gradient _ th1, reflectivity gradient _ th2 and reflectivity gradient _ th 3;
the distance type characteristics are obtained from radar combined reflectivity data and comprise the average distance between a monomer core point and a monomer 30dBZ contour line and the average distance between the monomer core point and the monomer 40dBZ contour line;
the liquid water content characteristic is obtained from radar three-dimensional lattice point data and comprises a vertically accumulated liquid water content, a liquid water content density _1 and a liquid water content density _ 2.
All the characteristics are shown in table 1.
TABLE 1
Step four, taking each feature of all positive and negative sample feature sets as a group of input, respectively carrying out statistical test, taking the feature as an original hypothesis that the features have no significant difference on the respective population, taking the significant difference as a candidate hypothesis, and defining the statistical quantity obeying t distribution as:
in the formula, x
1、x
2Respectively the mean of the sample features from the two populations,
is the corresponding variance, n
1And n
2Tests of confidence level (1- α) were developed for both types of samples. The significance level alpha is taken to be 0.01, and the table look-up can obtain t
a/2(n
1+n
2-2) if the value of t of a feature is greater than t
a/2(n
1+n
2-2), the original hypothesis is overridden at a confidence level of "0.99" and the alternative hypothesis is considered to be true. To make the statistical difference of the features on the two sample sets more significant, the value of t is selected to be larger than t
a/2(n
1+n
2-2) as a valid feature of the positive and negative sample sets.
And step five, dividing the data set into a training set and a testing set, training a classifier model according to the effective characteristics of positive and negative samples of the training set, and identifying the short-time heavy precipitation by using the classifier model.
Experimental example: the implementation of the method is described in detail below with reference to specific experimental data, and the steps are as follows:
1) collecting short-time strong precipitation live information and Doppler weather radar data matched with the short-time strong precipitation live information, and converting the radar data into three-dimensional lattice point data;
the information collected about historical sample data for short-term heavy and non-heavy precipitation is: the rainfall actual condition information of 2018, 2019 and 2020, 2 months to 10 months corresponds to the radar data of the time period, and the reflectivity is converted into three-dimensional lattice point data.
2) Calculating the total rainfall accumulated by each mobile station for one hour from the current moment according to the original live information;
3) identifying the convection current monomers at all the moments, matching the convection current monomers with corresponding live information and marking a sample;
for all data collected in step 1), radar convective cells were identified, and a total of 4792 convective cells were detected. The result of identifying a single convection cell on the radar reflectivity image is shown in fig. 1, where a single convection cell is identified within a rectangular frame.
For each detected convection monomer, recording the positions and the hourly rainfall of all automatic stations corresponding to the same or the latest moment in the live information, and marking the convection monomer with strong rainfall and the monomer with non-strong rainfall according to the following rules:
rule 1-1: recording the position of the automatic station in the range of the monomer area, and marking the monomer as a strong precipitation monomer if the hourly rainfall of the automatic station is greater than or equal to 20 mm;
rule 1-2: the upper limit of the non-precipitation monomer is set to 18mm/h, considering that the automation station may have large or missing errors in the recorded data due to temporary instrument failure, and that the public is not sensitive to the difference between the 20mm/h threshold and the 19mm/h threshold of the short-term precipitation. Therefore, the single cells which are within the single cell area and have the hour rainfall less than 18mm are marked as non-strong rainfall single cells;
rules 1-3: the rainfall amount of the sample is the maximum value in the hourly rainfall records of all the automatic stations meeting the conditions.
Rules 1-4: deleting samples of radar-based data corruption, as shown in FIG. 2;
881 short-time strong precipitation convection monomers and 1228 non-strong precipitation monomers are marked by the rules.
4) Recording short-time strong precipitation convection monomers as positive samples, recording non-strong precipitation monomers as negative samples, and extracting the characteristics of all monomers;
the convective monomer extraction of fig. 1 is characterized by: the monomer 30dBZ reflectivity density is 80.36, the monomer 40dBZ reflectivity density is 91.78, the space monomer reflectivity 40dBZ fraction is 0.13, the monomer combination reflectivity mean is 43.22, the 90% quantile of the maximum reflectivity intensity is 51.76, the 85% quantile of the maximum reflectivity intensity is 50.15, the 80% quantile of the maximum reflectivity intensity is 49.05, the monomer reflectivity 40dBZ fraction is 0.67, the monomer reflectivity 45dBZ fraction is 0.42, the reflectivity gradient _ th1 is 0.62, the reflectivity gradient _ th2 is 0.48, the reflectivity gradient _ th3 is 0.37, the average distance of the monomer core point from the monomer 30dBZ contour line is 0.15, the average distance of the monomer core point from the monomer 40dBZ contour line is 0.12, the vertically accumulated liquid water content is 509.85, the liquid water content density _1 is 6.34, and the liquid water content density _2 is 5.56. As shown in table 2.
TABLE 2
Feature name
|
Feature numbering
|
Characteristic value
|
Monomer 30dBZ reflectance density
|
y1 |
80.36
|
Monomer 40dBZ reflectance density
|
y2 |
91.78
|
Spatial monomer reflectivity 40dBZ fraction
|
y3 |
0.13
|
Mean value of reflectance of monomer combination
|
y4 |
43.22
|
90% quantile of maximum reflectance intensity
|
y5 |
51.76
|
85% quantile of maximum reflectance intensity
|
y6 |
50.15
|
80% quantile of maximum reflectance intensity
|
y7 |
49.05
|
Monomer reflectivity 40dBZ fraction
|
y8 |
0.67
|
Monomer reflectivity 45dBZ ratio
|
y9 |
0.42
|
Reflectivity gradient th1
|
y10 |
0.62
|
Reflectivity gradient th2
|
y11 |
0.48
|
Reflectivity gradient th3
|
y12 |
0.37
|
Average distance between core point of monomer and 30dBZ contour line of monomer
|
y13 |
0.15
|
Average distance between core point of monomer and contour line of 40dBZ of monomer
|
y14 |
0.12
|
Vertical accumulation of liquid water content
|
y15 |
509.85
|
Liquid Water content Density _1
|
y16 |
6.34
|
Liquid Water content Density _2
|
y17 |
5.56 |
5) Statistical tests were performed using statistics obeying the t-distribution, defined as:
in the formula, x
1、x
2Respectively the mean of the sample features from the two populations,
is the corresponding variance, n
1And n
2Tests of confidence level (1- α) were developed for both types of samples. The significance level alpha is taken to be 0.01, and the table look-up can obtain t
a/2(n
1+n
2-2)=t
0.005(2107)<t
0.005When the t value of a certain feature is greater than 2.576, (∞) — 2.576, the original hypothesis is inverted at a confidence level of "0.99", and the alternative hypothesis is considered to be true. To make the statistical difference of the features on the two sample sets more significant, the value of t is selected to be larger than t
a/2(n
1+n
2-2) features of 5 times or 12.88 as effective features for the positive and negative sample sets.
An example of the significance of each feature in the positive and negative sample feature sets is analyzed. Fig. 3a and 3b show the distribution of all features over the positive and negative examples. The curve with dots in the figure is the positive sample feature distribution and the curve without dots is the negative sample feature distribution. As can be seen from FIG. 3b, feature y13(average distance of monomer core point from monomer 30dBZ contour line) and feature y14(average distance between monomer core point and monomer 40dBZ contour line) has poor discrimination effect on two types of samples, and the result is calculated through statistical t test, and the characteristic y is13Has a t value of 6.05 and a characteristic y14The t value of (a) is 8.06, all other features are clearly distinguished on the two types of samples, and the t value is greater than 12.88, which shows that the statistical difference of the features on the short-time strong precipitation sample set and the non-strong precipitation sample set is obvious enough, so that the feature y is deleted13、y14And other features are retained.
After passing the screen, 15 features were finally retained.
8) The data set is divided into a training set and a testing set, a classifier model is trained according to the effective characteristics of positive and negative samples of the training set, and the classifier model is applied to identify short-time heavy precipitation.
The classifier model adopted in the invention is a support vector machine model with a Gaussian kernel, and the purpose of training is to find out two optimal model parameters. The specific method comprises the following steps: the positive and negative sample sets are divided by 4: 1, dividing the ratio into a training set and a test set, and extracting effective characteristics of each sample on the training set as an input vector of a classifier; then, performing ten-fold cross validation on the training set for training, searching for a C gamma value and a gamma value which enable the classification accuracy of the classifier on the training set to be highest, and respectively obtaining two optimal model parameters: c is 1.48, and gamma is 0.47.
In order to verify the feasibility of the short-time heavy precipitation identification method based on Doppler radar data in meteorology for identifying short-time heavy precipitation, the following test experiments are carried out:
the test set contained 560 samples from 5, 7, 9, 4, 8, 2019 and 3,6 of 2020 with 240 positive samples and 320 negative samples that did not participate in training.
The evaluation indexes of the classifier include a hit rate (POD), a false positive rate (FAR), and a Critical Success Index (CSI) which are calculated by the formula of POD ═ TP/(TP + FN), FAR ═ FP/(TP + FP), and CSI ═ TP/(TP + FN + FP), where TP is the number of positive samples classified into positive samples, FP is the number of negative samples classified into positive samples, and FN is the number of negative samples classified into positive samples. The variation range of the hit rate POD, the empty report rate FAR and the critical success index CSI is 0-1, and the higher the hit rate and the critical success index are, the lower the empty report rate is, and the best performance of the classifier is.
The classifier prediction results are shown in table 3, where TP is 213, FP is 49, and FN is 27.
TABLE 3
From table 3, the performance indexes of the classifiers in the test set are respectively: the POD is 88.75%, the FAR is 18.70% and the CSI is 73.70%, which shows that the classifier can effectively distinguish the short-time strong precipitation convection monomer from the non-strong precipitation monomer, although the air report rate is slightly high, in real life, the short-time strong precipitation has the disaster causing property, and the cost of missing report is FAR higher than that of the air report, so the classifier still has use value in service forecast.
To further illustrate the effect of the classifier, a part of the short-time strong precipitation monomers used for testing are analyzed according to the process, and the arrangement of the processes of part of the strong precipitation is shown in fig. 4, so that the classifier can correctly identify most of the processes, and a small part of the strong precipitation monomers can identify errors at the end of the processes, wherein the 10# strong precipitation process is identified as non-strong precipitation, and the reason for identifying errors can be found mainly by analyzing the reflectivity image and the corresponding three-dimensional monomer structure as shown in fig. 5 and combining the values of all the characteristics is that the monomer space structure in the process is loose, so that the density characteristic value of the reflectivity of the monomer 40dBZ is small, the content of liquid water is small, and the ratio of the reflectivity exceeding 40dBZ in the monomer area is small, so that the classifier identifies the monomer as the non-strong precipitation.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.