CN112686290A

CN112686290A - Squall line identification method based on convolutional neural network

Info

Publication number: CN112686290A
Application number: CN202011554225.3A
Authority: CN
Inventors: 王新敏; 栗晗; 金子琪; 张霞; 鲍艳松
Original assignee: Nanjing Raindrop Meteorological Technology Co ltd; Henan Observatory
Current assignee: Nanjing Raindrop Meteorological Technology Co ltd; Henan Observatory
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-04-20
Anticipated expiration: 2040-12-24
Also published as: CN112686290B

Abstract

A squall line identification method based on a convolutional neural network, the method comprising: s1, preprocessing a sample, and acquiring the combined reflectivity of the radar sample data under Cartesian coordinates; s2 squall line identification model establishing step: constructing a visual geometry group network VGG, identifying a sample, and acquiring a squall line identification model; s3 identification step: acquiring a radar sample data set to be identified, and acquiring a combined reflectivity according to the preprocessing step of the step S1; the squall line identification model obtained in step S2 is input for identification, a vector indicating the probability that radar sample data to be identified belongs to the squall line echo and the non-squall line echo is obtained, and the maximum value of the probability is selected as a classification result. The identification method disclosed by the invention has higher identification rate for the moment of squall line splitting or merging, and has good time extensibility after a large number of sample tests.

Description

Squall line identification method based on convolutional neural network

Technical Field

The invention relates to the field of radar data analysis, in particular to a squall line identification method based on a convolutional neural network.

Background

The squall line is a strong convective cloud formed by the lateral arrangement of many thunderstorm individuals, and is a typical mesoscale convective system with horizontal dimensions of about tens to hundreds of kilometers in length and width, and a duration of several hours to tens of hours. When the squall line appears, the squall line is usually accompanied by thunderstorms, strong winds, hails and other disastrous weather, and great harm is brought to the life of people and social production.

The radar data has the characteristics of high time resolution and high spatial resolution, has unique advantages for the nowcasting of the mesoscale convection weather, and can solve the problem of automatic identification of the squall line by utilizing the radar data. Currently, various algorithms have been developed for identifying and forecasting mesoscale convective weather based on Radar data, such as Tracking Radar echo by Correlation (TREC) based on cross Correlation, storm Identification, Tracking, Analysis and proximity forecasting (Thunderstorm Identification, Tracking, Analysis, and Nowcasting, TITAN) based on single centroid, and storm single Identification, Tracking and forecasting (storm Identification and Tracking algorithm, SCIT), which have become important components of each short-term forecasting system. The scholars further provide an automatic identification algorithm specially aiming at squall lines, design a dynamic template and a score function according to the long axis of a fitting ellipse of Yangji and the like, realize the automatic identification and tracking of a linear Mesoscale Convective System (MCSs) on radar jigsaw puzzle data, and the linear score value can better reflect the linear state of each stage of the MCSs.

Currently, the core idea of an automatic identification algorithm is to extract a number of image features of the squall line as factors that are placed in a mathematical model and compared to a pre-set threshold to identify the squall line, but this method uses individual squall line samples for modeling and testing, lacking the experimental results of a large number of squall line samples. In addition, the splitting or merging causes the position of the center of gravity of the system to be excessively displaced, which may cause the situation that the squall line cannot be identified at the moment of splitting or merging to occur, and the tracking effect is poor.

Disclosure of Invention

The invention aims to provide a squall line identification method based on a convolutional neural network, aiming at the problems of the existing squall line identification method.

The technical scheme of the invention is as follows:

the invention provides a squall line identification method based on a convolutional neural network, which comprises the following steps:

s1, sample preprocessing: acquiring a radar sample data set, and manually identifying whether the radar sample data is squall line echo or non-squall line echo; preprocessing radar sample data to acquire the combined reflectivity of the radar sample data under Cartesian coordinates;

s2, establishing squall line identification model:

constructing a visual geometry group network VGG, wherein the whole network comprises 1 input layer, 8 convolutional layers, 5 pooling layers, a plurality of full-connection layers and 1 output layer;

taking the combined reflectivity of radar sample data under a Cartesian coordinate as network input, wherein the input is a radar combined reflectivity matrix with 360 multiplied by 360 resolution, obtaining a characteristic diagram with 5 multiplied by 5 size and 256 channels after 8 times of convolution and 5 times of pooling, then stretching the characteristic diagram into a one-dimensional vector with the length of 6400, mapping the characteristic diagram into a one-dimensional vector with the length of 2 after passing through all full connection layers, respectively representing the probabilities that the input radar combined reflectivity matrix belongs to squall line echoes and non squall line echoes, and selecting the maximum probability value as a classification result;

comparing the classification result with the manual identification, updating the model, iterating for a preset number of times, and acquiring a squall line identification model;

s3, identification: acquiring a radar sample data set to be identified, and acquiring a combined reflectivity according to the preprocessing step of the step S1; the squall line identification model obtained in step S2 is input for identification, a vector indicating the probability that radar sample data to be identified belongs to the squall line echo and the non-squall line echo is obtained, and the maximum value of the probability is selected as a classification result.

Further, in step S1, the ratio of the squall line echo sample data set to the non-squall line echo sample data set is 2:5 in the radar sample data set.

Further, the preprocessing step in step S1 specifically includes:

s1-1, extracting a basic reflectivity factor matrix of each layer of data corresponding to the elevation angle for the radar sample data;

s1-2, rearranging the factors in each basic reflectivity factor matrix according to the sequence of azimuth angles from-179 degrees to 179 degrees;

s1-3, unifying the basic reflectivity factor matrixes into 360 azimuth angles by an interpolation method;

s1-4, selecting the maximum value of the reflectivity factor in all elevation angles aiming at each distance library in each azimuth angle for all layers, namely the elevation angle basic reflectivity factor matrix, and obtaining the combined reflectivity;

s1-5, unifying the combined reflectivity matrix into 360 distance bins by adopting an interpolation method to obtain a combined reflectivity matrix with the size of 360 multiplied by 360;

s1-6, converting the combined reflectivity of the radar sample data from the polar coordinate system to a Cartesian coordinate system.

Further, in the step S2, in the VGG, the convolution kernels used by all convolution layers are 3 × 3 in size, and the step size is 1; the window size of the front two layers of the pooling layers is 2 multiplied by 2, the step length is 2, the window size of the rear three layers of the pooling layers is 3 multiplied by 3, and the step length is 3; the pooling layer adopts a maximum pooling mode.

And further, filling the edge of the image with the same pixels after passing through the convolutional layer each time until the image size before convolution of the convolutional layer is corresponding to the image size before convolution of the convolutional layer.

Further, each convolutional layer and fully-connected layer then processes the data using the ReLU as an activation function.

Further, training was iterated 20 times.

Further, the training process parameters of the visual geometry group network VGG are set as: the learning rate is 0.001; a parameter optimization algorithm in a visual geometry group network VGG iteration process adopts adaptive moment estimation Adam and a loss function as a cross entropy function.

Further, in step S2, the full connection layer is 3 layers.

The invention has the beneficial effects that:

based on a convolutional neural network, the visual geometric group network VGG is constructed, so that robustness is provided for simple geometric distortion of the image, and the problems of image identification and classification are solved; the invention realizes the feature extraction of the image by three methods: (1) local receptive field: the neurons of each layer of the network are only connected with the neurons in a small neighborhood of the previous layer, and each neuron can extract the local features of the image; (2) weight sharing: the method can greatly reduce the parameters of the neurons in the network and reduce the calculation burden; (3) pooling: by reducing the resolution of the features, invariance to translation, scaling, and other deformations is achieved.

In the invention, the radar base data is stored in a binary file form, and the radar base data needs to be processed in advance before a network is trained; in order to facilitate network training, verification and comparison, the reflectivity factor matrixes in all the schemes are interpolated uniformly, and the consistency and the effectiveness of data are improved.

In the invention, radar data needs to be preprocessed, because the initial azimuth angles of radar scanning in each time of body scanning are different and the number of azimuth angles is also different, the reflectivity factor matrix needs to be rearranged from small to large according to the sequence of the azimuth angles from-179 degrees to 179 degrees, and the number of radial data and the number of the azimuth angles of the reflectivity factors are interpolated into 360 in a unified manner, corresponding to 360 azimuth angles; the combined reflectivity is then calculated and the maximum reflectivity factor in all elevation angles is selected for each range bin in each azimuth angle. The method ignores the difference between the positions of the same distance library with different elevation angles projected on the ground; and finally, converting the combined reflectivity under the polar coordinate system into a Cartesian coordinate, solving the problem that the distribution of the rain belt of the squall line is deformed under the polar coordinate system in the actual situation, being beneficial to a convolutional neural network to extract the image characteristics of the echo of the squall line, and improving the effectiveness and reliability of the algorithm.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

Fig. 1 shows a schematic diagram of the VGG network structure of the present invention.

FIG. 2 illustrates a squall line process diagram for 6 months 6 and 26 months 2018 in an embodiment in accordance with the invention.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein.

s1, sample preprocessing:

obtaining a radar sample data set, and manually identifying whether the radar sample data is a squall line echo or a non-squall line echo, wherein the squall line echo sample data set and the non-squall line echo sample data set are in a ratio of 2: 5; preprocessing the radar sample data to acquire the combined reflectivity of the radar sample data under Cartesian coordinates, wherein the preprocessing steps specifically comprise:

s1-6, converting the combined reflectivity of the radar sample data from the polar coordinate system to a Cartesian coordinate system. S2, establishing squall line identification model:

s3, identification:

acquiring a radar sample data set to be identified, and acquiring a combined reflectivity according to the preprocessing step of the step S1; the squall line identification model obtained in step S2 is input for identification, a vector indicating the probability that radar sample data to be identified belongs to the squall line echo and the non-squall line echo is obtained, and the maximum value of the probability is selected as a classification result.

Example (b):

the radar base data used by the method is from Zhengzhou SA waveband radar, the radar detection radius is 460km, the radial resolution is 1km, the time resolution is 6min, and the time span is 2008-2018. During which the radar detects a total of 20 squall line processes, Table 1 provides the specific dates on which these squall line processes occurred.

TABLE 1 squall line Process occurrence time

In order to build a data set for training, testing and case testing of a convolutional neural network, the radar returns in a sample need to be classified and identified in advance: the radar volume scan files within the 20 days are selected firstly, 3896 radar volume scan files are obtained in total, then a radar echo map is drawn according to a basic reflectivity matrix in radar base data, and the radar volume scan files are manually divided into two types, namely squall line echo and non-squall line echo. Currently, some scholars provide criteria for squall line identification on radar data, and table 2 summarizes these different criteria. With reference to these grounds, a radar echo map satisfying the following conditions is herein identified as a "squall line echo": the loop wave band above 40dBz is continuous or quasi-continuous more than or equal to 100km, and the length-width ratio exceeds 5: 1. Out of the final 3896 samples, 1076 were identified as "squall line echoes" and 2820 were identified as "non-squall line echoes".

Table 2 squall line criteria

The dataset required for the 20-day squall line instances to model is selected. For the 19 squall line processes in the 6 th and 13 th days in 2008-2018, four fifths are randomly selected as a training data set for training and parameter adjustment of the model, the remaining one fifth is used as a verification data set for verifying the training effect of the model, and the squall line process in the 6 th and 26 th days in 2018 separately forms a test data set and does not participate in the training of the model, so that the time generalization capability of the model is objectively verified. Table 3 gives details of the data set.

TABLE 3 data set information

The invention adopts a convolution neural network to realize the feature extraction of the image by three methods: (1) local receptive field: the neurons of each layer of the network are only connected with the neurons in a small neighborhood of the previous layer, and each neuron can extract the local features of the image; (2) weight sharing: the method can greatly reduce the parameters of the neurons in the network and reduce the calculation burden; (3) pooling: by reducing the resolution of the features, invariance to translation, scaling, and other deformations is achieved.

Preprocessing radar base data: the input of the convolutional neural network is an image, and in the invention, the radar base data is stored in a binary file form, so that the radar base data needs to be processed in advance before the network is trained. In order to facilitate network training, verification and comparison, the reflectivity factor matrixes in all the schemes are finally interpolated uniformly to be 360 × 360; the size of the difference can be adjusted as required.

The method comprises the steps of firstly obtaining basic reflectivity from radar base data, rearranging reflectivity factor matrixes from small to large according to the sequence of azimuth angles from-179 degrees to 179 degrees as the initial azimuth angles of radar scanning in each time of volume scanning are different and the number of azimuth angles is also different, and uniformly interpolating the number of radial data and the number of azimuth angles of the reflectivity factors into 360 corresponding to 360 azimuth angles.

Then calculating to obtain a combined reflectivity, and selecting the maximum value of the reflectivity factor in all elevation angles aiming at each distance library in each azimuth angle; this ignores the differences between the positions of the ground projections of the same range bins at different elevation angles. And unifying the combined reflectivity matrix into 360 distance bins by adopting an interpolation method to obtain the combined reflectivity matrix with the size of 360 multiplied by 360.

And finally converting the combined reflectivity in the polar coordinate system into a Cartesian coordinate system. The base reflectivities in the radar base data are stored in polar coordinates from the first scanning azimuth to the last scanning azimuth, i.e., in practice, the distribution of the squall line rain strip will have some degree of "deformation" in a polar coordinate system, which may be detrimental to the convolutional neural network for extracting image features of the squall line echo, and for this purpose, the combined reflectivities in the polar coordinate system are interpolated to Cartesian coordinates, and the interpolation algorithm is a bilinear interpolation.

Establishing a squall line identification model: the invention adopts a deep learning framework PyTorch to construct a convolutional neural network. The PyTorch is a Python-based scientific computing package and provides various packaged functions and modules. The user can freely and flexibly design the network structure and the loss function, and the network can automatically calculate forward propagation and error backward propagation after the input data is imported, so that the training of the network is completed.

The VGG adopts a plurality of 3 x 3 convolution kernels to replace larger convolution kernels (11 x 11, 7 x 7 and 5 x 5), and the main reason for doing so is that under the condition of ensuring the same perception field, the accumulation of a plurality of small convolution kernels is equivalent to increasing the number of nonlinear layers, and the network depth is deepened to ensure that a more complex mode is learned.

The VGG network structure designed by the present invention is shown in fig. 1, and the total connection layers take 5 as an example: the whole network consists of 20 layers, including 1 input layer, 8 convolutional layers, 5 pooling layers, 5 full-link layers and 1 output layer. The convolution kernels used for all convolution layers are 3 x 3 in size and the step size is 1, and the edges are filled with the same pixels in order to keep the size of the feature map after convolution unchanged. The window size of the first two layers of the pooling layers is 2 x 2, the step size is 2, the window size of the last three layers of the pooling layers is 3 x 3, the step size is 3, and the mode is maximum pooling. Each convolutional layer and fully-connected layer is followed by a ReLU as an activation function. Parameter setting of the training process: the learning rate is 0.001, the optimization algorithm is adaptive moment estimation Adam, iterative training is carried out for 20 times, and the loss function is a cross entropy function.

The method comprises the steps of obtaining a radar reflectivity factor matrix with 360 × 360 resolution ratio input by a network, obtaining a characteristic diagram with 5 × 5 sizes and 256 channels after 8 times of convolution and 5 times of pooling, stretching the characteristic diagram into a one-dimensional vector with the length of 6400, mapping the characteristic diagram into a one-dimensional vector with the length of 2 after 5 full connection layers, respectively representing the probabilities that the input radar reflectivity factor matrix belongs to squall line echo and non squall line echo, and selecting the maximum probability value as a classification result.

A squall line identification effect:

the final output of the model is the probabilities of the radar echo images belonging to both the squall and non-squall categories, with the maximum of the probabilities being taken as the corresponding classification result, i.e., 0/1 classification, which translates the squall identification problem into a two-classification problem. On the basis, the identification effect of the model is quantitatively evaluated by using a Critical Success Index (CSI), a hit rate (POD) and a misjudgment rate (FAR), and the calculation formula of the scoring standard is as follows:

in the formula: TP (true Positive) represents the number of squall line samples identified as the squall line; FN (false negative) represents the number of squall line samples identified as a non-squall line; FP (false Positive) represents the number of non-squall line samples identified as the squall line. The CSI represents a comprehensive identification capability of the model for the squall line sample and the non-squall line sample; the POD represents an identification rate of the model for the squall line samples; the FAR represents a misrecognition rate of the model for the non-squall line samples.

And verifying the identification effect of the data set, processing the radar base data by using the scheme of the invention to obtain a corresponding sample set, and verifying the learning effect of the convolutional neural network by using the verification data set. Analysis of Table 4 may yield a FAR of 0.30, a CSI of 0.54, and a POD of 0.70, and overall the network may be able to better resolve squall and non-squall line samples.

Table 4 verification data set identification results

Data set	CSI	POD	FAR
				Validating a data set	0.54	0.70	0.30

Furthermore, the accuracy of classification calculated based on the above evaluation indexes can be found as follows: the network may perform better identification of the non-squall line sample than the squall line sample. Such phenomena that appear in the above experiments may be caused by this reason: the conventional VGG network is generally 16 layers, and when the designed network is 20 layers, the structure is deeper, and the overfitting phenomenon is caused by the excessively strong learning capability. The invention proposes a further improvement for this reason.

For the problem of overfitting in convolutional neural networks, the invention further attempts to optimize the network structure: dropout is added in the fully-connected layer, 50% of neuron nodes are randomly ignored, and meanwhile, the number of the fully-connected layer is reduced from 5 layers to 3 layers, and the depth of the network is reduced.

In order to compare the recognition effects of different network structures, a network with optimized structure is used for training. The verification is performed separately with the validation sets and table 5 gives the identification results for different network configurations. Analysis of table 5 shows that compared with the results without improvement, the use of the network with the optimized structure increases CSI to 0.91, POD to 0.95, FAR to 0.04, and the recognition effect is significantly improved. Analysis of the above test results: the optimization of the network structure can obviously improve the identification effect of the squall line samples, which shows that the overfitting problem of the network can be improved by optimizing the network structure,

TABLE 5 network Structure optimized recognition results

Network improvement scheme	CSI	POD	FAR
				Without improvement	0.54	0.70	0.30
Network architecture optimization	0.91	0.95	0.04

Testing the recognition effect of the data set: the learning effect of the convolutional neural network model was individually tested using a test dataset, i.e., a squall line procedure occurring in riverside, 26 h 6.2018. FIG. 2 illustrates the squall line process echo over time, with the identification at this time being shown in the top right corner of the subgraph. Day 26/6 month 05:12, the convection was nascent, in the southwest direction (fig. 2 a). The 06:00 convection develops that the monomers are arranged linearly, the echo maximum reaches 50dBz, and moves from the northwest to the southeast (FIG. 2 b). The 07:48 system fades and the south-plus direction begins to generate new convective cells (fig. 2 c). 08:30 convection cells in the southwest direction continue to develop and arrange linearly, the maximum echo value reaches 60dBz, and the residual convection cells after the squall line in the southwest direction disappears begin to approach in the southwest direction (FIG. 2 d). The 10:00 southwest directional convection has been completely merged with the new squall line system in the due south direction, and the system is slowly moving in the northeast direction (fig. 2 e). The 11:36 squall line system begins to weaken in the southeast direction, but a number of new convective monomers are generated in the southeast direction (FIG. 2 f). The squall line system in the 12:36 right south direction completely disappears, only a weak echo zone is left, convection monomers in the southeast direction continue to develop and arrange into linearity, the maximum value of the echo reaches 55dBz, the system moves to the east (figure 2g), the 13:24 system is weakened and is not linear any more, and a convection monomer with a large area is generated at the rear part (figure 2 h).

Analyzing the recognition results of each epoch in fig. 2 shows that the network can correctly recognize the squall line in the stage where the convection current is weak (fig. 2a), and the network can correctly recognize the squall line in the stage where the squall line is strong (fig. 2b and 2 d-2 g), which means that the trained network learns the strong echo and linear characteristics of the squall line and can distinguish the squall line from the case where the convection current is weak. In the squall line death or squall stage (FIGS. 2a and 2c), the echo intensities may decrease and the shape may be irregular, at which point the network's recognition may decrease because the sample set may be classified and labeled based on human subjectivity based on squall line identification at the time of establishing the sample set, but the identification criteria may become difficult to ascertain for some squall line characteristics, particularly for samples in the squall line death or squall stage.

Table 6 shows a comparison of the validation data set and the identification results of the test data set, and it can be seen that the CSI and POD of the test set fall to 0.76 and 0.86 and the FAR rises to 0.13, because the establishment of the deep learning model relies on a large amount of historical data as learning samples, while the historical examples of squall lines are not many and the training set is not representative enough. The echo morphology of each squall line process is different, the sample in the validation set and the training set are from the same squall line process, so the identification effect is very good, and the sample in the test set is from another squall line process that does not participate in the training, having image features that the training set did not contain, so the identification effect may be poor. However, for the strong echo and linear characteristics typical of a weather process such as squall lines, the trained network has learned and is always able to resolve, and thus the model recognition capability does not significantly degrade and has good extensibility.

TABLE 6 test set identification results

Data set	CSI	POD	FAR
				Validating a data set	0.91	0.95	0.04
Test data set	0.76	0.86	0.13

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims

1. A squall line identification method based on a convolutional neural network, comprising:

s1, sample preprocessing:

acquiring a radar sample data set, and manually identifying whether the radar sample data is squall line echo or non-squall line echo; preprocessing radar sample data to acquire the combined reflectivity of the radar sample data under Cartesian coordinates;

s2, establishing squall line identification model:

s3, identification:

2. The squall line identification method based on a convolutional neural network of claim 1, wherein in step S1, the ratio of the radar sample data set to the squall line echo sample data set to the non-squall line echo sample data set is 2: 5.

3. The squall line identification method based on a convolutional neural network as claimed in claim 1, wherein the preprocessing step in step S1 is specifically:

4. The squall line identification method based on a convolutional neural network as claimed in claim 1, wherein in the visual geometry group network VGG in step S2, the convolution kernels used for all convolutional layers are 3 x 3 in size, and the step size is 1; the window size of the front two layers of the pooling layers is 2 multiplied by 2, the step length is 2, the window size of the rear three layers of the pooling layers is 3 multiplied by 3, and the step length is 3; the pooling layer adopts a maximum pooling mode.

5. The squall line identification method based on a convolutional neural network as claimed in claim 1, wherein the same pixels are used to fill in the edges of the image after each pass through the convolutional layer to correspond to the image size before convolution of the convolutional layer.

6. The squall line identification method based on a convolutional neural network as claimed in claim 1, wherein each convolutional layer and full connection layer thereafter processes the data using the ReLU as an activation function.

7. The squall line identification method based on a convolutional neural network as claimed in claim 1, wherein training is iterated 20 times.

8. The squall line identification method based on a convolutional neural network as claimed in claim 1, wherein the training process parameters of the visual geometry group network VGG are set to: the learning rate is 0.001; a parameter optimization algorithm in a visual geometry group network VGG iteration process adopts adaptive moment estimation Adam and a loss function as a cross entropy function.

9. The squall line identification method based on a convolutional neural network as claimed in claim 1, wherein in step S2, the full connection layer is 3 layers.