CN113095442A - Hail identification method based on semi-supervised learning under multi-dimensional radar data - Google Patents

Hail identification method based on semi-supervised learning under multi-dimensional radar data Download PDF

Info

Publication number
CN113095442A
CN113095442A CN202110624140.6A CN202110624140A CN113095442A CN 113095442 A CN113095442 A CN 113095442A CN 202110624140 A CN202110624140 A CN 202110624140A CN 113095442 A CN113095442 A CN 113095442A
Authority
CN
China
Prior art keywords
sample
cluster
hail
supervised
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110624140.6A
Other languages
Chinese (zh)
Other versions
CN113095442B (en
Inventor
文立玉
罗飞
钟宇
舒红平
曹亮
刘魁
郭本俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN202110624140.6A priority Critical patent/CN113095442B/en
Publication of CN113095442A publication Critical patent/CN113095442A/en
Application granted granted Critical
Publication of CN113095442B publication Critical patent/CN113095442B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • G01S13/95Radar or analogous systems specially adapted for specific applications for meteorological use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Image Analysis (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention provides a hail identification method based on semi-supervised learning under multi-dimensional radar data, which comprises the following steps: s1: acquiring a labeled sample set, randomly extracting a supervision sample set, a rainstorm sample training set and a hail sample training set, acquiring an unlabeled data set, and randomly and equally dividing the unlabeled data set into q first samples; s2: calculating the clustering center of each cluster of training set; s3: dividing a first sample cluster into corresponding clusters, and updating a cluster center; s4: iteration is carried out, and the cluster centers of all clusters and the confidence degrees of the corresponding clusters at the moment are obtained; s5: repeating the steps S2-S4 on the supervised sample set to obtain the supervised confidence of the supervised sample set on each cluster center, and classifying the supervised sample set into corresponding clusters; s6: judging whether the first sample is updated to the cluster, and repeating the steps S2-S6 until the first sample is processed; s7: and inputting the optimal clustering center as a recognition model to obtain the confidence of each sample to each cluster for classification. The method effectively improves the accuracy of hail recognition and reduces the false alarm rate.

Description

Hail identification method based on semi-supervised learning under multi-dimensional radar data
Technical Field
The invention belongs to the technical field of computer artificial intelligence and meteorological intersection, and particularly relates to a hail identification method based on semi-supervised learning under multi-dimensional radar data.
Background
Hail is strong local disastrous weather generated in special geography, terrain environment and certain large-scale circulation background; the solar energy mobile phone has the characteristics of sudden occurrence, rapid movement, severe weather and strong destructive power, often causes huge losses in various aspects such as agriculture, traffic, electric power, communication and the like to places, and even threatens the life safety of people.
The method has accurate recognition on hail, and is particularly important in hail prediction and rescue after hail disaster. The accuracy of hail prediction is effectively improved, relevant departments are timely and accurately informed to take powerful preventive measures, and great life and property loss caused by hail can be avoided as far as possible. At present, in the aspect of identification by using Doppler weather radar products, a hail identification technology mainly utilizes a (Support Vector Machine, SVM) Support Vector Machine and a K-means clustering algorithm to carry out identification; specifically, the method comprises the following steps:
1. the support vector machine is an earlier two-classification method for classifying data according to a supervised learning mode, and essentially searches for a separating hyperplane with the largest geometric interval on a feature space and completes the two-classification problem of multi-dimensional data according to the separating hyperplane. However, when the support vector machine is used to solve the linear inseparable problem, if the test data falls between the support vectors, there is a possibility of misclassification. The scholars propose that the mode of combining the support vector machine with the k-nearest neighbor method is used for reducing error classification and improving the identification accuracy;
however, the method is sensitive to parameter and kernel function selection, the performance of the support vector machine mainly depends on kernel function selection, and the kernel function selection is still manually set according to experience at present, and certain errors exist. The recognition accuracy of the model is not high, and the classification effect needs to be improved.
2. The normal Bayes classifier assumes that all the feature vectors participating in the operation are not connected, counts the prior probability that any component of the n-dimensional feature vector is hail by using the input sample type, and classifies unknown samples according to the probability. The principle is that machine learning is carried out on a large number of input sample data of different types, and the internal characteristic rule among the samples of the types is searched. Searching unknown samples which are more consistent with the rules according to the rules, and realizing classification;
the method relies on extensive training data. However, hail sample data is difficult to obtain and small in data size; the training of large-scale samples is difficult to implement, so that the classification model cannot be expected, and the judgment has limitation.
3. The principle of the K-means clustering algorithm is that data partitioning is completed according to the distance from each data object to each cluster clustering center by calculating the clustering center of K-cluster sample data; if the clustering centers of the two adjacent times are not changed, the data clustering iteration is finished; and if the distance of the sample data is changed, re-calculating the distance of the sample data by using the updated clustering center of the k cluster samples. After iteration is carried out until division, the clustering center is not changed, and a clustering result is obtained;
the method ignores the difference between the n-dimensional features participating in the operation, and the recognition effect needs to be improved after classification.
Disclosure of Invention
In view of this, the present invention provides a hail identification method based on semi-supervised learning under multi-dimensional radar data, which can effectively improve the accuracy of hail identification and reduce the false alarm rate.
In order to achieve the purpose, the technical scheme of the invention is as follows: a hail identification method under multi-dimensional radar data based on semi-supervised learning comprises the following steps:
s1: acquiring a marked sample set, extracting a supervision sample set from the marked sample set by a random sampling method, dividing the marked sample set into a rainstorm sample training set and a hail sample training set according to rainstorm and hail labels, acquiring an unmarked data set, and randomly dividing the unmarked data set into q first samples;
s2: calculating a mean vector of the rainstorm sample training set and the hail sample training set as a clustering center;
s3: calculating the weighted distance of a first sample to each cluster center, clustering and dividing the first sample into a corresponding cluster rainstorm sample training set or hail sample training set, and updating each cluster center by using the method in S2;
s4: repeating the step S3 until the last iteration is the same as the value of the previous iteration, obtaining the cluster centers at the moment, and taking the distance from the first sample point to each cluster center as the confidence of the corresponding cluster;
s5: repeating the steps S2-S4 by taking the supervised sample set as a first sample in the step S3, obtaining the supervised confidence of the supervised sample set to each clustering center, and classifying the data in the supervised sample set into corresponding rainstorm or hail clusters according to the supervised confidence;
s6: the classification evaluation index of the model to the supervision sample set can be obtained through calculation according to the data labels, when the index meets the preset condition, the first sample is updated to a rainstorm or hail cluster, the mean vector of each cluster at the moment is calculated and reserved, the pseudo label of the corresponding cluster of the data is finally made according to the position of the data in the first sample in the cluster, and S2-S6 is repeated until q parts of the first samples are processed; respectively calculating evaluation indexes of the retained mean vector to the last hail and rainstorm training set, and selecting the mean vector of the optimal index as a final model;
s7: and (4) inputting the mean vector of the final model as an identification model, and finishing the confidence coefficient of each sample in the to-be-identified sample to each cluster according to the steps S3-S4 to classify and finish identification.
Further, the annotated sample set and the unlabeled data set include radar base reflectance images and doppler weather radar series data.
Further, the mean vector in step S2 is calculated as:
Figure 501080DEST_PATH_IMAGE001
wherein,
Figure 887062DEST_PATH_IMAGE002
the mean of the jth parameter that is involved in computing the mean vector samples, N is the total number of samples involved in computing the mean vector,
Figure 135640DEST_PATH_IMAGE003
the value of the jth parameter for the ith sample, the number of parameters for the p single samples,
Figure 265270DEST_PATH_IMAGE004
to record the sample number of the current process,
Figure 548484DEST_PATH_IMAGE005
to record the parameter number of the current process,
Figure 836859DEST_PATH_IMAGE006
is a vector of the mean value of the vectors,
Figure 572733DEST_PATH_IMAGE007
representing a matrix transposition.
Further, the weighted distance in step S3 is obtained by:
Figure 240475DEST_PATH_IMAGE009
wherein p represents the number of elements involved in the recognition,
Figure 378195DEST_PATH_IMAGE010
is shown as
Figure 105980DEST_PATH_IMAGE004
The weight value of each element is calculated,
Figure 329151DEST_PATH_IMAGE006
the mean value vector is represented by a mean value vector,
Figure 535004DEST_PATH_IMAGE011
is shown as
Figure 527231DEST_PATH_IMAGE004
The mean vector of the individual elements is,
Figure 425917DEST_PATH_IMAGE012
is shown as
Figure 136384DEST_PATH_IMAGE013
Mean of individual elements.
Further, the classification evaluation index in step S6 includes: hit rate, false alarm rate and critical success index; wherein:
Figure 880349DEST_PATH_IMAGE014
wherein POD is hit rate, FAR is false alarm rate, CSI is critical success index,
Figure 258241DEST_PATH_IMAGE015
the number of times that the tag is hail consistent with the actual tag is identified for the model,
Figure 62249DEST_PATH_IMAGE016
identifying the number of times that the tag is inconsistent with the actual tag for no hail for the model,
Figure 260012DEST_PATH_IMAGE017
identifying the times that the label is inconsistent with the actual label for the model;
further, the preset conditions are as follows:
(ii) an increase in POD index of less than 90% POD and an increase greater than an increase in FAR;
POD index increased by 90% or more of POD and the increase value was greater than the increase value of 2/3 FAR.
Compared with the prior art, the invention has the following advantages:
the invention discloses a hail identification method based on semi-supervised learning under multi-dimensional radar data, which is characterized by utilizing radar basic reflectivity images and Doppler weather radar series data to extract features, clustering hail cloud data based on a Constrained Seed K-means algorithm (Constrained Seed K-means), and innovatively establishing a hail identification model by adopting a semi-supervised clustering and self-learning method. Only a small amount of hail samples and a large amount of unknown data are needed to complete the construction of the training set, and the problem of low hail identification accuracy caused by a small amount of original hail training data is solved; meanwhile, the method carries out weight analysis on multi-dimensional characteristics of hail and rainstorm, and transversely analyzes hail characteristic variables to obtain characteristic variable weights with good performance. And a supervision sample set is used for supervising the training process, so that the change direction of the model is ensured to be in accordance with the expectation, and meanwhile, the aim of automatically optimizing the training model is fulfilled by combining an automatic parameter optimization method. The hail recognition accuracy is effectively improved, and meanwhile the false alarm rate is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive exercise.
FIG. 1 is a diagram showing the variation of the classification evaluation index in the present invention;
FIG. 2 is a diagram illustrating the classification result of test data according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The examples are given for the purpose of better illustration of the invention, but the invention is not limited to the examples. Therefore, those skilled in the art should make insubstantial modifications and adaptations to the embodiments of the present invention in light of the above teachings and remain within the scope of the invention.
Example 1
The embodiment discloses a hail identification method based on semi-supervised learning under multi-dimensional radar data, which comprises the following steps:
s1: acquiring a marked sample set, extracting a supervision sample set from the marked sample set by a random sampling method, dividing the marked sample set into a rainstorm sample training set and a hail sample training set according to rainstorm and hail labels, acquiring an unmarked data set, and randomly dividing the unmarked data set into q first samples;
in this embodiment, the labeled sample set includes radar basic reflectivity images and doppler weather radar series data;
specifically, m sets of labeled samples are read:
Figure 807668DEST_PATH_IMAGE018
random access from a set of labeled samples
Figure 774487DEST_PATH_IMAGE019
A sample named
Figure 280555DEST_PATH_IMAGE020
For supervising the training process. Supervised sample set
Figure 700035DEST_PATH_IMAGE020
Figure 71889DEST_PATH_IMAGE021
Dividing into rainstorm sample training sets according to label sample labels
Figure 893215DEST_PATH_IMAGE022
Hail sample training set
Figure 304605DEST_PATH_IMAGE023
Figure 476960DEST_PATH_IMAGE024
In the invention, due to the limitation of the initial training set in quantity, the obtained model cannot ensure the accuracy of the initial training set in application. Therefore, a large amount of unlabeled data sets D are used for training the initial model so as to achieve the effect of improving the recognition accuracy and stability; thus, the unlabeled sample set D is read in:
Figure 366419DEST_PATH_IMAGE025
random p-dimensional samples in labeled sample set X and unlabeled sample set D
Figure 307830DEST_PATH_IMAGE026
Expressed as:
Figure 890121DEST_PATH_IMAGE027
s2: calculating a mean vector of the rainstorm sample training set and the hail sample training set as a clustering center;
in this step, the mean vector is used as a k cluster center (where k =2, including rainstorm clusters and hail clusters); mean vector
Figure 284193DEST_PATH_IMAGE028
The mean vector calculation formula for the p-dimensional random sample is as follows:
Figure 977343DEST_PATH_IMAGE029
wherein,
Figure 773260DEST_PATH_IMAGE030
the mean of the jth parameter that is involved in computing the mean vector samples, N is the total number of samples involved in computing the mean vector,
Figure 526453DEST_PATH_IMAGE031
is the jth parameter value of the ith sample, p is the parameter number of a single sample, i is the number of the sample recorded with the current processing, j is the number of the parameter recorded with the current processing,
Figure 407821DEST_PATH_IMAGE032
is a mean vector, T represents a matrix transpose; x represents a sample monomer; separately calculating hail clusters
Figure 904661DEST_PATH_IMAGE033
And rainstorm cluster
Figure 555086DEST_PATH_IMAGE034
Mean vector of
Figure 479179DEST_PATH_IMAGE035
Figure 847844DEST_PATH_IMAGE036
Taking the initial clustering center as the initial clustering center of the k clusters of samples;
s3: respectively calculating the weighted distance of a first sample to each cluster center, clustering and dividing the first sample into a corresponding cluster rainstorm sample training set or hail sample training set, and updating each cluster center by using the method in S2;
and then, clustering division is carried out on the x by calculating the distance between the sample x and each cluster center. In the invention, a multi-element sample clustering method is adopted, and weighting processing is carried out on the elements participating in calculation; weighted distance calculation formula:
Figure 882796DEST_PATH_IMAGE037
in which p denotes the participation in recognitionThe number of the elements is the same as the number of the elements,
Figure 650376DEST_PATH_IMAGE038
represents the weight value of the ith element,
Figure 479792DEST_PATH_IMAGE039
representing a mean vector;
further, divide the random into equal parts
Figure 335752DEST_PATH_IMAGE040
Selecting an unprocessed portion, adding the selected portion into the training process, and calculating the sample by using a weighted distance calculation formula
Figure 439975DEST_PATH_IMAGE041
For each cluster center
Figure 799412DEST_PATH_IMAGE042
And classifying x into corresponding clusters according to the weighted distances
Figure 65308DEST_PATH_IMAGE043
In (1). Updating with the mean vector calculation formula in step S2
Figure 408565DEST_PATH_IMAGE044
Cluster center of
Figure 50899DEST_PATH_IMAGE045
S4: repeating the step S3 until the last iteration is the same as the value of the previous iteration, obtaining the cluster centers at the moment, and taking the distance from the first sample point to each cluster center as the confidence coefficient from each point to the corresponding cluster;
using updated in this step
Figure 530421DEST_PATH_IMAGE046
For the sample
Figure 967219DEST_PATH_IMAGE047
Is repeated onThe process is iterated until
Figure 532193DEST_PATH_IMAGE048
The value is the same as the value in the previous iteration; reference to
Figure 978217DEST_PATH_IMAGE049
Calculating the weighted distance of x to each cluster center in a calculation mode; and using the confidence as the confidence of x to k clusters
Figure 312247DEST_PATH_IMAGE050
The clustering process can be represented by the following equation:
Figure 919946DEST_PATH_IMAGE051
s5: repeating the steps S2-S4 on the supervised sample set to obtain the supervised confidence of the supervised sample set to each cluster center, and classifying the data in the supervised sample set into corresponding rainstorm or hail clusters according to the supervised confidence;
after completing the process of matching a set of samples
Figure 972215DEST_PATH_IMAGE052
After classification, it is necessary to ensure that the change direction of the updated model is in accordance with the expectation, so the model is supervised using the supervision sample set in step S1:
in particular, the confidence level is referenced
Figure 690773DEST_PATH_IMAGE053
Is used for calculating a supervision sample set
Figure 410467DEST_PATH_IMAGE054
Training set of middle sample to be updated
Figure 189067DEST_PATH_IMAGE055
Confidence of each cluster center in
Figure 994212DEST_PATH_IMAGE056
Figure 513530DEST_PATH_IMAGE057
Wherein, according to
Figure 822152DEST_PATH_IMAGE058
Confidence of k clusters
Figure 771654DEST_PATH_IMAGE059
Will be provided with
Figure 532936DEST_PATH_IMAGE060
And classifying into corresponding rainstorm or hail clusters.
S6: the classification evaluation index of the model to the supervision sample set can be obtained through calculation according to the data label, when the index meets the preset condition, the first sample is updated to a rainstorm or hail cluster, a pseudo label of the cluster corresponding to the data is finally made according to the position of the data in the first sample in the cluster, and S2-S6 are repeated until q parts of the first samples are processed;
due to the fact that
Figure 124454DEST_PATH_IMAGE058
To label the samples, model pairs can thus be computed from the data labels
Figure 287583DEST_PATH_IMAGE061
The classification evaluation index hit rate (POD), False Alarm Rate (FAR), and Critical Success Index (CSI) of (a) is calculated as follows:
Figure 407985DEST_PATH_IMAGE062
wherein,
Figure 922143DEST_PATH_IMAGE063
the number of times that the tag is hail consistent with the actual tag is identified for the model,
Figure 51773DEST_PATH_IMAGE064
identifying the number of times that the tag is inconsistent with the actual tag for no hail for the model,
Figure 69408DEST_PATH_IMAGE065
identifying the times that the label is inconsistent with the actual label for the model;
the performance is considered good when the supervision index meets any of the following conditions:
(ii) POD index increases with an increase value greater than the FAR increase value at POD < 90%;
POD index increased by greater than 2/3FAR increase for POD > 90%;
if the model performs well, it will
Figure 626291DEST_PATH_IMAGE066
Update to a cluster
Figure 362166DEST_PATH_IMAGE067
Calculating and keeping the mean vector of hail and rainstorm clusters; clustering according to the final x
Figure 29908DEST_PATH_IMAGE068
Position in (2) make a dummy label for the x corresponding cluster:
Figure 167628DEST_PATH_IMAGE069
if the model performance does not meet the expectation, discarding
Figure 895413DEST_PATH_IMAGE070
(ii) a Waiting for the input of the next unlabeled sample set or outputting the recognition model at the moment;
if the identification model needs to be output, taking the mean vector reserved in the training process as model input, and respectively carrying out classification identification on the hail and rainstorm training set obtained at the last time to obtain corresponding CSI values; and taking the mean vector with the highest CSI value as the final model output.
S7: carrying out automatic parameter tuning on the updated recognition model;
this step is carried outThe method is used for optimizing the weight and parameters of the recognition model updated in step S6, specifically, recognizing test data by using the trained model to obtain the CSI value of the model to the test set, and taking the CSI value as the reference value of the bayesian optimization method to perform the above optimization on the CSI value
Figure 118584DEST_PATH_IMAGE071
The parameters (VIL characteristic weight value, H _ R _ Max characteristic weight value, R _ Max characteristic weight value) are automatically optimized. Updating the obtained parameter optimization result into a model;
s8: and (4) inputting the mean vector of the final model as an identification model, and finishing the confidence coefficient of each sample in the to-be-identified sample to each cluster according to the steps S3-S4 to classify and finish identification.
In this step, each cluster of training set is obtained through the above steps
Figure 324437DEST_PATH_IMAGE072
As a recognition model input; at the same time utilize
Figure 582243DEST_PATH_IMAGE073
Calculating to-be-identified sample by using calculation method
Figure 477999DEST_PATH_IMAGE074
Confidence of each sample in each cluster:
Figure 657308DEST_PATH_IMAGE075
and classifying according to the confidence coefficient to finish the identification.
Example 2
Based on the method in example 1, this example proposes a specific implementation to train and test the method in example 1:
in the embodiment, 104 initial hail training data and 103 hail test data are selected, 104 are randomly separated from 207 hail live data to serve as training data, and the other 103 are used as test data; 1098 rainstorm training data and 1098 rainstorm test data are simultaneously selected, and 1098 training data and the other 1098 data are randomly separated from 2196 rainstorm live data and serve as test data. The unknown sample data may be: collecting the time of hail occurrence in the festival in 2019 by the network, analyzing combined reflectivity files near the time point to obtain 20850 sample data serving as unlabelled data sets for training an identification model;
dividing 20850 unknown sample data into 1000 groups randomly, and adding training according to the method in the embodiment 1, wherein the weight values used in the training are as follows: VIL characteristic weight value: 0.1369, respectively; h _ R _ max characteristic weight value: 0.7220, respectively; r _ Max characteristic weight value: 0.1411, respectively; in the training process, after the model is updated every time, classifying the test data by using the model, wherein a classification evaluation index change diagram is shown in figure 1; the classification scheme is shown in FIG. 2, wherein in FIG. 2 "
Figure 932431DEST_PATH_IMAGE076
"the shaped points are correctly identified hail points," - "the shaped dots are correctly identified rainstorm points," the x "the shaped points are incorrectly identified rainstorm points"
Figure 44744DEST_PATH_IMAGE077
"the point of the shape is a hail point which is identified by mistake;
further, taking hail reduction as an example, controlling other conditions to be unchanged, carrying out hail identification by using a semi-supervised support vector machine identification method, and finally obtaining a prediction comparison result as follows:
watch (A)
Figure 848752DEST_PATH_IMAGE078
Hail suppression detection comparison result data
Figure 46515DEST_PATH_IMAGE079
From the above experimental results, it can be seen that: the hail identification method has the advantages that the accuracy of hail test data is 96.11%, the false alarm rate is 10.81%, the critical success index is 86.08%, and the overall effect of the method is higher than that of the identification result of the same type of method (semi-supervised support vector machine) under the condition of the same training data and test data.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (4)

1. A hail identification method based on semi-supervised learning under multi-dimensional radar data is characterized by comprising the following steps:
s1: acquiring a marked sample set, extracting a supervision sample set from the marked sample set by a random sampling method, dividing the marked sample set into a rainstorm sample training set and a hail sample training set according to rainstorm and hail labels, acquiring an unlabeled data set, and randomly averaging the unlabeled data set into q first samples;
s2: calculating a mean vector of the rainstorm sample training set and the hail sample training set as a clustering center;
s3: calculating the weighted distance of a first sample to each cluster center, clustering and dividing the first sample into a corresponding cluster rainstorm sample training set or hail sample training set, and updating each cluster center by using the method in S2;
s4: repeating the step S3 until the last iteration is the same as the value of the previous iteration, obtaining the cluster centers at the moment, and taking the distance from the first sample point to each cluster center as the confidence of the corresponding cluster;
s5: repeating the steps S2-S4 by taking the supervised sample set as a first sample in the step S3, obtaining the supervised confidence of the supervised sample set to each clustering center, and classifying the data in the supervised sample set into corresponding rainstorm or hail clusters according to the supervised confidence;
s6: the classification evaluation index of the model to the supervision sample set can be obtained through calculation according to the data labels, when the index meets the preset condition, the first sample is updated to a rainstorm or hail cluster, the mean vector of each cluster is calculated and reserved, the pseudo label of the cluster corresponding to the data is finally made according to the position of the data in the first sample in the cluster, and S2-S6 are repeated until q parts of the first samples are processed; respectively calculating evaluation indexes of the retained mean vector to the last hail and rainstorm training set, and selecting the mean vector of the optimal index as a final model;
s7: and (4) inputting the mean vector of the final model as an identification model, calculating the confidence coefficient of each sample in the to-be-identified sample to each cluster according to the steps S3-S4, and classifying to finish identification.
2. The method of claim 1, wherein the annotated sample set and the unlabeled data set comprise radar base reflectance images and doppler weather radar series data.
3. The method according to claim 1, wherein the classification evaluation index in step S6 includes: hit rate, false alarm rate, and critical success index.
4. The method according to claim 3, wherein the preset condition is:
when the hit rate is less than 90%, the hit rate index is increased and the increase value is greater than the increase value of the false alarm rate;
when the hit rate is greater than or equal to 90%, the hit rate index increases and the increase value is greater than the increase value of the false alarm rate of 2/3.
CN202110624140.6A 2021-06-04 2021-06-04 Hail identification method based on semi-supervised learning under multi-dimensional radar data Active CN113095442B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110624140.6A CN113095442B (en) 2021-06-04 2021-06-04 Hail identification method based on semi-supervised learning under multi-dimensional radar data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110624140.6A CN113095442B (en) 2021-06-04 2021-06-04 Hail identification method based on semi-supervised learning under multi-dimensional radar data

Publications (2)

Publication Number Publication Date
CN113095442A true CN113095442A (en) 2021-07-09
CN113095442B CN113095442B (en) 2021-09-10

Family

ID=76664557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110624140.6A Active CN113095442B (en) 2021-06-04 2021-06-04 Hail identification method based on semi-supervised learning under multi-dimensional radar data

Country Status (1)

Country Link
CN (1) CN113095442B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449807A (en) * 2021-07-13 2021-09-28 电子科技大学 New-class radar signal creating method based on reliability test
CN113657610A (en) * 2021-08-27 2021-11-16 无锡九方科技有限公司 Hail climate characteristic prediction method based on random forest
CN113936166A (en) * 2021-09-08 2022-01-14 昆明理工大学 Hail echo identification method and system based on Doppler weather radar data
CN114755745A (en) * 2022-05-13 2022-07-15 河海大学 Hail weather identification and classification method based on multi-channel depth residual shrinkage network
CN114818839A (en) * 2022-07-01 2022-07-29 之江实验室 Deep learning-based optical fiber sensing underwater acoustic signal identification method and device
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
CN115310879A (en) * 2022-10-11 2022-11-08 浙江浙石油综合能源销售有限公司 Multi-fueling-station power consumption control method based on semi-supervised clustering algorithm
WO2023178198A1 (en) * 2022-03-16 2023-09-21 Allstate Insurance Company Intelligent structural protection systems and methods
CN118191781A (en) * 2024-05-15 2024-06-14 成都信息工程大学 Hail automatic identification method based on radar image space mapping

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN104156438A (en) * 2014-08-12 2014-11-19 德州学院 Unlabeled sample selection method based on confidence coefficients and clustering
US20180247416A1 (en) * 2017-02-27 2018-08-30 Dolphin AI, Inc. Machine learning-based image recognition of weather damage
CN109190890A (en) * 2018-07-27 2019-01-11 南京理工大学 A kind of user behavior analysis method based on custom power consumption data
CN110852245A (en) * 2019-11-07 2020-02-28 中国民航大学 Dual-polarization meteorological radar precipitation particle classification method based on discrete attribute BNT
CN111308471A (en) * 2020-02-12 2020-06-19 河海大学 Rain, snow and hail classification monitoring method based on semi-supervised domain adaptation
CN111369093A (en) * 2018-12-26 2020-07-03 天云融创数据科技(北京)有限公司 Irrigation method and device based on machine learning
CN111916089A (en) * 2020-07-27 2020-11-10 南京信息工程大学 Hail detection method and device based on acoustic signal characteristic analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN104156438A (en) * 2014-08-12 2014-11-19 德州学院 Unlabeled sample selection method based on confidence coefficients and clustering
US20180247416A1 (en) * 2017-02-27 2018-08-30 Dolphin AI, Inc. Machine learning-based image recognition of weather damage
CN109190890A (en) * 2018-07-27 2019-01-11 南京理工大学 A kind of user behavior analysis method based on custom power consumption data
CN111369093A (en) * 2018-12-26 2020-07-03 天云融创数据科技(北京)有限公司 Irrigation method and device based on machine learning
CN110852245A (en) * 2019-11-07 2020-02-28 中国民航大学 Dual-polarization meteorological radar precipitation particle classification method based on discrete attribute BNT
CN111308471A (en) * 2020-02-12 2020-06-19 河海大学 Rain, snow and hail classification monitoring method based on semi-supervised domain adaptation
CN111916089A (en) * 2020-07-27 2020-11-10 南京信息工程大学 Hail detection method and device based on acoustic signal characteristic analysis

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JUNZHI SHI等: "Radar-Based Automatic Identification and Quantification of Weak Echo Regions for Hail Nowcasting", 《ATMOSPHERE》 *
JUNZHI SHI等: "Radar-based Hail-producing Storm Detection Using Positive Unlabeled Classification", 《TEHNIČKI VJESNIK》 *
NIKOLA BESIC等: "Hydrometeor classification through statistical clustering of polarimetric radar measurements: a semi-supervised approach", 《ATMOSPHERIC MEASUREMENT TECHNIQUES》 *
SUGATO BASU等: "Semi-supervised clustering by seeding", 《PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *
范文等: "基于聚类评分的暴雨/冰雹分类模型", 《天津大学学报(自然科学与工程技术版)》 *
莫日根: "标记样本规模对半监督文本聚类算法的影响", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
马洪民等: "暴雨中冰雹的识别", 《中国优秀硕士学位论文全文数据库 基础科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449807B (en) * 2021-07-13 2022-06-14 电子科技大学 New-class radar signal creating method based on reliability test
CN113449807A (en) * 2021-07-13 2021-09-28 电子科技大学 New-class radar signal creating method based on reliability test
CN113657610A (en) * 2021-08-27 2021-11-16 无锡九方科技有限公司 Hail climate characteristic prediction method based on random forest
CN113657610B (en) * 2021-08-27 2023-09-22 无锡九方科技有限公司 Hail climate characteristic prediction method based on random forest
CN113936166A (en) * 2021-09-08 2022-01-14 昆明理工大学 Hail echo identification method and system based on Doppler weather radar data
WO2023178198A1 (en) * 2022-03-16 2023-09-21 Allstate Insurance Company Intelligent structural protection systems and methods
CN114755745A (en) * 2022-05-13 2022-07-15 河海大学 Hail weather identification and classification method based on multi-channel depth residual shrinkage network
CN114755745B (en) * 2022-05-13 2022-12-20 河海大学 Hail weather identification and classification method based on multi-channel depth residual shrinkage network
WO2023216583A1 (en) * 2022-05-13 2023-11-16 河海大学 Hail weather identification and classification method based on multi-channel deep residual shrinkage network
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
CN115168345B (en) * 2022-06-27 2023-04-18 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
CN114818839B (en) * 2022-07-01 2022-09-16 之江实验室 Deep learning-based optical fiber sensing underwater acoustic signal identification method and device
CN114818839A (en) * 2022-07-01 2022-07-29 之江实验室 Deep learning-based optical fiber sensing underwater acoustic signal identification method and device
CN115310879A (en) * 2022-10-11 2022-11-08 浙江浙石油综合能源销售有限公司 Multi-fueling-station power consumption control method based on semi-supervised clustering algorithm
CN115310879B (en) * 2022-10-11 2022-12-16 浙江浙石油综合能源销售有限公司 Multi-fueling-station power consumption control method based on semi-supervised clustering algorithm
CN118191781A (en) * 2024-05-15 2024-06-14 成都信息工程大学 Hail automatic identification method based on radar image space mapping

Also Published As

Publication number Publication date
CN113095442B (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN113095442B (en) Hail identification method based on semi-supervised learning under multi-dimensional radar data
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
US20210390355A1 (en) Image classification method based on reliable weighted optimal transport (rwot)
Fang et al. Confident learning-based domain adaptation for hyperspectral image classification
CN110134803B (en) Image data quick retrieval method based on Hash learning
CN111353153A (en) GEP-CNN-based power grid malicious data injection detection method
CN104732244B (en) The Classifying Method in Remote Sensing Image integrated based on wavelet transformation, how tactful PSO and SVM
CN111984817B (en) Fine-grained image retrieval method based on self-attention mechanism weighting
US20230138302A1 (en) Multiple scenario-oriented item retrieval method and system
CN112883839A (en) Remote sensing image interpretation method based on adaptive sample set construction and deep learning
CN111460881A (en) Traffic sign countermeasure sample detection method and classification device based on neighbor discrimination
CN114780767B (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN112633051A (en) Online face clustering method based on image search
CN109447110A (en) The method of the multi-tag classification of comprehensive neighbours&#39; label correlative character and sample characteristics
CN113222072A (en) Lung X-ray image classification method based on K-means clustering and GAN
CN115393631A (en) Hyperspectral image classification method based on Bayesian layer graph convolution neural network
Shen et al. Equiangular basis vectors
CN117495825A (en) Method for detecting foreign matters on tower pole of transformer substation
CN109934270B (en) Classification method based on local manifold discriminant analysis projection network
CN116663414A (en) Fault diagnosis method and system for power transformer
Cao et al. A multi-label classification method for vehicle video
CN113238197A (en) Radar target identification and data judgment method based on Bert and BiLSTM
CN114329031B (en) Fine-granularity bird image retrieval method based on graph neural network and deep hash
CN118200060B (en) Lightweight vehicle-mounted network intrusion detection method based on edge vision transformer model
CN114155443B (en) Hyperspectral image classification method based on multi-receptive field graph meaning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant