A kind of identification of thunderstorm core and method for tracing based on Hybrid Clustering Algorithm
Technical field
The invention belongs to lightning monitoring field, it is related to a kind of thunderstorm core identification based on Hybrid Clustering Algorithm and tracking side
Method.
Background technology
With the development of electronic technology and computer technology, the monitoring of thunderstorm lightning activity observes hair from traditional lightning location
Lightning activity minutia during the entire thunderstorm life cycle of complete documentation is opened up, can develop various be based on thunderstorm on this basis
The lightning data product that life cycle develops.Indicator of the lightning activity as thunderstorm convective activity power is dropped compared to thunder cloud
The meteorological radar sounding of water particle, the potentiality in terms of the movable timeliness of diagnosis strong convection and its accuracy are increasingly by weight
Depending on, and be expected to be difficult to the monitoring that thunderstorm convective activity is carried out in the region detected in some meteorological Doppler radars.
Data mining technology and geographic information system technology are meteorological in processing as two important technologies in information technology
There are extremely important status and effect in terms of data.Data mining (Data Mining) refers in the database, comprehensive utilization system
Method, mode identification technology, artificial intelligence approach, nerual network technique scheduling theory are counted, novel, believable, people are drawn
Interested and final intelligible knowledge, to disclose the rule lain in data, inner link and development trend.Ground
Manage the preferable earth's surface of the features such as information systems technology can be by space characteristics, attributive character possessed by meteorological data and temporal characteristics
Reveal and, is the effective means for realizing data management.To thunderstorm core identification, prediction technique have very much, but because thunder and lightning with
Machine, locality, dispersibility, sudden, instantaneity and these three-dimensionality salient features so that different thunder and lightning prediction techniques
There is the environment that oneself is most suitably used.Clustering algorithm in maintenance data excavation, it is right for thunder and lightning own characteristic in conjunction with GIS platform
Algorithm optimizes, can accomplish it is quick, convenient, accurately calculate, and meet and close on related request in trend prediction,
There is actual meaning in the work of thunder and lightning nowcasting.
The cluster that can find arbitrary shape in having noisy spatial data based on traditional DBSCAN algorithms, can be by density
The features such as sufficiently large adjacent area connection, can be effectively treated abnormal data, algorithmic stability.But when being applied to Lightning data
Cluster when, it is obtaining the result is that cluster one by one, is not a "center", and existing noise spot also cannot be distinguished.
And the key of KMEANS algorithms is the selection of K values, if Lighting Position Data distribution excessively disperses, is polymerize according to solid defining K value, is obtained
To the position of barycenter may differ greatly with physical location.
Invention content
Technical problem to be solved by the invention is to provide a kind of identification of thunderstorm core and tracking based on Hybrid Clustering Algorithm
Method, it is proposed that the Lighting Position Data of monitoring point same period is polymerize by DBSCAN algorithms into line density, is formed several
A cluster, and using the data set of every cluster as new input, the iteration polymerization of KMEANS algorithms is recycled, if defining K value is fixed
It is 1, finds out the coordinate position of the thunderstorm caryoplasm heart;On the basis of cluster analysis result, to the mobile route of barycenter lightning point and
Lightning power is fitted, to obtain the strong and weak change of the relevance between thunderstorm core and predictable subsequent time thunderstorm core
Change;It is effective that the computational methods, which are applied in terms of the identification of thunderstorm core and core tracking,.
The present invention uses following technical scheme to solve above-mentioned technical problem:
Thunderstorm core identification based on Hybrid Clustering Algorithm and method for tracing, are as follows:
Step A detects and records Lightning data using the lightning monitoring point of deployment, and carried out to the Lightning data of record pre-
Processing is divided into each Lightning data collection for waiting the periods;
Step B reaches the time difference of each website using GPS clock simultaneous techniques and lightning electric field change pulses of radiation
(TOA), by arrival time difference algorithm, the space orientation coordinate of lightning is acquired;
Step C seeks obtained lighting location data in step B using DBSCAN algorithms and the mixing of KMEANS algorithms
Obtain the relevance between the thunderstorm caryoplasm heart coordinate position, lightning frequency and thunderstorm core of lighting location data.
As a further optimization solution of the present invention, the Lightning data recorded in step A is pre-processed, and is divided into each etc.
The Lightning data collection of period, specially:
Step A-1, the interior setting very low frequency Lightning radiation receiver of lightning location monitoring station, computer, GPS clock mould
Block, website continuously without interval capture lightning impulse waveform and its reach absolute time, generate data set;
Step A-2 pre-processes the step A-1 data sets generated, by Internet data transmission, when obtaining corresponding
The data set of section.
As a further optimization solution of the present invention, GPS clock simultaneous techniques and lightning electric field change spoke are used in step B
It penetrates the time difference (TOA) that pulse reaches each website and the space orientation coordinate of lightning is acquired, specifically by arrival time difference algorithm
For:
Step B-1 at least establishes four lightning location monitoring stations, and the data of the same period to being obtained in step A take
Obtain its GPS time;
Step B-2 makes full use of the GPU resource of video card, and according to arrival time difference algorithm (TDOA), it is fixed quickly to acquire lightning
Position coordinate.
As a further optimization solution of the present invention, it is acquired using DBSCAN algorithms and the mixing of KMEANS algorithms in step C
Relevance between the thunderstorm caryoplasm heart coordinate positions of lighting location data, lightning frequency and thunderstorm core, specially:
Step C-1 sets Eps and MinPts values, using DBSCAN algorithms, each equal periods for being obtained in traversal step B-2
Lightning location coordinate data collection, search for the Eps neighborhoods of each Lightning data point successively, the location data of each equal periods carried out
Cluster calculation so that the data similarity in same class is maximum, and the similitude of the data in inhomogeneity is minimum, removes noise number
According to rear, the cluster of several arbitrary shapes is formed;
Step C-2 is recycled according to the optimum cluster of C-1 as a result, using the data set of every cluster as new input
KMEANS algorithms, and by the latitude and longitude coordinates of all members in cluster, the space that iteration polymerization finds out the i.e. thunderstorm caryoplasm heart of cluster is sat
Cursor position;
Step C-3, according to C-2 thunderstorms core and barycenter as a result, obtain multiple thunderstorm nuclear informations of same period, but these
Thunderstorm core is that have certain relevance, i.e., thunderstorm core is come by which thunderstorm core differentiation;By calculating the more same period
The distance between thunderstorm caryoplasm heart of different periods recurred is in the threshold range of setting and within the scope of thunderstorm core
The intensity that lightning occurs, come calculate the relationship between each thunderstorm core (current thunderstorm core be by which last thunderstorm core develop Lai
), and then calculate the evolution process of single thunderstorm core.
As a further optimization solution of the present invention, in step B-2, using CUDA programming techniques, video card GPU is made full use of
Resource accelerates data run processing speed.
As a further optimization solution of the present invention, in step C-2, KMEANS algorithms represent a clustering cluster with barycenter,
The noise data collection in DBSCAN clustering clusters is filtered out, cluster result substitutes into KMEANS algorithms, obtains optimal polymerization result.
The present invention has the following technical effects using above technical scheme is compared with the prior art:
The present invention is identified to thunderstorm for tradition DBScan algorithms and the deficiency of thunderstorm power prediction, and KMEANS is clustered and is calculated
Method and the progress of DBSCAN algorithms are compound, carry out waiting period datas to Lighting Position Data with the compound rear Hybrid Clustering Algorithm proposed
Cluster;The algorithm not only allows for Lightning data and is distributed mixed and disorderly situation, also overcomes DBSCAN algorithms and does not find out " central point "
The case where, the perfect method that the identification of thunderstorm core and core relevance are calculated;Meanwhile in conjunction with DBSCAN algorithms and KMEANS algorithms
The characteristics of, by the Lighting Position Data under each equal periods carries out the identification of thunderstorm core, core relevance calculates, acquiring single thunderstorm
The mobile evolution process of core and the thunderstorm core power Long-term change trend of subsequent time;
In the inspection of practical thunder and lightning synoptic process, pass through the comparison with weather radar data for communication, the results showed that institute of the present invention
The method of proposition can accurately reflect thunder and lightning variation tendency in Thunderstorm Weather, reach good thunderstorm core identification and thunder
The effect of sudden and violent core moving tracing.
Description of the drawings
In order to facilitate the understanding of those skilled in the art, the present invention will be further described below with reference to the drawings.
Fig. 1 is a kind of flow chart of the identification of thunderstorm core and method for tracing based on Hybrid Clustering Algorithm of the present invention;
Fig. 2 is DBSCAN algorithm flow charts in the embodiment of the present invention;
Fig. 3 is KMEANS algorithm principles figure in the embodiment of the present invention;
Fig. 4 is TDOA algorithms schematic diagram in the embodiment of the present invention;
Fig. 5 is lightning number figure in the embodiment of the present invention;
Fig. 6 is DBSCAN clustering distributions figure in the embodiment of the present invention;
Fig. 7 be in the embodiment of the present invention KMEANS cluster after the distribution map with barycenter;
Fig. 8 is thunderstorm core trajectory diagram in the embodiment of the present invention;
Fig. 9 is thunderstorm core power tendency chart in the embodiment of the present invention.
Specific implementation mode
Below in conjunction with the accompanying drawings and specific embodiment is described in further detail technical scheme of the present invention:
The present invention provide it is a kind of based on Hybrid Clustering Algorithm thunderstorm core identification and method for tracing, as shown in Figure 1, be directed to thunder
Huge and mixed and disorderly location data in pyroelectric monitor point, this method are calculated according to the lightning data of monitoring point transmission by reaching time-difference
Lightning data collection is aggregated into several by the real time positioning data that method acquires first with the density of DBSCAN algorithms up to characteristic
Cluster, and using the data set of every cluster as new input, the iteration polymerization of KMEANS algorithms is recycled to find out the coordinate of barycenter
Position.On the basis of cluster analysis result, mobile route and lightning power to barycenter coordinate points are fitted, to obtain
The strong and weak variation tendency of relevance and predictable subsequent time thunderstorm core between thunderstorm core.It is demonstrated experimentally that this method can
Accurately reflect thunder and lightning variation tendency in Thunderstorm Weather, reaches the effect of good thunderstorm core identification and thunderstorm moving tracing
Fruit.
Based on DBSCAN clustering methods Lightning data analysis main thought be:To under Severe thunderstorm scale, part
The lightning in area changes with time change, and the lightning number within given lightning radius must not drop below given
Threshold value M i nPts, i.e. the density of neighborhood must not drop below some threshold value.So the set lightning cluster in the period is radius
The set of the above lightnings of M i nPts of interior generation;KMEANS algorithms are to represent a cluster with the center of a cluster, that is, are existed
The accumulation selected in iterative process is not necessarily a point in cluster.The purpose is to make the data point in each cluster and place cluster
The error sum of squares SSE (Sum of Squared Error) of barycenter reaches minimum.Here is some definition involved in algorithm:
(1) Eps neighborhoods:Region in given object radius Eps is known as the Eps neighborhoods of the object;
(2) kernel object:If the sample points in given object Eps neighborhoods are more than or equal to minimal amount MinPts,
The object is referred to as kernel object;
(3) directly density is reachable:An object set D is given, if P is in the Eps neighborhoods of q, and q is a core
Object then claims object P from object q to be that direct density is reachable;
(4) density is reachable:For sample set D, if there is object chain a p1, p2 ... ..., Pn, P1=q, Pn=
P is reachable about the direct density of Eps and MinPts from pi for pi ∈ D (I≤i≤n), pi+1, then it is from right to claim object P
As q is reachable (density-reachable) about Eps and MinPts density;
(5) density is connected:If there are an object o in object set D so that object P and q be from o about Eps and
MinPts density is reachable, then object P to q is connected (density-connected) about Eps with MinPts density;
(6) noise spot:It is not considered as then noise spot in the object of any cluster.
It can be found that it is the reachable transitive closure of direct density that density is reachable, and this relationship is asymmetrical.Only
Mutual density is reachable between kernel object.However, it is symmetric relation that density, which is connected,.The purpose of DBSCAN is to find the connected object of density
Maximum set.
DBSCAN algorithms can find the cluster of arbitrary shape in having noisy spatial data, can be by the sufficiently large phase of density
The features such as neighbouring region connects, and can be effectively treated abnormal data, algorithmic stability.But it when being applied to the cluster of Lightning data, obtains
It is arriving the result is that cluster one by one, is not a "center".And the key of KMEANS algorithms is the selection of K values, if lightning
Location data distribution excessively disperses, and polymerize according to solid defining K value, the position of obtained barycenter may differ greatly with physical location.
In the present invention, for the problem present on, the advantage module in DBSCAN algorithms and KMEANS algorithms is mixed, is proposed
Hybrid Clustering Algorithm, as shown in Figure 2 and Figure 3, the essential idea that the mixed process of the algorithm is designed using DBSCAN algorithms is base
Plinth is auxiliary with the characteristic of KMEANS algorithms, specially:
Step 1:The lightning data that lightning monitoring point is sent is pre-processed first, filters out some abnormal datas, it will be real-time
Effective lightning data gives lighting location processing module, and lighting location processing module accelerates to realize using TDOA algorithms, GPU
Lightning data positions.TDOA algorithms are a kind of localization methods based on reverse link, and two base stations are reached by monitor station signal
Time difference position the position of lightning.TDOA algorithms at least need 3 or more monitoring points, from monitoring point by the same time
It measures the data that same signal obtains and is sent to main monitoring point, main monitoring point calculates separately out radio signal and reaches two monitorings
The time difference (utilizing related algorithm) of point antenna, range difference is converted to according to the time difference between 2 points, a hyperbolic can be obtained
Line can obtain two by three time differences that either multiple radio monitoring points measure above or a plurality of hyperbola intersects
To realize the positioning to emission source.The algorithm requires no knowledge about the specific time of signal propagation, can offset greatly accidentally
Difference and the error brought of multipath effect, baseline length is unrestricted, avoids mutual coupling between antenna using Long baselines, phase is not present
Position fuzzy problem, positioning accuracy is very high, as shown in Figure 4;
Step 2:After the completion of lighting location data processing, the cluster that data set is carried out to thunderstorm core using DBSCAN algorithms is known
Not.DBSCAN algorithm flow charts are as shown in Fig. 2, DBSCAN algorithms are substantially a mistakes for finding class cluster and continuous extension class cluster
Journey, to form class cluster head elder generation packing density will meet the requirements.Entire data set is scanned, any one core point is found, to the core
Heart point is expanded.The method of expansion be find from all density of the core point be connected data point (attention is density
It is connected).Traverse all core points (because boundary point can not expand) in the Eps neighborhoods of the core point, find and these
The connected point of data dot density, until the data point that can not expand.The boundary point for the cluster being finally clustered into all right and wrong
Core data point.It is exactly to rescan data set (not including any data point in the cluster searched out before) later, searching does not have
There is the core point being clustered, repeat above step, which is expanded until not having new core in data set
Until point.The data point being not comprised in data set in any cluster just constitutes noise.After DBSCAN algorithms, lightning data
Several clusters are formd, the thunderstorm core as identified;
Step 3:Using KMEANS algorithms, set K=1, using several thunderstorm Nuclear Data collection as new input, (1) from
1 object is arbitrarily chosen in object data set as initial cluster center point;(2) (3) (4) are recycled until each cluster is no longer sent out
It changes and turns to only;(3) according to the mean value (center object) of each clustering object, calculate each object and these center objects away from
From, and corresponding object is divided again according to minimum range;(4) recalculate each (changing) cluster mean value (in
Heart object).The center object of calculating is the centre coordinate position of the thunderstorm core, and the schematic diagram of the algorithm is as shown in Figure 3;
Step 4:By calculating whether the distance between same period and several thunderstorm cores in different time periods are setting
In fixed radius, to determine the relevance between core and core, and then the evolution process of single thunderstorm core can be tracked simultaneously
It can be fitted according to the strong and weak variation of thunderstorm core, predict thunderstorm core power trend.
Embodiment
The embodiment of the present invention chooses 14 days 11 July in 2017:00 to 11:The raw Thunderstorm Weather instance data of 30 distributions.It is empty
Between on scale with longitude variation range for 117 ° of 09'-119 ° of 13', latitude variation range is 31 ° of 51'-33 ° of 99', and the period is total
Thunder and lightning 521 occurs for meter.Above-mentioned data are divided into every 3 minutes in time scale and divide data set for an interval, such as
Shown in table 1.
The Lightning data statistical information at equal intervals of table 1
It is distributed on map as shown in Figure 5.The data shown on picture are 11:15-11:The number of 21 6 minutes this periods
According to instantaneous picture.Lighting Position Data rambling presentation on map, does not see thunderstorm nuclear location and moving direction.By this
A little location data data sets the most, input DBSCAN algorithms are clustered.
Two parameter Eps that DBSCAN is arranged are 20km, MinPts 12, and above-mentioned data set is substituted into DBSCAN algorithms,
After removing noise data, obtained cluster result.It is as shown in table 2 that cluster cluster data will be obtained after the data clusters of the period.
2 DBSCAN cluster results of table
ID |
Time started |
End time |
Cluster number |
1 |
11:00 |
11:06 |
3 |
2 |
11:03 |
11:09 |
3 |
3 |
11:06 |
11:12 |
2 |
4 |
11:09 |
11:15 |
3 |
5 |
11:12 |
11:18 |
3 |
6 |
11:15 |
11:21 |
3 |
7 |
11:18 |
11:24 |
3 |
8 |
11:21 |
11:27 |
2 |
9 |
11:24 |
11:30 |
3 |
It is distributed on map as shown in Figure 6.What is presented on map is 11:15-11:21 this 6 minutes data.It can from figure
Clearly to find out, which forms 3 core lightning clusters, and maximum lightning cluster is distributed near Jiashan.From
Tables 1 and 2 can be seen that the variation with the time, and the frequency of lightning enhances after decrease but also not only to the process weakened, and thunderstorm
Core also becomes 2 from 3 and is clustered into 3 lightning clusters again, this embodies the spy of the randomness that lightning itself has and instantaneity
Point.And then, several lightning clusters obtained after DBSCAN being clustered input KMEANS algorithms as new data set, calculate
The center-of-mass coordinate position of each cluster, obtains the thunderstorm Nuclear Data with barycenter, and will wherein at the beginning of some lightning cluster, terminate
The significant datas such as time, barycenter longitude and latitude, lightning number summarize, and constitute thunderstorm core core data set, as shown in table 3.
3 KMEANS cluster results of table (wherein some cluster)
ID |
Time started |
End time |
Barycenter longitude |
Barycenter latitude |
Lightning number |
1 |
11:00 |
11:06 |
118.441767 |
32.565867 |
93 |
2 |
11:03 |
11:09 |
118.443899 |
32.565867 |
90 |
3 |
11:06 |
11:12 |
118.456950 |
32.576041 |
111 |
4 |
11:09 |
11:15 |
118.471954 |
32.569283 |
121 |
5 |
11:12 |
11:18 |
118.483603 |
32.570045 |
99 |
6 |
11:15 |
11:21 |
118.479054 |
32.577664 |
82 |
7 |
11:18 |
11:24 |
118.508455 |
32.582688 |
64 |
8 |
11:21 |
11:27 |
118.558573 |
32.579050 |
51 |
9 |
11:24 |
11:30 |
118.597356 |
32.582731 |
27 |
It is distributed on map as shown in Figure 7.What is shown on Fig. 7 is still 11:15-11:21 this 6 minutes data.Map
Upper there are three thunderstorm cores, wherein the center of circle is expressed as center-of-mass coordinate position, and circle represents the range of the thunderstorm core.By this 30 minutes
Lightning data cluster result, that is, thunderstorm core center-of-mass coordinate setting-out connection, as shown in Figure 8.Thunderstorm core can be intuitively found out in figure
Distributed areas be subjected to displacement, the position of each thunderstorm core is also changing.Lightning occur the frequency also gradually by enhancing to
It reduces, until 11:24 points, lightning quantity has huge reduction, it is seen that this is a Strong Thunderstorm for the data of the method for inspection
The process passed by or gradually withered away.
For the present invention for deficiency existing for tradition DBSCAN algorithms and KMEANS algorithms, the advantage in conjunction with the two algorithms is special
Point clusters lightning data with the compound rear Hybrid Clustering Algorithm proposed.The core lightning cluster acquired is clustered according to lightning,
Corresponding lightning frequency of each period is found, all lightning frequencies for being included using the cluster utilize the song come matched curve
Line predicts the trend of the enhancing of subsequent time thunderstorm or decrease to be fitted.Curve graph is as shown in Figure 9.It is dark according to matched curve
Indicate that true lightning frequency variation tendency, lighter curve indicate prediction subsequent time lightning variation tendency.From cluster result and divide
Analysis curve can be seen that the thunderstorm core identification proposed by the invention based on Hybrid Clustering Algorithm and method for tracing carries out thunder and lightning
Thunderstorm core identifies that strong and weak trend prediction has good effect in short-term with core tracking and thunder and lightning.
Present invention disclosed above preferred embodiment is only intended to help to illustrate the present invention.There is no detailed for preferred embodiment
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to preferably explain the present invention
Principle and practical application, to enable skilled artisan to be best understood by and utilize the present invention.The present invention is only
It is limited by claims and its full scope and equivalent.