CN110490264A - Multidimensional distance cluster method for detecting abnormality and system based on time series - Google Patents
Multidimensional distance cluster method for detecting abnormality and system based on time series Download PDFInfo
- Publication number
- CN110490264A CN110490264A CN201910783824.3A CN201910783824A CN110490264A CN 110490264 A CN110490264 A CN 110490264A CN 201910783824 A CN201910783824 A CN 201910783824A CN 110490264 A CN110490264 A CN 110490264A
- Authority
- CN
- China
- Prior art keywords
- track
- distance
- multidimensional
- data
- longitude
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000005856 abnormality Effects 0.000 title claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 39
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 230000002159 abnormal effect Effects 0.000 claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims abstract description 20
- 238000013480 data collection Methods 0.000 claims abstract description 13
- 230000000694 effects Effects 0.000 claims abstract description 10
- 238000010801 machine learning Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000004140 cleaning Methods 0.000 claims abstract description 7
- 241001310793 Podium Species 0.000 claims description 8
- 238000005498 polishing Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 description 7
- 230000033001 locomotion Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 206010000117 Abnormal behaviour Diseases 0.000 description 2
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Radar Systems Or Details Thereof (AREA)
Abstract
The invention discloses a kind of, and the multidimensional distance based on time series clusters method for detecting abnormality and system, belong to aviation safety technical field, the multidimensional distance cluster method for detecting abnormality based on time series comprises the steps of step 1: pre-processing to track data collection, the pretreatment includes cleaning and integrates again;Step 2: multidimensional similarity between track is calculated;Step 3: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance;It selects the hierarchical clustering algorithm in machine learning to be based on above-mentioned similarity matrix and carries out hierarchical clustering;Step 5: the abnormality detection effect of detection algorithm, construct has abnormal track on speed, direction, longitude, latitude, abnormal track and normal trace are clustered by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 value to evaluate clustering algorithm.
Description
Technical field
The invention belongs to aviation safety technical fields, cluster more particularly to a kind of multidimensional distance based on time series different
Normal detection method and system.
Background technique
As transportation, GPS positioning, target acquisition technology are fast-developing, more and more track datas are applied
Into experimental study.The trajectory clustering analysis of mobile object is monitored in traffic control, weather monitoring, intelligent navigation, anti-terrorism etc.
Field suffers from increasingly extensive and important application, and by analyzing these data, people can capture the movement of mobile object
Feature, while decision can be provided to the construction of social public infrastructure.In recent years, trajectory data mining research has become rail
The hot spot of mark data mining research field, including;Trajectory clustering, adjoint mode excavation, Frequent Pattern Mining and exception
Track detection etc..Abnormal track detection refers to the object concentrated from track data and find out substantial deviation normal mode, it is track
It is abnormal to be widely used in taxi fraud, flight monitoring, hurricane track variation etc. for one important branch of the field of data mining
Activity recognition.
Flight safety is the minimum requirements of Civil Aviation Industry, is the minimum support mission of civil aviaton worker.The motion stabilization of aircraft
Property and mobility are very important flight safety.It is unstable or cause into stall that excessive cause once occurred both at home and abroad
Machine out of control ruins accident, and the reason for causing aircraft flight unstable is diversified.In recent years simultaneously, terroristic organization is increasingly ferocious
Rampant, the attack of terrorism causes to seriously affect safely in rapid succession, to aerodrome flight.
In order to ensure Flight Safety, need to store and analyze relevant space-time trajectory data of largely flying.Civil aviaton
Flight space-time trajectory data contains a variety of attributes such as latitude and longitude coordinates, record time, flying height, flying speed, course.The people
There is important influence to flight safety and flight efficiency with the aircraft accurate track that can repeat to fly, from the flight of civil aircraft
Track is set out, and has studied different aircrafts according to the mission program of instrument.In practical flight, civil aircraft flies generally according to standard
Line program deploys flight by the commander of terrestrial air traffic control personnel.But in special circumstances, it may appear that actual flight path is inclined
The case where from standardization program, can be sent out by the abnormality detection to flight path data from the track data set of practical flight
The track for deviateing normal flight mode is excavated, ensures that aircraft flies according to normal trace, it is ensured that flight safety.
Summary of the invention
The technical problem to be solved by the present invention is to the unusual checking technologies for current trajectory analysis to detect position
Based on information, the track order and kinetic characteristic of motion profile are had ignored.It is special to propose a kind of multidimensional based on time series
Sign cluster method for detecting abnormality and system, to improve the accuracy of track data abnormality detection technology, by extracting track data
Middle longitude, latitude, speed, direction multidimensional characteristic calculate track using Hausdorff distance using the manner of comparison of a pair three
Between multidimensional distance (similarity), construct the similarity matrix between track, and different in binding hierarchy clustering method detection track
Chang Hangwei.
In order to solve the above-mentioned technical problem, the technical solution of the present invention is as follows:
The first invention purpose of this patent is to provide a kind of multidimensional distance cluster method for detecting abnormality based on time series,
It comprises the steps of
Step 1: data prediction pre-processes track data collection, mainly include data are carried out cleaning and
Two parts are integrated again.
Obvious abnormal data are handled using regular expression.For the data of missing values, if certain data has multiple categories
Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value.And then
According to required feature, time, speed, direction, longitude, latitude feature are extracted to new table, to reach from track data concentration
Standard data format.
Step 2: multidimensional similarity between track is calculated.Track data is represented by TR={ P1,P2,…Pi,…,Pn,
Middle Pi=(loni,lati,vi,θi,ti), loni, latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor
The direction of tracing point, tiFor the timestamp information of the tracing point.Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein
TRiIndicate i-th track data.According to Hausdorff distance definition H (A, B)=max (h (A, B), h (B, A)),Wherein, h (A, B) is known as the unidirectional Hausdorff distance from set A to set B.The present invention
In, speed, direction, longitude, latitude are merged in Hausdorff range formula, the multidimensional between any two tracks is calculated
Hausdorff distance.
It is specific as follows:
(1) position feature: posdis (ai,bi)=dist (ai, (bi, bi-1, bi+1)) indicate two o'clock on two tracks
Longitude and latitude distance.The distance between given two points are calculated in the present invention using Haversine formula, it is specific as follows: given
Two o'clockLongitude and latitude distance are as follows:
Wherein:
Haversin (θ)=sin2(θ/2)=(1-cos (θ))/2
R is earth radius, can be averaged 6371km;ω1, ω2Indicate the longitude of two o'clock;Indicate the latitude of two o'clock
Degree;The difference of Δ λ expression two o'clock longitude.
(2) velocity characteristic:Indicate the speed on two tracks between two o'clock
Euclidean distance, the resolution of velocity of point are vertical speed v*sin θ, horizontal velocity v*cos θ.
(3) direction character:Indicate that two tracks change journey in internal direction
Degree, has been reacted the fluctuation situation of track, has been indicated using absolute value distance, specific as follows:
The angle value θ of given two o'clock1、θ2:
When | θ1-θ2| when≤180, the absolute value distance in direction is | θ1-θ2|;
When | θ1-θ2| when > 180, the absolute value distance in direction is 360- (θ1、θ2)max+(θ1、θ2)min。
That is in summary formula:
TMFD(ai,bi)=ωp×posdis+ωv×spedis+ωθ× angdis formula (2)
Wherein, ωp+ωv+ωθ=1, andRespectively indicate position feature, velocity characteristic, direction character
Weight factor, can according to the difference of application scenarios, can appropriate adjustment weight selection.
Tracing point is to matching: when calculating the minimum range between two tracks, arbitrary point a in the A of trackiOnly and in the B of track
Corresponding moment point biAnd the adjacent two o'clock in front and back compares.
Step 3: any two are calculated using the multidimensional characteristic distance method based on time series to track data collection
Multidimensional similarity distance h (TrA, TrB) between track, and then construct the similarity matrix R calculated between track, it may be assumed that
Wherein, rijIndicate the similarity distance between i-th track and j-th strip track.The elements in a main diagonal 0 indicates track certainly
Similarity distance of the body compared with itself.
Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance.That is, the hierarchical clustering in selection machine learning is calculated
Method is based on similarity matrix in step 3 and carries out hierarchical clustering to track data collection.Table 1 is the multidimensional in conjunction with track data
The hierarchical clustering algorithm of hausdorff distance.
Step 5: for the abnormality detection effect of detection algorithm, constructing has exception on speed, direction, longitude, latitude
Track.The specific abnormal track of construction is as follows:
Velocity shifts: it is concentrated from normal data and extracts 5 tracks, its speed is become 1.5 times of normal speed.
Direction offset: it is concentrated from normal data and extracts 5 tracks, its direction is become to the opposite direction of normal direction.
Positional shift: it in conjunction with the X-Y scheme of flight path, is concentrated from normal data and extracts 5 tracks, modification tracing point makes
It deviates normal flight track threshold value.
The abnormal track of construction is clustered with normal trace by above-mentioned hierarchical clustering algorithm, and selects accuracy
(Accuracy), accurate rate (Precision), recall rate (Recall), F1 value (F1-score) evaluate multidimensional hausdorff
The hierarchical clustering algorithm of distance.
Second goal of the invention of this patent is to provide a kind of multidimensional distance cluster abnormality detection system based on time series,
Include:
Preprocessing module: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple categories
Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then
According to required feature, new table is focused on from track data, to reach standard data format;
Similarity calculation module: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vi,θi,ti), loni,
latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiFor the tracing point when
Between stab information;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A,
B)=max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, are calculated and are appointed
The multidimensional Hausdorff distance anticipated between two tracks;
Constructing module: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Hierarchical clustering module: the hierarchical clustering algorithm of multidimensional hausdorff distance;Select the hierarchical clustering in machine learning
Algorithm carries out hierarchical clustering based on above-mentioned similarity matrix;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generating has the poly- of hierarchical structure
Class figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Detection module: the abnormality detection effect of detection algorithm, construct have on speed, direction, longitude, latitude it is abnormal
Track is clustered abnormal track and normal trace by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate,
F1 value evaluates clustering algorithm.
The third goal of the invention of this patent, which is to provide, a kind of realizes that the above-mentioned multidimensional distance cluster based on time series is abnormal
The computer program of detection method.
4th goal of the invention of this patent, which is to provide, a kind of realizes that the above-mentioned multidimensional distance cluster based on time series is abnormal
The information data processing terminal of detection method.
5th goal of the invention of this patent is to provide a kind of computer readable storage medium, including instruction, when it is being calculated
When being run on machine, so that computer executes the above-mentioned multidimensional distance cluster method for detecting abnormality based on time series.
Advantages of the present invention and good effect are as follows:
The present invention is detected for the unusual checking technology of current trajectory analysis based on location information, has ignored movement
The track order and kinetic characteristic of track.A kind of multidimensional characteristic method for detecting abnormality based on time series is proposed, to mention
The accuracy of high track data abnormality detection technology is used by extracting longitude, latitude, speed, directional information in track data
The manner of comparison of a pair three is calculated the multi-feature similarity of track data using Hausdorff distance, constructs the phase between track
Like property matrix, and binding hierarchy clustering method detects the abnormal behaviour in track.The present invention passes through the multidimensional characteristic for incorporating track,
Improve the susceptibility to abnormal data.
For deficiency existing for existing track method for measuring similarity, the present invention is based on the multidimensional of time series
Hausdorff distance, on the basis of considering track movement order and tracing point continuity Characteristics inherently, from position,
Speed, these three aspects of course calculate the similarity of track, while being directed to " tracing point is orderly " this feature, use a pair
Three comparative approach reduces the number of comparisons between tracing point, reduces computation complexity.In conjunction with the level in machine learning
Clustering algorithm more intuitively distinguishes normal abnormal track data with dendrogram.Increase tracing point multidimensional characteristic improve it is winged
The accuracy of the unusual checking of row track data.In practical applications, the track that notes abnormalities goes out for searching aircarrier aircraft
Existing failure and loophole have important reference significance.
Track data collection by pre-processing, is extracted time, speed, direction, longitude, longitude attribute, forms mark by the present invention
Quasi- data format;Track multidimensional characteristic similarity is calculated using Hausdorff distance;And construct the similitude calculated between track
Matrix, the hierarchical clustering algorithm in reselection machine learning are based on the similarity matrix and carry out hierarchical clustering, and generating has level
The dendrogram of structure.The present invention improves the susceptibility to abnormal data, helps to detect the exception information between track.
Detailed description of the invention
Fig. 1 is one-to-many matching figure in Hausdorff;
Fig. 2 is three matching figure of a pair in Hausdorff.
Specific embodiment
In order to further understand the content, features and effects of the present invention, the following examples are hereby given, and cooperate attached drawing
Detailed description are as follows.
Structure of the invention is explained in detail with reference to the accompanying drawing.
A kind of multidimensional distance cluster method for detecting abnormality based on time series, comprises the steps of
Step 1: data prediction pre-processes track data collection, mainly include data are carried out cleaning and
Two parts are integrated again.
Obvious abnormal data are handled using regular expression.For the data of missing values, if certain data has multiple categories
Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value.And then
According to required feature, time, speed, direction, longitude, latitude feature are extracted to new table, to reach from track data concentration
Standard data format.
Step 2: multidimensional similarity between track is calculated.Track data is represented by TR={ P1,P2,…Pi,…,Pn,
Middle Pi=(loni,lati,vi,θi,ti), loni, latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor
The direction of tracing point, tiFor the timestamp information of the tracing point.Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein
TRiIndicate i-th track data.According to Hausdorff distance definition H (A, B)=max (h (A, B), h (B, A)),Wherein, h (A, B) is known as the unidirectional Hausdorff distance from set A to set B.The present invention
In, speed, direction, longitude, latitude are merged in Hausdorff range formula, the multidimensional between any two tracks is calculated
Hausdorff distance.
It is specific as follows:
(4) position feature: posdis (ai,bi)=dist (ai, (bi, bi-1, bi+1)) indicate two o'clock on two tracks
Longitude and latitude distance.The distance between given two points are calculated in the present invention using Haversine formula, it is specific as follows: given
Two o'clockLongitude and latitude distance are as follows:
Wherein:
Haversin (θ)=sin2(θ/2)=(1-cos (θ))/2
R is earth radius, can be averaged 6371km;ω1, ω2Indicate the longitude of two o'clock;Indicate the latitude of two o'clock
Degree;The difference of Δ λ expression two o'clock longitude.
(5) velocity characteristic:Indicate the speed on two tracks between two o'clock
Euclidean distance, the resolution of velocity of point are vertical speed v*sin θ, horizontal velocity v*cos θ.
(6) direction character:Indicate that two tracks change journey in internal direction
Degree, has been reacted the fluctuation situation of track, has been indicated using absolute value distance, specific as follows:
The angle value θ of given two o'clock1、θ2:
When | θ1-θ2| when≤180, the absolute value distance in direction is | θ1-θ2|;
When | θ1-θ2| when > 180, the absolute value distance in direction is 360- (θ1、θ2)max+(θ1、θ2)min。
That is in summary formula:
TMFD(ai,bi)=ωp×posdis+ωv×spedis+ωθ× angdis formula (2)
Wherein, ωp+ωv+ωθ=1, andRespectively indicate position feature, velocity characteristic, direction character
Weight factor, can according to the difference of application scenarios, can appropriate adjustment weight selection.
Tracing point is to matching: when calculating the minimum range between two tracks, arbitrary point a in the A of trackiOnly and in the B of track
Corresponding moment point biAnd the adjacent two o'clock in front and back compares.
Point in Fig. 1 is to matching to calculate one-to-many matching method used in hausdorff, and the present invention is on its basis
Its number of matches is improved, calculation amount is reduced.Point in Fig. 2 matches matching to calculate a pair three used in hausdorff
Method,
Step 3: any two are calculated using the multidimensional characteristic distance method based on time series to track data collection
Multidimensional similarity distance h (TrA, TrB) between track, and then construct the similarity matrix R calculated between track, it may be assumed that
Wherein, rijIndicate the similarity distance between i-th track and j-th strip track.The elements in a main diagonal 0 indicates track certainly
Similarity distance of the body compared with itself.
Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance.That is, the hierarchical clustering in selection machine learning is calculated
Method is based on similarity matrix in step 3 and carries out hierarchical clustering to track data collection.Table 1 is the multidimensional in conjunction with track data
The hierarchical clustering algorithm of hausdorff distance.
The hierarchical clustering algorithm of 1 multidimensional hausdorff distance of table
Step 5: for the abnormality detection effect of detection algorithm, constructing has exception on speed, direction, longitude, latitude
Track.The specific abnormal track of construction is as follows:
Velocity shifts: it is concentrated from normal data and extracts 5 tracks, its speed is become 1.5 times of normal speed.
Direction offset: it is concentrated from normal data and extracts 5 tracks, its direction is become to the opposite direction of normal direction.
Positional shift: it in conjunction with the X-Y scheme of flight path, is concentrated from normal data and extracts 5 tracks, modification tracing point makes
It deviates normal flight track threshold value.
The abnormal track of construction is clustered with normal trace by above-mentioned hierarchical clustering algorithm, and selects accuracy
(Accuracy), accurate rate (Precision), recall rate (Recall), F1 value (F1-score) evaluate multidimensional hausdorff
The hierarchical clustering algorithm of distance.
A kind of multidimensional distance cluster abnormality detection system based on time series, comprising:
Preprocessing module: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple categories
Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then
According to required feature, new table is focused on from track data, to reach standard data format;
Similarity calculation module: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vi,θi,ti), loni,
latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiFor the tracing point when
Between stab information;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A,
B)=max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, are calculated and are appointed
The multidimensional Hausdorff distance anticipated between two tracks;
Constructing module: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Hierarchical clustering module: the hierarchical clustering algorithm of multidimensional hausdorff distance;Select the hierarchical clustering in machine learning
Algorithm carries out hierarchical clustering based on above-mentioned similarity matrix;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generating has the poly- of hierarchical structure
Class figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Detection module: the abnormality detection effect of detection algorithm, construct have on speed, direction, longitude, latitude it is abnormal
Track is clustered abnormal track and normal trace by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate,
F1 value evaluates clustering algorithm.
It is a kind of to realize that the distance of the multidimensional in above-mentioned first preferred embodiment based on time series clusters method for detecting abnormality
Computer program.
A kind of information for realizing the distance cluster method for detecting abnormality of the multidimensional in first preferred embodiment based on time series
Data processing terminal.
A kind of computer readable storage medium, including instruction, when run on a computer, so that computer executes the
The multidimensional distance cluster method for detecting abnormality based on time series in one preferred embodiment.
A kind of multidimensional distance cluster method for detecting abnormality based on time series, is embodied in two parts, a part extracts
Track characteristic calculates similarity between track using Hausdorff distance calculation formula;Another part utilizes hierarchical clustering algorithm,
Track collection is clustered, the abnormal behaviour between track is detected.Specific manifestation are as follows: when being extracted first from track data concentration
Between, speed, direction, longitude, Position Latitude and motion information, for any two tracks, tracing point is matched according to a pair three,
Secondly using multidimensional distance (similarity) between Hausdorff distance calculating track, and the similitude square calculated between track is constructed
Gust, the hierarchical clustering algorithm in reselection machine learning is based on the similarity matrix and carries out hierarchical clustering, and generating has level knot
The dendrogram of structure.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or
Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to
Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network
Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one
Computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from one
A web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)
Or wireless (such as infrared, wireless, microwave etc.) mode is carried out to another web-site, computer, server or data center
Transmission).The computer-readable storage medium can be any usable medium or include one that computer can access
The data storage devices such as a or multiple usable mediums integrated server, data center.The usable medium can be magnetic Jie
Matter, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid
State Disk (SSD)) etc..
The above is only the preferred embodiments of the present invention, and is not intended to limit the present invention in any form,
Any simple modification made to the above embodiment according to the technical essence of the invention, equivalent variations and modification, belong to
In the range of technical solution of the present invention.
Claims (7)
1. a kind of multidimensional distance cluster method for detecting abnormality based on time series, which is characterized in that comprise the steps of
Step 1: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple attributes to lack
Mistake value, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then according to
Required feature focuses on new table from track data, to reach standard data format;
Step 2: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vi,θi,ti), loni, latiFor
The longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiBelieve for the timestamp of the tracing point
Breath;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A, B)=
Max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, calculate any two
Multidimensional Hausdorff distance between track;
Step 3: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance;The hierarchical clustering algorithm in machine learning is selected to be based on upper
It states similarity matrix and carries out hierarchical clustering;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generate the cluster with hierarchical structure
Figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Step 5: the abnormality detection effect of detection algorithm, construct has abnormal track on speed, direction, longitude, latitude,
Abnormal track and normal trace are clustered by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 value
To evaluate clustering algorithm.
2. the multidimensional distance cluster method for detecting abnormality according to claim 1 based on time series, which is characterized in that In
In step 2:
Define position feature: posdis (ai,bi)=dist (ai, (bi, bi-1, bi+1)) indicate two tracks on two o'clock longitude and latitude
Distance is spent, the distance between given two points are calculated using Haversine formula, specific as follows:
Given two o'clockLongitude and latitude distance are as follows:
Wherein:
Haversin (θ)=sin2(θ/2)=(1-cos (θ))/2
R is earth radius, can be averaged 6371km;ω1, ω2Indicate the longitude of two o'clock;Indicate the latitude of two o'clock;Δλ
Indicate the difference of two o'clock longitude;
Define velocity characteristic: spedis (ai,bi)=dist (Vai, (Vbi, Vbi-1, Vbi+1)) indicate on two tracks between two o'clock
Speed Euclidean distance, the resolution of velocity of point is vertical speed v*sin θ, horizontal velocity v*cos θ;
Define direction character: angdis (ai,bi)=dist (θai,(θbi,θbi-1,θbi+1)) indicate two tracks in internal direction
Change degree has been reacted the fluctuation situation of track, has been indicated using absolute value distance, specific as follows:
The angle value θ of given two o'clock1、θ2:
When | θ1-θ2| when≤180, the absolute value distance in direction is | θ1-θ2|;
When | θ1-θ2| when > 180, the absolute value distance in direction is 360- (θ1、θ2)max+(θ1、θ2)min;
That is in summary formula:
TMFD(ai,bi)=ωp×posdis+ωv×spedis+ωθ× angdis formula (2)
Wherein, ωp+ωv+ωθ=1, and ωp≥0,ωv≥0,ωθ>=0, ωp、ωv、ωθRespectively indicate position feature, speed
Feature, the weight factor of direction character, can according to the difference of application scenarios, can appropriate adjustment weight selection;
Tracing point is to matching: when calculating the minimum range between two tracks, arbitrary point a in the A of trackiIt is only corresponding with the B of track
Moment point biAnd the adjacent two o'clock in front and back compares.
3. the multidimensional distance cluster method for detecting abnormality according to claim 2 based on time series, which is characterized in that In
Step 3 specifically:
Track data collection is calculated between any two tracks using the multidimensional characteristic distance method based on time series
Multidimensional similarity distance h (TrA, TrB), and then construct the similarity matrix R calculated between track, it may be assumed that
Wherein, rijIndicate the similarity distance between i-th track and j-th strip track, the elements in a main diagonal 0 indicate track itself with
The similarity distance itself compared.
4. a kind of multidimensional distance cluster abnormality detection system based on time series characterized by comprising
Preprocessing module: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple attributes to lack
Mistake value, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then according to
Required feature focuses on new table from track data, to reach standard data format;
Similarity calculation module: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vi,θi,ti), loni, latiFor
The longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiBelieve for the timestamp of the tracing point
Breath;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A, B)=
Max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, calculate any two
Multidimensional Hausdorff distance between track;
Constructing module: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Hierarchical clustering module: the hierarchical clustering algorithm of multidimensional hausdorff distance;The hierarchical clustering in machine learning is selected to calculate
Method carries out hierarchical clustering based on above-mentioned similarity matrix;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generate the cluster with hierarchical structure
Figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Detection module: the abnormality detection effect of detection algorithm, construct has abnormal rail on speed, direction, longitude, latitude
Abnormal track and normal trace are clustered by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 by mark
Value evaluates clustering algorithm.
5. a kind of computer journey for realizing the distance cluster method for detecting abnormality of the multidimensional described in claim 1 based on time series
Sequence.
6. at a kind of information data for realizing the distance cluster method for detecting abnormality of the multidimensional described in claim 1 based on time series
Manage terminal.
7. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed
Benefit require 1 described in based on time series multidimensional distance cluster method for detecting abnormality.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910783824.3A CN110490264A (en) | 2019-08-23 | 2019-08-23 | Multidimensional distance cluster method for detecting abnormality and system based on time series |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910783824.3A CN110490264A (en) | 2019-08-23 | 2019-08-23 | Multidimensional distance cluster method for detecting abnormality and system based on time series |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110490264A true CN110490264A (en) | 2019-11-22 |
Family
ID=68553212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910783824.3A Pending CN110490264A (en) | 2019-08-23 | 2019-08-23 | Multidimensional distance cluster method for detecting abnormality and system based on time series |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110490264A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259966A (en) * | 2020-01-17 | 2020-06-09 | 青梧桐有限责任公司 | Method and system for identifying homonymous cell with multi-feature fusion |
CN111275096A (en) * | 2020-01-17 | 2020-06-12 | 青梧桐有限责任公司 | Homonymous cell identification method and system based on image identification |
CN111506627A (en) * | 2020-04-21 | 2020-08-07 | 成都路行通信息技术有限公司 | Target behavior clustering method and system |
CN111552754A (en) * | 2020-04-24 | 2020-08-18 | 中国科学院空天信息创新研究院 | Ship track similarity measurement method and system |
CN111783738A (en) * | 2020-07-29 | 2020-10-16 | 中国人民解放军国防科技大学 | Abnormal motion trajectory detection method for communication radiation source |
CN111882873A (en) * | 2020-07-22 | 2020-11-03 | 平安国际智慧城市科技股份有限公司 | Track anomaly detection method, device, equipment and medium |
CN112230253A (en) * | 2020-10-13 | 2021-01-15 | 电子科技大学 | Track characteristic anomaly detection method based on public slice subsequence |
CN113361786A (en) * | 2021-06-10 | 2021-09-07 | 国网江苏省电力有限公司南通供电分公司 | Intelligent planning method for power line fusing multi-source multi-dimensional heterogeneous big data |
CN114529311A (en) * | 2022-02-16 | 2022-05-24 | 安徽肇立科技有限公司 | Route track matching method based on positioning curve similarity |
CN115356013A (en) * | 2022-08-15 | 2022-11-18 | 桂林师范高等专科学校 | Reflow soldering temperature curve abnormity detection method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855638A (en) * | 2012-08-13 | 2013-01-02 | 苏州大学 | Detection method for abnormal behavior of vehicle based on spectrum clustering |
CN103605362A (en) * | 2013-09-11 | 2014-02-26 | 天津工业大学 | Learning and anomaly detection method based on multi-feature motion modes of vehicle traces |
CN105825242A (en) * | 2016-05-06 | 2016-08-03 | 南京大学 | Cluster communication terminal track real time anomaly detection method and system based on hybrid grid hierarchical clustering |
-
2019
- 2019-08-23 CN CN201910783824.3A patent/CN110490264A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102855638A (en) * | 2012-08-13 | 2013-01-02 | 苏州大学 | Detection method for abnormal behavior of vehicle based on spectrum clustering |
CN103605362A (en) * | 2013-09-11 | 2014-02-26 | 天津工业大学 | Learning and anomaly detection method based on multi-feature motion modes of vehicle traces |
CN105825242A (en) * | 2016-05-06 | 2016-08-03 | 南京大学 | Cluster communication terminal track real time anomaly detection method and system based on hybrid grid hierarchical clustering |
Non-Patent Citations (2)
Title |
---|
张晓滨,杨东山: "基于时间约束的Hausdorff距离的时空轨迹相似度量", 《计算机应用研究》 * |
潘新龙 等: "基于多维航迹特征的异常行为检测方法", 《航空学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259966A (en) * | 2020-01-17 | 2020-06-09 | 青梧桐有限责任公司 | Method and system for identifying homonymous cell with multi-feature fusion |
CN111275096A (en) * | 2020-01-17 | 2020-06-12 | 青梧桐有限责任公司 | Homonymous cell identification method and system based on image identification |
CN111506627A (en) * | 2020-04-21 | 2020-08-07 | 成都路行通信息技术有限公司 | Target behavior clustering method and system |
CN111552754A (en) * | 2020-04-24 | 2020-08-18 | 中国科学院空天信息创新研究院 | Ship track similarity measurement method and system |
CN111882873A (en) * | 2020-07-22 | 2020-11-03 | 平安国际智慧城市科技股份有限公司 | Track anomaly detection method, device, equipment and medium |
CN111882873B (en) * | 2020-07-22 | 2022-01-28 | 平安国际智慧城市科技股份有限公司 | Track anomaly detection method, device, equipment and medium |
CN111783738A (en) * | 2020-07-29 | 2020-10-16 | 中国人民解放军国防科技大学 | Abnormal motion trajectory detection method for communication radiation source |
CN112230253A (en) * | 2020-10-13 | 2021-01-15 | 电子科技大学 | Track characteristic anomaly detection method based on public slice subsequence |
CN113361786A (en) * | 2021-06-10 | 2021-09-07 | 国网江苏省电力有限公司南通供电分公司 | Intelligent planning method for power line fusing multi-source multi-dimensional heterogeneous big data |
CN113361786B (en) * | 2021-06-10 | 2022-08-19 | 国网江苏省电力有限公司南通供电分公司 | Intelligent planning method for power line fusing multi-source multi-dimensional heterogeneous big data |
CN114529311A (en) * | 2022-02-16 | 2022-05-24 | 安徽肇立科技有限公司 | Route track matching method based on positioning curve similarity |
CN115356013A (en) * | 2022-08-15 | 2022-11-18 | 桂林师范高等专科学校 | Reflow soldering temperature curve abnormity detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110490264A (en) | Multidimensional distance cluster method for detecting abnormality and system based on time series | |
Zheng | Trajectory data mining: an overview | |
CN110188093A (en) | A kind of data digging system being directed to AIS information source based on big data platform | |
Karagiorgou et al. | On vehicle tracking data-based road network generation | |
Yang et al. | Generating hierarchical strokes from urban street networks based on spatial pattern recognition | |
CN103196430B (en) | Based on the flight path of unmanned plane and the mapping navigation method and system of visual information | |
Fu et al. | Finding abnormal vessel trajectories using feature learning | |
CN105206057B (en) | Detection method and system based on Floating Car resident trip hot spot region | |
CN105893621A (en) | Method for mining target behavior law based on multi-dimensional track clustering | |
JP2019212291A (en) | Indoor positioning system and method based on geomagnetic signals in combination with computer vision | |
CN103575279B (en) | Based on Data Association and the system of fuzzy message | |
CN106055885A (en) | Anomaly detection method of flight data of unmanned aerial vehicle based on over-sampling projection approximation basis pursuit | |
WO2015049340A1 (en) | Marker based activity transition models | |
Wang et al. | Indoor tracking by rfid fusion with IMU data | |
Jiang et al. | Vision-guided unmanned aerial system for rapid multiple-type damage detection and localization | |
Minnikhanov et al. | Detection of traffic anomalies for a safety system of smart city | |
Huang et al. | Research on Real‐Time Anomaly Detection of Fishing Vessels in a Marine Edge Computing Environment | |
Tan et al. | Implicit multimodal crowdsourcing for joint RF and geomagnetic fingerprinting | |
Cheng et al. | Moving Target Detection Technology Based on UAV Vision | |
CN110135451A (en) | A kind of track clustering method arriving line-segment sets distance based on point | |
Jiang et al. | Behavior pattern mining based on spatiotemporal trajectory multidimensional information fusion | |
CN110909037B (en) | Frequent track mode mining method and device | |
Li et al. | Driving performances assessment based on speed variation using dedicated route truck GPS data | |
Ding et al. | Anomaly detection in large-scale trajectories using hybrid grid-based hierarchical clustering | |
Zhao et al. | Towards long‐term UAV object tracking via effective feature matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191122 |