CN110490264A - Multidimensional distance cluster method for detecting abnormality and system based on time series - Google Patents

Multidimensional distance cluster method for detecting abnormality and system based on time series Download PDF

Info

Publication number
CN110490264A
CN110490264A CN201910783824.3A CN201910783824A CN110490264A CN 110490264 A CN110490264 A CN 110490264A CN 201910783824 A CN201910783824 A CN 201910783824A CN 110490264 A CN110490264 A CN 110490264A
Authority
CN
China
Prior art keywords
track
distance
multidimensional
data
longitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910783824.3A
Other languages
Chinese (zh)
Inventor
丁建立
黄天镜
王静
王怀超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation University of China
Original Assignee
Civil Aviation University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation University of China filed Critical Civil Aviation University of China
Priority to CN201910783824.3A priority Critical patent/CN110490264A/en
Publication of CN110490264A publication Critical patent/CN110490264A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention discloses a kind of, and the multidimensional distance based on time series clusters method for detecting abnormality and system, belong to aviation safety technical field, the multidimensional distance cluster method for detecting abnormality based on time series comprises the steps of step 1: pre-processing to track data collection, the pretreatment includes cleaning and integrates again;Step 2: multidimensional similarity between track is calculated;Step 3: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance;It selects the hierarchical clustering algorithm in machine learning to be based on above-mentioned similarity matrix and carries out hierarchical clustering;Step 5: the abnormality detection effect of detection algorithm, construct has abnormal track on speed, direction, longitude, latitude, abnormal track and normal trace are clustered by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 value to evaluate clustering algorithm.

Description

Multidimensional distance cluster method for detecting abnormality and system based on time series
Technical field
The invention belongs to aviation safety technical fields, cluster more particularly to a kind of multidimensional distance based on time series different Normal detection method and system.
Background technique
As transportation, GPS positioning, target acquisition technology are fast-developing, more and more track datas are applied Into experimental study.The trajectory clustering analysis of mobile object is monitored in traffic control, weather monitoring, intelligent navigation, anti-terrorism etc. Field suffers from increasingly extensive and important application, and by analyzing these data, people can capture the movement of mobile object Feature, while decision can be provided to the construction of social public infrastructure.In recent years, trajectory data mining research has become rail The hot spot of mark data mining research field, including;Trajectory clustering, adjoint mode excavation, Frequent Pattern Mining and exception Track detection etc..Abnormal track detection refers to the object concentrated from track data and find out substantial deviation normal mode, it is track It is abnormal to be widely used in taxi fraud, flight monitoring, hurricane track variation etc. for one important branch of the field of data mining Activity recognition.
Flight safety is the minimum requirements of Civil Aviation Industry, is the minimum support mission of civil aviaton worker.The motion stabilization of aircraft Property and mobility are very important flight safety.It is unstable or cause into stall that excessive cause once occurred both at home and abroad Machine out of control ruins accident, and the reason for causing aircraft flight unstable is diversified.In recent years simultaneously, terroristic organization is increasingly ferocious Rampant, the attack of terrorism causes to seriously affect safely in rapid succession, to aerodrome flight.
In order to ensure Flight Safety, need to store and analyze relevant space-time trajectory data of largely flying.Civil aviaton Flight space-time trajectory data contains a variety of attributes such as latitude and longitude coordinates, record time, flying height, flying speed, course.The people There is important influence to flight safety and flight efficiency with the aircraft accurate track that can repeat to fly, from the flight of civil aircraft Track is set out, and has studied different aircrafts according to the mission program of instrument.In practical flight, civil aircraft flies generally according to standard Line program deploys flight by the commander of terrestrial air traffic control personnel.But in special circumstances, it may appear that actual flight path is inclined The case where from standardization program, can be sent out by the abnormality detection to flight path data from the track data set of practical flight The track for deviateing normal flight mode is excavated, ensures that aircraft flies according to normal trace, it is ensured that flight safety.
Summary of the invention
The technical problem to be solved by the present invention is to the unusual checking technologies for current trajectory analysis to detect position Based on information, the track order and kinetic characteristic of motion profile are had ignored.It is special to propose a kind of multidimensional based on time series Sign cluster method for detecting abnormality and system, to improve the accuracy of track data abnormality detection technology, by extracting track data Middle longitude, latitude, speed, direction multidimensional characteristic calculate track using Hausdorff distance using the manner of comparison of a pair three Between multidimensional distance (similarity), construct the similarity matrix between track, and different in binding hierarchy clustering method detection track Chang Hangwei.
In order to solve the above-mentioned technical problem, the technical solution of the present invention is as follows:
The first invention purpose of this patent is to provide a kind of multidimensional distance cluster method for detecting abnormality based on time series, It comprises the steps of
Step 1: data prediction pre-processes track data collection, mainly include data are carried out cleaning and Two parts are integrated again.
Obvious abnormal data are handled using regular expression.For the data of missing values, if certain data has multiple categories Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value.And then According to required feature, time, speed, direction, longitude, latitude feature are extracted to new table, to reach from track data concentration Standard data format.
Step 2: multidimensional similarity between track is calculated.Track data is represented by TR={ P1,P2,…Pi,…,Pn, Middle Pi=(loni,lati,vii,ti), loni, latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor The direction of tracing point, tiFor the timestamp information of the tracing point.Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data.According to Hausdorff distance definition H (A, B)=max (h (A, B), h (B, A)),Wherein, h (A, B) is known as the unidirectional Hausdorff distance from set A to set B.The present invention In, speed, direction, longitude, latitude are merged in Hausdorff range formula, the multidimensional between any two tracks is calculated Hausdorff distance.
It is specific as follows:
(1) position feature: posdis (ai,bi)=dist (ai, (bi, bi-1, bi+1)) indicate two o'clock on two tracks Longitude and latitude distance.The distance between given two points are calculated in the present invention using Haversine formula, it is specific as follows: given Two o'clockLongitude and latitude distance are as follows:
Wherein:
Haversin (θ)=sin2(θ/2)=(1-cos (θ))/2
R is earth radius, can be averaged 6371km;ω1, ω2Indicate the longitude of two o'clock;Indicate the latitude of two o'clock Degree;The difference of Δ λ expression two o'clock longitude.
(2) velocity characteristic:Indicate the speed on two tracks between two o'clock Euclidean distance, the resolution of velocity of point are vertical speed v*sin θ, horizontal velocity v*cos θ.
(3) direction character:Indicate that two tracks change journey in internal direction Degree, has been reacted the fluctuation situation of track, has been indicated using absolute value distance, specific as follows:
The angle value θ of given two o'clock1、θ2:
When | θ12| when≤180, the absolute value distance in direction is | θ12|;
When | θ12| when > 180, the absolute value distance in direction is 360- (θ1、θ2)max+(θ1、θ2)min
That is in summary formula:
TMFD(ai,bi)=ωp×posdis+ωv×spedis+ωθ× angdis formula (2)
Wherein, ωpvθ=1, andRespectively indicate position feature, velocity characteristic, direction character Weight factor, can according to the difference of application scenarios, can appropriate adjustment weight selection.
Tracing point is to matching: when calculating the minimum range between two tracks, arbitrary point a in the A of trackiOnly and in the B of track Corresponding moment point biAnd the adjacent two o'clock in front and back compares.
Step 3: any two are calculated using the multidimensional characteristic distance method based on time series to track data collection Multidimensional similarity distance h (TrA, TrB) between track, and then construct the similarity matrix R calculated between track, it may be assumed that
Wherein, rijIndicate the similarity distance between i-th track and j-th strip track.The elements in a main diagonal 0 indicates track certainly Similarity distance of the body compared with itself.
Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance.That is, the hierarchical clustering in selection machine learning is calculated Method is based on similarity matrix in step 3 and carries out hierarchical clustering to track data collection.Table 1 is the multidimensional in conjunction with track data The hierarchical clustering algorithm of hausdorff distance.
Step 5: for the abnormality detection effect of detection algorithm, constructing has exception on speed, direction, longitude, latitude Track.The specific abnormal track of construction is as follows:
Velocity shifts: it is concentrated from normal data and extracts 5 tracks, its speed is become 1.5 times of normal speed.
Direction offset: it is concentrated from normal data and extracts 5 tracks, its direction is become to the opposite direction of normal direction.
Positional shift: it in conjunction with the X-Y scheme of flight path, is concentrated from normal data and extracts 5 tracks, modification tracing point makes It deviates normal flight track threshold value.
The abnormal track of construction is clustered with normal trace by above-mentioned hierarchical clustering algorithm, and selects accuracy (Accuracy), accurate rate (Precision), recall rate (Recall), F1 value (F1-score) evaluate multidimensional hausdorff The hierarchical clustering algorithm of distance.
Second goal of the invention of this patent is to provide a kind of multidimensional distance cluster abnormality detection system based on time series, Include:
Preprocessing module: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple categories Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then According to required feature, new table is focused on from track data, to reach standard data format;
Similarity calculation module: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vii,ti), loni, latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiFor the tracing point when Between stab information;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A, B)=max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, are calculated and are appointed The multidimensional Hausdorff distance anticipated between two tracks;
Constructing module: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Hierarchical clustering module: the hierarchical clustering algorithm of multidimensional hausdorff distance;Select the hierarchical clustering in machine learning Algorithm carries out hierarchical clustering based on above-mentioned similarity matrix;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generating has the poly- of hierarchical structure Class figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Detection module: the abnormality detection effect of detection algorithm, construct have on speed, direction, longitude, latitude it is abnormal Track is clustered abnormal track and normal trace by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 value evaluates clustering algorithm.
The third goal of the invention of this patent, which is to provide, a kind of realizes that the above-mentioned multidimensional distance cluster based on time series is abnormal The computer program of detection method.
4th goal of the invention of this patent, which is to provide, a kind of realizes that the above-mentioned multidimensional distance cluster based on time series is abnormal The information data processing terminal of detection method.
5th goal of the invention of this patent is to provide a kind of computer readable storage medium, including instruction, when it is being calculated When being run on machine, so that computer executes the above-mentioned multidimensional distance cluster method for detecting abnormality based on time series.
Advantages of the present invention and good effect are as follows:
The present invention is detected for the unusual checking technology of current trajectory analysis based on location information, has ignored movement The track order and kinetic characteristic of track.A kind of multidimensional characteristic method for detecting abnormality based on time series is proposed, to mention The accuracy of high track data abnormality detection technology is used by extracting longitude, latitude, speed, directional information in track data The manner of comparison of a pair three is calculated the multi-feature similarity of track data using Hausdorff distance, constructs the phase between track Like property matrix, and binding hierarchy clustering method detects the abnormal behaviour in track.The present invention passes through the multidimensional characteristic for incorporating track, Improve the susceptibility to abnormal data.
For deficiency existing for existing track method for measuring similarity, the present invention is based on the multidimensional of time series Hausdorff distance, on the basis of considering track movement order and tracing point continuity Characteristics inherently, from position, Speed, these three aspects of course calculate the similarity of track, while being directed to " tracing point is orderly " this feature, use a pair Three comparative approach reduces the number of comparisons between tracing point, reduces computation complexity.In conjunction with the level in machine learning Clustering algorithm more intuitively distinguishes normal abnormal track data with dendrogram.Increase tracing point multidimensional characteristic improve it is winged The accuracy of the unusual checking of row track data.In practical applications, the track that notes abnormalities goes out for searching aircarrier aircraft Existing failure and loophole have important reference significance.
Track data collection by pre-processing, is extracted time, speed, direction, longitude, longitude attribute, forms mark by the present invention Quasi- data format;Track multidimensional characteristic similarity is calculated using Hausdorff distance;And construct the similitude calculated between track Matrix, the hierarchical clustering algorithm in reselection machine learning are based on the similarity matrix and carry out hierarchical clustering, and generating has level The dendrogram of structure.The present invention improves the susceptibility to abnormal data, helps to detect the exception information between track.
Detailed description of the invention
Fig. 1 is one-to-many matching figure in Hausdorff;
Fig. 2 is three matching figure of a pair in Hausdorff.
Specific embodiment
In order to further understand the content, features and effects of the present invention, the following examples are hereby given, and cooperate attached drawing Detailed description are as follows.
Structure of the invention is explained in detail with reference to the accompanying drawing.
A kind of multidimensional distance cluster method for detecting abnormality based on time series, comprises the steps of
Step 1: data prediction pre-processes track data collection, mainly include data are carried out cleaning and Two parts are integrated again.
Obvious abnormal data are handled using regular expression.For the data of missing values, if certain data has multiple categories Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value.And then According to required feature, time, speed, direction, longitude, latitude feature are extracted to new table, to reach from track data concentration Standard data format.
Step 2: multidimensional similarity between track is calculated.Track data is represented by TR={ P1,P2,…Pi,…,Pn, Middle Pi=(loni,lati,vii,ti), loni, latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor The direction of tracing point, tiFor the timestamp information of the tracing point.Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data.According to Hausdorff distance definition H (A, B)=max (h (A, B), h (B, A)),Wherein, h (A, B) is known as the unidirectional Hausdorff distance from set A to set B.The present invention In, speed, direction, longitude, latitude are merged in Hausdorff range formula, the multidimensional between any two tracks is calculated Hausdorff distance.
It is specific as follows:
(4) position feature: posdis (ai,bi)=dist (ai, (bi, bi-1, bi+1)) indicate two o'clock on two tracks Longitude and latitude distance.The distance between given two points are calculated in the present invention using Haversine formula, it is specific as follows: given Two o'clockLongitude and latitude distance are as follows:
Wherein:
Haversin (θ)=sin2(θ/2)=(1-cos (θ))/2
R is earth radius, can be averaged 6371km;ω1, ω2Indicate the longitude of two o'clock;Indicate the latitude of two o'clock Degree;The difference of Δ λ expression two o'clock longitude.
(5) velocity characteristic:Indicate the speed on two tracks between two o'clock Euclidean distance, the resolution of velocity of point are vertical speed v*sin θ, horizontal velocity v*cos θ.
(6) direction character:Indicate that two tracks change journey in internal direction Degree, has been reacted the fluctuation situation of track, has been indicated using absolute value distance, specific as follows:
The angle value θ of given two o'clock1、θ2:
When | θ12| when≤180, the absolute value distance in direction is | θ12|;
When | θ12| when > 180, the absolute value distance in direction is 360- (θ1、θ2)max+(θ1、θ2)min
That is in summary formula:
TMFD(ai,bi)=ωp×posdis+ωv×spedis+ωθ× angdis formula (2)
Wherein, ωpvθ=1, andRespectively indicate position feature, velocity characteristic, direction character Weight factor, can according to the difference of application scenarios, can appropriate adjustment weight selection.
Tracing point is to matching: when calculating the minimum range between two tracks, arbitrary point a in the A of trackiOnly and in the B of track Corresponding moment point biAnd the adjacent two o'clock in front and back compares.
Point in Fig. 1 is to matching to calculate one-to-many matching method used in hausdorff, and the present invention is on its basis Its number of matches is improved, calculation amount is reduced.Point in Fig. 2 matches matching to calculate a pair three used in hausdorff Method,
Step 3: any two are calculated using the multidimensional characteristic distance method based on time series to track data collection Multidimensional similarity distance h (TrA, TrB) between track, and then construct the similarity matrix R calculated between track, it may be assumed that
Wherein, rijIndicate the similarity distance between i-th track and j-th strip track.The elements in a main diagonal 0 indicates track certainly Similarity distance of the body compared with itself.
Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance.That is, the hierarchical clustering in selection machine learning is calculated Method is based on similarity matrix in step 3 and carries out hierarchical clustering to track data collection.Table 1 is the multidimensional in conjunction with track data The hierarchical clustering algorithm of hausdorff distance.
The hierarchical clustering algorithm of 1 multidimensional hausdorff distance of table
Step 5: for the abnormality detection effect of detection algorithm, constructing has exception on speed, direction, longitude, latitude Track.The specific abnormal track of construction is as follows:
Velocity shifts: it is concentrated from normal data and extracts 5 tracks, its speed is become 1.5 times of normal speed.
Direction offset: it is concentrated from normal data and extracts 5 tracks, its direction is become to the opposite direction of normal direction.
Positional shift: it in conjunction with the X-Y scheme of flight path, is concentrated from normal data and extracts 5 tracks, modification tracing point makes It deviates normal flight track threshold value.
The abnormal track of construction is clustered with normal trace by above-mentioned hierarchical clustering algorithm, and selects accuracy (Accuracy), accurate rate (Precision), recall rate (Recall), F1 value (F1-score) evaluate multidimensional hausdorff The hierarchical clustering algorithm of distance.
A kind of multidimensional distance cluster abnormality detection system based on time series, comprising:
Preprocessing module: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple categories Property missing values, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then According to required feature, new table is focused on from track data, to reach standard data format;
Similarity calculation module: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vii,ti), loni, latiFor the longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiFor the tracing point when Between stab information;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A, B)=max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, are calculated and are appointed The multidimensional Hausdorff distance anticipated between two tracks;
Constructing module: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Hierarchical clustering module: the hierarchical clustering algorithm of multidimensional hausdorff distance;Select the hierarchical clustering in machine learning Algorithm carries out hierarchical clustering based on above-mentioned similarity matrix;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generating has the poly- of hierarchical structure Class figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Detection module: the abnormality detection effect of detection algorithm, construct have on speed, direction, longitude, latitude it is abnormal Track is clustered abnormal track and normal trace by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 value evaluates clustering algorithm.
It is a kind of to realize that the distance of the multidimensional in above-mentioned first preferred embodiment based on time series clusters method for detecting abnormality Computer program.
A kind of information for realizing the distance cluster method for detecting abnormality of the multidimensional in first preferred embodiment based on time series Data processing terminal.
A kind of computer readable storage medium, including instruction, when run on a computer, so that computer executes the The multidimensional distance cluster method for detecting abnormality based on time series in one preferred embodiment.
A kind of multidimensional distance cluster method for detecting abnormality based on time series, is embodied in two parts, a part extracts Track characteristic calculates similarity between track using Hausdorff distance calculation formula;Another part utilizes hierarchical clustering algorithm, Track collection is clustered, the abnormal behaviour between track is detected.Specific manifestation are as follows: when being extracted first from track data concentration Between, speed, direction, longitude, Position Latitude and motion information, for any two tracks, tracing point is matched according to a pair three, Secondly using multidimensional distance (similarity) between Hausdorff distance calculating track, and the similitude square calculated between track is constructed Gust, the hierarchical clustering algorithm in reselection machine learning is based on the similarity matrix and carries out hierarchical clustering, and generating has level knot The dendrogram of structure.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one Computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from one A web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL) Or wireless (such as infrared, wireless, microwave etc.) mode is carried out to another web-site, computer, server or data center Transmission).The computer-readable storage medium can be any usable medium or include one that computer can access The data storage devices such as a or multiple usable mediums integrated server, data center.The usable medium can be magnetic Jie Matter, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
The above is only the preferred embodiments of the present invention, and is not intended to limit the present invention in any form, Any simple modification made to the above embodiment according to the technical essence of the invention, equivalent variations and modification, belong to In the range of technical solution of the present invention.

Claims (7)

1. a kind of multidimensional distance cluster method for detecting abnormality based on time series, which is characterized in that comprise the steps of
Step 1: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple attributes to lack Mistake value, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then according to Required feature focuses on new table from track data, to reach standard data format;
Step 2: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vii,ti), loni, latiFor The longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiBelieve for the timestamp of the tracing point Breath;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A, B)= Max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, calculate any two Multidimensional Hausdorff distance between track;
Step 3: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Step 4: the hierarchical clustering algorithm of multidimensional hausdorff distance;The hierarchical clustering algorithm in machine learning is selected to be based on upper It states similarity matrix and carries out hierarchical clustering;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generate the cluster with hierarchical structure Figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Step 5: the abnormality detection effect of detection algorithm, construct has abnormal track on speed, direction, longitude, latitude, Abnormal track and normal trace are clustered by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 value To evaluate clustering algorithm.
2. the multidimensional distance cluster method for detecting abnormality according to claim 1 based on time series, which is characterized in that In In step 2:
Define position feature: posdis (ai,bi)=dist (ai, (bi, bi-1, bi+1)) indicate two tracks on two o'clock longitude and latitude Distance is spent, the distance between given two points are calculated using Haversine formula, specific as follows:
Given two o'clockLongitude and latitude distance are as follows:
Wherein:
Haversin (θ)=sin2(θ/2)=(1-cos (θ))/2
R is earth radius, can be averaged 6371km;ω1, ω2Indicate the longitude of two o'clock;Indicate the latitude of two o'clock;Δλ Indicate the difference of two o'clock longitude;
Define velocity characteristic: spedis (ai,bi)=dist (Vai, (Vbi, Vbi-1, Vbi+1)) indicate on two tracks between two o'clock Speed Euclidean distance, the resolution of velocity of point is vertical speed v*sin θ, horizontal velocity v*cos θ;
Define direction character: angdis (ai,bi)=dist (θai,(θbibi-1bi+1)) indicate two tracks in internal direction Change degree has been reacted the fluctuation situation of track, has been indicated using absolute value distance, specific as follows:
The angle value θ of given two o'clock1、θ2:
When | θ12| when≤180, the absolute value distance in direction is | θ12|;
When | θ12| when > 180, the absolute value distance in direction is 360- (θ1、θ2)max+(θ1、θ2)min
That is in summary formula:
TMFD(ai,bi)=ωp×posdis+ωv×spedis+ωθ× angdis formula (2)
Wherein, ωpvθ=1, and ωp≥0,ωv≥0,ωθ>=0, ωp、ωv、ωθRespectively indicate position feature, speed Feature, the weight factor of direction character, can according to the difference of application scenarios, can appropriate adjustment weight selection;
Tracing point is to matching: when calculating the minimum range between two tracks, arbitrary point a in the A of trackiIt is only corresponding with the B of track Moment point biAnd the adjacent two o'clock in front and back compares.
3. the multidimensional distance cluster method for detecting abnormality according to claim 2 based on time series, which is characterized in that In Step 3 specifically:
Track data collection is calculated between any two tracks using the multidimensional characteristic distance method based on time series Multidimensional similarity distance h (TrA, TrB), and then construct the similarity matrix R calculated between track, it may be assumed that
Wherein, rijIndicate the similarity distance between i-th track and j-th strip track, the elements in a main diagonal 0 indicate track itself with The similarity distance itself compared.
4. a kind of multidimensional distance cluster abnormality detection system based on time series characterized by comprising
Preprocessing module: pre-processing track data collection, and the pretreatment includes cleaning and integrates again;Specifically:
Obvious abnormal data are handled using regular expression, for the data of missing values, if certain data there are multiple attributes to lack Mistake value, selection directly delete the tuple, for the missing of data out of the ordinary, then carry out polishing data using average value;And then according to Required feature focuses on new table from track data, to reach standard data format;
Similarity calculation module: multidimensional similarity between track is calculated;Specifically:
Track data is expressed as TR={ P1,P2,…Pi,…,Pn, wherein Pi=(loni,lati,vii,ti), loni, latiFor The longitude and latitude value of tracing point, viFor the speed of tracing point, θiFor the direction of tracing point, tiBelieve for the timestamp of the tracing point Breath;Track collection is combined into T={ TR1,TR2,…,TRi,…,TRn, wherein TRiIndicate i-th track data;According to H (A, B)= Max (h (A, B), h (B, A)), speed, direction, longitude, latitude are merged in Hausdorff range formula, calculate any two Multidimensional Hausdorff distance between track;
Constructing module: for above-mentioned multidimensional Hausdorff distance, similarity matrix between track is constructed;
Hierarchical clustering module: the hierarchical clustering algorithm of multidimensional hausdorff distance;The hierarchical clustering in machine learning is selected to calculate Method carries out hierarchical clustering based on above-mentioned similarity matrix;Specifically:
N class is constructed according to n track data first, the podium level of every one kind is 0;
Secondly two nearest classes of combined distance are new class, modify podium level;
Calculate again new class with it is current it is all kinds of at a distance from, if the number of class has equalized 1, generate the cluster with hierarchical structure Figure, otherwise continue merge class, and calculate new class and it is all kinds of at a distance from, until end;
Detection module: the abnormality detection effect of detection algorithm, construct has abnormal rail on speed, direction, longitude, latitude Abnormal track and normal trace are clustered by above-mentioned hierarchical clustering algorithm, and select accuracy, accurate rate, recall rate, F1 by mark Value evaluates clustering algorithm.
5. a kind of computer journey for realizing the distance cluster method for detecting abnormality of the multidimensional described in claim 1 based on time series Sequence.
6. at a kind of information data for realizing the distance cluster method for detecting abnormality of the multidimensional described in claim 1 based on time series Manage terminal.
7. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed Benefit require 1 described in based on time series multidimensional distance cluster method for detecting abnormality.
CN201910783824.3A 2019-08-23 2019-08-23 Multidimensional distance cluster method for detecting abnormality and system based on time series Pending CN110490264A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910783824.3A CN110490264A (en) 2019-08-23 2019-08-23 Multidimensional distance cluster method for detecting abnormality and system based on time series

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910783824.3A CN110490264A (en) 2019-08-23 2019-08-23 Multidimensional distance cluster method for detecting abnormality and system based on time series

Publications (1)

Publication Number Publication Date
CN110490264A true CN110490264A (en) 2019-11-22

Family

ID=68553212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910783824.3A Pending CN110490264A (en) 2019-08-23 2019-08-23 Multidimensional distance cluster method for detecting abnormality and system based on time series

Country Status (1)

Country Link
CN (1) CN110490264A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259966A (en) * 2020-01-17 2020-06-09 青梧桐有限责任公司 Method and system for identifying homonymous cell with multi-feature fusion
CN111275096A (en) * 2020-01-17 2020-06-12 青梧桐有限责任公司 Homonymous cell identification method and system based on image identification
CN111506627A (en) * 2020-04-21 2020-08-07 成都路行通信息技术有限公司 Target behavior clustering method and system
CN111552754A (en) * 2020-04-24 2020-08-18 中国科学院空天信息创新研究院 Ship track similarity measurement method and system
CN111783738A (en) * 2020-07-29 2020-10-16 中国人民解放军国防科技大学 Abnormal motion trajectory detection method for communication radiation source
CN111882873A (en) * 2020-07-22 2020-11-03 平安国际智慧城市科技股份有限公司 Track anomaly detection method, device, equipment and medium
CN112230253A (en) * 2020-10-13 2021-01-15 电子科技大学 Track characteristic anomaly detection method based on public slice subsequence
CN113361786A (en) * 2021-06-10 2021-09-07 国网江苏省电力有限公司南通供电分公司 Intelligent planning method for power line fusing multi-source multi-dimensional heterogeneous big data
CN114529311A (en) * 2022-02-16 2022-05-24 安徽肇立科技有限公司 Route track matching method based on positioning curve similarity
CN115356013A (en) * 2022-08-15 2022-11-18 桂林师范高等专科学校 Reflow soldering temperature curve abnormity detection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855638A (en) * 2012-08-13 2013-01-02 苏州大学 Detection method for abnormal behavior of vehicle based on spectrum clustering
CN103605362A (en) * 2013-09-11 2014-02-26 天津工业大学 Learning and anomaly detection method based on multi-feature motion modes of vehicle traces
CN105825242A (en) * 2016-05-06 2016-08-03 南京大学 Cluster communication terminal track real time anomaly detection method and system based on hybrid grid hierarchical clustering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855638A (en) * 2012-08-13 2013-01-02 苏州大学 Detection method for abnormal behavior of vehicle based on spectrum clustering
CN103605362A (en) * 2013-09-11 2014-02-26 天津工业大学 Learning and anomaly detection method based on multi-feature motion modes of vehicle traces
CN105825242A (en) * 2016-05-06 2016-08-03 南京大学 Cluster communication terminal track real time anomaly detection method and system based on hybrid grid hierarchical clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张晓滨,杨东山: "基于时间约束的Hausdorff距离的时空轨迹相似度量", 《计算机应用研究》 *
潘新龙 等: "基于多维航迹特征的异常行为检测方法", 《航空学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259966A (en) * 2020-01-17 2020-06-09 青梧桐有限责任公司 Method and system for identifying homonymous cell with multi-feature fusion
CN111275096A (en) * 2020-01-17 2020-06-12 青梧桐有限责任公司 Homonymous cell identification method and system based on image identification
CN111506627A (en) * 2020-04-21 2020-08-07 成都路行通信息技术有限公司 Target behavior clustering method and system
CN111552754A (en) * 2020-04-24 2020-08-18 中国科学院空天信息创新研究院 Ship track similarity measurement method and system
CN111882873A (en) * 2020-07-22 2020-11-03 平安国际智慧城市科技股份有限公司 Track anomaly detection method, device, equipment and medium
CN111882873B (en) * 2020-07-22 2022-01-28 平安国际智慧城市科技股份有限公司 Track anomaly detection method, device, equipment and medium
CN111783738A (en) * 2020-07-29 2020-10-16 中国人民解放军国防科技大学 Abnormal motion trajectory detection method for communication radiation source
CN112230253A (en) * 2020-10-13 2021-01-15 电子科技大学 Track characteristic anomaly detection method based on public slice subsequence
CN113361786A (en) * 2021-06-10 2021-09-07 国网江苏省电力有限公司南通供电分公司 Intelligent planning method for power line fusing multi-source multi-dimensional heterogeneous big data
CN113361786B (en) * 2021-06-10 2022-08-19 国网江苏省电力有限公司南通供电分公司 Intelligent planning method for power line fusing multi-source multi-dimensional heterogeneous big data
CN114529311A (en) * 2022-02-16 2022-05-24 安徽肇立科技有限公司 Route track matching method based on positioning curve similarity
CN115356013A (en) * 2022-08-15 2022-11-18 桂林师范高等专科学校 Reflow soldering temperature curve abnormity detection method

Similar Documents

Publication Publication Date Title
CN110490264A (en) Multidimensional distance cluster method for detecting abnormality and system based on time series
Zheng Trajectory data mining: an overview
CN110188093A (en) A kind of data digging system being directed to AIS information source based on big data platform
Karagiorgou et al. On vehicle tracking data-based road network generation
Yang et al. Generating hierarchical strokes from urban street networks based on spatial pattern recognition
CN103196430B (en) Based on the flight path of unmanned plane and the mapping navigation method and system of visual information
Fu et al. Finding abnormal vessel trajectories using feature learning
CN105206057B (en) Detection method and system based on Floating Car resident trip hot spot region
CN105893621A (en) Method for mining target behavior law based on multi-dimensional track clustering
JP2019212291A (en) Indoor positioning system and method based on geomagnetic signals in combination with computer vision
CN103575279B (en) Based on Data Association and the system of fuzzy message
CN106055885A (en) Anomaly detection method of flight data of unmanned aerial vehicle based on over-sampling projection approximation basis pursuit
WO2015049340A1 (en) Marker based activity transition models
Wang et al. Indoor tracking by rfid fusion with IMU data
Jiang et al. Vision-guided unmanned aerial system for rapid multiple-type damage detection and localization
Minnikhanov et al. Detection of traffic anomalies for a safety system of smart city
Huang et al. Research on Real‐Time Anomaly Detection of Fishing Vessels in a Marine Edge Computing Environment
Tan et al. Implicit multimodal crowdsourcing for joint RF and geomagnetic fingerprinting
Cheng et al. Moving Target Detection Technology Based on UAV Vision
CN110135451A (en) A kind of track clustering method arriving line-segment sets distance based on point
Jiang et al. Behavior pattern mining based on spatiotemporal trajectory multidimensional information fusion
CN110909037B (en) Frequent track mode mining method and device
Li et al. Driving performances assessment based on speed variation using dedicated route truck GPS data
Ding et al. Anomaly detection in large-scale trajectories using hybrid grid-based hierarchical clustering
Zhao et al. Towards long‐term UAV object tracking via effective feature matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191122