CN111125189A - Track similarity measurement method based on weighted real number cost edit distance - Google Patents

Track similarity measurement method based on weighted real number cost edit distance Download PDF

Info

Publication number
CN111125189A
CN111125189A CN201911272820.5A CN201911272820A CN111125189A CN 111125189 A CN111125189 A CN 111125189A CN 201911272820 A CN201911272820 A CN 201911272820A CN 111125189 A CN111125189 A CN 111125189A
Authority
CN
China
Prior art keywords
track
cost
weighted
sequence
real number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911272820.5A
Other languages
Chinese (zh)
Other versions
CN111125189B (en
Inventor
陈兴蜀
蒋术语
王海舟
王文贤
殷明勇
唐瑞
蒋梦婷
李春辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201911272820.5A priority Critical patent/CN111125189B/en
Publication of CN111125189A publication Critical patent/CN111125189A/en
Application granted granted Critical
Publication of CN111125189B publication Critical patent/CN111125189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a track similarity measurement method based on a weighted real number cost edit distance, which comprises the following steps: step 1: representing the trajectory data as an ordered multi-dimensional real number sequence; step 2: acquiring a weighted Euclidean distance between every two track points; and step 3: acquiring a weighted real number cost editing distance between every two track sequences; and 4, step 4: and (3) obtaining the track similarity between the two track sequences by adopting an exponential function method and taking the weighted real number cost editing distance in the step (3) as a power number on the basis of 0.99, and further obtaining the track similarity between every two other track sequences. The method does not require that the track sequences are equal in length, can be suitable for multi-dimensional track data, and can dynamically change the influence factors of each dimension on the track similarity according to actual requirements.

Description

Track similarity measurement method based on weighted real number cost edit distance
Technical Field
The invention relates to the technical field of track data analysis and mining, in particular to a track similarity measurement method based on Weighted Real number cost Edit distance (WRERP).
Background
With the rapid development of various wireless communication technologies, positioning technologies, and sensor technologies, a large amount of spatiotemporal trajectory data is generated and collected, such as trajectory data of animals, hurricanes, airplanes, ship moving users, and vehicles. Analysis of these data can help researchers obtain a lot of valuable information, such as: sub-hotspots, behavioral patterns, location prediction, social event detection and identification, and the like. The motion trend of the typhoon can be predicted by analyzing the movement data of the typhoon; the migration mode of the animals can be summarized by analyzing the data of the migration of the animals, and the reasons of the migration of the animals are analyzed; the mode of the traffic flow and the reason of traffic jam can be obtained by analyzing the navigation data of the taxi, and a theoretical basis is provided for reasonably scheduling the traffic flow. In this context, mining and analysis of spatiotemporal trajectory data has become a new research hotspot in the field of data mining.
However, the basic analysis task of spatiotemporal trajectories is similarity measurement. Similarity measurement is one of the key problems of research hotspots such as track pattern mining, track classification, track anomaly detection, route calculation and the like. For example, track clustering refers to grouping similar tracks together into a class; the track classification refers to training track data to establish a model according to similarity measurement between tracks, and the type of one track can be judged through the model. The track anomaly detection means that a track which is not similar to the population is detected. Furthermore, it is also an analysis task itself, for example in hurricane analysis. It is well known that the paths of hurricanes are similar, especially when they are very close to each other in space and time. Thus, when a new hurricane occurs, the meteorologist uses hurricanes that have similar initial trajectories in the past to predict the development trajectory of the hurricane, particularly the location of future re-intersections and landing points. The Euclidean distance is used as the most classical similarity measurement mode, and requires that two tracks are equal in length, so that the Euclidean distance is not suitable for tracks of airplanes, ships, hurricanes and the like. The lcs distance does not require equal lengths of the tracks, but it only focuses on similar parts between the tracks and does not consider dissimilar parts between similar subsequences, which is detrimental to the detection of abnormal tracks.
Disclosure of Invention
The invention aims to solve the technical problem of providing a track similarity measurement method based on a weighted real number cost editing distance, which replaces the cost of deletion, insertion and replacement operations in the traditional editing distance by Euclidean distance, does not require the track sequence to be equal in length, can be suitable for multi-dimensional track data, and can dynamically change the influence factor of each dimension on the track similarity according to the actual requirement.
In order to solve the technical problems, the invention adopts the technical scheme that:
a track similarity measurement method based on a weighted real number cost edit distance comprises the following steps:
step 1: the trajectory data is represented as an ordered sequence of multi-dimensional real numbers, namely:
Tr={p1(lat1,lon1,alt1,t1),p2(lati,loni,alti,ti),...,pn(latn,lonn,altn,tn)}
wherein p is1(lat1,lon1,alt1,t1) Is the 1 st trace point, … …, pn(latn,lonn,altn,tn) The nth track point is defined, n represents the number of sampling points in the track sequence, lat represents the dimensionality, lon represents the longitude, alt represents the altitude, and t represents the time point;
step 2: obtaining a weighted Euclidean distance between two trace points, i.e. trace point pi(lati,loni,alti,ti) And locus point pj(latj,lonj,altj,tj) Weighted euclidean distance | p betweeni-pj|weighted
Figure BDA0002314673190000021
Wherein, ω is1234=1;
And step 3: obtaining a weighted real number cost edit distance between every two track sequences, namely the weighted real number cost edit distance between the track sequence R and the track sequence S
Figure BDA0002314673190000022
The acquisition method comprises the following steps:
1) acquiring a weighted real number cost WRERP (R, S) for converting the track sequence R into the track sequence S, specifically:
Figure BDA0002314673190000031
Figure BDA0002314673190000032
Figure BDA0002314673190000033
substitute_cost(rm,sn)=|rm-sn|weighted
where m and n are the lengths of the track sequence R and the track sequence S, respectively, and rest (R) { R ═ R1,r2,...,rm-1The rest parts except the current comparison character in the track sequence R are Rest (S); insert _ cost(s)j) For inserting s into the sequence of tracks RjThe operation cost of, delete _ cost (r)m,sn) For deleting R from the track sequence RiThe operational cost of (r), subsystem _ costm,sn) To trace R in the sequence RmBy substitution of snThe operating cost of (c);
2) obtaining a weighted real number cost WRERP (S, R) for converting the track sequence S into the track sequence R in the same way as in the step 1);
3) taking the smaller value of WRERP (R, S) and WRERP (S, R) as the weighted real number cost edit distance of the track sequence R and the track sequence S
Figure BDA0002314673190000034
Namely:
Figure BDA0002314673190000035
and 4, step 4: editing the distance by an exponential function method with the weighted real number cost in the step 3 and with the base of 0.99
Figure BDA0002314673190000041
The track similarity between the track sequence R and the track sequence S is obtained as a power number
Figure BDA0002314673190000042
Namely:
Figure BDA0002314673190000043
compared with the prior art, the invention has the beneficial effects that:
1) the invention improves the editing distance which can only be used for characters originally into the weighted real number cost editing distance which can be applied to real number track data, does not require the equal length of track sequences and can be applied to multi-dimensional real number track sequences;
2) the method can dynamically change the influence factor of each dimension on the track similarity according to the actual requirement, and has more flexibility;
3) the distance between the two tracks is converted into the similarity in an index mode, so that the showing mode is more vivid and understandable;
4) the computational complexity of the trajectory similarity in the invention does not increase with the increase of the dimensions of the trajectory data.
Drawings
FIG. 1 is a graph of the log of similar trajectories under different similarity thresholds and under different weights for the present invention.
Fig. 2 is a two-dimensional plan view of the 2047a699 flight path of flight 3U8882 and its similar trajectory.
FIG. 3 is a two-dimensional plan view of the 1f94cc1c flight path of flight CA404 and its similar trajectory.
Fig. 4 is a two-dimensional plan view of the 1fc37b2d trajectory of flight HO1201 and its similar trajectory.
Detailed Description
The invention is explained in more detail below with reference to the figures and the description of the embodiments.
The track similarity measurement method based on the weighted real number cost editing distance is an improvement on the traditional editing distance, the weighted Euclidean distance is used for replacing the editing operation cost, so that the track similarity measurement method can be applied to a multi-dimensional real number track sequence, the influence factor of each dimension on the track similarity is dynamically changed according to the actual requirement, the problem of similarity measurement between tracks with different lengths and multiple dimensions is solved, the similarity (0-1) between the tracks is finally output, the closer to 1, the more similar the two tracks are, and the closer to 0, the more dissimilar the two tracks are.
Taking 900 aircraft tracks of 23 flights as an example, the specific implementation mode is as follows:
step 1: and representing each airplane track data into an ordered multi-dimensional real number sequence in the following way:
Tr={p1(lat1,lon1,alt1,t1),...,p2(lati,loni,alti,ti),...,pn(latn,lonn,altn,tn)}
wherein p is1(lat1,lon1,alt1,t1) For a track point, n represents the number of sampling points in the track sequence, lat represents the latitude, lon represents the longitude, alt represents the latitudeHeight, t, represents a point in time.
Step 2: obtaining the weighted Euclidean distance between every two track points as point pi(lati,loni,alti,ti) And pj(latj,lonj,altj,tj) For example, it weights Euclidean distance | pi-pj|weightedThe acquisition mode is as follows:
Figure BDA0002314673190000051
wherein, ω is1234=1
And step 3: obtaining a weighted real number cost edit distance between every two track sequences, namely the weighted real number cost edit distance between the track sequence R and the track sequence S
Figure BDA0002314673190000052
The acquisition method comprises the following steps:
1) acquiring a weighted real number cost WRERP (R, S) for converting the track sequence R into the track sequence S, wherein the acquisition mode is as follows:
Figure BDA0002314673190000053
Figure BDA0002314673190000061
Figure BDA0002314673190000062
substitute_cost(rm,sn)=|rm-sn|weighted)
where m and n are the lengths of the track sequence R and the track sequence S, respectively, and rest (R) { R ═ R1,r2,...,rm-1The rest parts except the current comparison character in the track sequence R are Rest (S); insert _ cost(s)j) For inserting s into the sequence of tracks RjThe operation cost of, delete _ cost (r)m,sn) For deleting R from the track sequence RiThe operational cost of (r), subsystem _ costm,sn) To trace R in the sequence RmBy substitution of snThe cost of the operation of (c).
2) And acquiring a weighted real number cost WRERP (S, R) for converting the track sequence S into the track sequence R according to the same method in the step 1).
3) Taking the smaller value of WRERP (R, S) and WRERP (S, R) as the weighted real number cost edit distance of the track sequence R and the track sequence S
Figure BDA0002314673190000063
Namely:
Figure BDA0002314673190000064
and 4, step 4: method of using exponential function, base 0.99, edit distance with weighted real number cost in step 3
Figure BDA0002314673190000065
The track similarity between the track sequence R and the track sequence S is obtained as a power number
Figure BDA0002314673190000066
The acquisition mode is as follows:
Figure BDA0002314673190000067
the similarity between other pairs of trajectory sequences is obtained in the same manner as described above.
To compare the impact of weighted and unweighted euclidean distances on the trajectory similarity measure, the following five experiments were performed:
1) experiment 1: get w1=1,w2=1,w3=0,w4When the Euclidean distance is not weighted, acquiring the similarity of the two-dimensional flight path;
2) experiment 2: get w1=1,w2=1,w3=1,w4When the Euclidean distance is not weighted, acquiring the similarity of the three-dimensional flight path;
3) experiment 3: get w1=0.5,w2=0.5,w3=0,w4When the Euclidean distance is weighted, acquiring the similarity of the two-dimensional flight path;
4) experiment 4: get w1=0.333,w2=0.333,w3=0.333,w4When the Euclidean distance is weighted, acquiring the similarity of the three-dimensional flight path;
5) experiment 5: considering that the variation range of the altitude in the track is far larger than the latitude and longitude, the weight of the altitude is reduced to 0.00001, and the variation range of the square of the altitude difference is matched with the variation range of the square of the latitude and longitude difference. Get w immediately1=0.49995,w2=0.49995,w3=0.00001,w4And (5) acquiring the similarity of the three-dimensional flight path when the Euclidean distance is weighted as 0.
In the experiment, a threshold value, namely threshold, is set for the similarity, and when the similarity of the track pair is greater than the threshold, the two tracks are judged to be similar, otherwise, the two tracks are not similar. Table 1 is the log of similar trajectories at different thresholds.
TABLE 1 logarithm of similar traces at different thresholds
Figure BDA0002314673190000071
Fig. 1 shows the log of similar trajectories at different similarity thresholds and different weights. As can be seen from fig. 1, more similar tracks can be found by using the weighted euclidean distance, and the problem that the distance between tracks is larger and the similarity is smaller as the dimension is increased is avoided. And when the difference between each dimensionality of the facing track data is very large, the difference can be balanced by adjusting the weight of the Euclidean distance, which is shown in the experiment five.
As can be seen from fig. 1, when the similarity threshold exceeds 0.5, the logarithm of similar tracks starts to decrease greatly, so that 0.5 is temporarily used as the track similarity threshold here. In practical application, the weight value of experiment three is more reasonable and common, so that any one of the flight paths of three flights is randomly selected from the similar tracks of experiment three, and a two-dimensional plane graph of the similar track of the flight path is drawn, as shown in fig. 2-4. From fig. 2-4, it can be seen that these similar trajectories are very close and similar, and therefore the effectiveness of the present invention in the trajectory similarity measure can be illustrated.

Claims (1)

1. A track similarity measurement method based on a weighted real number cost edit distance is characterized by comprising the following steps:
step 1: the trajectory data is represented as an ordered sequence of multi-dimensional real numbers, namely:
Tr={p1(lat1,lon1,alt1,t1),p2(lati,loni,alti,ti),...,pn(latn,lonn,altn,tn)}
wherein p is1(lat1,lon1,alt1,t1) Is the 1 st trace point, … …, pn(latn,lonn,altn,tn) The nth track point is defined, n represents the number of sampling points in the track sequence, lat represents the dimensionality, lon represents the longitude, alt represents the altitude, and t represents the time point;
step 2: obtaining a weighted Euclidean distance between two trace points, i.e. trace point pi(lati,loni,alti,ti) And locus point pj(latj,lonj,altj,tj) Weighted euclidean distance | p betweeni-pj|weighted
Figure FDA0002314673180000011
Wherein, ω is1234=1;
And step 3: obtaining a weighted real number cost edit distance between every two track sequences,i.e. the weighted real-valued cost edit distance between the trajectory sequence R and the trajectory sequence S
Figure FDA0002314673180000014
The acquisition method comprises the following steps:
1) acquiring a weighted real number cost WRERP (R, S) for converting the track sequence R into the track sequence S, specifically:
Figure FDA0002314673180000012
Figure FDA0002314673180000013
Figure FDA0002314673180000021
substitute_cost(rm,sn)=|rm-sn|weighted
where m and n are the lengths of the track sequence R and the track sequence S, respectively, and rest (R) { R ═ R1,r2,...,rm-1The rest parts except the current comparison character in the track sequence R are Rest (S); insert _ cost(s)j) For inserting s into the sequence of tracks RjThe operation cost of, delete _ cost (r)m,sn) For deleting R from the track sequence RiThe operational cost of (r), subsystem _ costm,sn) To trace R in the sequence RmBy substitution of snThe operating cost of (c);
2) obtaining a weighted real number cost WRERP (S, R) for converting the track sequence S into the track sequence R in the same way as in the step 1);
3) taking the smaller value of WRERP (R, S) and WRERP (S, R) as the weighted real number cost edit distance of the track sequence R and the track sequence S
Figure FDA0002314673180000024
Namely:
Figure FDA0002314673180000022
and 4, step 4: editing the distance by an exponential function method with the weighted real number cost in the step 3 and with the base of 0.99
Figure FDA0002314673180000025
The track similarity between the track sequence R and the track sequence S is obtained as a power number
Figure FDA0002314673180000026
Namely:
Figure FDA0002314673180000023
CN201911272820.5A 2019-12-12 2019-12-12 Track similarity measurement method based on weighted real number cost edit distance Active CN111125189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911272820.5A CN111125189B (en) 2019-12-12 2019-12-12 Track similarity measurement method based on weighted real number cost edit distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911272820.5A CN111125189B (en) 2019-12-12 2019-12-12 Track similarity measurement method based on weighted real number cost edit distance

Publications (2)

Publication Number Publication Date
CN111125189A true CN111125189A (en) 2020-05-08
CN111125189B CN111125189B (en) 2021-01-29

Family

ID=70499828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911272820.5A Active CN111125189B (en) 2019-12-12 2019-12-12 Track similarity measurement method based on weighted real number cost edit distance

Country Status (1)

Country Link
CN (1) CN111125189B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111930791A (en) * 2020-05-28 2020-11-13 中南大学 Similarity calculation method and system for vehicle track and storage medium
CN112733890A (en) * 2020-12-28 2021-04-30 北京航空航天大学 Online vehicle track clustering method considering space-time characteristics

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722541A (en) * 2012-05-23 2012-10-10 中国科学院计算技术研究所 Method and system for calculating space-time locus similarity
US20150039217A1 (en) * 2013-07-31 2015-02-05 International Business Machines Corporation Computing a similarity measure over moving object trajectories
CN106339716A (en) * 2016-08-16 2017-01-18 浙江工业大学 Mobile trajectory similarity matching method based on weighted Euclidean distance
US20180255431A1 (en) * 2016-12-31 2018-09-06 Google Llc Determining position of a device in three-dimensional space and corresponding calibration techniques
CN108536851A (en) * 2018-04-16 2018-09-14 武汉大学 A kind of method for identifying ID based on motion track similarity-rough set
CN109635059A (en) * 2018-11-23 2019-04-16 武汉烽火众智数字技术有限责任公司 People's vehicle association analysis method and system based on track similarity mode

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722541A (en) * 2012-05-23 2012-10-10 中国科学院计算技术研究所 Method and system for calculating space-time locus similarity
US20150039217A1 (en) * 2013-07-31 2015-02-05 International Business Machines Corporation Computing a similarity measure over moving object trajectories
CN106339716A (en) * 2016-08-16 2017-01-18 浙江工业大学 Mobile trajectory similarity matching method based on weighted Euclidean distance
US20180255431A1 (en) * 2016-12-31 2018-09-06 Google Llc Determining position of a device in three-dimensional space and corresponding calibration techniques
CN108536851A (en) * 2018-04-16 2018-09-14 武汉大学 A kind of method for identifying ID based on motion track similarity-rough set
CN109635059A (en) * 2018-11-23 2019-04-16 武汉烽火众智数字技术有限责任公司 People's vehicle association analysis method and system based on track similarity mode

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAYI GUO ET AL.: "Convolutional Trajectory Similarity Model: a faster method for trajectory similarity measurement", 《2019 IEEE INTELLIGENT TRASPORTATION SYSTEMS CONFERENCE》 *
杨洁: "基于移动大数据的轨迹匹配算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111930791A (en) * 2020-05-28 2020-11-13 中南大学 Similarity calculation method and system for vehicle track and storage medium
CN111930791B (en) * 2020-05-28 2022-07-15 中南大学 Similarity calculation method and system for vehicle track and storage medium
CN112733890A (en) * 2020-12-28 2021-04-30 北京航空航天大学 Online vehicle track clustering method considering space-time characteristics

Also Published As

Publication number Publication date
CN111125189B (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN111125189B (en) Track similarity measurement method based on weighted real number cost edit distance
CN109191922B (en) Large-scale four-dimensional track dynamic prediction method and device
CN113158445A (en) Prediction algorithm for residual service life of aero-engine with convolution memory residual self-attention mechanism
CN110889444B (en) Driving track feature classification method based on convolutional neural network
CN107392311B (en) Method and device for segmenting sequence
CN112862171B (en) Flight arrival time prediction method based on space-time neural network
Das et al. Anomaly detection in flight recorder data: A dynamic data-driven approach
Schimpf et al. Flight trajectory prediction based on hybrid-recurrent networks
CN110070131A (en) A kind of Active Learning Method of data-oriented driving modeling
Jiang et al. Research on method of trajectory prediction in aircraft flight based on aircraft performance and historical track data
EP3944154A1 (en) Method for optimizing on-device neural network model by using sub-kernel searching module and device using the same
CN113360655A (en) Track point classification and text generation method based on sequence annotation
CN111182445B (en) Method and system for analyzing aggregated groups based on mobile phone signaling data
CN110944295B (en) Position prediction method, position prediction device, storage medium and terminal
CN110955804B (en) Adaboost method for user space-time data behavior detection
Etemad Transportation modes classification using feature engineering
CN117077843A (en) Space-time attention fine granularity PM2.5 concentration prediction method based on CBAM-CNN-converter
Shao et al. Onlineairtrajclus: An online aircraft trajectory clustering for tarmac situation awareness
CN115759470A (en) Flight overall process fuel consumption prediction method based on machine learning
CN116244356A (en) Abnormal track detection method and device, electronic equipment and storage medium
Karataş et al. Trajectory prediction for maritime vessels using ais data
CN116304966A (en) Track association method based on multi-source data fusion
US20220277223A1 (en) Computer-readable recording medium storing machine learning program, machine learning method, and estimation device
CN111079089B (en) Base station data anomaly detection method based on interval division
Walkowiak et al. Utilizing local outlier factor for open-set classification in high-dimensional data-case study applied for text documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant