CN111125189A - Track similarity measurement method based on weighted real number cost edit distance - Google Patents
Track similarity measurement method based on weighted real number cost edit distance Download PDFInfo
- Publication number
- CN111125189A CN111125189A CN201911272820.5A CN201911272820A CN111125189A CN 111125189 A CN111125189 A CN 111125189A CN 201911272820 A CN201911272820 A CN 201911272820A CN 111125189 A CN111125189 A CN 111125189A
- Authority
- CN
- China
- Prior art keywords
- track
- cost
- weighted
- sequence
- real number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a track similarity measurement method based on a weighted real number cost edit distance, which comprises the following steps: step 1: representing the trajectory data as an ordered multi-dimensional real number sequence; step 2: acquiring a weighted Euclidean distance between every two track points; and step 3: acquiring a weighted real number cost editing distance between every two track sequences; and 4, step 4: and (3) obtaining the track similarity between the two track sequences by adopting an exponential function method and taking the weighted real number cost editing distance in the step (3) as a power number on the basis of 0.99, and further obtaining the track similarity between every two other track sequences. The method does not require that the track sequences are equal in length, can be suitable for multi-dimensional track data, and can dynamically change the influence factors of each dimension on the track similarity according to actual requirements.
Description
Technical Field
The invention relates to the technical field of track data analysis and mining, in particular to a track similarity measurement method based on Weighted Real number cost Edit distance (WRERP).
Background
With the rapid development of various wireless communication technologies, positioning technologies, and sensor technologies, a large amount of spatiotemporal trajectory data is generated and collected, such as trajectory data of animals, hurricanes, airplanes, ship moving users, and vehicles. Analysis of these data can help researchers obtain a lot of valuable information, such as: sub-hotspots, behavioral patterns, location prediction, social event detection and identification, and the like. The motion trend of the typhoon can be predicted by analyzing the movement data of the typhoon; the migration mode of the animals can be summarized by analyzing the data of the migration of the animals, and the reasons of the migration of the animals are analyzed; the mode of the traffic flow and the reason of traffic jam can be obtained by analyzing the navigation data of the taxi, and a theoretical basis is provided for reasonably scheduling the traffic flow. In this context, mining and analysis of spatiotemporal trajectory data has become a new research hotspot in the field of data mining.
However, the basic analysis task of spatiotemporal trajectories is similarity measurement. Similarity measurement is one of the key problems of research hotspots such as track pattern mining, track classification, track anomaly detection, route calculation and the like. For example, track clustering refers to grouping similar tracks together into a class; the track classification refers to training track data to establish a model according to similarity measurement between tracks, and the type of one track can be judged through the model. The track anomaly detection means that a track which is not similar to the population is detected. Furthermore, it is also an analysis task itself, for example in hurricane analysis. It is well known that the paths of hurricanes are similar, especially when they are very close to each other in space and time. Thus, when a new hurricane occurs, the meteorologist uses hurricanes that have similar initial trajectories in the past to predict the development trajectory of the hurricane, particularly the location of future re-intersections and landing points. The Euclidean distance is used as the most classical similarity measurement mode, and requires that two tracks are equal in length, so that the Euclidean distance is not suitable for tracks of airplanes, ships, hurricanes and the like. The lcs distance does not require equal lengths of the tracks, but it only focuses on similar parts between the tracks and does not consider dissimilar parts between similar subsequences, which is detrimental to the detection of abnormal tracks.
Disclosure of Invention
The invention aims to solve the technical problem of providing a track similarity measurement method based on a weighted real number cost editing distance, which replaces the cost of deletion, insertion and replacement operations in the traditional editing distance by Euclidean distance, does not require the track sequence to be equal in length, can be suitable for multi-dimensional track data, and can dynamically change the influence factor of each dimension on the track similarity according to the actual requirement.
In order to solve the technical problems, the invention adopts the technical scheme that:
a track similarity measurement method based on a weighted real number cost edit distance comprises the following steps:
step 1: the trajectory data is represented as an ordered sequence of multi-dimensional real numbers, namely:
Tr={p1(lat1,lon1,alt1,t1),p2(lati,loni,alti,ti),...,pn(latn,lonn,altn,tn)}
wherein p is1(lat1,lon1,alt1,t1) Is the 1 st trace point, … …, pn(latn,lonn,altn,tn) The nth track point is defined, n represents the number of sampling points in the track sequence, lat represents the dimensionality, lon represents the longitude, alt represents the altitude, and t represents the time point;
step 2: obtaining a weighted Euclidean distance between two trace points, i.e. trace point pi(lati,loni,alti,ti) And locus point pj(latj,lonj,altj,tj) Weighted euclidean distance | p betweeni-pj|weighted:
Wherein, ω is1+ω2+ω3+ω4=1;
And step 3: obtaining a weighted real number cost edit distance between every two track sequences, namely the weighted real number cost edit distance between the track sequence R and the track sequence SThe acquisition method comprises the following steps:
1) acquiring a weighted real number cost WRERP (R, S) for converting the track sequence R into the track sequence S, specifically:
substitute_cost(rm,sn)=|rm-sn|weighted
where m and n are the lengths of the track sequence R and the track sequence S, respectively, and rest (R) { R ═ R1,r2,...,rm-1The rest parts except the current comparison character in the track sequence R are Rest (S); insert _ cost(s)j) For inserting s into the sequence of tracks RjThe operation cost of, delete _ cost (r)m,sn) For deleting R from the track sequence RiThe operational cost of (r), subsystem _ costm,sn) To trace R in the sequence RmBy substitution of snThe operating cost of (c);
2) obtaining a weighted real number cost WRERP (S, R) for converting the track sequence S into the track sequence R in the same way as in the step 1);
3) taking the smaller value of WRERP (R, S) and WRERP (S, R) as the weighted real number cost edit distance of the track sequence R and the track sequence SNamely:
and 4, step 4: editing the distance by an exponential function method with the weighted real number cost in the step 3 and with the base of 0.99The track similarity between the track sequence R and the track sequence S is obtained as a power numberNamely:
compared with the prior art, the invention has the beneficial effects that:
1) the invention improves the editing distance which can only be used for characters originally into the weighted real number cost editing distance which can be applied to real number track data, does not require the equal length of track sequences and can be applied to multi-dimensional real number track sequences;
2) the method can dynamically change the influence factor of each dimension on the track similarity according to the actual requirement, and has more flexibility;
3) the distance between the two tracks is converted into the similarity in an index mode, so that the showing mode is more vivid and understandable;
4) the computational complexity of the trajectory similarity in the invention does not increase with the increase of the dimensions of the trajectory data.
Drawings
FIG. 1 is a graph of the log of similar trajectories under different similarity thresholds and under different weights for the present invention.
Fig. 2 is a two-dimensional plan view of the 2047a699 flight path of flight 3U8882 and its similar trajectory.
FIG. 3 is a two-dimensional plan view of the 1f94cc1c flight path of flight CA404 and its similar trajectory.
Fig. 4 is a two-dimensional plan view of the 1fc37b2d trajectory of flight HO1201 and its similar trajectory.
Detailed Description
The invention is explained in more detail below with reference to the figures and the description of the embodiments.
The track similarity measurement method based on the weighted real number cost editing distance is an improvement on the traditional editing distance, the weighted Euclidean distance is used for replacing the editing operation cost, so that the track similarity measurement method can be applied to a multi-dimensional real number track sequence, the influence factor of each dimension on the track similarity is dynamically changed according to the actual requirement, the problem of similarity measurement between tracks with different lengths and multiple dimensions is solved, the similarity (0-1) between the tracks is finally output, the closer to 1, the more similar the two tracks are, and the closer to 0, the more dissimilar the two tracks are.
Taking 900 aircraft tracks of 23 flights as an example, the specific implementation mode is as follows:
step 1: and representing each airplane track data into an ordered multi-dimensional real number sequence in the following way:
Tr={p1(lat1,lon1,alt1,t1),...,p2(lati,loni,alti,ti),...,pn(latn,lonn,altn,tn)}
wherein p is1(lat1,lon1,alt1,t1) For a track point, n represents the number of sampling points in the track sequence, lat represents the latitude, lon represents the longitude, alt represents the latitudeHeight, t, represents a point in time.
Step 2: obtaining the weighted Euclidean distance between every two track points as point pi(lati,loni,alti,ti) And pj(latj,lonj,altj,tj) For example, it weights Euclidean distance | pi-pj|weightedThe acquisition mode is as follows:
wherein, ω is1+ω2+ω3+ω4=1
And step 3: obtaining a weighted real number cost edit distance between every two track sequences, namely the weighted real number cost edit distance between the track sequence R and the track sequence SThe acquisition method comprises the following steps:
1) acquiring a weighted real number cost WRERP (R, S) for converting the track sequence R into the track sequence S, wherein the acquisition mode is as follows:
substitute_cost(rm,sn)=|rm-sn|weighted)
where m and n are the lengths of the track sequence R and the track sequence S, respectively, and rest (R) { R ═ R1,r2,...,rm-1The rest parts except the current comparison character in the track sequence R are Rest (S); insert _ cost(s)j) For inserting s into the sequence of tracks RjThe operation cost of, delete _ cost (r)m,sn) For deleting R from the track sequence RiThe operational cost of (r), subsystem _ costm,sn) To trace R in the sequence RmBy substitution of snThe cost of the operation of (c).
2) And acquiring a weighted real number cost WRERP (S, R) for converting the track sequence S into the track sequence R according to the same method in the step 1).
3) Taking the smaller value of WRERP (R, S) and WRERP (S, R) as the weighted real number cost edit distance of the track sequence R and the track sequence SNamely:
and 4, step 4: method of using exponential function, base 0.99, edit distance with weighted real number cost in step 3The track similarity between the track sequence R and the track sequence S is obtained as a power numberThe acquisition mode is as follows:
the similarity between other pairs of trajectory sequences is obtained in the same manner as described above.
To compare the impact of weighted and unweighted euclidean distances on the trajectory similarity measure, the following five experiments were performed:
1) experiment 1: get w1=1,w2=1,w3=0,w4When the Euclidean distance is not weighted, acquiring the similarity of the two-dimensional flight path;
2) experiment 2: get w1=1,w2=1,w3=1,w4When the Euclidean distance is not weighted, acquiring the similarity of the three-dimensional flight path;
3) experiment 3: get w1=0.5,w2=0.5,w3=0,w4When the Euclidean distance is weighted, acquiring the similarity of the two-dimensional flight path;
4) experiment 4: get w1=0.333,w2=0.333,w3=0.333,w4When the Euclidean distance is weighted, acquiring the similarity of the three-dimensional flight path;
5) experiment 5: considering that the variation range of the altitude in the track is far larger than the latitude and longitude, the weight of the altitude is reduced to 0.00001, and the variation range of the square of the altitude difference is matched with the variation range of the square of the latitude and longitude difference. Get w immediately1=0.49995,w2=0.49995,w3=0.00001,w4And (5) acquiring the similarity of the three-dimensional flight path when the Euclidean distance is weighted as 0.
In the experiment, a threshold value, namely threshold, is set for the similarity, and when the similarity of the track pair is greater than the threshold, the two tracks are judged to be similar, otherwise, the two tracks are not similar. Table 1 is the log of similar trajectories at different thresholds.
TABLE 1 logarithm of similar traces at different thresholds
Fig. 1 shows the log of similar trajectories at different similarity thresholds and different weights. As can be seen from fig. 1, more similar tracks can be found by using the weighted euclidean distance, and the problem that the distance between tracks is larger and the similarity is smaller as the dimension is increased is avoided. And when the difference between each dimensionality of the facing track data is very large, the difference can be balanced by adjusting the weight of the Euclidean distance, which is shown in the experiment five.
As can be seen from fig. 1, when the similarity threshold exceeds 0.5, the logarithm of similar tracks starts to decrease greatly, so that 0.5 is temporarily used as the track similarity threshold here. In practical application, the weight value of experiment three is more reasonable and common, so that any one of the flight paths of three flights is randomly selected from the similar tracks of experiment three, and a two-dimensional plane graph of the similar track of the flight path is drawn, as shown in fig. 2-4. From fig. 2-4, it can be seen that these similar trajectories are very close and similar, and therefore the effectiveness of the present invention in the trajectory similarity measure can be illustrated.
Claims (1)
1. A track similarity measurement method based on a weighted real number cost edit distance is characterized by comprising the following steps:
step 1: the trajectory data is represented as an ordered sequence of multi-dimensional real numbers, namely:
Tr={p1(lat1,lon1,alt1,t1),p2(lati,loni,alti,ti),...,pn(latn,lonn,altn,tn)}
wherein p is1(lat1,lon1,alt1,t1) Is the 1 st trace point, … …, pn(latn,lonn,altn,tn) The nth track point is defined, n represents the number of sampling points in the track sequence, lat represents the dimensionality, lon represents the longitude, alt represents the altitude, and t represents the time point;
step 2: obtaining a weighted Euclidean distance between two trace points, i.e. trace point pi(lati,loni,alti,ti) And locus point pj(latj,lonj,altj,tj) Weighted euclidean distance | p betweeni-pj|weighted:
Wherein, ω is1+ω2+ω3+ω4=1;
And step 3: obtaining a weighted real number cost edit distance between every two track sequences,i.e. the weighted real-valued cost edit distance between the trajectory sequence R and the trajectory sequence SThe acquisition method comprises the following steps:
1) acquiring a weighted real number cost WRERP (R, S) for converting the track sequence R into the track sequence S, specifically:
substitute_cost(rm,sn)=|rm-sn|weighted
where m and n are the lengths of the track sequence R and the track sequence S, respectively, and rest (R) { R ═ R1,r2,...,rm-1The rest parts except the current comparison character in the track sequence R are Rest (S); insert _ cost(s)j) For inserting s into the sequence of tracks RjThe operation cost of, delete _ cost (r)m,sn) For deleting R from the track sequence RiThe operational cost of (r), subsystem _ costm,sn) To trace R in the sequence RmBy substitution of snThe operating cost of (c);
2) obtaining a weighted real number cost WRERP (S, R) for converting the track sequence S into the track sequence R in the same way as in the step 1);
3) taking the smaller value of WRERP (R, S) and WRERP (S, R) as the weighted real number cost edit distance of the track sequence R and the track sequence SNamely:
and 4, step 4: editing the distance by an exponential function method with the weighted real number cost in the step 3 and with the base of 0.99The track similarity between the track sequence R and the track sequence S is obtained as a power numberNamely:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911272820.5A CN111125189B (en) | 2019-12-12 | 2019-12-12 | Track similarity measurement method based on weighted real number cost edit distance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911272820.5A CN111125189B (en) | 2019-12-12 | 2019-12-12 | Track similarity measurement method based on weighted real number cost edit distance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111125189A true CN111125189A (en) | 2020-05-08 |
CN111125189B CN111125189B (en) | 2021-01-29 |
Family
ID=70499828
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911272820.5A Active CN111125189B (en) | 2019-12-12 | 2019-12-12 | Track similarity measurement method based on weighted real number cost edit distance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111125189B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111930791A (en) * | 2020-05-28 | 2020-11-13 | 中南大学 | Similarity calculation method and system for vehicle track and storage medium |
CN112733890A (en) * | 2020-12-28 | 2021-04-30 | 北京航空航天大学 | Online vehicle track clustering method considering space-time characteristics |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722541A (en) * | 2012-05-23 | 2012-10-10 | 中国科学院计算技术研究所 | Method and system for calculating space-time locus similarity |
US20150039217A1 (en) * | 2013-07-31 | 2015-02-05 | International Business Machines Corporation | Computing a similarity measure over moving object trajectories |
CN106339716A (en) * | 2016-08-16 | 2017-01-18 | 浙江工业大学 | Mobile trajectory similarity matching method based on weighted Euclidean distance |
US20180255431A1 (en) * | 2016-12-31 | 2018-09-06 | Google Llc | Determining position of a device in three-dimensional space and corresponding calibration techniques |
CN108536851A (en) * | 2018-04-16 | 2018-09-14 | 武汉大学 | A kind of method for identifying ID based on motion track similarity-rough set |
CN109635059A (en) * | 2018-11-23 | 2019-04-16 | 武汉烽火众智数字技术有限责任公司 | People's vehicle association analysis method and system based on track similarity mode |
-
2019
- 2019-12-12 CN CN201911272820.5A patent/CN111125189B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722541A (en) * | 2012-05-23 | 2012-10-10 | 中国科学院计算技术研究所 | Method and system for calculating space-time locus similarity |
US20150039217A1 (en) * | 2013-07-31 | 2015-02-05 | International Business Machines Corporation | Computing a similarity measure over moving object trajectories |
CN106339716A (en) * | 2016-08-16 | 2017-01-18 | 浙江工业大学 | Mobile trajectory similarity matching method based on weighted Euclidean distance |
US20180255431A1 (en) * | 2016-12-31 | 2018-09-06 | Google Llc | Determining position of a device in three-dimensional space and corresponding calibration techniques |
CN108536851A (en) * | 2018-04-16 | 2018-09-14 | 武汉大学 | A kind of method for identifying ID based on motion track similarity-rough set |
CN109635059A (en) * | 2018-11-23 | 2019-04-16 | 武汉烽火众智数字技术有限责任公司 | People's vehicle association analysis method and system based on track similarity mode |
Non-Patent Citations (2)
Title |
---|
JIAYI GUO ET AL.: "Convolutional Trajectory Similarity Model: a faster method for trajectory similarity measurement", 《2019 IEEE INTELLIGENT TRASPORTATION SYSTEMS CONFERENCE》 * |
杨洁: "基于移动大数据的轨迹匹配算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111930791A (en) * | 2020-05-28 | 2020-11-13 | 中南大学 | Similarity calculation method and system for vehicle track and storage medium |
CN111930791B (en) * | 2020-05-28 | 2022-07-15 | 中南大学 | Similarity calculation method and system for vehicle track and storage medium |
CN112733890A (en) * | 2020-12-28 | 2021-04-30 | 北京航空航天大学 | Online vehicle track clustering method considering space-time characteristics |
Also Published As
Publication number | Publication date |
---|---|
CN111125189B (en) | 2021-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111125189B (en) | Track similarity measurement method based on weighted real number cost edit distance | |
CN109191922B (en) | Large-scale four-dimensional track dynamic prediction method and device | |
CN113158445A (en) | Prediction algorithm for residual service life of aero-engine with convolution memory residual self-attention mechanism | |
CN110889444B (en) | Driving track feature classification method based on convolutional neural network | |
CN107392311B (en) | Method and device for segmenting sequence | |
CN112862171B (en) | Flight arrival time prediction method based on space-time neural network | |
Das et al. | Anomaly detection in flight recorder data: A dynamic data-driven approach | |
Schimpf et al. | Flight trajectory prediction based on hybrid-recurrent networks | |
CN110070131A (en) | A kind of Active Learning Method of data-oriented driving modeling | |
Jiang et al. | Research on method of trajectory prediction in aircraft flight based on aircraft performance and historical track data | |
EP3944154A1 (en) | Method for optimizing on-device neural network model by using sub-kernel searching module and device using the same | |
CN113360655A (en) | Track point classification and text generation method based on sequence annotation | |
CN111182445B (en) | Method and system for analyzing aggregated groups based on mobile phone signaling data | |
CN110944295B (en) | Position prediction method, position prediction device, storage medium and terminal | |
CN110955804B (en) | Adaboost method for user space-time data behavior detection | |
Etemad | Transportation modes classification using feature engineering | |
CN117077843A (en) | Space-time attention fine granularity PM2.5 concentration prediction method based on CBAM-CNN-converter | |
Shao et al. | Onlineairtrajclus: An online aircraft trajectory clustering for tarmac situation awareness | |
CN115759470A (en) | Flight overall process fuel consumption prediction method based on machine learning | |
CN116244356A (en) | Abnormal track detection method and device, electronic equipment and storage medium | |
Karataş et al. | Trajectory prediction for maritime vessels using ais data | |
CN116304966A (en) | Track association method based on multi-source data fusion | |
US20220277223A1 (en) | Computer-readable recording medium storing machine learning program, machine learning method, and estimation device | |
CN111079089B (en) | Base station data anomaly detection method based on interval division | |
Walkowiak et al. | Utilizing local outlier factor for open-set classification in high-dimensional data-case study applied for text documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |