CN106407378B - Method for re-representing road network track data - Google Patents

Method for re-representing road network track data Download PDF

Info

Publication number
CN106407378B
CN106407378B CN201610817878.3A CN201610817878A CN106407378B CN 106407378 B CN106407378 B CN 106407378B CN 201610817878 A CN201610817878 A CN 201610817878A CN 106407378 B CN106407378 B CN 106407378B
Authority
CN
China
Prior art keywords
data
track
road
time
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610817878.3A
Other languages
Chinese (zh)
Other versions
CN106407378A (en
Inventor
孙未未
韩韵衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201610817878.3A priority Critical patent/CN106407378B/en
Publication of CN106407378A publication Critical patent/CN106407378A/en
Application granted granted Critical
Publication of CN106407378B publication Critical patent/CN106407378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Navigation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention belongs to the technical field of track data calculation, and particularly relates to a method for re-representing road network track data. The road network track data obtained by original GPS sampling is not easy to encode and compress, so the original three-dimensional real number sequence needs to be changed before compression. The method comprises the steps of decomposing a map-matched road network track into spatial data and time data, wherein the spatial data is a road network road sequence, and the time data is a distance-time binary sequence; the data before and after decomposition can be transformed losslessly in linear time. In the track calculation, the invention can reduce the track storage and query cost in the database.

Description

Method for re-representing road network track data
Technical Field
The invention belongs to the technical field of trajectory data calculation, and particularly relates to a method for re-representing road network trajectory data.
Background
Trajectory data is a basic spatiotemporal data, generally defined as a function of position with respect to time. The track points sampled by the vehicle-mounted positioning device are represented by (x, y, t) triples, wherein x and y are longitude and latitude respectively, and t is the time stamp of the sampling point. The original road network trajectory can then be represented by a sequence of triplets, i.e.<(x1,y1,t1),(x2,y2,t2),…,(xn,yn,tn)>Where n is the length of the track and (x)i,yi) Is that the vehicle is at tiThe location of the time of day. With the popularization of vehicle-mounted positioning equipment, vehicles in cities generate massive road network tracks. The road network track data carries a large amount of information, and is often used as an important decision basis and an information source in the problems of analyzing urban traffic conditions, mining behavior patterns of people, predicting vehicle flow directions and the like. The urban road network is represented by a directed graph G ═ V, E, where V is the set of intersections of roads and E is the set of segments between intersections.
Data compression algorithms are classified into lossless compression and lossy compression. Lossless compression does not generate information loss, namely, compressed data can be completely restored to original data; in contrast, lossy compression achieves a higher compression rate by directly discarding portions of the data that do not affect the accuracy requirement, but there is a loss of information in the data after ω compression. Lossless compression algorithms are divided into entropy coding and dictionary coding. Commonly used entropy coding includes Huffman coding and arithmetic coding; lexicographic coding is commonly used with Lempel-Ziv coding and other algorithms derived. Lossy compression algorithms are specialized algorithms designed for certain specific data, such as JPEG compression for images and MPEG compression for audio, and these methods are directed only to specific data (such as images and audio), unlike lossless compression. Specific lossy compression algorithms are also provided for general track data and road network track data, and the methods usually directly delete sampling points which do not influence data precision in an original track.
The raw road network trajectory data is typically represented as a triplet sequence T ═ x (x) due to being sampled from the mobile positioning device1,y1,t1),(x2,y2,t2),…,(xn,yn,tn) However, such representations contain unnecessary redundancy, which is detrimental to data compression. According to the limitation of the original data representation method, a new track format and a corresponding data decomposition method are provided to directly reduce data redundancy, so that the data redundancy is easily processed by a compression algorithm.
Disclosure of Invention
The compression rate is one of the key indicators for measuring the performance of a data compression algorithm, and is generally defined as the ratio of the size of original data to the size of compressed data. Giving an original track T, and setting the size of the original track T as | T |; and the track after compression is TcOf size | TcIf the compression ratio is
Figure BDA0001112633040000011
For example, the original data size is 2KB, and the compressed data size is 1KB, the data compression rate is 2.
First consider the use of a general lossless compression algorithm (sourcecoding) on the original track. If we use classical lossless compression algorithm directly for original track data, since the theoretical background of data compression is information theory, analyzing the problem of track data compression from the perspective of information entropy, it can be proved that, no matter entropy coding or dictionary coding, the compression ratio of algorithm to track data becomes very low when the precision of real data is improved.
Theorem: when the precision of real number data is improved, both entropy encoding and dictionary encoding tend to have a compression rate of 1 for trajectory data.
And (3) proving that: first we demonstrate that entropy coding is inefficient for high precision real data. Let X be a continuous distribution with a probability density function of p (X). To calculate the entropy of X, we first equally divide the sample space ω ═ a, b) of X into n parts, each cell interval having a length Δ ═ b/n. Let [ a, b) be divided into { [ a ═ x { [ a { [ x { ]0,x1),[x1,x2),…[xn-1,xnB), the probability of x falling within each interval constitutes a discrete distribution whose column of probability distributions can be calculated by integration:
Figure BDA0001112633040000021
according to the median theorem of integral, there must be
Figure BDA0001112633040000025
Such that:
Figure BDA0001112633040000022
in other words, the discrete distribution has an entropy of
Figure BDA0001112633040000023
If the function p (x) log p (x) Riemannian's product, then there is
Figure BDA0001112633040000024
Where h (X) is the differential entropy of the continuous distribution X.
N in the above equation is the precision of the data because the finer the interval division is, the more symbols are used to represent different data, i.e., the higher the precision of the data. If the data is not compressed, the data can be directly used
Figure BDA0001112633040000026
Bits to store each symbol. According to Shannon's source coding theorem and the optimality of entropy coding algorithms, when n goes to infinity, there are:
Figure BDA0001112633040000031
in summary, when the data precision is improved, the compression rate of the entropy coding approaches 1, i.e. the data cannot be compressed.
Unlike entropy coding, in order to compute the compression effect of lexicographic coding, the joint entropy of the source distribution needs to be analyzed, but the process of proving is similar. Given any k characters, let p (x)1,x2,…xk) Is X1,X2,…XkThe joint probability density function of (a). The optimal average code length LkIt is inevitable to satisfy:
Figure BDA0001112633040000032
i.e. we need at least H (X)1,X2,…Xk) Bits to represent k characters.
Similarly, we will sample space ω ═ a1,b1)×[a2,b2)×…[ak,bk) Is divided into nkPart, each block of size
Figure BDA0001112633040000033
The entropy of the discrete distribution is then calculated as:
Figure BDA0001112633040000034
therefore, if there are k items of data and their precision is n, then at least H Δ (X) is needed1,X2,…Xk) Bits to encode the data. If p (x)1,x2,…xk)log p(x1,x2,…xk) Riemannianghui, then:
Figure BDA0001112633040000035
wherein, h (X)1,X2,…Xk) Is the joint differential entropy. Finally we can calculate the compression ratio r:
Figure BDA0001112633040000036
Figure BDA0001112633040000041
after the syndrome is confirmed.
Although lossy compression algorithms can achieve very high compression rates, they do so at the expense of data accuracy. Most existing algorithms delete sample points directly from the original trajectory, which results in a large deviation between the original trajectory and the compressed trajectory. The compression rate of these lossy compression algorithms is also low if the accuracy requirements are high, i.e. the loss of information due to tight framing compression. At the most extreme, these lossy compression algorithms would also be as inefficient as the common lossless compression algorithms if a zero loss of information is required, when lossy compression is equivalent to lossless compression.
Based on the above discussion, it can be seen that the key factor limiting the data compression rate is the representation method of the trajectory data, not the compression algorithm used. In fact, both entropy coding algorithms and lexicographic coding algorithms have proven to be optimal compression algorithms, i.e. they both reach the entropy (or entropy rate) of the source. In the information theory, the information entropy measures the uncertainty of the information source, and the higher the uncertainty of the information source is, the more information we obtain from the output of the information source, that is, more data is needed for encoding. Consider an existing triplet representation of a trajectory, which is suitable for representing an arbitrary trajectory in two dimensions. However, the shape of the road network trajectory is severely limited by the roads and its uncertainty is significantly smaller than any two-dimensional trajectory. In other words, the original trajectory representation method introduces unnecessary uncertainty (unnecessary information), which makes the trajectory data represented by the original triplet difficult to compress.
The invention removes unnecessary uncertainty in the data by reducing the dimension (dimensionality) of the data. Suppose that three-dimensional trajectory data (x)i,yi,ti) Instead, in two-dimensional form (d)i,ti) Indicating that its base compression ratio has reached 1.5. Note that this conversion must be lossless, i.e., there must be a one-to-one correspondence between data before and after conversion, otherwise it is equivalent to lossy compression directly on the data. The expression form of the data is buffered in advance before the data compression, so that the data compression rate is directly improved, and the data is easy to process by a subsequent compression algorithm.
In the original trace, sample point (x)i,yi,ti) Is shown at time tiWith the target at position (x)i,yi). Let (x)1,y1) Is the starting sample point of the trace, from the starting position (x)1,y1) To the current position (x)i,yi) Distance d ofiIs determined. Conversely, if the travel from the starting point is known, the corresponding position (x)i,yi) But are difficult to determine. Therefore, in order to establish the one-to-one correspondence relationship between the original track and the decomposed track, the road sequence needs to be additionally stored<e1,e2,…,em>Wherein e isiIs the edge in E and the number of roads traversed by the m track.
Up to this point, the trajectory has been decomposed into two parts, namely spatial data-road sequences, temporal data-distance-time sequences, i.e. a new format of trajectory data: the road sequence of the trajectory T is a series of successive roads, i.e. SPs, traversed by T in the road network G ═ (V, E)T=<e1,e2,…,em>(ii) a (where V is a set of graph vertices (i.e., intersections of roads) and E is a set of edges connecting the graph vertices (i.e., links connecting intersections of roads) in the road network G ═ V, E ═ V ═ E<v0,v1,v2,…,vm>,E=<e1,e2,…,em>,viAs a directed edge ei-1End point, or edge e ofiThe starting point of (2). The distance-time series of the trajectory T is a series (d)i,ti) A doublet of where diIs that the target starts moving from the starting point to time tiTotal distance to, i.e. TST=<(d1,t1),(d2,t2),…,(dn,tn)>。
Given any road network track T, the original track decomposition or the decomposition track reduction can be completed in O (| T |). After data decomposition, the trajectory data is converted into a road sequence and a distance-time sequence. Next, the comp uses lossless compression for the road sequence and lossy compression for the distance-time sequence. The lossless compression is used for the road sequence because the road sequence is an integer sequence and has low information entropy; while the range-time series are still real data, which is still high in information, so lossy compression is required.
According to the analysis, the method for re-representing road network track data provided by the invention is to decompose the road network track data into two parts of spatial data and time data; wherein:
(1) original GPS sampling track format is T ═<(x1,y1,t1),(x2,y2,t2),…,(xn,yn,tn)>Wherein the sampling point (x)i,yi,ti) Is shown at time tiWith the moving object at a two-dimensional coordinate position (x)i,yi) Coordinate value xi,yiAnd a time stamp tiAre all real data;
(2) the trajectory is broken into two parts: spatial data and temporal data;
the spatial data is a road number sequence and is used for representing the spatial shape of the track;
the time data is a distance-time binary sequence and is used for representing the change of the track speed;
the road number sequence is specifically represented by the following formula:
(1) the track data after map matching does not contain GPS sampling errors any more, namely track points are corrected, and the position distance of sampling points does not have deviation corresponding to map roads;
(2) after map matching, each sampling point is on a map road, so that road numbers corresponding to the sampling points can be obtained. Road number sequence SP corresponding to original sampling point sequenceT=<e1,e2,…,em>Namely, the spatial data after decomposition is obtained; wherein eiIs the edge in E, and m is the number of roads through which the track passes;
(3) spatial data, i.e. SPs, can also be represented by sequences of vertices of a mapT=<v0,v1,v2,…,vm>Wherein v isiAs a directed edge ei-1End point, or edge e ofiThe vertex representation of the link sequence and the edge representation of the start point of (1) are equivalent.
The distance-time binary sequence representation form is as follows: (d)i,ti),diIs that the target starts moving from the starting point to time tiTotal distance to, i.e. sequence of doublets TST=<(d1,t1),(d2,t2),…,(dn,tn)>As time data after decomposition.
The track decomposition method for decomposing the original track into the format comprises the following specific steps:
(1) matching the input track by a map to ensure that each sampling point corresponds to a road;
(2) output each sample point (x)i,yi,ti) Corresponding road number eiFor consecutive repeated entries, only one of the entries is retained;
(3) calculating every two adjacent sampling points (x) of the tracki-1,yi-1) And (x)i,yi) The distance traveled in the road network is denoted liWherein, as1=0;
(4) For theEach sample point (x)i,yi,ti) Output of
Figure BDA0001112633040000061
As a distance-time doublet (d)i,ti) D in (1)iAnd the time stamp is not changed.
In the track calculation, the method can reduce the track storage and query cost in the database.
Drawings
FIG. 1 is a sample road network, including 12 intersections and 17 roads.
FIG. 2 shows two sample traces on the road network.
Detailed Description
The data format and trajectory decomposition method are described below in conjunction with example road networks and trajectories.
As shown in fig. 1, a given road network contains 12 vertices (intersections) and 17 edges (roads). Considering track 1 (blue track), all sample points have been mapped onto the road since the tracks have all been map matched. In FIG. 2, the sampling points11Corresponding edge15(ii) a Sampling point12Corresponding edge16(ii) a Sampling point13Corresponding edge13(ii) a Sampling point14Corresponding edge16(ii) a Sampling point15Corresponding edge3. Note that if the sampling point happens to fall at the intersection, the next side should be uniformly taken instead of the previous side as the corresponding road sequence item, such as the sampling point13Corresponding edge13Rather than to16. Therefore, it is not only easy to use1Road sequence SP1=<e15,e16,e13,e6,e3>。
In order to calculate the corresponding distance-time series, a trajectory decomposition method needs to be applied. From the calculated road sequence and road shape, the road network distance between two sampling points can be calculated in turn, as in figure 1,11and12a distance therebetween of15)+Δ11Wherein (a)15) For roads15Total length of (d), Δ11Is composed of12Distance between two adjacent plates16Distance of starting point. In order to calculate the distance between two adjacent points, the geographical shape of the road needs to be known, the roads in a general road network are all stored as a broken line and comprise a plurality of two-dimensional coordinate points, and the shape of the actual road can be simulated by sequentially linking the two-dimensional coordinate points. The distance between sampling points on the road can be calculated according to the shape of the road, and the Euclidean distance or the spherical distance (when longitude and latitude coordinates are used) can be calculated only according to the two-dimensional coordinates. As shown in the figure 1 of the drawings,11and12is a distance of1=(15)+Δ1112And13the distance between them is:2=(16)-Δ1113and14is a distance of3=(13)+Δ12;t14And t15Is a distance of l4=w(e6)-Δ1213
Then according to the track decomposition method
Figure BDA0001112633040000062
The summation yields:
Figure BDA0001112633040000063
Figure BDA0001112633040000071
actually, to facilitate processing the time data, the starting point of the road where the first sampling point is located may be used as the starting point of the whole track, such as T in fig. 22(Red trace), we calculate the sample point distance v5Instead of the sample point distance t21The distance of (c). Thus obtaining
Figure BDA0001112633040000072
And time data
Figure BDA0001112633040000073
Figure BDA0001112633040000074

Claims (2)

1. A method for re-representing road network track data is characterized in that road network track data is decomposed into two parts of spatial data and time data; wherein:
(1) setting original GPS sampling track format as T ═<(x1,y1,t1),(x2,y2,t2),…,(xn,yn,tn)>N is the length of the trace, sample point (x)i,yi,ti) Is shown at time tiWith the moving object at a two-dimensional coordinate position (x)i,yi) Coordinate value xi,yiAnd a time stamp tiAre all real data;
(2) the trajectory is broken into two parts: spatial data and temporal data;
the spatial data is a road number sequence and is used for representing the spatial shape of the track;
the time data is a distance-time binary sequence and is used for representing the change of the track speed;
the road number sequence is specifically represented in the form:
(1) the track data after map matching does not contain GPS sampling errors any more, namely track points are corrected, and the position distance of sampling points does not have deviation corresponding to map roads;
(2) after map matching, each sampling point is on a map road, and road numbers corresponding to the sampling points can be obtained; road number sequence SP corresponding to original sampling point sequenceT=<e1,e2,…,em>Namely, the spatial data after decomposition is obtained; wherein eiIs an edge in E; m is the number of roads passed by the track; e is the side between the vertexes of the connection graph, namely the road section between the connection intersections;
(3) representing spatial data by a sequence of vertices of a map, i.e. SPsT=<v0,v1,v2,…,vm> (wherein v)iAs a directed edge ei-1End point, or edge e ofiA starting point of (a);
the distance-time binary sequence representation form is as follows: (d)i,ti),diIs that the target starts moving from the starting point to time tiTotal distance to, i.e. sequence of doublets TST=<(d1,t1),(d2,t2),…,(dn,tn)>As time data after decomposition.
2. The method for re-representing road network trajectory data according to claim 1, wherein said decomposing road network trajectory data into two parts of spatial data and temporal data comprises the following steps:
(1) matching the input track by a map to ensure that each sampling point corresponds to a road;
(2) output each sample point (x)i,yi,ti) Corresponding road number eiFor consecutive repeated entries, only one of the entries is retained;
(3) calculating every two adjacent sampling points (x) of the tracki-1,yi-1) And (x)i,yi) The distance traveled in the road network is denoted liWherein, as1=0;
(4) For each sample point (x)i,yi,ti) Output of
Figure FDA0002427689000000011
As a distance-time doublet (d)i,ti) D in (1)iAnd the time stamp is not changed.
CN201610817878.3A 2016-09-11 2016-09-11 Method for re-representing road network track data Active CN106407378B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610817878.3A CN106407378B (en) 2016-09-11 2016-09-11 Method for re-representing road network track data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610817878.3A CN106407378B (en) 2016-09-11 2016-09-11 Method for re-representing road network track data

Publications (2)

Publication Number Publication Date
CN106407378A CN106407378A (en) 2017-02-15
CN106407378B true CN106407378B (en) 2020-05-26

Family

ID=57999852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610817878.3A Active CN106407378B (en) 2016-09-11 2016-09-11 Method for re-representing road network track data

Country Status (1)

Country Link
CN (1) CN106407378B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463335A (en) * 2017-08-02 2017-12-12 上海数烨数据科技有限公司 A kind of location track big data high-efficiency storage method
CN108022006B (en) * 2017-11-24 2020-07-24 浙江大学 Data-driven accessibility probability and region generation method
CN108259463B (en) * 2017-12-05 2020-08-14 北京掌行通信息技术有限公司 Fusion compression method and system for positioning track and driving path

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103162702A (en) * 2013-03-05 2013-06-19 中山大学 Vehicle running track reconstruction method based on multiple probability matching under sparse sampling
US8744840B1 (en) * 2013-10-11 2014-06-03 Realfusion LLC Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping
CN104318766A (en) * 2014-10-22 2015-01-28 北京建筑大学 Bus GPS track data road network matching method
CN104330089A (en) * 2014-11-17 2015-02-04 东北大学 Map matching method by use of historical GPS data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103162702A (en) * 2013-03-05 2013-06-19 中山大学 Vehicle running track reconstruction method based on multiple probability matching under sparse sampling
US8744840B1 (en) * 2013-10-11 2014-06-03 Realfusion LLC Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping
CN104318766A (en) * 2014-10-22 2015-01-28 北京建筑大学 Bus GPS track data road network matching method
CN104330089A (en) * 2014-11-17 2015-02-04 东北大学 Map matching method by use of historical GPS data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于浮动车GPS轨迹点数据的地图匹配算法研究;孙静怡等;《科技创新与应用》;20141231;第8-9页 *

Also Published As

Publication number Publication date
CN106407378A (en) 2017-02-15

Similar Documents

Publication Publication Date Title
Nibali et al. Trajic: An effective compression system for trajectory data
CN106407378B (en) Method for re-representing road network track data
CN112015835A (en) Geohash compressed map matching method
US7528746B2 (en) Encoding data generation method and device
US11150097B2 (en) Synthetic data collection for vehicle controller
Chen et al. Compression of GPS trajectories
Cai et al. Universal entropy estimation via block sorting
CN110443156B (en) Track similarity measurement method, data processing equipment and storage equipment
Tiwari et al. Route prediction using trip observations and map matching
CN111209457B (en) Target typical activity pattern deviation warning method
Rakhmanov et al. Compression of GNSS data with the aim of speeding up communication to autonomous vehicles
CN102404564A (en) Data compression and decompression using relative and absolute increament values
WO2004097340A1 (en) Route information transmitting method and device
Liu et al. Compressing large scale urban trajectory data
CN109005512B (en) Position prediction method oriented to specific time interval
CN109286399A (en) The compression method of GPS track data based on lzw algorithm
CN101469989B (en) Compression method for navigation data in mobile phone network navigation
Chen et al. Toward opportunistic compression and transmission for private car trajectory data collection
CN106253909B (en) Lossless compression method for road network track
CN110688436B (en) Improved GeoHash road clustering method based on driving track
Chen et al. DAVT: An error-bounded vehicle trajectory data representation and compression framework
CN102684703A (en) Efficient lossless compression method for digital elevation model data
Abdelwahab et al. LiDAR data compression challenges and difficulties
Lovell Lossless compression of all vehicle trajectories in a common roadway segment
Kotb et al. A comparative study among various algorithms for lossless airborne LiDAR data compression

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant