CN106407378B - Method for re-representing road network track data - Google Patents
Method for re-representing road network track data Download PDFInfo
- Publication number
- CN106407378B CN106407378B CN201610817878.3A CN201610817878A CN106407378B CN 106407378 B CN106407378 B CN 106407378B CN 201610817878 A CN201610817878 A CN 201610817878A CN 106407378 B CN106407378 B CN 106407378B
- Authority
- CN
- China
- Prior art keywords
- data
- track
- road
- time
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Navigation (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention belongs to the technical field of track data calculation, and particularly relates to a method for re-representing road network track data. The road network track data obtained by original GPS sampling is not easy to encode and compress, so the original three-dimensional real number sequence needs to be changed before compression. The method comprises the steps of decomposing a map-matched road network track into spatial data and time data, wherein the spatial data is a road network road sequence, and the time data is a distance-time binary sequence; the data before and after decomposition can be transformed losslessly in linear time. In the track calculation, the invention can reduce the track storage and query cost in the database.
Description
Technical Field
The invention belongs to the technical field of trajectory data calculation, and particularly relates to a method for re-representing road network trajectory data.
Background
Trajectory data is a basic spatiotemporal data, generally defined as a function of position with respect to time. The track points sampled by the vehicle-mounted positioning device are represented by (x, y, t) triples, wherein x and y are longitude and latitude respectively, and t is the time stamp of the sampling point. The original road network trajectory can then be represented by a sequence of triplets, i.e.<(x1,y1,t1),(x2,y2,t2),…,(xn,yn,tn)>Where n is the length of the track and (x)i,yi) Is that the vehicle is at tiThe location of the time of day. With the popularization of vehicle-mounted positioning equipment, vehicles in cities generate massive road network tracks. The road network track data carries a large amount of information, and is often used as an important decision basis and an information source in the problems of analyzing urban traffic conditions, mining behavior patterns of people, predicting vehicle flow directions and the like. The urban road network is represented by a directed graph G ═ V, E, where V is the set of intersections of roads and E is the set of segments between intersections.
Data compression algorithms are classified into lossless compression and lossy compression. Lossless compression does not generate information loss, namely, compressed data can be completely restored to original data; in contrast, lossy compression achieves a higher compression rate by directly discarding portions of the data that do not affect the accuracy requirement, but there is a loss of information in the data after ω compression. Lossless compression algorithms are divided into entropy coding and dictionary coding. Commonly used entropy coding includes Huffman coding and arithmetic coding; lexicographic coding is commonly used with Lempel-Ziv coding and other algorithms derived. Lossy compression algorithms are specialized algorithms designed for certain specific data, such as JPEG compression for images and MPEG compression for audio, and these methods are directed only to specific data (such as images and audio), unlike lossless compression. Specific lossy compression algorithms are also provided for general track data and road network track data, and the methods usually directly delete sampling points which do not influence data precision in an original track.
The raw road network trajectory data is typically represented as a triplet sequence T ═ x (x) due to being sampled from the mobile positioning device1,y1,t1),(x2,y2,t2),…,(xn,yn,tn) However, such representations contain unnecessary redundancy, which is detrimental to data compression. According to the limitation of the original data representation method, a new track format and a corresponding data decomposition method are provided to directly reduce data redundancy, so that the data redundancy is easily processed by a compression algorithm.
Disclosure of Invention
The compression rate is one of the key indicators for measuring the performance of a data compression algorithm, and is generally defined as the ratio of the size of original data to the size of compressed data. Giving an original track T, and setting the size of the original track T as | T |; and the track after compression is TcOf size | TcIf the compression ratio isFor example, the original data size is 2KB, and the compressed data size is 1KB, the data compression rate is 2.
First consider the use of a general lossless compression algorithm (sourcecoding) on the original track. If we use classical lossless compression algorithm directly for original track data, since the theoretical background of data compression is information theory, analyzing the problem of track data compression from the perspective of information entropy, it can be proved that, no matter entropy coding or dictionary coding, the compression ratio of algorithm to track data becomes very low when the precision of real data is improved.
Theorem: when the precision of real number data is improved, both entropy encoding and dictionary encoding tend to have a compression rate of 1 for trajectory data.
And (3) proving that: first we demonstrate that entropy coding is inefficient for high precision real data. Let X be a continuous distribution with a probability density function of p (X). To calculate the entropy of X, we first equally divide the sample space ω ═ a, b) of X into n parts, each cell interval having a length Δ ═ b/n. Let [ a, b) be divided into { [ a ═ x { [ a { [ x { ]0,x1),[x1,x2),…[xn-1,xnB), the probability of x falling within each interval constitutes a discrete distribution whose column of probability distributions can be calculated by integration:
in other words, the discrete distribution has an entropy of
If the function p (x) log p (x) Riemannian's product, then there is
Where h (X) is the differential entropy of the continuous distribution X.
N in the above equation is the precision of the data because the finer the interval division is, the more symbols are used to represent different data, i.e., the higher the precision of the data. If the data is not compressed, the data can be directly usedBits to store each symbol. According to Shannon's source coding theorem and the optimality of entropy coding algorithms, when n goes to infinity, there are:
in summary, when the data precision is improved, the compression rate of the entropy coding approaches 1, i.e. the data cannot be compressed.
Unlike entropy coding, in order to compute the compression effect of lexicographic coding, the joint entropy of the source distribution needs to be analyzed, but the process of proving is similar. Given any k characters, let p (x)1,x2,…xk) Is X1,X2,…XkThe joint probability density function of (a). The optimal average code length LkIt is inevitable to satisfy:
i.e. we need at least H (X)1,X2,…Xk) Bits to represent k characters.
Similarly, we will sample space ω ═ a1,b1)×[a2,b2)×…[ak,bk) Is divided into nkPart, each block of sizeThe entropy of the discrete distribution is then calculated as:
therefore, if there are k items of data and their precision is n, then at least H Δ (X) is needed1,X2,…Xk) Bits to encode the data. If p (x)1,x2,…xk)log p(x1,x2,…xk) Riemannianghui, then:
wherein, h (X)1,X2,…Xk) Is the joint differential entropy. Finally we can calculate the compression ratio r:
after the syndrome is confirmed.
Although lossy compression algorithms can achieve very high compression rates, they do so at the expense of data accuracy. Most existing algorithms delete sample points directly from the original trajectory, which results in a large deviation between the original trajectory and the compressed trajectory. The compression rate of these lossy compression algorithms is also low if the accuracy requirements are high, i.e. the loss of information due to tight framing compression. At the most extreme, these lossy compression algorithms would also be as inefficient as the common lossless compression algorithms if a zero loss of information is required, when lossy compression is equivalent to lossless compression.
Based on the above discussion, it can be seen that the key factor limiting the data compression rate is the representation method of the trajectory data, not the compression algorithm used. In fact, both entropy coding algorithms and lexicographic coding algorithms have proven to be optimal compression algorithms, i.e. they both reach the entropy (or entropy rate) of the source. In the information theory, the information entropy measures the uncertainty of the information source, and the higher the uncertainty of the information source is, the more information we obtain from the output of the information source, that is, more data is needed for encoding. Consider an existing triplet representation of a trajectory, which is suitable for representing an arbitrary trajectory in two dimensions. However, the shape of the road network trajectory is severely limited by the roads and its uncertainty is significantly smaller than any two-dimensional trajectory. In other words, the original trajectory representation method introduces unnecessary uncertainty (unnecessary information), which makes the trajectory data represented by the original triplet difficult to compress.
The invention removes unnecessary uncertainty in the data by reducing the dimension (dimensionality) of the data. Suppose that three-dimensional trajectory data (x)i,yi,ti) Instead, in two-dimensional form (d)i,ti) Indicating that its base compression ratio has reached 1.5. Note that this conversion must be lossless, i.e., there must be a one-to-one correspondence between data before and after conversion, otherwise it is equivalent to lossy compression directly on the data. The expression form of the data is buffered in advance before the data compression, so that the data compression rate is directly improved, and the data is easy to process by a subsequent compression algorithm.
In the original trace, sample point (x)i,yi,ti) Is shown at time tiWith the target at position (x)i,yi). Let (x)1,y1) Is the starting sample point of the trace, from the starting position (x)1,y1) To the current position (x)i,yi) Distance d ofiIs determined. Conversely, if the travel from the starting point is known, the corresponding position (x)i,yi) But are difficult to determine. Therefore, in order to establish the one-to-one correspondence relationship between the original track and the decomposed track, the road sequence needs to be additionally stored<e1,e2,…,em>Wherein e isiIs the edge in E and the number of roads traversed by the m track.
Up to this point, the trajectory has been decomposed into two parts, namely spatial data-road sequences, temporal data-distance-time sequences, i.e. a new format of trajectory data: the road sequence of the trajectory T is a series of successive roads, i.e. SPs, traversed by T in the road network G ═ (V, E)T=<e1,e2,…,em>(ii) a (where V is a set of graph vertices (i.e., intersections of roads) and E is a set of edges connecting the graph vertices (i.e., links connecting intersections of roads) in the road network G ═ V, E ═ V ═ E<v0,v1,v2,…,vm>,E=<e1,e2,…,em>,viAs a directed edge ei-1End point, or edge e ofiThe starting point of (2). The distance-time series of the trajectory T is a series (d)i,ti) A doublet of where diIs that the target starts moving from the starting point to time tiTotal distance to, i.e. TST=<(d1,t1),(d2,t2),…,(dn,tn)>。
Given any road network track T, the original track decomposition or the decomposition track reduction can be completed in O (| T |). After data decomposition, the trajectory data is converted into a road sequence and a distance-time sequence. Next, the comp uses lossless compression for the road sequence and lossy compression for the distance-time sequence. The lossless compression is used for the road sequence because the road sequence is an integer sequence and has low information entropy; while the range-time series are still real data, which is still high in information, so lossy compression is required.
According to the analysis, the method for re-representing road network track data provided by the invention is to decompose the road network track data into two parts of spatial data and time data; wherein:
(1) original GPS sampling track format is T ═<(x1,y1,t1),(x2,y2,t2),…,(xn,yn,tn)>Wherein the sampling point (x)i,yi,ti) Is shown at time tiWith the moving object at a two-dimensional coordinate position (x)i,yi) Coordinate value xi,yiAnd a time stamp tiAre all real data;
(2) the trajectory is broken into two parts: spatial data and temporal data;
the spatial data is a road number sequence and is used for representing the spatial shape of the track;
the time data is a distance-time binary sequence and is used for representing the change of the track speed;
the road number sequence is specifically represented by the following formula:
(1) the track data after map matching does not contain GPS sampling errors any more, namely track points are corrected, and the position distance of sampling points does not have deviation corresponding to map roads;
(2) after map matching, each sampling point is on a map road, so that road numbers corresponding to the sampling points can be obtained. Road number sequence SP corresponding to original sampling point sequenceT=<e1,e2,…,em>Namely, the spatial data after decomposition is obtained; wherein eiIs the edge in E, and m is the number of roads through which the track passes;
(3) spatial data, i.e. SPs, can also be represented by sequences of vertices of a mapT=<v0,v1,v2,…,vm>Wherein v isiAs a directed edge ei-1End point, or edge e ofiThe vertex representation of the link sequence and the edge representation of the start point of (1) are equivalent.
The distance-time binary sequence representation form is as follows: (d)i,ti),diIs that the target starts moving from the starting point to time tiTotal distance to, i.e. sequence of doublets TST=<(d1,t1),(d2,t2),…,(dn,tn)>As time data after decomposition.
The track decomposition method for decomposing the original track into the format comprises the following specific steps:
(1) matching the input track by a map to ensure that each sampling point corresponds to a road;
(2) output each sample point (x)i,yi,ti) Corresponding road number eiFor consecutive repeated entries, only one of the entries is retained;
(3) calculating every two adjacent sampling points (x) of the tracki-1,yi-1) And (x)i,yi) The distance traveled in the road network is denoted liWherein, as1=0;
(4) For theEach sample point (x)i,yi,ti) Output ofAs a distance-time doublet (d)i,ti) D in (1)iAnd the time stamp is not changed.
In the track calculation, the method can reduce the track storage and query cost in the database.
Drawings
FIG. 1 is a sample road network, including 12 intersections and 17 roads.
FIG. 2 shows two sample traces on the road network.
Detailed Description
The data format and trajectory decomposition method are described below in conjunction with example road networks and trajectories.
As shown in fig. 1, a given road network contains 12 vertices (intersections) and 17 edges (roads). Considering track 1 (blue track), all sample points have been mapped onto the road since the tracks have all been map matched. In FIG. 2, the sampling points11Corresponding edge15(ii) a Sampling point12Corresponding edge16(ii) a Sampling point13Corresponding edge13(ii) a Sampling point14Corresponding edge16(ii) a Sampling point15Corresponding edge3. Note that if the sampling point happens to fall at the intersection, the next side should be uniformly taken instead of the previous side as the corresponding road sequence item, such as the sampling point13Corresponding edge13Rather than to16. Therefore, it is not only easy to use1Road sequence SP1=<e15,e16,e13,e6,e3>。
In order to calculate the corresponding distance-time series, a trajectory decomposition method needs to be applied. From the calculated road sequence and road shape, the road network distance between two sampling points can be calculated in turn, as in figure 1,11and12a distance therebetween of15)+Δ11Wherein (a)15) For roads15Total length of (d), Δ11Is composed of12Distance between two adjacent plates16Distance of starting point. In order to calculate the distance between two adjacent points, the geographical shape of the road needs to be known, the roads in a general road network are all stored as a broken line and comprise a plurality of two-dimensional coordinate points, and the shape of the actual road can be simulated by sequentially linking the two-dimensional coordinate points. The distance between sampling points on the road can be calculated according to the shape of the road, and the Euclidean distance or the spherical distance (when longitude and latitude coordinates are used) can be calculated only according to the two-dimensional coordinates. As shown in the figure 1 of the drawings,11and12is a distance of1=(15)+Δ11;12And13the distance between them is:2=(16)-Δ11;13and14is a distance of3=(13)+Δ12;t14And t15Is a distance of l4=w(e6)-Δ12+Δ13。
actually, to facilitate processing the time data, the starting point of the road where the first sampling point is located may be used as the starting point of the whole track, such as T in fig. 22(Red trace), we calculate the sample point distance v5Instead of the sample point distance t21The distance of (c). Thus obtainingAnd time data
Claims (2)
1. A method for re-representing road network track data is characterized in that road network track data is decomposed into two parts of spatial data and time data; wherein:
(1) setting original GPS sampling track format as T ═<(x1,y1,t1),(x2,y2,t2),…,(xn,yn,tn)>N is the length of the trace, sample point (x)i,yi,ti) Is shown at time tiWith the moving object at a two-dimensional coordinate position (x)i,yi) Coordinate value xi,yiAnd a time stamp tiAre all real data;
(2) the trajectory is broken into two parts: spatial data and temporal data;
the spatial data is a road number sequence and is used for representing the spatial shape of the track;
the time data is a distance-time binary sequence and is used for representing the change of the track speed;
the road number sequence is specifically represented in the form:
(1) the track data after map matching does not contain GPS sampling errors any more, namely track points are corrected, and the position distance of sampling points does not have deviation corresponding to map roads;
(2) after map matching, each sampling point is on a map road, and road numbers corresponding to the sampling points can be obtained; road number sequence SP corresponding to original sampling point sequenceT=<e1,e2,…,em>Namely, the spatial data after decomposition is obtained; wherein eiIs an edge in E; m is the number of roads passed by the track; e is the side between the vertexes of the connection graph, namely the road section between the connection intersections;
(3) representing spatial data by a sequence of vertices of a map, i.e. SPsT=<v0,v1,v2,…,vm> (wherein v)iAs a directed edge ei-1End point, or edge e ofiA starting point of (a);
the distance-time binary sequence representation form is as follows: (d)i,ti),diIs that the target starts moving from the starting point to time tiTotal distance to, i.e. sequence of doublets TST=<(d1,t1),(d2,t2),…,(dn,tn)>As time data after decomposition.
2. The method for re-representing road network trajectory data according to claim 1, wherein said decomposing road network trajectory data into two parts of spatial data and temporal data comprises the following steps:
(1) matching the input track by a map to ensure that each sampling point corresponds to a road;
(2) output each sample point (x)i,yi,ti) Corresponding road number eiFor consecutive repeated entries, only one of the entries is retained;
(3) calculating every two adjacent sampling points (x) of the tracki-1,yi-1) And (x)i,yi) The distance traveled in the road network is denoted liWherein, as1=0;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610817878.3A CN106407378B (en) | 2016-09-11 | 2016-09-11 | Method for re-representing road network track data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610817878.3A CN106407378B (en) | 2016-09-11 | 2016-09-11 | Method for re-representing road network track data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106407378A CN106407378A (en) | 2017-02-15 |
CN106407378B true CN106407378B (en) | 2020-05-26 |
Family
ID=57999852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610817878.3A Active CN106407378B (en) | 2016-09-11 | 2016-09-11 | Method for re-representing road network track data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106407378B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463335A (en) * | 2017-08-02 | 2017-12-12 | 上海数烨数据科技有限公司 | A kind of location track big data high-efficiency storage method |
CN108022006B (en) * | 2017-11-24 | 2020-07-24 | 浙江大学 | Data-driven accessibility probability and region generation method |
CN108259463B (en) * | 2017-12-05 | 2020-08-14 | 北京掌行通信息技术有限公司 | Fusion compression method and system for positioning track and driving path |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103162702A (en) * | 2013-03-05 | 2013-06-19 | 中山大学 | Vehicle running track reconstruction method based on multiple probability matching under sparse sampling |
US8744840B1 (en) * | 2013-10-11 | 2014-06-03 | Realfusion LLC | Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping |
CN104318766A (en) * | 2014-10-22 | 2015-01-28 | 北京建筑大学 | Bus GPS track data road network matching method |
CN104330089A (en) * | 2014-11-17 | 2015-02-04 | 东北大学 | Map matching method by use of historical GPS data |
-
2016
- 2016-09-11 CN CN201610817878.3A patent/CN106407378B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103162702A (en) * | 2013-03-05 | 2013-06-19 | 中山大学 | Vehicle running track reconstruction method based on multiple probability matching under sparse sampling |
US8744840B1 (en) * | 2013-10-11 | 2014-06-03 | Realfusion LLC | Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping |
CN104318766A (en) * | 2014-10-22 | 2015-01-28 | 北京建筑大学 | Bus GPS track data road network matching method |
CN104330089A (en) * | 2014-11-17 | 2015-02-04 | 东北大学 | Map matching method by use of historical GPS data |
Non-Patent Citations (1)
Title |
---|
基于浮动车GPS轨迹点数据的地图匹配算法研究;孙静怡等;《科技创新与应用》;20141231;第8-9页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106407378A (en) | 2017-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nibali et al. | Trajic: An effective compression system for trajectory data | |
CN106407378B (en) | Method for re-representing road network track data | |
CN112015835A (en) | Geohash compressed map matching method | |
US7528746B2 (en) | Encoding data generation method and device | |
US11150097B2 (en) | Synthetic data collection for vehicle controller | |
Chen et al. | Compression of GPS trajectories | |
Cai et al. | Universal entropy estimation via block sorting | |
CN110443156B (en) | Track similarity measurement method, data processing equipment and storage equipment | |
Tiwari et al. | Route prediction using trip observations and map matching | |
CN111209457B (en) | Target typical activity pattern deviation warning method | |
Rakhmanov et al. | Compression of GNSS data with the aim of speeding up communication to autonomous vehicles | |
CN102404564A (en) | Data compression and decompression using relative and absolute increament values | |
WO2004097340A1 (en) | Route information transmitting method and device | |
Liu et al. | Compressing large scale urban trajectory data | |
CN109005512B (en) | Position prediction method oriented to specific time interval | |
CN109286399A (en) | The compression method of GPS track data based on lzw algorithm | |
CN101469989B (en) | Compression method for navigation data in mobile phone network navigation | |
Chen et al. | Toward opportunistic compression and transmission for private car trajectory data collection | |
CN106253909B (en) | Lossless compression method for road network track | |
CN110688436B (en) | Improved GeoHash road clustering method based on driving track | |
Chen et al. | DAVT: An error-bounded vehicle trajectory data representation and compression framework | |
CN102684703A (en) | Efficient lossless compression method for digital elevation model data | |
Abdelwahab et al. | LiDAR data compression challenges and difficulties | |
Lovell | Lossless compression of all vehicle trajectories in a common roadway segment | |
Kotb et al. | A comparative study among various algorithms for lossless airborne LiDAR data compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |