CN106407378A - Method for expressing road network trajectory data again - Google Patents

Method for expressing road network trajectory data again Download PDF

Info

Publication number
CN106407378A
CN106407378A CN201610817878.3A CN201610817878A CN106407378A CN 106407378 A CN106407378 A CN 106407378A CN 201610817878 A CN201610817878 A CN 201610817878A CN 106407378 A CN106407378 A CN 106407378A
Authority
CN
China
Prior art keywords
data
road
track
time
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610817878.3A
Other languages
Chinese (zh)
Other versions
CN106407378B (en
Inventor
孙未未
韩韵衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201610817878.3A priority Critical patent/CN106407378B/en
Publication of CN106407378A publication Critical patent/CN106407378A/en
Application granted granted Critical
Publication of CN106407378B publication Critical patent/CN106407378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Navigation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention belongs to the technical field of trajectory data calculation, and particularly relates to a method for expressing road network trajectory data again. The road network trajectory data obtained by original GPS (Global Positioning System) sampling is difficult in coding compression, and therefore, an original three-dimensional real number sequence needs to be changed before compression is carried out. By use of the method, a road network trajectory matched with a map is decomposed into spatial data and time data, wherein the spatial data is a road network road sequence, the time data is a distance-time two-tuple sequence, and data before and after decomposition is carried out can be subjected to lossless transformation in linear time. During trajectory calculation, trajectory storage and query cost in a database can be reduced.

Description

A kind of method again representing road network track data
Technical field
The invention belongs to track data computing technique field and in particular to a kind of again represent road network track data side Method.
Background technology
Track data is a kind of basic space-time data, is normally defined the function with regard to the time for the position.Through vehicle positioning The tracing point that equipment sampling obtains represents with (x, y, t) tlv triple, wherein x and y is respectively longitude and latitude, and t is this sampling The timestamp of point.Then original road network track can be represented with a triad sequence, that is,<(x1, y1, t1), (x2, y2, t2)..., (xn, yn, tn)>, wherein, n is the length of track, and (xi, yi) it is vehicle in tiThe position in moment.With vehicle positioning equipment Popularization, in city, vehicle creates the road network track of magnanimity.These road network track datas carry bulk information, in analysis city Frequently as important decision foundation and information in the problems such as city's traffic, the behavioral pattern excavating people and prediction vehicular movement direction Source.City road network is represented with a directed graph G=(V, E), and wherein V is the intersection point set of road, and E is between crossing Section is gathered.
Data compression algorithm is divided into lossless compress and lossy compression method two class.Lossless compress does not produce information loss, that is, compress Data can be reduced to initial data completely afterwards;Contrary lossy compression method is passed through directly to give up not affecting the portion of required precision in data Point, to reach higher compression ratio, but data existence information loss after pressure ω contracting.Lossless compression algorithm be divided into again entropy code and Lexicographic encodes two kinds.Conventional entropy code has Huffman coding and arithmetic coding;The conventional Lempel-Ziv of lexicographic coding compiles Code and other algorithms deriving.Lossy Compression Algorithm is the dedicated algorithms for the design of some special datas, such as figure The JPEG compression of picture and the MPEG compression for audio frequency, different with lossless compress, these methods are just for specific data (as schemed Picture and audio frequency).For general track data and road network track data, also there is specific Lossy Compression Algorithm, these methods are usual Directly delete the sampled point not affecting data precision in initial trace.
Due to being to obtain from the sampling of running fix equipment, usual table is a triad sequence T to original road network track data =< (x1, y1, t1), (x2, y2, t2) ..., (xn, yn, tn) >, but this expression contains unnecessary redundancy, is unfavorable for counting According to compression.Represent the limitation of method according to initial data, I proposes a kind of new track format and a kind of corresponding data is divided Solution method directly to reduce data redundancy so as to be easily compressed algorithm process.
Content of the invention
Compression ratio is to weigh one of key index of data compression algorithm performance, generally defined as initial data and compression The ratio of size of data afterwards.Given initial trace T, if its size is | T |;And track is T after compressingc, size is | Tc|, then press Shrinkage isFor example, initial data size is 2KB, and after compressing, size of data is 1KB, then data compression rate is 2.
Consider first in initial trace using general lossless compression algorithm (sourcecoding).If we are direct To initial trace data using classical lossless compression algorithm, because the theoretical background of data compression is theory of information, from comentropy Angle to analyze for track data compression problem, may certify that, either entropy code or lexicographic coding, when real number number According to precision improvement when, algorithm all will become very low for the compression ratio of track data.
Theorem:When the precision improvement of real data, entropy code and lexicographic coding for track data compression ratio all Trend towards 1.
Prove:We demonstrate that entropy code is poorly efficient for high accuracy real data first.If X is a continuous distribution, its Probability density function is p (x).In order to calculate the entropy of X, we first by the sample space ω of X=[a, b) be divided into n part, each The length of minizone is Δ=(b-a)/n.If [a, b) it is divided into { [a=x0, x1), [x1, x2) ... [xn-1, xn=b) }, x falls Probability in each interval constitutes Discrete Distribution, the available integral and calculating of its probability distribution row:
According to INTEGRAL THEOREM OF MEAN, certainly existSo that:
In other words, the entropy of this Discrete Distribution is
If function p (x) log p (x) Riemann interability, have
Wherein, h (X) is the differential entropy of continuous distribution X.
N in above-mentioned equation is exactly the precision of data because interval division is thinner, in order to represent different pieces of information symbol just More, that is, the precision of data is higher.If do not compressed to data, we can directly useBit is storing each Symbol.Source coding theorom according to Shannon and the optimality of entropy code algorithm, when n trends towards infinite, for data Compression ratio r has:
In sum, when data precision is lifted, the compression ratio of entropy code will level off to 1, that is, cannot compressed data.
Different with entropy code, for the compression effectiveness of Dictionary of Computing formula coding, need to analyze the combination entropy of information source distribution, but Be prove process be similar.Give any k character, if p is (x1, x2... xk) it is X1, X2... XkJoint probability density Function.Then optimum average code length LkNecessarily satisfying for:
It is that we need at least H (X1, X2... Xk) bit to be representing k character.
Similarly, we are by sample space ω=[a1, b1)×[a2, b2)×…[ak, bk) it is divided into nkPart, each piece Size beThen the entropy of calculating Discrete Distribution is:
So, if there are k item data, and their precision is n, then at least need H Δ (X1, X2... Xk) bit to be encoding These data.If p is (x1, x2... xk)log p(x1, x2... xk) Riemann interability, then also have:
Wherein, h (X1, X2... Xk) it is joint differential entropy.Finally we can calculate compression ratio r:
Card is finished.
Although Lossy Compression Algorithm can reach very high compression ratio, it is intended to the precision sacrificing data as cost.Greatly Partly have algorithm and directly from initial trace, delete sampled point, this will exist between track huge after leading to initial trace and compression Big deviation.If required precision is higher, strictly confine the information loss that compression produces, then these Lossy Compression Algorithms Compression ratio also can be very low.The most terrifically, if it is zero that require information is lost, at this moment lossy compression method is equivalent to lossless compress, then this The lossless compression algorithm that Lossy Compression Algorithms also can be general a bit is equally poorly efficient.
Based on the key factor understanding, limiting data compression rate described above, it is the method for expressing of track data, rather than The compression algorithm being adopted.It is true that entropy code algorithm and lexicographic encryption algorithm have proven to be optimum compression algorithm, It is the entropy (or entropy rate) that they have all reached information source.In theory of information, comentropy has weighed the uncertainty of information source, and information source is not true Qualitative higher, the quantity of information that we obtain from information source output is more, namely needs more data to encode.Consider existing Track tlv triple represent, it be applied to represent two-dimensional space arbitrary trajectory.But the shape of road network track is tight by road Lattice limit, and its uncertainty is significantly less than random two-dimensional track.In other words, it is unnecessary that original track method for expressing introduces Uncertain (unnecessary information), this makes the track data being represented with original tlv triple be difficult to be compressed.
The present invention pass through reduce data dimension (dimensionalityreduction) make a return journey except in data unnecessary not Definitiveness.Assume three-dimensional track data (xi, yi, ti) then use two dimensional form (di, ti) represent, then its basic compression ratio is just Reach 1.5.Notice that this conversion must be lossless, that is, there must be one-to-one corresponding before and after changing between data and close System, is otherwise equal to directly to data lossy compression method.The expression-form of slowly data in advance before data compression, not only directly Connect and improve data compression rate, also make data be easy to be processed by follow-up compression algorithm.
In initial trace, sampled point (xi, yi, ti) represent in time ti, target is positioned at position (xi, yi).If (x1, y1) be The starting sample point of track, from original position (x1, y1) arrive current location (xi, yi) apart from diIt is to determine.On the contrary, if known The road stroke starting from starting point, corresponding position (xi, yi) be but difficult to determine.Therefore in order to set up initial trace to after decompose One-to-one relationship between track is in addition it is also necessary to additionally preserve road sequence<e1, e2..., em>, wherein eiIt is the side in E and m The quantity of the road that track is passed through.
So far, decomposing trajectories have been two parts, i.e. spatial data road sequence, time data distance- Time serieses, namely the format of track data:The road sequence of track T is that T is a series of in the middle process of road network G=(V, E) Continuous road, i.e. SPT=<e1, e2..., em>;(in road network G=(V, E), V is that figure summit (i.e. intersection of road) is gathered, And E is the set of side between connection figure summit (connecting the section between crossing);V=<v0, v1, v2..., vm>, E=<e1, e2..., em>, viFor directed edge side ei-1Terminal, or side eiStarting point.The distance verses time sequence of track T is a series of (di, ti) two tuples, wherein diIt is that target begins to move into time t from starting pointiTill total distance, i.e. TST=<(d1, t1), (d2, t2) ..., (dn, tn)>.
Give any road network track T, initial trace is decomposed can be in O (| T |) time or track reduction will be decomposed Inside complete.After data is decomposed, track data is converted into road sequence and distance verses time sequence.Next, COMPRESS is to road sequence lossless compress, and-time serieses lossy compression method of adjusting the distance.Why to road sequence no Damage compression, be because that road sequence is integer sequence, its comentropy is relatively low;And distance verses time sequence remains real data, its Remain unchanged in information higher, so needing to use lossy compression method.
According to above-mentioned analysis, the method again representing road network track data proposed by the present invention, is by road network track data It is decomposed into spatial data and time data two parts;Wherein:
(1) original GPS sample track form is T=<(x1, y1, t1), (x2, y2, t2) ..., (xn, yn, tn)>, wherein adopt Sampling point (xi, yi, ti) represent in time ti, mobile target is positioned at two-dimensional coordinate position (xi, yi), coordinate figure xi, yiWith timestamp ti It is real data;
(2) decomposing trajectories are two parts:Spatial data and time data;
Described spatial data is road number sequence, for characterizing the spatial form of track;
Described time data is distance verses time two tuple sequence, for characterizing path velocity change;
Described road number sequence specifically represents that line is:
(1) track data after map match no longer contains GPS sampling error, and that is, tracing point is all corrected, sampling There is not deviation in the corresponding map road of point positional distance;
(2) after map match, each sampled point, on map road, therefore can obtain the corresponding road of sampled point and compile Number.Corresponding road number sequence SP of former sampled point sequenceT=<e1, e2..., em>, as decompose after spatial data;Wherein ei It is the side in E, the quantity of the road that m passes through for track;
(3) also the vertex sequence of available map carrys out representation space data, i.e. SPT=<v0, v1, v2..., vm>, wherein viFor Directed edge side ei-1Terminal, or side eiStarting point, represent be of equal value with vertex representation road sequence with side.
Described distance verses time two tuple sequence table shows that form is:(di, ti), diIt is target when starting point begins to move into Between tiTill total distance, i.e. two tuple sequence TST=<(d1, t1), (d2, t2) ..., (dn, tn)>As the time number after decomposing According to.
Initial trace is decomposed into the decomposing trajectories method of above-mentioned form, concretely comprises the following steps:
(1) to input trajectory through map match, each sampled point is made to correspond on road;
(2) export each sampled point (xi, yi, ti) corresponding road number ei, for continuous duplicate keys, only retain it In one;
(3) calculate track often two neighboring sampled point (xi-1, yi-1) and (xi, yi) distance of process in road network, note Make li, wherein, make l1=0;
(4) for each sampled point (xi, yi, ti), outputAs distance verses time two tuple (di, ti) in di, And timestamp is constant.
In trajectory calculation, the inventive method can reduce track storage and Query Cost in data base.
Brief description
Fig. 1 is sample road network, comprises 12 crossings and 17 roads.
Fig. 2 is two sample tracks on road network.
Specific embodiment
To introduce data form and decomposing trajectories method with reference to example road network and track.
As shown in figure 1, given road network comprises 12 summits (crossing) and 17 sides (road).Consider that track 1 is (blue Track), because track is all through map match, so all of sampled point has all corresponded on road.In fig. 2, sample Point11Corresponding sides15;Sampled point12Corresponding sides16;Sampled point13Corresponding sides13;Sampled point14Corresponding sides16;Sampled point15Corresponding sides3.Note If meaning sampled point falls just at crossing, should unify to take rear a line rather than front a line as corresponding road sequence List, such as sampled point13Corresponding sides13Rather than16.So1Road sequence SP1=<e15, e16, e13, e6, e3>.
In order to calculate corresponding distance verses time sequence, need to apply decomposing trajectories method.According to calculated road sequence Row and road shape, can calculate the road network distance between two sampled points successively, in such as Fig. 1,11With12The distance between be (15)+Δ11, wherein (15) it is road15Total length, Δ11For12Distance16The distance of starting point.In order to calculate between adjacent 2 points Distance, need to know the geographic shape of road, the road in Ordinary Rd network is all stored as a broken line, comprises some two Dimension coordinate point, these two-dimensional coordinate points are linked the shape that can simulate real road successively.Can be calculated according to road shape Fall the distance between sampled point on road it is only necessary to longitude and latitude (is used according to two-dimensional coordinate calculating Euclidean distance or spherical distance During degree coordinate).In Fig. 1,11With12The distance between be1=(15)+Δ1112With13The distance between be:2=(16)- Δ1113With14The distance between be3=(13)+Δ12;t14And t15The distance between be l4=w (e6)-Δ1213.
Next according in decomposing trajectories methodAdd up and obtain:
In practice to facilitating process time data, can be with the starting point of first sampled point place road as whole piece T in the starting point of track, such as Fig. 22(red track), we calculate sampled point apart from v5Distance substituting sampled point distance t21Distance.So obtainingAnd time data

Claims (2)

1. a kind of method of the road network track data of expression again is it is characterised in that be decomposed into spatial data by road network track data With time data two parts;Wherein:
(1)If original GPS sample track form is, adopt Sampling pointRepresent in the time, mobile target is positioned at two-dimensional coordinate position, coordinate figureAnd the time StampIt is real data;
(2)Decomposing trajectories are two parts:Spatial data and time data;
Described spatial data is road number sequence, for characterizing the spatial form of track;
Described time data is distance verses time two tuple sequence, for characterizing path velocity change;
Described road number sequence specifically represents that line is:
(1)Track data after map match no longer contains GPS sampling error, and that is, tracing point is all corrected, sampling optimization Put the corresponding map road of distance and there is not deviation;
(2)After map match, each sampled point, on map road, can obtain the corresponding road number of sampled point;Former sampling Point sequence corresponding road number sequence, as decompose after spatial data;Wherein It is the side in E, the quantity of the road that m passes through for track;
(3)With the vertex sequence of map come representation space data, that is,, whereinFor having To while whileTerminal, or sideStarting point;
Described distance verses time two tuple sequence table shows that form is:,It is that target begins to move into the time from starting pointTill total distance, i.e. two tuple sequenceAs point Time data after solution.
2. according to claim 1 again represent road network track data method it is characterised in that described by road network track Data is decomposed into spatial data and time data two parts, and concrete operation step is:
(1)To input trajectory through map match, each sampled point is made to correspond on road;
(2)Export each sampled pointCorresponding road number, for continuous duplicate keys, only retain wherein One;
(3)Calculate track often two neighboring sampled pointWithThe distance passed through in road network, It is denoted as, wherein, make
(4)For each sampled point, outputAs distance verses time two tupleIn, And timestamp is constant.
CN201610817878.3A 2016-09-11 2016-09-11 Method for re-representing road network track data Active CN106407378B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610817878.3A CN106407378B (en) 2016-09-11 2016-09-11 Method for re-representing road network track data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610817878.3A CN106407378B (en) 2016-09-11 2016-09-11 Method for re-representing road network track data

Publications (2)

Publication Number Publication Date
CN106407378A true CN106407378A (en) 2017-02-15
CN106407378B CN106407378B (en) 2020-05-26

Family

ID=57999852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610817878.3A Active CN106407378B (en) 2016-09-11 2016-09-11 Method for re-representing road network track data

Country Status (1)

Country Link
CN (1) CN106407378B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463335A (en) * 2017-08-02 2017-12-12 上海数烨数据科技有限公司 A kind of location track big data high-efficiency storage method
CN108022006A (en) * 2017-11-24 2018-05-11 浙江大学 The accessibility probability and Area generation method of a kind of data-driven
CN108259463A (en) * 2017-12-05 2018-07-06 北京掌行通信息技术有限公司 A kind of positioning track merges compression method and system with driving path

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103162702A (en) * 2013-03-05 2013-06-19 中山大学 Vehicle running track reconstruction method based on multiple probability matching under sparse sampling
US8744840B1 (en) * 2013-10-11 2014-06-03 Realfusion LLC Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping
CN104318766A (en) * 2014-10-22 2015-01-28 北京建筑大学 Bus GPS track data road network matching method
CN104330089A (en) * 2014-11-17 2015-02-04 东北大学 Map matching method by use of historical GPS data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103162702A (en) * 2013-03-05 2013-06-19 中山大学 Vehicle running track reconstruction method based on multiple probability matching under sparse sampling
US8744840B1 (en) * 2013-10-11 2014-06-03 Realfusion LLC Method and system for n-dimentional, language agnostic, entity, meaning, place, time, and words mapping
CN104318766A (en) * 2014-10-22 2015-01-28 北京建筑大学 Bus GPS track data road network matching method
CN104330089A (en) * 2014-11-17 2015-02-04 东北大学 Map matching method by use of historical GPS data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙静怡等: "基于浮动车GPS轨迹点数据的地图匹配算法研究", 《科技创新与应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463335A (en) * 2017-08-02 2017-12-12 上海数烨数据科技有限公司 A kind of location track big data high-efficiency storage method
CN108022006A (en) * 2017-11-24 2018-05-11 浙江大学 The accessibility probability and Area generation method of a kind of data-driven
CN108022006B (en) * 2017-11-24 2020-07-24 浙江大学 Data-driven accessibility probability and region generation method
CN108259463A (en) * 2017-12-05 2018-07-06 北京掌行通信息技术有限公司 A kind of positioning track merges compression method and system with driving path
CN108259463B (en) * 2017-12-05 2020-08-14 北京掌行通信息技术有限公司 Fusion compression method and system for positioning track and driving path

Also Published As

Publication number Publication date
CN106407378B (en) 2020-05-26

Similar Documents

Publication Publication Date Title
CN112015835B (en) Geohash compressed map matching method
Han et al. COMPRESS: A comprehensive framework of trajectory compression in road networks
Nibali et al. Trajic: An effective compression system for trajectory data
KR100943676B1 (en) Digital map shape vector encoding method and position information transfer method
CN101277117B (en) Increment and continuous data compression method and equipment
CN100517979C (en) Data compression and decompression method
Chen et al. Compression of GPS trajectories
CN106407378A (en) Method for expressing road network trajectory data again
CN113094346A (en) Big data coding and decoding method and device based on time sequence
CN110473251B (en) Self-defined range spatial data area statistical method based on grid spatial index
CN109033141B (en) Space-time trajectory compression method based on trajectory dictionary
CN107247761A (en) Track coding method based on bitmap
CN109286399A (en) The compression method of GPS track data based on lzw algorithm
Rakhmanov et al. Compression of GNSS data with the aim of speeding up communication to autonomous vehicles
CN101469989B (en) Compression method for navigation data in mobile phone network navigation
CN104125475A (en) Multi-dimensional quantum data compressing and uncompressing method and apparatus
Chen et al. DAVT: An error-bounded vehicle trajectory data representation and compression framework
Ji et al. A comparison of road-network-constrained trajectory compression methods
JP2007104543A (en) Apparatus and method for compressing latitude/longitude data stream
CN106253909B (en) Lossless compression method for road network track
Cánovas et al. Practical compression for multi-alignment genomic files
CN101877005B (en) Document mode-based GML compression method
CN116673947A (en) Mobile robot travel path point prediction method
Abdelwahab et al. LiDAR data compression challenges and difficulties
CN110411450A (en) It is a kind of for compressing the map-matching method of track

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant