CN115662124A

CN115662124A - GPS track data road section flow matching method based on network coding

Info

Publication number: CN115662124A
Application number: CN202211293839.XA
Authority: CN
Inventors: 何巍楠; 程颖; 韩媛; 姚金龙; 郑龙; 郑晓彬; 周瑜芳; 王聘玺; 赵旭; 李宇翔; 孙剑; 张硕晨; 刘笑影
Original assignee: Beijing Transport Institute
Current assignee: Beijing Transport Institute
Priority date: 2022-10-21
Filing date: 2022-10-21
Publication date: 2023-01-31

Abstract

The invention provides a GPS track data road section flow matching method based on network coding, belongs to the technical field of intelligent traffic data processing, and is used for solving the technical problem of low calculation rate of the existing map matching algorithm. The method comprises the steps of simplifying a road network, preprocessing track data, forming track sections based on grid coding, forming complete tracks, filling missing tracks, counting the flow of each road section and the like to obtain the flow of each road section matched with the road network; the road network is simplified by meshing the road network, the road network is divided into three-level grids with equal size and is coded, the track data is preprocessed, and the track data is deleted, deduplicated, combined, filled and classified and stored based on the grid codes, so that the accuracy of road section matching is improved, the time cost required by map matching of urban traffic is reduced, the calculation efficiency of map matching is greatly improved, and the application effect of real-time response can be achieved when the map matching is applied to large-scale traffic flow of urban traffic.

Description

GPS track data road section flow matching method based on network coding

Technical Field

The invention belongs to the technical field of intelligent traffic data processing, and relates to a GPS track data road section flow matching method based on network coding.

Background

The characteristics of urban traffic vary according to differences in population size, major industry, geographical location and the like of different cities, but they have similar major characteristics that urban traffic is mainly passenger transportation, commuting is the major traffic demand during rush hour, and the size of urban traffic has a direct relationship with the traffic service level and traffic policy and traffic management of each city. With the continuous improvement of the motor vehicle reserves in China, the problem of urban traffic jam becomes more serious. Traffic management and control at a microscopic level are effective means for solving the problem of traffic congestion, and how to identify frequent road sections and frequent time of traffic congestion is the basis of management and control.

The existing traffic investigation equipment is usually installed on main channels of a city, the coverage area is narrow, the traffic condition of the whole road network is difficult to monitor effectively, and in contrast, the GPS equipment has wide coverage area and can reflect the actual traffic condition more. However, the spatial information collected by the GPS device only includes longitude and latitude, and has an error, so that it is difficult to directly locate in the road network, thereby acquiring the traffic actual situation of each road segment. Based on the map matching algorithm, technicians in the related field design various map matching algorithms, but the existing algorithms face the problem of calculating speed when facing large-scale matching of vehicles in the whole city domain.

Due to the fact that the urban road network is complex in structure, the motor vehicle inventory is large in scale, and the difference of the travel characteristics of different people is large, the travel data with large magnitude are generated. In map matching, the wider the range of the spatial information and the topological relation included in the road network are, the more the calculation amount involved is, therefore, the current general map matching algorithm faces the problem of calculating speed in large data application, and the conventional map matching algorithm is difficult to achieve the application effect of real-time response when facing large-scale track points of urban traffic.

Therefore, the GPS track data road section flow matching method based on the network coding is designed, the map is quickly matched on the premise of ensuring the matching precision, and the map matching calculation efficiency is improved.

Disclosure of Invention

The invention aims to provide a GPS track data road section flow matching method based on network coding aiming at the problems in the prior art, and the technical problems to be solved by the invention are that: how to improve the map matching calculation efficiency.

The purpose of the invention can be realized by the following technical scheme:

a GPS track data road section flow matching method based on network coding comprises the following steps:

s1, simplifying a road network, wherein the step S1 comprises the following steps:

s101, road network information is obtained: preparing a road network file, wherein the prepared road network file needs to contain information such as road network nodes, coordinates, road section directions and the like;

s102, dividing a road network map into three-level grids with equal size, wherein the size of the grids is determined according to the size of the road network;

s103, encoding the hierarchical grids, wherein the encoding rule is as follows: the total number of the codes is 14 bits, the first 5 bits represent the X direction of the primary decoding, and the 6 to 10 bits represent the Y direction of the primary coding; 11 bits represent the X direction of the secondary coding, and 12 bits represent the Y direction of the secondary coding; 13 bits represent the X direction of the three-level coding, and 14 bits represent the Y direction of the three-level coding;

the coding formula is based on the input longitude and latitude, and the coding formula is as follows (see formula 1):

wherein, min is a starting point coordinate, and the starting point coordinate is (0, 0); gridsize is the accuracy, and the accuracy of the first-level to third-level grids is respectively set to be 0.01,0.002 and 0.0004; lat is the longitude coordinate, lon is the latitude coordinate, min _lat And min _lon Initial minimum longitude and latitude respectively for map meshing;

s104, assigning codes to all road sections, assigning a sequence to each road section, wherein the sequence is a coded sequence of the road section passing through the grid, sequencing the sequences according to the direction of the road, and establishing two dictionaries to store the relation between the sequences, wherein a first dictionary key is a road section number, the value is the coded sequence, a second dictionary key is the coded sequence, and the value is the road section number in the grid;

s2, preprocessing the GPS track data, wherein the step S2 comprises the following steps:

s201, deleting null values: deleting data with empty part data (license plate, time, longitude and latitude) in the GPS track data list, deleting data with spatial position outside the road network boundary, and sequencing the data according to time sequence;

s202, classifying and storing the vehicles according to license plate numbers;

s203, reserving a stop point: respectively screening out continuous track data (two or more) with the longitude and latitude of each vehicle unchanged or the distance of less than 15 meters, when the time span of the continuous track data is less than 30 minutes, only keeping the first track point, or keeping the first track data and the last track data, and marking the two track data with a label of a stop point (the track point of which the position of the vehicle is unchanged for a period of time, the vehicle may be in a stop state at this moment), so that the label cannot be deleted by subsequent deduplication operation;

s204, deleting abnormal data: deleting points with abnormal time and speed, calculating the distance and time difference between adjacent coordinate points, dividing the distance by the time difference to obtain the speed, if the speed is greater than 34 m/s, regarding the latter points as data abnormal and deleting the latter points, and if the speed is less than 8 m/s and the time difference is greater than 600 s, regarding the latter points as the vehicle stopping and deleting the latter points;

s205, circulating the step S203 and the step S204 for each vehicle to obtain preprocessed track data;

s3, matching road sections according to grids where the vehicle track points are located, wherein the matching road sections are communicated to form a plurality of track sections, and the step S3 comprises the following steps:

s301, matching grids according to the longitude and latitude of the track points, and adding a column in original data for storing codes of the matched grids;

s302, removing the weight of the adjacent track points: judging whether the grid codes of the adjacent track points are consistent, if so, only keeping the first track point, deleting the repeated value of the adjacent track through a deduplication function, wherein the expression of the deduplication function is as follows (see formula 2):

Dedupl(Code _list )＝Dedupl(Code(code1,code2,,code3…coden))＝

Code _list { code1, code2, \ 8230; code } (formula 2)

Wherein Dedupl () is a de-duplication function, code (Code 1, code2, \ 8230; code) is a set of adjacent track points, and Code is a set of adjacent track points _list The trace points are finally reserved after the weight is removed;

s303, acquiring all road sections in the grid corresponding to the track points through the dictionary II in the step S104 as a candidate road section set of the track points;

s304, performing deletion judgment (judging which track points need to be deleted) on all the track points obtained in the step S303, wherein the judgment step length is 3 (namely judging 3 adjacent track points each time), and the advancing step length of each judgment is 1; the decision rule is as follows:

if the candidate road section alternative sets corresponding to the adjacent track points are completely the same, deleting the road section set of the next track point, wherein the formula is as follows (see formula 3),

Dedpul{Point1[way ₁ ,way ₂ ,way ₃ ],Point2[way ₁ ,way ₂ ,way ₃ ]}＝

{[Point1，Point2][way ₁ ,way ₂ ,way ₃ ]} (formula 3)

Wherein Dedpul { } is the deduplication function, point1[ way { } [ ₁ ,way ₂ ,way ₃ ],Point2[way ₁ ,way ₂ ,way ₃ ]For two adjacent track points with the same candidate road section set, the data set is regarded as the track Point data repeatedly recorded, [ Point1, point2][way ₁ ,way ₂ ,way ₃ ]The trace points are obtained after the duplication is removed;

if the road section sets of two track points which are separated by one track point are completely the same, the separated track points are considered to be deviated, and are deleted, the formula is as follows (see formula 4),

Dedpul{Point1[way ₁ ,way ₂ ,way ₃ ],Point2[way ₄ ,way ₅ ],Point3[way1,way2,way3]}＝

{[Point1,Point3][way ₁ ,way ₂ ,way ₃ ],Point2[way ₄ ,way ₅ ]} (formula 4)

Wherein Dedpul { } is the deduplication function, point1[ way { } [ ₁ ,way ₂ ,way ₃ ],Point3[way ₁ ,way ₂ ,way ₃ ]The corresponding candidate road sections are completely same in set, the spacing distance is two track points separated by one track Point, the data set is determined that the track Point is deviated to cause data repeated recording, [ Point1, point3 ]][way ₁ ,way ₂ ,way ₃ ]Track points obtained after the duplication removal;

s305, constructing a track road section set, wherein the track road section set is initially an empty set and is used for storing the judgment result of the step S306;

s306, performing reservation judgment on the candidate road section set of the adjacent track points in the result of the S304 (judging which road sections in the road section set need to be reserved), wherein the judgment step length is 2, storing the candidate road section set into the track road section set constructed in the step S305 if the candidate road section set passes the judgment, establishing the corresponding relation between the track points and the candidate road sections, stopping when the candidate road section set does not pass the judgment, and turning to the step S307; storing the candidate road sections with the first track points and the second track points meeting the conditions into a track road section set for the first time, wherein the objects judged for the second time are the second track points and the third track points, the candidate road sections with the second track points are already in the track road section set at the moment, only the candidate road sections with the third track points meeting the conditions are added, and so on, and the objects judged for each time in the future are actually the last track point in the track road section set and the adjacent track point which is not in the set;

the rule for judging the section reservation is as follows: when the same road section exists in the adjacent track point candidate road section set, the same road section is reserved, and the road section exists in the track road section set at the moment, and does not need to be added; judging the connectivity among the circuit sections when different sections exist, reserving the sections when the sections can be connected, and adding the reserved section set to the track section set; the rules for judging connectivity among road sections are as follows:

s3061, judging whether the two road sections have intersections or not, if so, judging whether the intersection of the road section a and the road section b is the starting point of the road section a and the end point of the road section b, if so, judging that the road section a cannot reach the road section b, and ending the judgment; if no intersection point exists, the road section a is stored into the candidate road section set, and the following steps are continued;

s3062, only the next connected road section of the last road section in the candidate road section set needs to be searched each time (the directions need to be consistent), and if the connected road sections exist, the road sections are added into the corresponding candidate road section set; if two connected road sections exist, two candidate road section sets are generated, the connected road sections are respectively stored, and by analogy, a storage rule formula is as follows (see formula 5, two candidate road section sets can be obtained according to the formula 5 when the track construction is carried out on the ending road section with two connected road sections);

Candidate _waylist1 ＝[way ₁ ,way ₂ ]

Candidate _waylist2 ＝[way ₃ ,way ₄ ]

Intersection{Candidate _waylist1 ,Candidate _waylist2 }＝

{[PointCode ₁ ,PointCode ₂ ],[[way ₁ ,way ₃ ],[way ₂ ,way ₃ ]]} (formula 5)

Wherein, the interaction { } is a judgment connectivity function, candidate _waylist For the stored judgment result, pointCode is track point code, [ way ] ₁ ,way ₂ ]，[way ₃ ,way ₄ ]，[way ₁ ,way ₃ ]，[way ₂ ,way ₃ ]As stored start and end road segments;

after all the candidate road sections are operated, the step is switched to the step S3063;

s3063, candidate road sections similar to the road sections {1,2,3,1} are removed, and the candidate road sections are judged to be alternately driven by the main road and the auxiliary road;

s3064, in order to prevent the connectivity judgment time from being too long, the number of the road sections in the candidate road section set is set to be 8, and when the number of the candidate road section set is 8 and no feasible route is found, the two road sections are considered to be unreachable; if yes, adding the candidate road section set into the track road section set;

s307, when the judgment of the step S306 is failed, segmenting the track data, namely, considering the current track point as the end point of the path (the last road section of the current road section set); meanwhile, the next track point is considered as the starting point of the new path (the first road segment of the new road segment set), and the process goes to step S305;

s4, obtaining a plurality of track road section sets in the step S3, searching road sections which can be connected with adjacent track sections within the range of 5 kilometers, and if a plurality of track road sections exist in the same track point, connecting each track road section with the previous track section and the next track section to finally form a plurality of complete tracks;

there may be two cases where the track segments are not contiguous: (1) the distance between two track segments is greater than 5 kilometers; (2) The difference exists between the road network adopted in the program and the road network for the vehicle to run;

s5, selecting the shortest complete track in the step S4 as a final matching track, inquiring node longitude and latitude and node grid codes of the road sections added in the step S3 and the step S4 according to the road network relation, filling missing track points by using nodes, ensuring that all road sections and the track points have corresponding relation, and finally giving time;

the time calculation rule of missing track points is as follows: dividing the speed of the track point by the distance between the track point to be filled and the previous track point to obtain the time difference of the two track points, wherein the time of the previous track point plus the time difference is the time for filling the track point;

s6, circulating the steps S3 to S5 for each vehicle to obtain the space corresponding relation between the track point of each vehicle and the road section and the time of the vehicle appearing on each road section; classifying and storing all the obtained vehicle track data according to link numbers (links) to obtain a track data set corresponding to each link, then performing problem discussion according to aggregation (in the traditional traffic planning or traffic demand prediction, generally, an object area or a group is divided into a plurality of specific aggregates such as cells or groups and the like, then the cells or groups are used as basic units, when a model is established or a sample is amplified, aggregation processing needs to be performed on the data by using the aggregates such as the specific aggregates, the data obtained through the aggregation processing is called aggregation data, the model established by the aggregation data is called an aggregation model) is further divided according to time requirements (such as hour granularity and 15 minute granularity) to obtain the track data sets of the sub-links and the sub-links, and the track data is subjected to de-duplication, wherein the data volume after the de-duplication is the flow of the link in the time;

the rule for removing duplication of track data in this step is as follows: the license plate number is the unique identification of the vehicle, so the license plate number column in the track data set is selected as the reference column for data deduplication, the sequence is carried out according to the license plate number column (the ascending sequence or the descending sequence), the track data with the repeated license plate number is deleted, and only the first row of data is reserved.

The daily flow, the peak flow in the morning and evening or the flow in different areas of the road section can be obtained in a centralized manner according to the needs.

Compared with the prior art, the network coding-based GPS track data road section flow matching method has the following advantages:

according to the method, the road is subjected to topology construction and road network complete meshing before map matching through a GPS track data road section flow matching method based on network coding, conditions possibly occurring in practice are fully considered, track is subjected to simplified preprocessing, the matching accuracy is improved, the map matching time cost is reduced, the map matching calculation efficiency is greatly improved, and the application effect of real-time response can be achieved when large-scale traffic flow of urban traffic is faced.

Drawings

FIG. 1 is a schematic diagram of a technical route of the present invention;

FIG. 2 is a time-consuming distribution plot for trace matching using the method of the present invention;

FIG. 3 is a time distribution scale diagram for trajectory matching using the method of the present invention;

Detailed Description

The technical solution of the present patent will be further described in detail with reference to the following embodiments.

Reference will now be made in detail to embodiments of the present patent, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the drawings are illustrative only for the purpose of explaining the present patent and are not to be construed as limiting the present patent.

Referring to fig. 1, the present embodiment provides a method for matching traffic of a GPS track data road segment based on network coding, and takes track data of a vehicle with a license plate number of jing AP5 as an example, and the method is used to perform matching processing on the traffic of the GPS track data road segment. The method comprises the following specific steps:

s1, simplifying road networks, selecting a Beijing road network shp file, extracting spatial information to simplify topological relations, and the method comprises the following steps:

s101, obtaining road network information: the method comprises the steps of preparing a Beijing road network shp file, wherein the prepared road network file needs to contain information such as road network nodes, coordinates, road section directions and the like.

S102, dividing a road network map into three-level grids with equal size; setting the size of the three-level grid according to the actual size of the Beijing road network: the size of the primary grid is set to 1.2km x 609.4m, the size of the secondary grid is set to 152.9m x 152.4m, and the size of the tertiary grid is set to 39.2m x 19m.

S103, coding the hierarchical grids according to the following coding rules: the code is 14 bits in total, the first 5 bits represent the X direction of the primary decoding, and the 6 to 10 bits represent the Y direction of the primary coding; 11 bits represent the X direction of the secondary coding, and 12 bits represent the Y direction of the secondary coding; 13 bits represent the X direction of the three-level coding, and 14 bits represent the Y direction of the three-level coding;

wherein, min is a starting point coordinate, and the starting point coordinate is (0, 0); gridsize is accuracy, and the accuracy of the first, second and third levels of grids is respectively set to be 0.01,0.002 and 0.0004; lat is the longitude coordinate, lon is the latitude coordinate, min _lat And min _lon The initial minimum longitude and latitude of the map mesh are respectively.

S104, coding all road segments, wherein each road segment is provided with a sequence, the sequence is a coding sequence of a grid passed by the road segment, and the sequences are sequenced according to the direction of the road, for example, the coding sequence with the road ID of 4263917 is [ "11633039913431", "11633039913421", "11633039913411", "11633039912451", and "1163303 9912441" ]; and establishing two dictionaries to store the relation between the two, wherein one key of the dictionary is the road section number, the value is the coding sequence, the two keys of the dictionary are the coding sequence, and the value is the road section number in the grid.

s201, deleting null values: and deleting data with empty part data (license plate, time, longitude and latitude) in the GPS track data list, deleting data with spatial positions outside the road network boundary, and sequencing the data according to the time sequence.

S202, classifying and storing the vehicles according to license plate numbers.

S203, reserving a stop point: and respectively screening out continuous track data (two or more) with the longitude and latitude of each vehicle unchanged or the distance of less than 15 meters, when the time span of the continuous track data is less than 30 minutes, only retaining the first track point, otherwise, retaining the first track data and the last track data, and marking the two track data with a label of a stop point (the track point of which the position of the vehicle is kept unchanged for a period of time, and the vehicle may be in a stop state at the moment), so that the two track data can not be deleted by subsequent deduplication operation.

S204, deleting abnormal data: and deleting the points with abnormal time and speed, calculating the distance and the time difference between adjacent coordinate points, dividing the distance by the time difference to obtain the speed, if the speed is more than 34 m/s, regarding the later points as data abnormality and deleting the later points, and if the speed is less than 8 m/s and the time difference is more than 600 s, regarding the later points as the vehicle stopping and deleting the later points.

And S205, circulating the steps S203 and S204 for each vehicle to obtain the preprocessed track data.

The data volume of the GPS track data of the vehicle is 437 pieces after preprocessing, and an example of intercepting part of the preprocessed data of a certain sampling vehicle in 1 month and 13 days of 2020 is shown in table 1.

TABLE 1 example of preprocessed vehicle trajectory data

S3, according to the grid matching road sections where the vehicle track points are located, the matching road sections are communicated to form a plurality of track sections, and by taking GPS track data of the vehicles, namely Beijing AP5, the step S3 comprises the following steps:

and S301, matching grids according to the longitude and latitude of each track point of the Beijing AP5, and adding a column in the original data for storing codes of the matched grids.

S302, duplicate removal of adjacent track points: judging whether the grid codes of the adjacent track points are consistent, if so, only keeping the first track point, deleting the repeated value of the adjacent track through a deduplication function, wherein the expression of the deduplication function is as follows (see formula 2):

Dedupl(Code _list )＝Dedupl(Code(code1,code2,,code3…coden))＝

Code _list { code1, code2, \ 8230; code } (formula 2)

Wherein Dedupl () is a de-duplication function, code (Code 1, code2, \ 8230; code) is a set of adjacent track points, and Code is a set of adjacent track points _list The trace points are finally reserved after the past duplication;

the data with trellis codes obtained through steps S301 and S302 are 62 pieces, and an example of the intercepted part of the trace data is shown in table 2.

TABLE 2 vehicle trajectory data example with trellis coding

S303, acquiring all road sections in the grids corresponding to the Beijing AP5 vehicle track points through the dictionary II in the step S104 as a candidate road section set of the track points; and in total, 140 links are obtained, and a part of data of the track point candidate road segment set is intercepted, and an example is shown in table 3.

TABLE 3 Trace Point candidate road segment set example

S304, all track points of the Beijing AP5 star vehicle obtained in the step S303 are deleted and judged, the judgment step length is 3, and the forward step length of each judgment is 1; the decision rule is as follows:

if the candidate road section candidate sets corresponding to the adjacent track points are completely the same, deleting the road section set of the next track point, wherein the formula is as follows (see formula 3),

{[Point1，Point2][way ₁ ,way ₂ ,way ₃ ]} (formula 3)

Wherein Dedpul { } is the deduplication function, point1[ way } ₁ ,way ₂ ,way ₃ ],Point2[way ₁ ,way ₂ ,way ₃ ]For two adjacent track points with the same candidate road section set, the data set is regarded as the track Point data repeatedly recorded, [ Point1, point2][way ₁ ,way ₂ ,way ₃ ]Track points obtained after the duplication removal;

Wherein Dedpul { } is the deduplication function, point1[ way { } [ ₁ ,way ₂ ,way ₃ ],Point3[way ₁ ,way ₂ ,way ₃ ]The corresponding candidate road section sets are completely same, the spacing distance is two track points separated by one track Point, the data set is determined that the track points are deviated to cause data repeated entry, [ Point1, point3 ]][way ₁ ,way ₂ ,way ₃ ]The trace points obtained after the duplication removal are obtained.

And S305, constructing a track section set, wherein the track section set is initially an empty set and is used for storing the judgment result of the step S306.

S306, performing reservation judgment on the candidate road section set of the adjacent track points in the result of the S304, wherein the judgment step length is 2, storing the candidate road section set into the track road section set constructed in the step S305 if the candidate road section set passes the judgment, establishing the corresponding relation between the track points and the candidate road sections, stopping when the candidate road section set does not pass the judgment, and turning to the step S307. And storing the candidate road sections with the first track point and the second track point meeting the conditions into the track road section set by first judgment, wherein the objects judged for the second time are the second track point and the third track point, the candidate road sections of the second track point are already in the track road section set at the moment, only the candidate road sections with the third track point meeting the conditions are added, and so on, and the object judged for each time afterwards is actually the last track point in the track road section set and the adjacent track point which is not in the set.

The rule for judging the section reservation is as follows: when the same road section exists in the adjacent track point candidate road section set, the same road section is reserved, and the road section exists in the track road section set at the moment, and does not need to be added; judging the connectivity among the circuit sections when different circuit sections exist, reserving the circuit sections when the circuit sections can be communicated, and adding the reserved circuit section set to the track circuit section set; the rule for judging connectivity among road sections is as follows:

s3061, a starting road section a and an ending road section b exist, whether intersection points exist between the two road sections is judged, if yes, whether the intersection points between the road sections a and b are the starting point of a and the end point of b is judged, if yes, the road section a cannot reach the road section b, and the judgment is finished; if no intersection point exists, the road section a is stored into the candidate road section set, and the following steps are continued;

s3062, only the next connected road section of the last road section in the candidate road section set needs to be searched each time (the directions need to be consistent), and if the connected road sections exist, the road sections are added into the corresponding candidate road section set; if two connected road sections exist, two candidate road section sets are generated, the connected road sections are respectively stored, and the like, the storage rule formula is as follows (see formula 5, two candidate road section sets can be obtained according to formula 5 when the track is constructed for the end road sections with the two connected road sections);

Candidate _waylist1 ＝[way ₁ ,way ₂ ]

Candidate _waylist2 ＝[way ₃ ,way ₄ ]

Intersection{Candidate _waylist1 ,Candidate _waylist2 }＝

Wherein, the interaction { } is a connectivity judgment function, candidate _waylist For the stored determination result, pointCode is track point code, [ way ] ₁ ,way ₂ ]，[way ₃ ,way ₄ ]，[way ₁ ,way ₃ ]，[way ₂ ,way ₃ ]As stored start and end road segments;

after all the candidate road sections are operated, the step S3063 is carried out;

s3064, in order to prevent the connectivity judgment time from being too long, the number of the road sections in the candidate road section set is set to be 8, and when the number of the candidate road section set is 8 and no feasible route is found, the two road sections are considered to be unreachable; and if yes, adding the candidate road section set into the track road section set.

S307, when the judgment of the step S306 is failed, segmenting the track data, namely, considering the current track point as the end point of the path (the last road section of the current road section set); meanwhile, the next track point is considered as the starting point of the new path (the first segment of the new segment set), and the process proceeds to step S305.

The result of step S3 is shown in table 4, where table 4 is an example of a set of candidate road segments of each intercepted track segment. Because the trajectory data is segmented when the connectivity judgment is not passed, the finally obtained total set of the candidate road sections is the summary of the candidate sets of all the trajectory sections. Wherein, the grid code only stores the starting point and the end point of the track segment; and taking the grid code as a key, and taking the corresponding value as a candidate path of the track segment, wherein multiple candidate paths can be used as the candidates of the final segment.

TABLE 4 example of sets of candidate road segments for each track segment

S4, obtaining a plurality of track road section sets in the step S3, searching road sections which can be connected with adjacent track sections within the range of 5 kilometers, and if a plurality of track road sections exist in the same track point, connecting each track road section with the previous track section and the next track section to finally form a plurality of complete tracks; as from the start grid 11646039863413 to the end grid 11644039864424 (see table 4), the complete trajectory is [187844581, 150585627, 240635778,164948575] and [187844581, 26403549, 240635778,164948575], and each candidate path appears in the complete trajectory. There are two cases where the track segments are not contiguous: (1) the distance between two track segments is greater than 5 kilometers; (2) There is a difference between the road network taken in the program and the road network on which the vehicle is traveling.

S5, selecting the shortest complete track in the step S4 as a final matching track, inquiring the longitude and latitude of the nodes of the road sections and the grid codes of the nodes added in the step S3 and the step S4 according to the road network relationship, filling missing track points by using the nodes, ensuring that all the road sections and the track points have corresponding relationships, and finally giving time;

the time calculation rule of missing track points is as follows: dividing the speed of the track point by the distance between the track point to be filled and the previous track point to obtain the time difference between the two track points, wherein the time difference added to the time of the previous track point is the time for filling the track point; the calculation formula is as follows:

T _{missing track point} ＝T _{Last track point} +S/V _{Last track point}

In the formula, T _{Missing track point} Time to missing track point, T _{Last track point} Time of last trace point, S is distance between missing trace point and last trace point, V _{Last track point} The velocity of the last trace point.

S6, circulating the steps S3 to S5 for each vehicle to obtain the space corresponding relation between the track point of each vehicle and the road section and the time of the vehicle appearing on each road section; classifying and storing all the obtained vehicle track data according to the road section numbers (links) to obtain a track data set corresponding to each road section, further dividing the track data set according to the time requirements (such as hour granularity and 15-minute granularity) of the set to obtain track data sets of sub-road sections and sub-time periods, and performing track data deduplication to finally obtain the flow of each road section in different time periods.

The track data deduplication rule in this step is as follows: the license plate number is the only identification mark of the vehicle, so the license plate number column in the track data set is selected as the reference column for data deduplication, the sequence is carried out according to the license plate number column, the track data with repeated license plate numbers is deleted, and only the first row of data is reserved.

Compared with the prior art, the network coding-based GPS track data road section flow matching method has higher road section accuracy and calculation efficiency, and specifically comprises the following steps:

(1) Road segment accuracy

Because the matching basis adopted by the invention is three-level grids, and the weight distance between a point and a line is not taken as the matching basis, the situation that continuous track points are not overlapped with matched road sections can be generated theoretically, and a road section accuracy index is set for more accurately describing the accuracy of the invention. The road section accuracy refers to the percentage of the correctly matched road sections in the total number of the road sections, the matching is considered to be correct if the matched road and the track point are overlapped and the road direction is consistent with the track advancing direction, and the calculation method is shown in formula 6.

Wherein Accuracy is road section Accuracy, num _a Num for matching the correct number of road sections _all Is the total number of matched road segments.

The existing map matching algorithm adopts a common geometric algorithm, namely a weight map matching algorithm of a fusion direction as comparison. The map matching algorithm based on the maximum weight obtains the vehicle matching road section through calculation of indexes such as direction similarity and weight screening in several stages of noise reduction, segmentation, compression, nearby road retrieval, matching and combined segmentation. The road section accuracy of the weight map matching algorithm for calculating the fusion direction by selecting the vehicle track data and the road section accuracy based on the map matching algorithm related in the invention is shown in the table 5.

TABLE 5 road section accuracy comparison of the direction-fused weight map matching algorithm with the network coding-based GPS trajectory data road section traffic matching method

	Jing AAP	Jing ADM	Beijing ALX	Beijing AWP
					Weight algorithm for fusion direction	93.40％	92.90％	89.60％	95.20％
Grid coding based algorithm	100％	99.40％	97.90％	99.20％

As can be seen from table 5, both the weighting algorithm based on the fusion direction and the algorithm based on the trellis coding involved in the present invention have higher accuracy, but the algorithm based on the trellis coding has higher accuracy. In actual matching, the weight algorithm of the fusion direction still retains and matches the shifted data, resulting in relatively low accuracy.

(2) Efficiency of calculation

In the existing map matching algorithm, a common geometric algorithm, namely a weight map matching algorithm of a fusion direction is adopted for comparison, and although the method can well complete track matching, the positions of the footholds, the vertical distances, the tracks and the angles of a road network are designed to be complex calculation. The method takes about 5 minutes on average when the track matching of one vehicle is completed.

By adopting the GPS track data road section flow matching method based on the network coding, the track matching is carried out by adopting the road network complete gridding, namely, the three-level gridding method, and through experiments, the time consumption distribution diagram is shown in figure 2, the abscissa is the time consumption interval (unit: second), the ordinate is the number of vehicles (unit: vehicles), and the time distribution ratio diagram corresponding to figure 2 is shown in figure 3, so that 94% of vehicles can complete the track matching within 10 s.

Although the preferred embodiments of the present patent have been described in detail, the present patent is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present patent within the knowledge of those skilled in the art.

Claims

1. A network coding based GPS trajectory data segment traffic matching method, including the preamble (prior art), characterized by the steps of:

s101, obtaining road network information: preparing a road network file, wherein the information to be contained in the prepared road network file comprises road network nodes, coordinates and road section directions;

the coding formula is based on the input longitude and latitude, and is as follows:

wherein, min is a starting point coordinate, and the starting point coordinate is (0, 0); gridsize is precision; lat is longitude coordinate, lon is latitude coordinate, min _lat And min _lon Initial minimum longitude and latitude respectively for map meshing;

s201, deleting null values: deleting data with empty license plate, time, longitude or latitude in a GPS track data list, deleting data with spatial position outside road network boundary, and sequencing the data according to time sequence;

s202, classifying and storing the vehicles according to license plate numbers;

s203, reserving a stop point: respectively screening out continuous track data (two or more) with the longitude and latitude of each vehicle unchanged or the distance of less than 15 meters, when the time span of the continuous track data is less than 30 minutes, only keeping a first track point, otherwise, keeping the first track data and the last track data, and marking a label of a stop point (the track point of which the position of the vehicle is unchanged for a period of time, and the vehicle may be in a stop state) on the two track data, so that the two track data can not be deleted by subsequent deduplication operation;

s302, duplicate removal of adjacent track points: judging whether the grid codes of the adjacent track points are consistent, if so, only keeping the first track point, deleting the repeated value of the adjacent track through a deduplication function, wherein the deduplication function expression is as follows:

Dedupl(Code _list )＝Dedupl(Code(code1,code2,,code3…coden))＝Code _list {code1,code2,…coden}

s304, performing deletion judgment on all track points obtained in the step S303, wherein the judgment step length is 3, and the forward step length of each judgment is 1;

s305, constructing a track section set, wherein the track section set is initially an empty set and is used for storing the judgment result of the step S306;

s306, performing reservation judgment on the candidate road section set of the adjacent track points in the result of the S304, wherein the judgment step length is 2, storing the candidate road section set into the track road section set constructed in the step S305 if the candidate road section set passes the judgment, establishing the corresponding relation between the track points and the candidate road sections, stopping when the candidate road section set does not pass the judgment, and turning to the step S307; storing the candidate road sections with the first track point and the second track point meeting the conditions into a track road section set for the first time, wherein the objects judged for the second time are the second track point and the third track point, the candidate road sections of the second track point are already in the track road section set at the moment, only the candidate road sections with the third track point meeting the conditions are added, and so on, and the object judged for each time in the future is actually the last track point in the track road section set and the adjacent track point which is not in the set;

s307, when the judgment of the step S306 is failed, the track data is segmented, namely the current track point is considered as the end point of the path; meanwhile, considering the next track point as the starting point of the new path, the step S305 is performed;

s4, obtaining a plurality of track road section sets in the step S3, searching road sections capable of connecting adjacent track sections within the range of 5 kilometers, and if a plurality of track road sections exist in the same track point, connecting each track road section with the previous track section and the next track section to finally form a plurality of complete tracks;

s6, circulating the steps S3 to S5 for each vehicle to obtain the space corresponding relation between the track point of each vehicle and the road section and the time of the vehicle appearing on each road section; classifying and storing all the obtained vehicle track data according to the road section numbers to obtain a track data set corresponding to each road section, further dividing the track data set according to the time requirement of the set, obtaining track data sets of the road sections and the time sections, and performing track data deduplication, wherein the data size after deduplication is the flow of the road section in the time section.

2. The method as claimed in claim 1, wherein in step S103, the values of the primary to tertiary grid accuracies gridsize are set to 0.01,0.002,0.0004, respectively.

3. The method for matching the traffic of the GPS track data section based on the network coding according to claim 1, wherein in step S304, a rule for determining deletion of the track point is as follows:

when the candidate road section alternative sets corresponding to the adjacent track points are completely the same, deleting the road section set of the next track point, wherein the formula is as follows,

Dedpul{Point1[way ₁ ,way ₂ ,way ₃ ],Point2[way ₁ ,way ₂ ,way ₃ ]}＝{[Point1，Point2][way ₁ ,way ₂ ,way ₃ ]}

wherein Dedpul { } is the deduplication function, point1[ way } ₁ ,way ₂ ,way ₃ ],Point2[way ₁ ,way ₂ ,way ₃ ]For two adjacent track points with the same candidate road section set, the data set is regarded as the track Point data repeatedly recorded, [ Point1, point2][way ₁ ,way ₂ ,way ₃ ]The trace points are obtained after the duplication is removed;

when the road section sets of two track points separated by one track point are completely the same, the separated track points are considered to be deviated and deleted, the formula is as follows,

Dedpul{Point1[way ₁ ,way ₂ ,way ₃ ],Point2[way ₄ ,way ₅ ],Point3[way1,way2,way3]}＝{[Point1,Point3][way ₁ ,way ₂ ,way ₃ ],Point2[way ₄ ,way ₅ ]}

wherein Dedpul { } is the deduplication function, point1[ way } ₁ ,way ₂ ,way ₃ ],Point3[way ₁ ,way ₂ ,way ₃ ]The corresponding candidate road section sets are completely same, the spacing distance is two track points separated by one track Point, the data set is determined that the track points are deviated to cause data repeated entry, [ Point1, point3 ]][way ₁ ,way ₂ ,way ₃ ]The trace points obtained after the duplication removal are obtained.

4. The method for matching the traffic of the GPS track data segment based on the network code as claimed in claim 1, wherein in step S306, the rule for determining the segment retention is: when the same road section exists in the adjacent track point candidate road section set, the same road section is reserved, and the road section exists in the track road section set at the moment, and does not need to be added; and when different road sections exist, judging the connectivity among the road sections, reserving the connectivity among the road sections when the road sections can be communicated, and adding the reserved road section set to the track road section set.

5. The network coding-based GPS track data road section flow matching method according to claim 4, wherein the rule for judging connectivity among the road sections is as follows:

s3061, a starting road section a and an ending road section b exist, whether intersection points exist between the two road sections is judged, if yes, whether the intersection points between the road sections a and b are the starting point of a and the end point of b is judged, if yes, the road section a cannot reach the road section b, and the judgment is finished; if no intersection point exists, storing the road section a into a candidate road section set, and continuing the following steps;

s3062, only the next connected road section of the last road section in the candidate road section set needs to be searched each time, and if the connected road section exists, the road section is added into the corresponding candidate road section set; if two connected road sections exist, two candidate road section sets are generated, the connected road sections are respectively stored, and the analogy is repeated, and the storage rule formula is as follows;

Candidate _waylist1 ＝[way ₁ ,way ₂ ]

Candidate _waylist2 ＝[way ₃ ,way ₄ ]

Intersection{Candidate _waylist1 ,Candidate _waylist2 }＝

{[PointCode ₁ ,PointCode ₂ ],[[way ₁ ,way ₃ ],[way ₂ ,way ₃ ]]}

wherein, the interaction { } is a connectivity judgment function, candidate _waylist For the stored judgment result, pointCode is track point code, [ way ] ₁ ,way ₂ ]，[way ₃ ,way ₄ ]，[way ₁ ,way ₃ ]，[way ₂ ,way ₃ ]As stored start and end road segments;

6. The method for matching the traffic of the GPS track data section based on the network coding according to claim 1, wherein in the step S5, the calculation rule of the missing track point given time is: dividing the speed of the track point by the distance between the track point to be filled and the previous track point to obtain the time difference between the two track points, wherein the time difference added to the time of the previous track point is the time for filling the track point; the calculation formula is as follows:

T _{missing track point} ＝T _{On the upper part} A _{Tracing point} +S/V _{Upper part of} A _{Tracing point}

In the formula, T _{Missing track point} Time to missing track point, T _{Upper part of} One track point is the time of the last track point, S is the distance between the missing track point and the last track point, V _{On the upper part} A _{Tracing point} The velocity of the last trace point.

7. The method for matching the traffic of the GPS track data section based on the network coding according to claim 1, wherein in the step S6, the track data deduplication rule is as follows: taking the license plate number as the unique identification of the vehicle, selecting the license plate number column in the track data set as a reference column for data deduplication, sequencing according to the license plate number column, deleting the track data with repeated license plate numbers, and only keeping the first row of data.