CN108961747A

CN108961747A - A kind of urban road traffic state information extracting method under incomplete bayonet data qualification

Info

Publication number: CN108961747A
Application number: CN201810714830.9A
Authority: CN
Inventors: 任毅龙; 刘帅; 于海洋; 刘晨阳; 杨刚; 张路
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2018-07-03
Filing date: 2018-07-03
Publication date: 2018-12-07
Anticipated expiration: 2038-07-03
Also published as: CN108961747B

Abstract

This patent discloses a method for extracting urban road traffic status information under the condition of incomplete bayonet data, which specifically includes the following steps: Step 1, preprocessing of incomplete data; Step 2, complementing missing data; Step 3 , to extract road traffic state information, the present invention is applicable to the extraction of urban road traffic state information, based on incomplete bayonet data, use optimized depth-first traversal to find lost track points, and then use Lagrangian polynomial interpolation method The time information of the lost track point is obtained, and then the road traffic state information can be extracted.

Description

A kind of urban road traffic status information extraction under the condition of incomplete checkpoint data method

技术领域technical field

本发明涉及交通信息领域，尤其涉及不完备卡口数据条件下城市交通状态信息提取方法。The invention relates to the field of traffic information, in particular to a method for extracting urban traffic state information under the condition of incomplete bayonet data.

背景技术Background technique

随着我国的综合国力和人民生活水平的提高，机动车的数量每年以10％～20％的速度增加，为了加强对机动车的监管和城市道路化建设，卡口设备布设数量也得到了大幅提升。道路卡口监控系统可以全天候的对卡口进行监测，并对所有类型的过往车辆进行记录，而且记录信息丰富，包含卡口编号，车牌号码，过车时间等信息。使用卡口数据可以对城市道路交通状态信息进行提取和分析，但是由于卡口监控系统设备技术水平以及卡口监控设备布设卡口数目等的限制，难免会出现记录信息丢失和记录点缺失等信息记录不完善的现象，导致不完备卡口数据条件下的城市交通状态信息提取困难。所以需要研究不完备卡口数据条件下的城市道路交通状态信息提取方法。With the improvement of my country's comprehensive national strength and people's living standards, the number of motor vehicles is increasing at a rate of 10% to 20% every year. In order to strengthen the supervision of motor vehicles and the construction of urban roads, the number of bayonet equipment has also been greatly increased. promote. The road checkpoint monitoring system can monitor the checkpoint around the clock and record all types of passing vehicles, and the recorded information is rich, including the checkpoint number, license plate number, passing time and other information. The use of bayonet data can extract and analyze urban road traffic status information. However, due to the technical level of bayonet monitoring system equipment and the number of bayonets installed by bayonet monitoring equipment, it is inevitable that information such as loss of recorded information and missing recording points will occur. The phenomenon of incomplete records makes it difficult to extract urban traffic status information under the condition of incomplete bayonet data. Therefore, it is necessary to study the extraction method of urban road traffic status information under the condition of incomplete checkpoint data.

目前，关于道路交通状态信息提取的研究有很多，申请号为201510938922.1的专利——《一种基于大数据的交通运行状态特征参数提取方法》是基于GPS数据和手机信令数据对车辆速度和道路流量参数进行提取，其方法不适用于卡口数据。现有技术中还有大量基于卡口数据对交通信息进行提取的技术方案，但是对于卡口数据不完备的情况下通常的做法是将该数据舍弃，例如申请号为201510225291.9的专利——《基于大规模卡口过车数据的道路实时通行速度计算方法》虽然是基于卡口数据对路段速度信息的提取方法，但对于有缺失的卡口数据并不适用。At present, there are many studies on the extraction of road traffic state information. The patent application number 201510938922.1 - "A method for extracting characteristic parameters of traffic operation state based on big data" is based on GPS data and mobile phone signaling data for vehicle speed and road Flow parameters are extracted in a way that does not apply to bayonet data. There are still a large number of technical solutions for extracting traffic information based on checkpoint data in the prior art, but the usual practice is to discard the data when the checkpoint data is incomplete, such as the patent application number 201510225291. Calculation method of road real-time traffic speed based on large-scale checkpoint passing data" Although it is based on checkpoint data to extract road section speed information, it is not applicable to missing checkpoint data.

发明内容Contents of the invention

本发明提出了一种在部分卡口数据丢失的情况下交通状态信息提取方法，通过对有效的卡口信息进行补充，使其所记录的位置信息，时间信息和车辆信息相对完整，完成对城市道路交通状态信息的提取。The invention proposes a method for extracting traffic state information when part of the bayonet data is lost. By supplementing the effective bayonet information, the recorded position information, time information and vehicle information are relatively complete, and the city Extraction of road traffic status information.

为了解决上述技术问题，本专利提供的技术方案包括：In order to solve the above technical problems, the technical solutions provided by this patent include:

提供一种不完备卡口数据条件下的城市道路交通状态信息提取方法，所述方法包括如下步骤：步骤一，不完备数据的预处理A method for extracting urban road traffic status information under the condition of incomplete bayonet data is provided, said method comprising the following steps: Step 1, preprocessing of incomplete data

1.1车辆行程划分1.1 Vehicle trip division

数据的预处理包括，从数据库中选取路网内所有记录点的卡口数据，并剔除异常数据和重复数据；将此部分数据按照车牌号进行分类，并给每辆车分配编号；按时间顺序对每辆车的数据进行排列；最后，对车辆行程划分，划分方法如下：Data preprocessing includes selecting checkpoint data of all recording points in the road network from the database, and eliminating abnormal data and duplicate data; classifying this part of data according to the license plate number, and assigning a number to each vehicle; chronologically Arrange the data of each vehicle; finally, divide the vehicle itinerary, the division method is as follows:

(1)根据路网中各个路段正常行驶通过所需的平均时间设置路段通行时间集合T＝{T₁，T₂，…，T_V}，集合中的最大值T_u(u∈V)作为阈值。(1) According to the average time required for the normal driving of each road section in the road network, set the road section transit time set T={T ₁ , T ₂ ,...,T _V }, and the maximum value T _u (u∈V) in the set is taken as threshold.

(2)在上述已按时间排序的数据中，若每辆车的每两条相邻数据时间差大于阈值，则认为车辆在这两个记录点之间有停留，需要将数据从这两个记录点之间断开，所以将这两个记录点前者及其以前的数据视为行程A，后者及其以后的记录点视为行程B。如图2所示，某辆车在经过卡口编号为187和024的卡口之间所消耗的时间大于阈值，则认为此车辆在两个卡口之间有停留，故将其行驶轨迹从此处断开，划分为行程A和行程B。但若行程B中仍有相邻数据时间差大于阈值的情况，则可以对行程B继续进行行程划分。(2) In the above data sorted by time, if the time difference between every two adjacent data points of each vehicle is greater than the threshold, it is considered that the vehicle has stayed between these two record points, and the data needs to be transferred from the two records The points are disconnected, so the data of the former and the former of the two recording points are regarded as trip A, and the latter and the following recording points are regarded as trip B. As shown in Figure 2, if the time spent by a certain vehicle between the bayonets numbered 187 and 024 is greater than the threshold, it is considered that the vehicle has stopped between the two bayonets, so its driving trajectory will be It is disconnected and divided into trip A and trip B. However, if there is still a situation in the itinerary B where the time difference between adjacent data is greater than the threshold, then the itinerary B may continue to be divided into itineraries.

(3)剔除只有一条记录的行程，并对剩余的行程分配编号以记录行程划分情况。(3) Eliminate the itinerary with only one record, and assign numbers to the remaining itineraries to record the division of the itinerary.

1.2计算部分评价指标1.2 Calculation of some evaluation indicators

初步计算各路段流量和速度，作为后续熵权法计算所需的两项评价指标，具体计算方法如下：Preliminary calculation of the flow and speed of each road section, as the two evaluation indicators required for subsequent entropy weight method calculation, the specific calculation method is as follows:

(1)以t为周期(t可取5min，10min等)将一天的时间划分为n个时段，对每个周期的数据进行遍历，提取各条数据的卡口编号k，车牌号码m，过车时间s等信息，计算每个时间周期内经过各卡口的各方向的车辆数，即可得出各路段每个周期的流量信息q_dj，其中d为路段编号，j为第j个时间周期。(1) Take t as the period (t can be 5min, 10min, etc.) to divide a day into n periods, traverse the data of each period, extract the bayonet number k of each piece of data, the license plate number m, and pass the vehicle Time s and other information, calculate the number of vehicles passing through each checkpoint in each direction in each time period, and then obtain the flow information q _dj of each period of each road section, where d is the number of the road section, and j is the jth time period .

(2)仍以t为周期计算路段速度v_dj，按行程读取数据的卡口编号k，车牌号码m，过车时间s等信息，计算每相邻两条数据的时间差:Δs＝s_i+1-s_i，然后用对应路段长度除以时间差可得速度，计算公式如下：(2) Still take t as the cycle to calculate the speed v _dj of the road section, read the bayonet number k of the data, the license plate number m, the passing time s and other information according to the itinerary, and calculate the time difference between every two adjacent data: Δs=s _{i +1} -s _i , and then divide the length of the corresponding section by the time difference to get the speed, the calculation formula is as follows:

其中v_di为第i条数据的卡口编号k和第i+1条数据的卡口编号k+1所连接的路段d的一个速度值，Δs为第i+1条数据的时间和i条数据的时间的差值，d_d为路段d的长度。Where v _di is a speed value of the road section d connected by the bayonet number k of the i-th data and the bayonet number k+1 of the i+1 data, Δs is the time of the i+1 data and the i The time difference of the data, d _d is the length of road segment d.

对每个周期的数据求其各路段的速度平均值即为此周期内的路段速度，公式如下：Calculate the average speed of each road section for the data of each cycle, which is the speed of the road section in this cycle, the formula is as follows:

其中，v_dj为第j个时间周期内的路段d的速度平均值，∑v_dj为第j个时间周期的路段d的所有速度值v_di的和，h为第j个时间周期内v_di的个数。将路段编号d，时间段j，路段平均速度v_dj存入数据库中，即可得各路段在各个时段的速度。Among them, v _dj is the average speed of road section d in the jth time period, ∑v _dj is the sum of all speed values v _di of road section d in the jth time period, and h is v _di in the jth time period the number of . Store the road section number d, time section j, and road section average speed v _dj into the database, and then the speed of each road section at each time period can be obtained.

(3)由于原始卡口数据有缺失，故所求路段的流量和速度均不完整。若某路段的流量或速度在某时段有缺失，则可使用此路段相邻时段的流量或速度近似替代。(3) Due to the lack of original bayonet data, the traffic and speed of the road section are incomplete. If the flow or speed of a road section is missing in a certain period of time, the flow or speed of the adjacent time period of this road section can be used as an approximate replacement.

步骤二，补全缺失数据Step 2, complete the missing data

若某行程的数据所显示的位置信息如果能够依次出现在路网各个相邻卡口处，即其轨迹信息不存在断点，则此行程数据无缺失，可直接使用来提取道路交通状态信息。如图2所示的行程A，行程数据无缺失；若某行程的数据所显示的位置信息在路网中不能顺次连结，即其轨迹信息存在断点，则此行程数据有缺失，需要对其每个断点的数据补充完整后才可使用。如图2所示的行程B，在卡口编号为026的卡口处数据缺失，其行程数据所显示的轨迹位置为028到186。遍历各行程的数据，提取其位置信息，并判断其轨迹是否存在断点，提取存在数据缺失的行程数据进行补充。If the position information displayed by the data of a certain trip can appear in each adjacent checkpoint in the road network in sequence, that is, there is no breakpoint in the trajectory information, then the data of this trip is not missing, and can be directly used to extract road traffic status information. For itinerary A shown in Figure 2, there is no missing data in the itinerary; if the location information displayed by the data of a certain itinerary cannot be connected sequentially in the road network, that is, there are breakpoints in its trajectory information, then the data of this itinerary is missing, and it needs to be corrected. It can only be used after the data of each breakpoint is supplemented completely. As for stroke B shown in Figure 2, data is missing at bayonet number 026, and the trajectory positions displayed by the stroke data are 028 to 186. Traversing the data of each trip, extracting its location information, and judging whether there are breakpoints in its trajectory, extracting the missing trip data for supplement.

2.1计算缺失点的位置2.1 Calculate the location of the missing point

对于存在缺失的数据需使用优化的深度优先遍历计算其缺失部分的轨迹，具体步骤如下：For missing data, it is necessary to use optimized depth-first traversal to calculate the trajectory of the missing part. The specific steps are as follows:

(1)使用深度优先遍历的方法找出数据缺失部分对应的起点和终点的所有可能路径，并对所有可能路径进行初次筛选，找出其中最短路径及与最短路径所经过的卡口数相差小于等于3的所有路径，设某行程缺失部分的可能路径的集合为R＝{r₁，r₂，…，r_n}，其中r_i表示路径的集合中第i条路径。其中n是符合条件的可能路径的总数。(1) Use the depth-first traversal method to find out all possible paths of the starting point and the ending point corresponding to the missing part of the data, and perform initial screening on all possible paths to find out the shortest path and the difference between the number of checkpoints passed by the shortest path and the shortest path is less than or equal to 3, let the set of possible paths for the missing part of a trip be R={r ₁ , r ₂ ,...,r _n }, where r _i represents the i-th path in the set of paths. where n is the total number of possible paths that qualify.

(2)选择路径长度，路径平均速度，路段数量，路径平均流量作为评价指标。路径长度为每条可能路径所经过的各路段的长度和，用集合表示为L＝{l₁，l₂，…，l_n}^T，其中l_i(i∈n)表示第i条可能路径所经过的各路段的长度和，即l_i＝∑d_d；路径平均速度为每条可能路径所经过的各路段在对应时间段内的速度平均值，用集合表示为V＝{v₁，v₂，…，v_n}^T，其中v_i(i∈n)表示第i条可能路径所经过的各路段的平均速度v_dj的平均值，即路段数量为每条可能路径所经过的路段的数量，用集合表示为C＝{c₁，c₂，…，c_n}^T，其中c_i(i∈n)表示第i条可能路径所经过的路段的数量；路径平均流量为每条可能路径所经过的各路段在对应时间段内的流量平均值，用集合表示为Q＝{q₁，q₂，…，q_n}^T，其中q_i(i∈n)表示第i条可能路径所经过的各路段的流量q_dj的平均值，即用熵权法对各评价指标的权值进行确定，首先获得初始矩阵：(2) Select path length, path average speed, number of road sections, and path average traffic as evaluation indicators. The path length is the sum of the lengths of the sections that each possible path passes through, expressed as L={l ₁ , l ₂ ,...,l _n } ^T in a set, where l _i (i∈n) represents the i-th possible path The sum of the lengths of each road section passed, that is, l _i =∑d _d ; the average speed of the path is the average speed of each road section passed by each possible path in the corresponding time period, expressed as V={v ₁ , v ₂ ,…,v _n } ^T , where v _i (i∈n) represents the average speed v _dj of each road section passed by the i-th possible path, namely The number of road sections is the number of road sections that each possible path passes through, expressed as a set C={c ₁ ,c ₂ ,...,c _n } ^T , where c _i (i∈n) represents the i-th possible path passed through The number of road sections; the average flow rate of the path is the average flow rate of each road section passed by each possible path in the corresponding time period, which is expressed as Q={q ₁ , q ₂ ,...,q _n } ^T , where q _i (i∈n) represents the average value of the flow q _dj of each road section passed by the i-th possible path, namely The entropy weight method is used to determine the weight of each evaluation index. First, the initial matrix is obtained:

(3)上述四个评价指标中路段平均速度和路段流量为高优指标，路段长度和路段数量为低优指标，不同指标之间应该具有同趋势性，故采用倒数法将低优指标转化为高优指标，转化后的矩阵为：(3) Among the above four evaluation indicators, the average speed of the road section and the flow rate of the road section are the high-quality indicators, and the length of the road section and the number of road sections are the low-quality indicators. The different indicators should have the same trend, so the reciprocal method is used to transform the low-quality indicators into High-quality indicators, the transformed matrix is:

(4)对Y矩阵进行归一化处理，即用Y矩阵中每个列向量的元素yi_j与该向量所有元素的和的比值作为归一化得到的矩阵Z的对应元素，归一化后的矩阵为：(4) Normalize the Y matrix, that is, use the ratio of the element yi _j of each column vector in the Y matrix to the sum of all elements of the vector as the corresponding element of the matrix Z obtained by normalization, after normalization The matrix of is:

(5)确定各评价指标的熵权值H(x_j)，(j＝1,2,3,4)，具体公式为：(5) Determine the entropy weight H(x _j ) of each evaluation index, (j=1,2,3,4), the specific formula is:

其中k为调节系数，z_ij为第i个评价单元的第j个评价指标的标准化值，即Z矩阵中第i行，第j列的元素。where k is the adjustment coefficient, z _ij is the standardized value of the jth evaluation indicator of the i-th evaluation unit, that is, the element in the i-th row and j-th column in the Z matrix.

(6)将评价指标的熵值转化为权重值，即可得到每个评价指标的权值，其具体公式为：(6) Convert the entropy value of the evaluation index into a weight value to obtain the weight value of each evaluation index. The specific formula is:

其中，d_j为第j列的评价指标的权值，且0≤d_j≤1，m为评价指标的个数，即m＝4。Among them, d _j is the weight of the evaluation index in the jth column, and 0≤d _j ≤1, m is the number of evaluation indicators, that is, m=4.

(7)确定各评价指标的熵权综合评价值，将各指标的权值分别与其对应的指标标准化值相乘后求和，其公式为：(7) Determine the comprehensive evaluation value of the entropy weight of each evaluation index, multiply the weight of each index with its corresponding index standardized value and then sum, the formula is:

其中，U_i为第i条可能路径的熵权综合评价值；选取熵权综合评价值最大的路径作为数据缺失部分的轨迹。Among them, U _i is the comprehensive evaluation value of the entropy weight of the i-th possible path; the path with the largest comprehensive evaluation value of the entropy weight is selected as the trajectory of the missing part of the data.

2.2计算缺失点的时间2.2 Time to calculate missing points

使用拉格朗日型多项式插值法计算处数据缺失部分轨迹点的时间信息，而后即可得出路段流量，路段速度，路段平均车流密度，其具体步骤如下：Use the Lagrangian polynomial interpolation method to calculate the time information of the track points where the data is missing, and then get the traffic flow, speed, and average traffic density of the road. The specific steps are as follows:

(1)设轨迹数据缺失部分的前一个卡口记录点为O点，轨迹数据缺失部分的后一个卡口记录点为D点，若O点之前有数据则令O点的前一个卡口记录点为A点，若O点之前无数据则A点等与O点。令O点到A点的距离为x₀，D点到A点的距离为x₁，数据缺失部分的各轨迹点到A点的距离的集合为X＝{x₁,x₂,…,x_d}，其中x_u为第u个点到A点的距离。将O点和D点的时间点转化为时间戳，并分别记为y₀，y₁。(1) Let the previous bayonet recording point of the missing portion of the trajectory data be point O, and the subsequent bayonet recording point of the missing portion of the trajectory data be point D. If there is data before point O, let the previous bayonet record of point O be Point A is point A, and if there is no data before point O, point A is equal to point O. Let the distance from point O to point A be x ₀ , the distance from point D to point A be x ₁ , and the set of distances from each track point to point A in the data missing part is X={x ₁ ,x ₂ ,…,x _d }, where x _u is the distance from the uth point to point A. The time points at point O and point D are converted into time stamps and recorded as y ₀ and y ₁ respectively.

(2)采用拉格朗日型插值多项式计算各数据缺失部分轨迹点的时间值，即可得到各轨迹点的时间值的集合P＝{p₁,p₂,…,p_d}，其中p_u为第u个点的时间值，拉格朗日型插值多项式具体公式为：(2) Using the Lagrangian interpolation polynomial to calculate the time value of the track points in the missing part of each data, the set of time values of each track point P={p ₁ ,p ₂ ,…,p _d } can be obtained, where p _u is the time value of the uth point, and the specific formula of the Lagrangian interpolation polynomial is:

其中1≤u≤d，且x_u∈X。where 1≤u≤d, and x _u ∈X.

(3)将集合P中的时间值转化为原始数据中的时间格式，并将对应的轨迹点的卡口编号，车牌号码，过车时间等信息加入此车辆行程数据中即可将此行程信息补充完整。将补充完整后的数据和无缺失数据整合，以便下一步使用。(3) Convert the time value in the set P into the time format in the original data, and add the bayonet number, license plate number, passing time and other information of the corresponding track point to the vehicle itinerary data to get the itinerary information make it complete. Integrate the completed data and non-missing data for the next step.

步骤三，提取道路交通状态信息Step 3: Extract road traffic status information

(1)以t为时间周期(t可取5min，10min等)将一天的时间划分为n个时段，对每个周期的数据进行遍历，提取各条数据的卡口编号k，车牌号码m，过车时间s等信息，计算每个时间周期内经过各卡口的各方向的车辆数，即可得出各路段每个周期的流量q_dj，其中d为路段编号，j为第j个时间周期。将路段编号d，时间段j，路段流量q_dj存入数据库中，即可得各路段在各个时段的流量。(1) Taking t as the time period (t can be 5min, 10min, etc.), divide the time of a day into n time periods, traverse the data of each period, extract the bayonet number k and license plate number m of each piece of data, and pass Calculate the number of vehicles in each direction passing through each checkpoint in each time period, and then obtain the flow rate q _dj of each road section in each period, where d is the road section number, and j is the jth time period . Store the link number d, time period j, and link traffic q _dj into the database, and then the traffic of each link at each time period can be obtained.

(3)以t为周期计算路段平均密度，依次使用各周期内的路段流量除以对应的路段平均速度即可得到此周期的路段平均密度，计算公式为：将路段编号d，时间段j，路段密度k_dj存入数据库中，即可得各路段在各个时段的密度。(3) Calculate the average density of the road section with t as the cycle, and divide the flow of the road section in each cycle by the average speed of the corresponding road section in turn to obtain the average density of the road section in this cycle. The calculation formula is: Store the road segment number d, time segment j, and road segment density k _dj into the database, and the density of each road segment at each time period can be obtained.

本发明适用于对城市道路交通状态信息的提取，基于不完备的卡口数据，使用优化的深度优先遍历求出丢失轨迹点，然后使用拉格朗日型多项式插值法求出丢失轨迹点的时间信息，进而可以对道路交通状态信息进行提取。The present invention is applicable to the extraction of urban road traffic state information, based on incomplete bayonet data, using optimized depth-first traversal to obtain lost track points, and then using Lagrangian polynomial interpolation method to find the time of lost track points Information, and then the road traffic status information can be extracted.

附图说明Description of drawings

图1为本发明提出的城市道路交通状态信息提取流程图；Fig. 1 is the extraction flowchart of urban road traffic state information that the present invention proposes;

图2为本发明提出的车辆行程划分示意图；Fig. 2 is a schematic diagram of the division of vehicle journeys proposed by the present invention;

图3为本发明实例中的卡口位置示意图；Fig. 3 is a schematic diagram of bayonet position in the example of the present invention;

图4为本发明实例中的13条路段速度变化图。Fig. 4 is the speed change diagram of 13 road sections in the example of the present invention.

具体实施方式Detailed ways

下面将结合附图和实例对本发明作进一步的详细说明。The present invention will be further described in detail below in conjunction with accompanying drawings and examples.

本发明提出的不完备卡口数据条件下的城市道路交通状态信息提取方法，主要包括：数据的预处理，估算路段流量和速度，提取存在缺失的数据，计算缺失轨迹点位置，提取道路交通状态信息五个小步骤，流程如图1所示，具体为：The method for extracting urban road traffic state information under the condition of incomplete bayonet data proposed by the present invention mainly includes: data preprocessing, estimating road section flow and speed, extracting missing data, calculating the position of missing track points, and extracting road traffic state There are five small steps for information, and the process is shown in Figure 1, specifically:

步骤一，缺失数据的预处理Step 1, preprocessing of missing data

1.1车辆行程划分1.1 Vehicle trip division

从数据库中根据卡口编号信息提取出路网内所有记录点的卡口数据，并剔除其中车牌号未能识别的记录等异常数据，然后删除卡口编号，车牌号码，过车时间都相同的数据；将剔除异常数据和重复数据后的数据按照车牌号进行排列，并给每辆车分配一个编号；再按时间顺序对每辆车的数据进行排列，即可得到所有符合要求的并按车辆编号和时间排列的数据；最后，按照如下步骤对车辆行程进行划分：Extract the bayonet data of all record points in the road network from the database according to the bayonet number information, and remove abnormal data such as records where the license plate number cannot be recognized, and then delete the data with the same bayonet number, license plate number, and passing time ;Arrange the data after removing abnormal data and duplicate data according to the license plate number, and assign a number to each vehicle; and time-arranged data; finally, the vehicle trips are divided according to the following steps:

(1)，根据路网中各个路段正常行驶通过所需的平均时间设置路段通行时间集合T＝{T₁，T₂，…，T_V}，其中V需根据路网中路段的数量确定，T_u(u∈V)为第u条路段的平均行驶时间。取集合中的最大值作为阈值来对判断是否需要对车辆行程进行划分。(1), according to the average time required for the normal driving of each road section in the road network, set the road section transit time set T = {T ₁ , T ₂ ,..., T _V }, where V needs to be determined according to the number of road sections in the road network, T _u (u∈V) is the average travel time of the uth road segment. Take the largest value in the set It is used as a threshold to determine whether the vehicle trip needs to be divided.

(2)，在已按车辆编号和时间排序的数据中，依次选取每辆车的两条相邻数据并求其时间差，若时间差大于阈值，则认为车辆在这两个记录点之间有停留，需要将数据从这两个记录点之间断开从而把其划分为两个行程，所以将这两个记录点的前者及其以前的数据视为行程A，后者及其以后的记录点视为行程B。但若行程B中仍有相邻数据时间差大于阈值的情况，则可以对行程B继续进行行程划分。(2), in the data that has been sorted by vehicle number and time, select two adjacent data of each vehicle in turn and find the time difference. If the time difference is greater than the threshold, it is considered that the vehicle has stayed between these two record points , the data needs to be disconnected from the two record points to divide it into two trips, so the former and the previous data of the two record points are regarded as trip A, and the latter and the subsequent record points are regarded as For itinerary B. However, if there is still a situation in the itinerary B where the time difference between adjacent data is greater than the threshold, then the itinerary B may continue to be divided into itineraries.

(3)，经过行程划分的数据可能出现某个行程只有一条记录的情况，需将此类行程剔除，并对剩余的行程分配递增编号以记录行程划分情况。(3) The data that has been divided into trips may have only one record for a certain trip. This kind of trip needs to be eliminated, and the remaining trips should be assigned incremental numbers to record the trip division.

本例截取一辆车牌号码为“皖CXXXXX”，时间信息为“2017-06-01”的车辆记录为例说明车辆行程划分之后的数据格式，其数据包含信息如下所示：This example intercepts a vehicle record with the license plate number "Anhui CXXXXX" and the time information "2017-06-01" as an example to illustrate the data format after the vehicle itinerary is divided. The information contained in the data is as follows:

表1“皖CXXXXX”车辆行程划分表Table 1 "Anhui CXXXXX" vehicle trip division table

卡口编号Bayonet No. 车牌号码License plate number 过车时间passing time 行程编号itinerary number 035035 皖CXXXXXAnhui CXXXXX 2017-06-0111:28:152017-06-0111:28:15 11 533533 皖CXXXXXAnhui CXXXXX 2017-06-0111:30:472017-06-0111:30:47 11 054054 皖CXXXXXAnhui CXXXXX 2017-06-0120:34:542017-06-0120:34:54 22 026026 皖CXXXXXAnhui CXXXXX 2017-06-0120:39:542017-06-0120:39:54 22 343343 皖CXXXXXAnhui CXXXXX 2017-06-0121:35:202017-06-0121:35:20 33 024024 皖CXXXXXAnhui CXXXXX 2017-06-0121:39:272017-06-0121:39:27 33 187187 皖CXXXXXAnhui CXXXXX 2017-06-0121:42:102017-06-0121:42:10 33

其中行程编号中的1，2，3分别表示此车辆的第一段行程，第二段行程，第三段行程。每段行程都包含了卡口监测设备的编号信息和信息记录时间。Among them, 1, 2, and 3 in the itinerary number respectively represent the first section of the journey, the second section of the journey, and the third section of the itinerary of the vehicle. Each section of the journey includes the number information of the bayonet monitoring equipment and the information recording time.

1.2计算部分评价指标1.2 Calculation of some evaluation indicators

因后续计算需用到路段的流量和速度信息，故初步对各路段流量和速度进行计算，由于卡口数据不完整，所以计算得出的流量和速度信息必定有缺失，需对缺失部分做近似处理。具体计算方法如下：Because the follow-up calculation needs to use the flow and speed information of the road section, the flow and speed of each road section are initially calculated. Since the checkpoint data is incomplete, the calculated flow and speed information must be missing, and the missing part needs to be approximated. deal with. The specific calculation method is as follows:

(1)，以10分钟为周期将一天的时间划分为144个时段，由每条数据中的时间信息为判断依据，对每个10分钟周期的数据进行遍历，提取每条数据的卡口编号k，车牌号码m，过车时间s等信息，车辆每经过一个卡口就会产生一条过车记录，故计算流量时只需统计出各10分钟周期内的连续经过某两个相邻卡口数据的条数，即为这两个相邻卡口所在路段的流量q_dj。若某车辆经过某路段的第一个卡口时，其记录时间在前一个10分钟周期，经过第二个卡口时，其记录时间在下一个10分钟周期，则归入前一个10分钟周期内的流量。按照此方法计算，即可得出各路段每个10分钟周期的流量信息，最后将其路段编号d，时间段j，流量q_dj存入数据库中，为后续计算作准备。(1) Divide a day into 144 time periods with a period of 10 minutes, use the time information in each piece of data as the basis for judgment, traverse the data of each 10-minute period, and extract the bayonet number of each piece of data k, license plate number m, passing time s and other information, each time a vehicle passes through a checkpoint, a passing record will be generated, so when calculating the traffic, it is only necessary to count the consecutive passing through certain two adjacent checkpoints within each 10-minute period The number of pieces of data is the flow q _dj of the road section where the two adjacent checkpoints are located. If a vehicle passes through the first checkpoint of a road section, its recorded time is in the previous 10-minute period, and when it passes through the second checkpoint, its recorded time is in the next 10-minute period, then it is included in the previous 10-minute period traffic. Calculated according to this method, the flow information of each 10-minute cycle of each road section can be obtained, and finally the road section number d, time section j, and flow q _dj are stored in the database for subsequent calculation.

(2)，仍以10分钟为周期计算路段速度，按行程读取数据的卡口编号k，车牌号码m，过车时间s等信息，计算每相邻的两条数据的时间差Δs并以秒为单位存储，每个行程的最后一条数据的时间差记为0，然后用相邻两条数据对应路段的长度除以其时间差可得对应路段在其过车时间对应的10分钟周期内的一个速度值v_di，然后对每个路段的每个10分钟周期的速度数据求平均值v_dj即为该10分钟周期内的路段速度。将速度信息v_dj按照对应路段和对应时间段j存入数据库中，为后续计算做准备。(2), still calculate the speed of the road section with a cycle of 10 minutes, read the bayonet number k of the data, the license plate number m, the passing time s and other information according to the itinerary, calculate the time difference Δs between two adjacent data and calculate it in seconds Stored as a unit, the time difference of the last piece of data in each trip is recorded as 0, and then the length of the corresponding road section of two adjacent data is divided by the time difference to obtain a speed of the corresponding road section in the 10-minute period corresponding to the passing time value v _di , and then calculate the average value of the speed data of each 10-minute cycle of each road segment, and v _dj is the speed of the road segment in the 10-minute cycle. The speed information v _dj is stored in the database according to the corresponding road section and the corresponding time period j, so as to prepare for the subsequent calculation.

(3)，由于原始卡口数据有缺失，故所求路段的流量和速度均不完整。可能出现某路段的流量或速度在某个10分钟周期内的数据有缺失的情况，则可使用此路段相邻时段的流量或速度近似替代。(3) Due to the lack of original checkpoint data, the traffic and speed of the road section are incomplete. It may happen that the data of the flow or speed of a certain road section in a certain 10-minute period is missing, then the flow or speed of the adjacent period of this road section can be used as an approximate substitute.

本例截取路段编号为1的路段在时间信息为“2017-06-0103:40:00”到“2017-06-0104:10:00”的速度，来说明路段速度的数据格式及其计算过程，数据如下表所示：This example intercepts the speed of the road section with the road section number 1 when the time information is "2017-06-0103:40:00" to "2017-06-0104:10:00", to illustrate the data format and calculation process of the speed of the road section , the data is shown in the table below:

表2路段速度表Table 2 Section speed table

路段编号section number 路段速度section speed 时段起点time period start 时段终点time period end 11 9.7509.750 2017-06-0103:40:002017-06-0103:40:00 2017-06-0103:50:002017-06-0103:50:00 11 9.2439.243 2017-06-0103:50:002017-06-0103:50:00 2017-06-0104:00:002017-06-0104:00:00 11 9.1809.180 2017-06-0104:00:002017-06-0104:00:00 2017-06-0104:10:002017-06-0104:10:00

其中，路段速度单位为“m/s”，路段1的上行方向卡口编号为175，下行方向卡口编号为442。为计算本例中的第一个时段的速度时，需先找出所有依次连续经过卡口编号为175和442的且经过卡口175使时间信息在“2017-06-0103:40:00”和“2017-06-0103:50:00”之间的数据，用路段1的长度除以各组数据的时间差，再求其平均值，即为路段1在“2017-06-0103:40:00”到“2017-06-0103:50:00”这个10分钟周期内的平均速度。Wherein, the speed unit of the road section is "m/s", the checkpoint number of the uplink direction of the road section 1 is 175, and the checkpoint number of the downlink direction is 442. In order to calculate the speed of the first time period in this example, it is necessary to find out all the vehicles that pass through the bayonet numbers 175 and 442 successively and pass through the bayonet 175 so that the time information is at "2017-06-0103:40:00" For the data between "2017-06-0103:50:00", divide the length of road section 1 by the time difference of each group of data, and then calculate the average value, that is, road section 1 in "2017-06-0103:40: 00" to "2017-06-0103:50:00" the average speed in the period of 10 minutes.

步骤二，补全缺失数据Step 2, complete the missing data

遍历每个行程的数据，若某行程每相邻的两条数据的卡口编号依次等于某路段上行方向和下行方向的卡口编号，则此行程信息完整，将其卡口位置按照记录的时间顺序显示在地图中可沿道路连成一条完整的轨迹。此类数据无缺失信息，可直接用于道路状态信息的提取；若某行程存在相邻的两条数据的卡口编号不等于同一个路段所连接的两个卡口的卡口编号，则此类数据存在缺失，提取其缺失部分的起点卡口编号和终点卡口编号，为计算其缺失部分各轨迹点的卡口编号信息做准备。Traversing the data of each trip, if the bayonet numbers of every two adjacent data of a certain trip are equal to the bayonet numbers of the uplink direction and the downlink direction of a road section, then the information of this trip is complete, and the bayonet position is recorded according to the time Sequentially displayed on the map can be connected along the road to form a complete track. This type of data has no missing information and can be directly used to extract road state information; If there is a lack of class data, extract the starting point number and end point number of the missing part, and prepare for the calculation of the bayonet number information of each trajectory point in the missing part.

2.1，计算缺失点的位置2.1, calculate the position of the missing point

(1)，已知起点和终点的卡口编号，即可得到其对应的位置信息，利用路网连接关系使用深度优先遍历的方法找出起点和终点的所有可能路径，并对所有可能路径进行初次筛选，找出其中最短路径及与最短路径所经过的卡口数相差小于等于3的所有路径，由于本例所选的城市路网为方格式，故可以简化路径筛选过程选取其经过卡口数最少的路径及与卡口数最少路径的卡口数相差小于等于3的所有路径。本例以表1中行程3中的数据缺失部分为例说明其缺失点位置的计算。其中卡口编号为343到024之间的数据有缺失，使用深度优先遍历并对其可能路径初次筛选之后得其可能路径的集合为R＝{r₁，r₂，…，r_n}，其中r_i表示路径的集合中第i条路径。其中n是符合条件的可能路径的总数，本例经筛选后得四条可能路径，故n＝4。其中r₁表示第1条可能路径，所经过卡口的编号依次为：343，344，345，024；r₂表示第2条可能路径，所经过卡口的编号依次为：343，344，028，024；r₃表示第3条可能路径，所经过卡口的编号依次为：343，026，028，024；r₄表示第4条可能路径，所经过卡口的编号依次为：343，023，186，026，028，024，各卡口编号对应的位置如图3所示。(1), given the bayonet numbers of the starting point and the ending point, the corresponding location information can be obtained, and the depth-first traversal method is used to find all possible paths of the starting point and the ending point by using the road network connection relationship, and all possible paths are calculated. For the first screening, find out the shortest path and all paths with the difference of the number of checkpoints passed by the shortest path and the number of checkpoints that are less than or equal to 3. Since the urban road network selected in this example is in a square format, the path screening process can be simplified and the number of checkpoints that pass through the shortest path can be selected. and all paths whose number of bayonets differs by less than or equal to 3 from the path with the least number of bayonets. This example takes the missing part of the data in itinerary 3 in Table 1 as an example to illustrate the calculation of the missing point position. The data between bayonet number 343 and 024 is missing, and the set of possible paths obtained by using depth-first traversal and initial screening of its possible paths is R={r ₁ , r ₂ ,...,r _n }, where r _i represents the i-th path in the set of paths. Where n is the total number of possible paths that meet the conditions. In this example, four possible paths are obtained after screening, so n=4. Among them, r ₁ represents the first possible path, and the numbers of the bayonets passed are: 343, 344, 345, 024; r ₂ represents the second possible path, and the numbers of the bayonets passed are: 343, 344, 028 , 024; r ₃ represents the third possible path, and the numbers of the bayonets passed are: 343, 026, 028, 024; r ₄ represents the fourth possible path, and the numbers of the bayonets passed are: 343,023 , 186, 026, 028, 024, the position corresponding to each bayonet number is shown in Figure 3.

(2)，选择路径长度，路径平均速度，路段数量，路径平均流量作为评价指标来判断上述可能路径集合中的最理想化的路段作为缺失部分的轨迹，其中路径长度为每条可能路径所经过的各路段的长度和，用集合表示为L＝{l₁，l₂，…，l_n}^T，本例经计算得其路径长度的集合为L＝{2147，2169，1950，2725}^T；路径平均速度为每条可能路径所经过的各路段在对应时间段内的速度的平均值，用集合表示为V＝{v₁，v₂，…，v_n}^T。本例经计算各可能路径在“2017-06-0103:30:00”到“2017-06-0103:40:00”的时段内路径平均速度的集合为V＝{7.095，6.044，5.786，6.104}^T；路段数量为每条可能路径所经过的路段的数量，用集合表示为C＝{c₁，c₂，…，c_n}^T，本例求得的可能路径路段数量集合为C＝{3，3，3，5}^T；路径平均流量为每条可能路径所经过的各路段在对应时间段内的流量平均值，用集合表示为Q＝{q₁，q₂，…，q_n}^T。本例经计算每条可能路径在“2017-06-0103:30:00”到“2017-06-0103:40:00”的时段内路径平均流量集合为Q＝{104，60，51，50}^T；用熵权法对各评价指标的权值进行确定，首先获得初始矩阵，将数据带入得：(2), select path length, path average speed, number of road sections, and path average traffic as evaluation indicators to determine the most idealized road section in the above possible path set as the trajectory of the missing part, where the path length is the path passed by each possible path The sum of the lengths of each road section is represented as L={l ₁ , l ₂ ,...,l _n } ^T by a set, and the set of path lengths calculated in this example is L={2147, 2169, 1950, 2725} ^T ; The average path speed is the average speed of each road section passed by each possible path in the corresponding time period, expressed as V={v ₁ , v ₂ ,...,v _n } ^T in a set. In this example, the set of average speed of each possible path in the period from "2017-06-0103:30:00" to "2017-06-0103:40:00" is calculated as V={7.095, 6.044, 5.786, 6.104 } ^T ; the number of road sections is the number of road sections passed by each possible path, which is represented by a set as C={c ₁ , c ₂ ,...,c _n } ^T , the set of possible path road sections obtained in this example is C= {3, 3, 3, 5} ^T ; the average flow rate of the path is the average flow rate of each road section passed by each possible path in the corresponding time period, expressed as Q={q ₁ , q ₂ ,...,q _n } ^T . In this example, the average traffic set of each possible path in the period from "2017-06-0103:30:00" to "2017-06-0103:40:00" is calculated as Q={104, 60, 51, 50 } ^T ; Use the entropy weight method to determine the weight of each evaluation index, first obtain the initial matrix, and bring the data into:

(3)，在路径长度，路径平均速度，路段数量，路径平均流量四个评价指标中路段平均速度和路段流量为高优指标，即其数值较高时有利于车辆通行。路段长度和路段数量为低优指标，即其数值较低时有利于车辆通行。不同指标之间应该具有同趋势性，故采用倒数法将低优指标转化为高优指标，即对低优指标求其倒数，高优指标值不变，按对应位置写入Y矩阵。转化后的矩阵为：(3) Among the four evaluation indexes of path length, path average speed, number of road sections, and path average flow, the average road speed and road flow are high-quality indicators, that is, when the value is high, it is beneficial to vehicle traffic. The length of the road section and the number of road sections are low-quality indicators, that is, when the value is low, it is conducive to vehicle traffic. Different indicators should have the same trend, so the reciprocal method is used to convert the low-quality indicators into high-quality indicators, that is, to calculate the reciprocal of the low-quality indicators, and the value of the high-quality indicators remains unchanged, and write the Y matrix according to the corresponding position. The transformed matrix is:

(4)，对Y矩阵进行归一化处理，即用Y矩阵中每个列向量的元素y_ij与该向量所有元素的和的比值作为归一化得到的矩阵Z的对应元素，本例按公式求得的归一化后的矩阵为：(4), the Y matrix is normalized, that is, the ratio of the element y _ij of each column vector in the Y matrix to the sum of all elements of the vector is used as the corresponding element of the matrix Z obtained by normalization, and this example is as follows The normalized matrix obtained by the formula is:

(5)，确定各评价指标的熵权值H(x_j)，(j＝1,2,3,4)，具体公式为：(5), determine the entropy weight H(x _j ) of each evaluation index, (j=1,2,3,4), the specific formula is:

其中k为调节系数，z_ij为第i个评价单元的第j个评价指标的标准化值，即Z矩阵中第i行，第j列的元素。具体到本例，将(4)中求得的Z矩阵的对应元素带入上述公式求得：where k is the adjustment coefficient, z _ij is the standardized value of the jth evaluation indicator of the i-th evaluation unit, that is, the element in the i-th row and j-th column in the Z matrix. Specifically in this example, the corresponding elements of the Z matrix obtained in (4) are brought into the above formula to obtain:

H(x₁)＝0.3543，H(x₂)＝0.3554，H(x₃)＝0.3510，H(x₄)＝0.3530，H(x ₁ )=0.3543, H(x ₂ )=0.3554, H(x ₃ )=0.3510, H(x ₄ )=0.3530,

(6)，将评价指标的熵值转化为权重值d_j，(j＝1,2,3,4)，即求每个评价指标的权值，其具体公式为：(6), convert the entropy value of the evaluation index into the weight value d _j , (j=1,2,3,4), that is, calculate the weight value of each evaluation index, and the specific formula is:

其中，d_j为第j列的评价指标的权值，且0≤d_j≤1，m为评价指标的个数，即m＝4。具体到本例将(5)中求得的H(x_j)，(j＝1,2,3,4)带入上述公式可求得：Among them, d _j is the weight of the evaluation index in the jth column, and 0≤d _j ≤1, m is the number of evaluation indicators, that is, m=4. Specifically, in this example, H(x _j ), (j=1,2,3,4) obtained in (5) is brought into the above formula to obtain:

d₁＝0.2487，d₂＝0.2482，d₃＝0.2510，d₄＝0.2530d ₁ =0.2487, d ₂ =0.2482, d ₃ =0.2510, d ₄ =0.2530

(7)，确定各评价指标的熵权综合评价值，将各指标的权值分别与其对应的指标标准化值相乘后求和，其公式为：(7) Determine the comprehensive evaluation value of the entropy weight of each evaluation index, multiply the weight of each index with its corresponding index standardized value and then sum, the formula is:

其中，U_i为第i条可能路径的熵权综合评价值；选取熵权综合评价值最大的路径作为数据缺失部分的轨迹。集体到本例，将(6)中的d_j，(j＝1,2,3,4)，和Z矩阵中的对应元素带入上述公式得：Among them, U _i is the comprehensive evaluation value of the entropy weight of the i-th possible path; the path with the largest comprehensive evaluation value of the entropy weight is selected as the trajectory of the missing part of the data. Collectively to this example, put d _j in (6), (j=1,2,3,4), and the corresponding elements in the Z matrix into the above formula:

U₁＝0.3029，U₂＝0.2504，U₃＝0.2467，U₄＝0.2009U ₁ =0.3029, U ₂ =0.2504, U ₃ =0.2467, U ₄ =0.2009

由于评价函数值U₁最大，故选用可能路径集合中的第一条可能路径来作为缺失部分的轨迹，有第一条可能路径中各卡口编号信息可知各卡口的位置信息。Since the evaluation function value U ₁ is the largest, the first possible path in the possible path set is selected as the trajectory of the missing part, and the position information of each bayonet can be known from the number information of each bayonet in the first possible path.

2.2，计算缺失点的时间2.2, Calculate the time of missing points

求得缺失点的位置信息后，数据仍然不完整，需要求出其时间信息才可使用。本例使用拉格朗日型多项式插值法计算处数据缺失部分轨迹点的时间信息，为后续求路段流量，路段速度，路段平均车流密度做准备，其具体步骤如下：After the location information of the missing point is obtained, the data is still incomplete, and its time information needs to be obtained before it can be used. In this example, the Lagrangian polynomial interpolation method is used to calculate the time information of the track points where the data is missing, and to prepare for the subsequent calculation of road traffic, road speed, and road average traffic density. The specific steps are as follows:

(1)，设轨迹数据缺失部分的前一个卡口记录点为O点，本例中为可能路径的起点为O点，即编号为343的卡口为O点。轨迹数据缺失部分的后一个卡口记录点为D点，本例中为可能路径的终点为D点，即编号为024的卡口为D点。本行程O点的之前没有数据，故A点等于O点。令O点到A点的距离为x₀，即x₀＝0，D点到A点的距离为x₁，即x₁＝2147m，数据缺失部分的各轨迹点到A点的距离的集合为J＝{j₁,j₂,…,j_d}，其中j_u(u∈[1，d])为第u个点到A点的距离。本例所得的J＝{653,1182}，分别为所计算路径中每个卡口到A点的距离。将O点和D点的时间格式转化为秒，并分别记为y₀，y₁。本例将O点和D点的时间点转化为时间戳，得y₀＝1496324120s，y₁＝1496324367s。(1), it is assumed that the previous bayonet recording point of the missing portion of the trajectory data is point O. In this example, the starting point of the possible path is point O, that is, the bayonet numbered 343 is point O. The last bayonet recording point of the track data missing part is point D. In this example, the end point of the possible path is point D, that is, the bayonet numbered 024 is point D. There is no data before point O in this itinerary, so point A is equal to point O. Let the distance from point O to point A be x ₀ , that is, x ₀ =0, the distance from point D to point A be x ₁ , that is, x ₁ =2147m, and the set of distances from each track point to point A in the data missing part is J={j ₁ ,j ₂ ,...,j _d }, where j _u (u∈[1,d]) is the distance from the uth point to point A. The J={653,1182} obtained in this example are respectively the distances from each checkpoint in the calculated path to point A. Convert the time formats of points O and D into seconds, and record them as y ₀ and y ₁ respectively. In this example, the time points at points O and D are converted into time stamps, and y ₀ =1496324120s and y ₁ =1496324367s are obtained.

(2)，采用拉格朗日型插值多项式计算各数据缺失部分轨迹点的时间值，即可得到各轨迹点的时间戳的集合P＝{p₁,p₂,…,p_d}，其中p_u(u∈[1，d])为第u个点的时间戳，拉格朗日型插值多项式具体公式为：(2), using the Lagrangian interpolation polynomial to calculate the time value of the track points in the missing part of each data, the set of time stamps of each track point P={p ₁ ,p ₂ ,…,p _d } can be obtained, where p _u (u∈[1, d]) is the time stamp of the uth point, and the specific formula of the Lagrangian interpolation polynomial is:

其中1≤u≤d，且x_u∈X。将本例所求得的各数值按公式要求带入得：where 1≤u≤d, and x _u ∈X. Put the values obtained in this example into the formula according to the requirements:

P＝{1496324195,1496324255}P={1496324195,1496324255}

(3)，将集合P中的时间戳转化为原始数据中的时间格式，经转换的得：p₁为2017-06-0121:36:35，p₂为2017-06-0121:37:35。并将对应的轨迹点的卡口编号，车牌号码，过车时间等信息加入此车辆行程数据中即可将此行程信息补充完整。本例将此行程数据补充完整后如表3所示。最后将补充完整后的行程数据和无缺失行程数据整合，以便下一步使用。(3), convert the time stamp in the set P to the time format in the original data, after conversion: p ₁ is 2017-06-0121:36:35, p ₂ is 2017-06-0121:37:35 . And add the bayonet number, license plate number, passing time and other information of the corresponding track point to the vehicle travel data to complete the travel information. In this example, the itinerary data is supplemented as shown in Table 3. Finally, integrate the supplementary and complete itinerary data with the non-missing itinerary data for use in the next step.

表3补充后的行程3信息表Table 3 Supplementary itinerary 3 information table

卡口编号Bayonet No. 车牌号码License plate number 过车时间passing time 行程编号itinerary number 343343 皖CXXXXXAnhui CXXXXX 2017-06-0121:35:202017-06-0121:35:20 33 344344 皖CXXXXXAnhui CXXXXX 2017-06-0121:36:352017-06-0121:36:35 33 345345 皖CXXXXXAnhui CXXXXX 2017-06-0121:37:352017-06-0121:37:35 33 024024 皖CXXXXXAnhui CXXXXX 2017-06-0121:39:272017-06-0121:39:27 33 187187 皖CXXXXXAnhui CXXXXX 2017-06-0121:42:102017-06-0121:42:10 33

将有缺失的数据补充完整后和无缺失的数据整合在一起，即可对路段的流量，速度和密度进行计算。计算方法如下：After the missing data is completed and the data without missing is integrated, the flow, speed and density of the road section can be calculated. The calculation method is as follows:

(1)，以10分钟为周期将一天的时间划分为144个时段，由每条数据中的时间信息为判断依据，对每个10分钟周期的数据进行遍历，提取每条数据的卡口编号，车牌号码，过车时间等信息，车辆每经过一个卡口就会产生一条过车记录，故计算流量时只需统计出各10分钟周期内的连续经过某两个相邻卡口数据的条数，即为这两个相邻卡口所在路段的流量。若某车辆经过某路段的第一个卡口时，其记录时间在前一个10分钟周期，经过第二个卡口时，其记录时间在下一个10分钟周期，则归入前一个10分钟周期内的流量。按照此方法计算，即可得出各路段每个10分钟周期的流量信息，最后将其路段编号，时间段，流量存入数据库中，即可得到各路段在各个时段的流量。(1) Divide a day into 144 time periods with a period of 10 minutes, use the time information in each piece of data as the basis for judgment, traverse the data of each 10-minute period, and extract the bayonet number of each piece of data , license plate number, passing time and other information, every time a vehicle passes through a checkpoint, a passing record will be generated, so when calculating the traffic, it is only necessary to count the records of data passing through two adjacent checkpoints in each 10-minute period. The number is the flow rate of the road section where the two adjacent checkpoints are located. If a vehicle passes through the first checkpoint of a road section, its recorded time is in the previous 10-minute period, and when it passes through the second checkpoint, its recorded time is in the next 10-minute period, then it is included in the previous 10-minute period traffic. According to this calculation method, the flow information of each road section in each 10-minute cycle can be obtained, and finally the road section number, time period, and flow are stored in the database, and the traffic flow of each road section at each time period can be obtained.

(2)，仍以10分钟为周期计算路段速度，按行程读取数据的卡口编号，车牌号码，过车时间等信息，计算每相邻的两条数据的时间差并以秒为单位存储，每个行程的最后一条数据的时间差记为0，然后用相邻两条数据对应路段长度除以其时间差可得对应路段在其过车时间对应的10分钟周期内的一个速度值，然后对每个路段的每个10分钟周期的速度数据求平均值即为该10分钟周期内的路段速度。将速度信息按照对应路段和对应时间段存入数据库中，即可得到各个路段在各个时段的速度。本例选取13条道路等级不尽相同的路段绘制其一天内的速度变化图，如图4所示。(2), still calculate the speed of the road section with a cycle of 10 minutes, read the bayonet number, license plate number, passing time and other information of the data according to the itinerary, calculate the time difference between each two adjacent data and store it in seconds, The time difference of the last piece of data in each trip is recorded as 0, and then the length of the corresponding road section of the two adjacent data is divided by the time difference to obtain a speed value of the corresponding road section in the 10-minute period corresponding to the passing time, and then for each The average speed data of each 10-minute period of a road section is the speed of the road section within the 10-minute period. The speed information is stored in the database according to the corresponding road section and corresponding time period, and the speed of each road section at each time period can be obtained. In this example, 13 road sections with different road grades are selected to draw the speed change map within a day, as shown in Figure 4.

(3)，以10分钟为周期计算路段平均密度，依次使用各10分钟周期内的路段流量除以对应的路段平均速度即为此周期的路段平均密度。将路段编号，时间段，路段密度存入数据库中，即可得各路段在各个时段的密度。(3) Calculate the average density of the road section in a period of 10 minutes, and divide the flow of the road section in each 10-minute period by the corresponding average speed of the road section to obtain the average density of the road section for this period. Store the road section number, time period, and road section density in the database to get the density of each road section at each time period.

以上所述乃是本发明的具体案例实施及其运用的原理，若在本发明的构想下对本发明进行改变，但其功能仍为超出说明书及附图涵盖的精神时，仍属于本发明的保护范围。The above is the implementation of specific cases of the present invention and the principle of its application. If the present invention is changed under the conception of the present invention, but its function is still beyond the spirit covered by the description and accompanying drawings, it still belongs to the protection of the present invention. scope.

Claims

1. the urban road traffic state information extracting method under a kind of incomplete bayonet data qualification, it is characterised in that including such as Lower step:

Step 1, the pretreatment of incomplete data

1.1 vehicle travels divide

The pretreatment of data includes recording the bayonet data put, and rejecting abnormalities data from all in selection road network in database And repeated data；This partial data is classified according to license plate number, and distributes and numbers to each car；In chronological order to each The data of vehicle are arranged；Finally, dividing to vehicle travel, division methods are as follows:

(1) section transit time set T={ T is arranged by required average time according to section each in road network normally travel₁, T₂..., T_V, the maximum of T in set_u(u ∈ V) is used as threshold value.

(2) in above-mentioned data according to time sequence, if every two adjacent data time differences of each car are greater than threshold value, recognize There is stop between the two record points for vehicle, needs data from the separated of the two record points, so by the two Record puts the former and its pervious data and is considered as stroke A, and the latter and its later record point are considered as stroke B, if in=stroke B still The case where having the adjacent data time difference to be greater than threshold value, can then continue stroke division to stroke B；

(3) stroke of only one record is rejected, and to remaining stroke distribution number to record stroke dividing condition；

1.2 calculating section evaluation indexes

Each link flow of primary Calculation and speed, two evaluation indexes needed for being calculated as subsequent entropy assessment, specific calculating side Method is as follows:

(1) one day time is divided into n period using t as the period, the data in each period is traversed, extract each item number According to bayonet number k, license plate number m, cross the information such as vehicle time s, calculate all directions in each time cycle by each bayonet Vehicle number, the flow information q in you can get it each section each period_dj, wherein d is section number, and j is j-th of time cycle；

(2) section speed v still is calculated by the period of t_dj, bayonet number k, the license plate number m of data are read by stroke, spend the vehicle time The information such as s calculate the time difference per adjacent two data: Δ s=s_i+1-s_i, then can divided by the time difference with corresponding road section length Speed is obtained, calculation formula is as follows:

Wherein v_diBy the bayonet number k of the i-th data and one of the bayonet number k+1 of the i+1 data section d connected Velocity amplitude, Δ s are the difference of the time of i+1 data and the time of i data, d_dFor the length of section d；

Seeking the speed average in its each section to the data in each period, i.e. the section speed in the period, formula are as follows thus:

Wherein, v_djFor the speed average of the section d in j-th of time cycle, ∑ v_djFor the section d of j-th time cycle All velocity amplitude v_diSum, h is v in j-th time cycle_diNumber.By section number d, period j, road average-speed v_djIt is stored in database, each section can be obtained in the speed of each period；

(3) when the flow or speed in certain section have missing in certain period, then close using the flow of this section adjacent time interval or speed Like substitution.

Step 2, completion missing data

If if location information shown by the data of certain stroke could be successively present in each adjacent bayonet of road network, i.e. its rail Breakpoint is not present in mark information, then this trip data are without missing, directly using extracting road traffic state information；If certain stroke Location information shown by data cannot sequentially link in road network, i.e., there are breakpoints for its trace information, then this trip data have Missing just can be used after then supplementing completely the data of each of which breakpoint；The data for traversing each stroke extract its location information, And its track is judged with the presence or absence of breakpoint, there are the run-length datas of shortage of data to be supplemented for extraction；

2.1 calculate the position of missing point

Data in the presence of missing are calculated with the track of its lack part using the depth-first traversal of optimization, specific steps are such as Under:

(1) all possible paths of the corresponding beginning and end in shortage of data part are found out using the method for depth-first traversal, And all possible paths are screened for the first time, find out wherein shortest path and differ small with the bayonet number that shortest path is passed through In all paths for being equal to 3, if the collection of the possible path of certain stroke lack part is combined into R={ r₁, r₂..., r_n, wherein r_iTable Show the i-th paths in the set in path.Wherein n is the sum of qualified possible path；

(2) select path length, path average speed, section quantity and path average flow rate are as evaluation index；Path length The length in each section that degree is passed through for every possible path be L={ l with set expression₁, l₂..., l_n}^T, wherein l_i(i∈ N) length and i.e. l in each section that i-th possible path is passed through are indicated_i=∑ d_d；Path average speed can energy circuit for every The speed average of each section that diameter is passed through during that corresponding time period, is V={ v with set expression₁, v₂..., v_n}^T, wherein v_i (i ∈ n) indicates the average speed v in each section that i-th possible path is passed through_djAverage value, i.e., Section quantity It is C={ c with set expression by the quantity in the section that every possible path passes through₁, c₂..., c_n}^T, wherein c_i(i ∈ n) is indicated The quantity in the section that i-th possible path is passed through；Each section that path average flow rate is passed through by every possible path is right The flow average value in the period is answered, is Q={ q with set expression₁, q₂..., q_n}^T, wherein q_i(i ∈ n) indicates i-th possibility The flow q in each section that path is passed through_djAverage value, i.e.,It is carried out really with weight of the entropy assessment to each evaluation index It is fixed, initial matrix is obtained first:

(3) road average-speed and link flow are high excellent index, road section length and section quantity in aforementioned four evaluation index For low excellent index, high excellent index is converted for low excellent index using counting backward technique, the matrix after conversion are as follows:

(4) Y matrix is normalized, i.e., with the element y of each column vector in Y matrix_ijWith the vector all elements The corresponding element for the matrix Z that the ratio of sum is obtained as normalization, the matrix after normalization are as follows:

(5) the entropy weight H (x of each evaluation index is determined_j), (j=1,2,3,4), specific formula are as follows:

Wherein k is adjustment factor,z_ijFor the standardized value of j-th of evaluation index of i-th of evaluation unit, i.e., in Z matrix I-th row, the element of jth column；

(6) weighted value is converted by the entropy of evaluation index, the weight of each evaluation index, specific formula can be obtained are as follows:

Wherein, d_jFor the weight of the evaluation index of jth column, and 0≤d_j≤ 1,M is the number of evaluation index, i.e. m= 4。

(7) weight of each index, is distinguished corresponding criterion by the entropy weight comprehensive evaluation value for determining each evaluation index Value is summed after being multiplied, formula are as follows:

Wherein, U_iFor the entropy weight comprehensive evaluation value of i-th possible path；The maximum path of entropy weight comprehensive evaluation value is chosen as number According to the track of lack part.

2.2 calculate the time of missing point

Using the temporal information of shortage of data partial traces point at Lagrangian type polynomial interpolation calculating, then you can get it Link flow, section speed, road-section average vehicle density, the specific steps of which are as follows:

(1) the previous bayonet record for setting track data lack part is put as O point, the latter bayonet of track data lack part Record point is D point, and enabling the previous bayonet of O point record point if having data before O point is A point, if no data A before O point Point etc. and O point；The distance for enabling O point to A point is x₀, the distance of D point to A point is x₁, each tracing point of shortage of data part to A point The collection of distance be combined into X={ x₁,x₂,…,x_d, wherein x_uThe distance of A point is arrived for u-th point；The time point of O point and D point is turned Timestamp is turned to, and is denoted as y respectively₀, y₁；

(2) time value that each shortage of data partial traces point is calculated using Lagrangian type interpolation polynomial, can be obtained each rail Set P={ the p of the time value of mark point₁,p₂,…,p_d, wherein p_uFor u-th point of time value, Lagrangian type interpolation polynomial The specific formula of formula are as follows:

Wherein 1≤u≤d, and x_u∈X；

(3) time format in initial data is converted by the time value in set P, and the bayonet of corresponding tracing point is compiled Number, license plate number, the excessively information such as vehicle time are added in this vehicle travel data can be complete by this trip information supplement.It will supplement Data after complete and without missing Data Integration, to use in next step.

Step 3 extracts road traffic state information

(1) one day time was divided into n period using t as the time cycle, the data in each period is traversed, extracted each Bayonet number k, the license plate number m of data cross the information such as vehicle time s, calculate in each time cycle by each of each bayonet The vehicle number in direction, the flow q in you can get it each section each period_dj, wherein d is section number, and j is j-th of time cycle. By section number d, period j, link flow q_djIt is stored in database, each section can be obtained in the flow of each period.

Wherein v_diBy the bayonet number k of the i-th data and one of the bayonet number k+1 of the i+1 data section d connected Velocity amplitude, Δ s are the difference of the time of i+1 data and the time of i data, d_dFor the length of section d.

(3) road-section average density is calculated by the period of t, successively using the link flow in each period divided by corresponding road-section average The road-section average density in this period, calculation formula can be obtained in speed are as follows:By section number d, period j, section is close Spend k_djIt is stored in database, each section can be obtained in the density of each period.

2. the urban road traffic state information extraction side under a kind of incomplete bayonet data qualification according to claim 1 Method, which is characterized in that in 1.2 (3) of the step one, if the flow for the section period acquired or speed have missing, The flow or speed approximate substitution that adjacent time interval can be used, if the front and back adjacent time interval of specially numerical value loss period has number According to then with the average value of front and back adjacent time interval come the value of this loss period of approximate substitution；If with the presence of multiple continuous time numerical value Missing, then centre is substituted with the average value of front and back adjacent time interval, and all there are the values of numerical value loss period.

3. the urban road traffic state information extraction side under a kind of incomplete bayonet data qualification according to claim 1 Method, it is characterised in that in the 2.1 of the step two, the depth-first traversal of optimization is using depth-first traversal and entropy assessment In conjunction with optimal way, find possible miss path using depth-first traversal, the superiority and inferiority in path then judged with entropy assessment, To obtain the result of depth-first traversal of the most possible track for lack part as optimization.