一种城市道路交通异常检测方法 Urban road traffic anomaly detection method
技术领域 Technical field
本发明属于交通检测技术领域。 特别地, 本发明涉及一种城市道路交通异常实时检测方法。 通过 浮动车的车载 GNSS定位装置, 可获取其不同时刻的空间位置信息, 经过数据预处理、 地图匹配和数 据融合, 获得特定时空范围的行程车速概率分布; 根据速度分布的变化情况, 可有效识别城市道路交 通异常事件。 背景技术 The invention belongs to the technical field of traffic detection. In particular, the present invention relates to a method for real-time detection of urban road traffic anomalies. Through the on-board GNSS positioning device of the floating car, the spatial position information of different time points can be obtained. After data preprocessing, map matching and data fusion, the travel speed probability distribution of the specific time and space range can be obtained; according to the change of the speed distribution, the position can be effectively identified. Urban road traffic anomalies. Background technique
交通异常事件检测是城市交通管理的重要组成部分, 也是智能交通系统的核心功能之一。 交通异 常事件主要包括交通事故、 车辆抛铺、 货车落物、 道路交通设施损坏或故障以及其他造成交通流紊乱 的特殊事件。 该类事件容易造成交通拥堵、 路段通行能力降低, 严重时影响整个道路交通系统的正常 运行。 通过交通异常事件检测, 可使交通管理者及时了解交通异常信息, 并采取适当的诱导和控制措 施, 降低交通异常事件的不良影响。 Traffic anomaly detection is an important part of urban traffic management and one of the core functions of intelligent transportation systems. Traffic anomalies mainly include traffic accidents, vehicle dumping, falling objects, damage or malfunction of road traffic facilities, and other special events that cause traffic flow disturbances. Such incidents are prone to traffic congestion, reduced road capacity, and severely affect the normal operation of the entire road traffic system. Through traffic anomaly detection, traffic managers can timely understand traffic anomaly information and take appropriate inducement and control measures to reduce the adverse effects of traffic anomalies.
交通异常事件检测可分为人工方式和自动方式。 人工方式包括巡逻车、 紧急电话上报和视频监控 等, 由于消耗人力物力且实时性差, 无法满足交通管理的需要。 自动方式依靠自动事件检测 (AID, Automated Incidence Detection)算法实现, 基本原理是通过检测不同位置道路交通流的变化来识别交通 异常事件。 目前常用的 AID算法包括模式识别类算法 (如 Califorma算法、 莫妮卡算法)、 统计预测类 算法(如指数平滑法、 卡尔曼滤波算法)、 交通流模型算法(如 McMaster算法) 以及智能识别算法(如 人工神经网络、 模糊逻辑算法)。 Traffic anomaly detection can be divided into manual mode and automatic mode. Manual methods include patrol cars, emergency telephone reporting, and video surveillance. Due to the human and material resources and poor real-time performance, traffic management needs cannot be met. The automatic method relies on the automatic event detection (AID, Automated Incidence Detection) algorithm. The basic principle is to identify traffic anomalies by detecting changes in road traffic at different locations. Currently used AID algorithms include pattern recognition algorithms (such as Califorma algorithm, Monica algorithm), statistical prediction algorithms (such as exponential smoothing, Kalman filtering), traffic flow model algorithms (such as McMaster algorithm), and intelligent recognition algorithms. (such as artificial neural networks, fuzzy logic algorithms).
但是目前的检测方法存在对设施的要求高、 计算复杂度高、 无法对异常状况的态势做进一步判断 等缺点。 本发明利用出租车、 公交车车载 GNSS定位装置回传的轨迹数据, 建立历史交通状态数据库 和实时交通状态数据库, 通过分析两者反映的交通流特征差异, 识别交通异常事件。 该方法具有实时 性好、 可并行处理、 识别率高以及对检测设施要求低等特点, 适用于有实时浮动车定位数据的数据环 境下城市道路交通异常事件的检测。 However, the current detection methods have disadvantages such as high requirements on facilities, high computational complexity, and inability to make further judgments on the situation of abnormal conditions. The invention utilizes the trajectory data returned by the taxi and the bus GNSS positioning device to establish a historical traffic state database and a real-time traffic state database, and analyzes the traffic anomaly events by analyzing the difference of the traffic flow characteristics reflected by the two. The method has the characteristics of good real-time performance, parallel processing, high recognition rate and low requirements for detection facilities, and is suitable for detecting urban road traffic anomalies in a data environment with real-time floating vehicle positioning data.
目前, 针对交通异常事件监测, 有以下代表性技术: At present, for the monitoring of traffic anomalies, the following representative technologies are available:
一件美国专利申请, US 20160148512, 披露了一种交通异常事件检测和上报系统的组成原理和实 施方法。 该系统由传感器、 通信模块、 移动处理模块和用户交互模块组成。 传感器用于采集车辆周边 的相关数据; 通信模块用于发送本车辆数据和接收周边车辆的数据; 移动处理模块用于处理和分析相 关车辆在某一区域内的数据并生成交通事件报告; 用户交互模块能够像用户提供交通事件报告。 该方 案是一种基于车车和车路通讯网络的交通异常事件检测技术, 能够利用传感器采集的各类信息, 判别 异常事件。 然而, 由于传感器、 通信单元需要单独安装调试, 实施难度较大; 移动处理单元处理能力 受限; 同时需要移动和固定的讯息接收端, 系统本身存在故障概率, 可靠性不佳。 A US patent application, US 20160148512, discloses a composition principle and implementation method of a traffic anomaly detection and reporting system. The system consists of a sensor, a communication module, a mobile processing module, and a user interaction module. The sensor is used to collect relevant data around the vehicle; the communication module is used for transmitting the vehicle data and receiving data of the surrounding vehicle; the mobile processing module is for processing and analyzing the data of the relevant vehicle in a certain area and generating a traffic event report; user interaction The module is able to provide traffic incident reports like a user. The scheme is a traffic anomaly detection technology based on the vehicle and vehicle communication network, which can use various types of information collected by sensors to identify abnormal events. However, since the sensor and the communication unit need to be separately installed and debugged, the implementation is difficult; the processing capacity of the mobile processing unit is limited; and the mobile and fixed message receiving end is required, and the system itself has a failure probability and the reliability is not good.
一件中国专利申请, CN 104809878 A, 披露了一种利用公交车 GPS数据检测城市道路交通异常状 态的方法。 该方案根据 GPS历史数据获得路段延误时间指数, 根据 GPS当前数据获得瞬时速度、 周期 平均速度、 加权滑动平均速度和多车平均速度, 利用规范变量分析算法检测异常。 这一方案不需要新
增检测设施, 实施便利。 但是对于交通态势的表征过于简化, 无法分析交通异常状况的特点和成因; 对交通场景的划分缺乏依据, 未能考虑天气等因素对交通态势变化的影响。 发明内容 A Chinese patent application, CN 104809878 A, discloses a method for detecting abnormal state of urban road traffic using bus GPS data. The scheme obtains the link delay time index according to the GPS historical data, obtains the instantaneous speed, the cycle average speed, the weighted moving average speed and the multi-vehicle average speed according to the current GPS data, and uses the gauge variable analysis algorithm to detect the abnormality. This program does not need new Increase inspection facilities and facilitate implementation. However, the characterization of the traffic situation is too simplistic, and it is impossible to analyze the characteristics and causes of traffic anomalies. There is no basis for the division of traffic scenarios, and the influence of weather and other factors on traffic situation changes cannot be considered. Summary of the invention
为了更清晰地阐述本发明的内容, 首先将涉及到的专业术语解释如下: In order to explain the contents of the present invention more clearly, the technical terms involved will first be explained as follows:
浮动车: 也称探测车。 指安装了车载定位装置并行驶在城市道路上的公交汽车和出租车。 Floating car: Also known as the probe car. Refers to buses and taxis that have on-board positioning devices and are driving on city roads.
GNSS: 全球导航卫星系统(Global Navigation Satellite System )。包括 GPS、 GLONASS、 GALILEO 以及北斗卫星导航系统等。 GNSS: Global Navigation Satellite System. Including GPS, GLONASS, GALILEO and Beidou satellite navigation systems.
时空子区:按照时间和空间两个维度划分的片区,反映在一段时间内, 一定的空间范围内的情况。 将一天划分为若干时间片段, 例如 0:00-0: 10, 0: 10-0:20……, 每个时间片段称之为一个时间子区; 将 城市道路交通异常检测的实施区域划分为若干空间片段,例如经度 121.58° E-121.590 E,纬度 31.16° N-31.17° N之间的区域, 每个空间片段称为一个空间子区; 任意一个时间子区和任意一个空间子区的 交集形成的时空片段, 称为时空子区, 例如经度 121.58° E-121.590 E, 纬度 31.16° N-31.170 N之间 的区域在 0:00-0: 10的时空片段。 Space-time sub-zone: A zone divided by two dimensions, time and space, reflected in a certain space within a certain period of time. Divide the day into several time segments, for example 0:00-0: 10, 0: 10-0:20..., each time segment is called a time sub-zone; divide the implementation area of urban road traffic anomaly detection into A number of spatial segments, such as the longitude 121.58° E-121.59 0 E, the latitude 31.16° N-31.17° N, each spatial segment is called a spatial sub-region; any one time sub-region and any one spatial sub-region The space-time segment formed by the intersection is called the spatiotemporal sub-region, for example, the space-time segment of the region between 0:00-0:10 in the region between the longitude 121.58° E-121.59 0 E and the latitude 31.16° N-31.17 0 N.
历史轨迹数据: 历史轨迹数据是长时间积累并存储在数据库中的轨迹数据。 历史轨迹数据是动态 变化的数据, 需要及时进行更新, 并定期做重新处理和分析, 以保证历史交通特征提取的准确性。 每 个时空子区的数据可以并行处理以提高效率。 本发明中可简称为历史数据。 Historical trajectory data: Historical trajectory data is trajectory data accumulated over a long period of time and stored in a database. Historical trajectory data is dynamically changing data that needs to be updated in a timely manner and periodically reprocessed and analyzed to ensure the accuracy of historical traffic feature extraction. The data for each time-space sub-area can be processed in parallel to increase efficiency. In the present invention, it may be simply referred to as historical data.
实时轨迹数据: 实时轨迹数据是距离当前时刻最近的一个时间区段内的轨迹数据集合。 本发明中 可简称为实时数据。 Real-time trajectory data: The real-time trajectory data is a trajectory data set within a time zone that is closest to the current time. In the present invention, it may be simply referred to as real-time data.
交通态势: 一定时间、 一定空间内交通运行的综合情况的总称。 Traffic situation: A general term for the comprehensive situation of traffic operations within a certain period of time and within a certain space.
交通异常: 交通事故、 车辆抛铺、 货车落物、 道路交通设施损坏或故障等事件引发的交通流紊乱 的情况。 Traffic anomalies: traffic flow disturbances caused by traffic accidents, vehicle dumping, truck falling, road traffic facilities damage or malfunctions.
交通异常严重性: 即交通流紊乱的严重性, 是正常状态下交通流与交通异常发生后交通流特征的 差异。 Abnormal traffic severity: The severity of traffic flow disorder is the difference in traffic flow characteristics after traffic flow and traffic anomalies in normal conditions.
交通异常指数: 交通异常严重性的量度。 范围为 0~10, 数值越大, 交通异常越严重。 Traffic Anomaly Index: A measure of the severity of traffic. The range is 0~10. The larger the value, the more serious the traffic anomaly.
交通环境: 作用于道路交通参与者的所有外界影响与力量的总和。 包括道路状况、 交通设施、 地 物地貌、 气象条件, 以及其他交通参与者的交通活动。 Traffic environment: The sum of all external influences and forces acting on road traffic participants. This includes road conditions, transportation facilities, landforms, meteorological conditions, and traffic activities of other transportation participants.
地图匹配: 将地理坐标与城市路网关联的过程。 Map Matching: The process of associating geographic coordinates with a city road network.
髙峰小时流量: 某城市道路断面一日内小时交通流量的最大值。 Peak hourly traffic: The maximum hourly traffic flow in a city's road section.
有限混合模型: 一种用简单密度模拟复杂密度的数学方法。 变量集合为 y、 成分数量为 K的有限 混合模型可表示为: p(y) = J lpl(y) 响应变量: 根据自变量发生改变的变量, 也称因变量。 Finite Mixing Model: A mathematical method of simulating complex density with simple density. A finite mixed model with a set of variables y and a component number K can be expressed as: p(y) = J l p l (y) Response variable: A variable that changes according to an independent variable, also called a dependent variable.
贝叶斯信息准则: 是在不完全情报下, 对部分未知的状态用主观概率估计, 然后用贝叶斯公式对 发生概率进行修正的结果可靠性的评价指标。 其计算方法为: Bayesian information criterion: It is an evaluation index of the reliability of the result of correcting the probability of occurrence by using the Bayesian formula for subjective probability estimation under partially incomplete information. Its calculation method is:
BIC = -2lnL + k - lnn BIC = -2lnL + k - lnn
式中, 为似然函数的最大值, ^为未知参数的个数, 《为样本量。
似然函数:似然函数是一种关于统计模型参数的函数。给定输出 X时,关于参数 Θ的似然函数 ( |x) (在数值上) 等于给定参数 后变量 的概率: Z( |x)=P( =x| )。 Where, is the maximum value of the likelihood function, ^ is the number of unknown parameters, "for the sample size. Likelihood function: The likelihood function is a function of the parameters of the statistical model. Given the output X, the likelihood function ( |x) on the parameter ( (in numerical value) is equal to the probability of the variable after the given parameter: Z( |x)=P( =x| ).
参数估计: 根据从总体中抽取的样本估计总体分布中包含的未知参数的方法。 Parameter Estimation: A method of estimating unknown parameters contained in the overall distribution based on samples taken from the population.
EM算法: 最大期望算法 (Expectation Maximization Algorithm) , 是一种迭代算法, 用于含有隐变 量的概率参数模型的最大似然估计或极大后验概率估计。 EM algorithm: The Expectation Maximization Algorithm is an iterative algorithm for the maximum likelihood estimation or the maximum posterior probability estimation of a probability parameter model with implicit variables.
Kullback- Leibler散度: 两个概率分布 P和 Q差异的一种量度。 Kullback-Leibler divergence: A measure of the difference between two probability distributions, P and Q.
Jensen-Shannon散度: 是 Kullback- Leibler散度的一种对称化形式。 Jensen-Shannon divergence: is a symmetrized form of Kullback-Leibler divergence.
K-Medoids算法:一种聚类算法,每次迭代都从当前类别中选取这样一个点 它到其他所有(当 前类别中的) 点的距离之和最小 作为新的中心点。 本发明的目的是建立一套基于浮动车轨迹记录系统, 利用历史 GNSS定位数据和实时 GNSS定位 数据, 结合交通环境信息识别道路交通异常事件的方案。 为了达到上述目的, 本发明提供了如下技术 方案: K-Medoids algorithm: A clustering algorithm that selects such a point from the current category for each iteration. It has the smallest sum of distances to all other points (in the current category) as the new center point. The object of the present invention is to establish a scheme based on a floating vehicle trajectory recording system, using historical GNSS positioning data and real-time GNSS positioning data, combined with traffic environment information to identify road traffic anomalies. In order to achieve the above object, the present invention provides the following technical solutions:
本发明的实施前提是: 搭载 GNSS轨迹记录仪的浮动车 (出租车、 公交车等); 具有大规模存储、 计算、 实时任务处理能力的数据中心。 The premise of the present invention is: a floating car (a taxi, a bus, etc.) equipped with a GNSS track recorder; a data center having a large-scale storage, calculation, and real-time task processing capability.
本发明的适用范围是: 有上述浮动车经过的城市道路 (包括地面道路和高架道路)。 The scope of application of the present invention is: Urban roads (including ground roads and elevated roads) through which the above-mentioned floating vehicles pass.
本发明的实施步骤包括: The implementation steps of the present invention include:
1) 确定检测的时空范围和建立时空子区。 1) Determine the spatio-temporal range of the test and establish the spatio-temporal sub-area.
基于实际的应用需求, 确定需要进行交通异常事件检测的时间范围和空间范围。 时间范围可以设 定为全天, 即 0:00-24:00; 也可以设定为某一特定的时段, 例如要检测 17:00-20:00这个时段的交通异 常时间, 则将检测时间范围设定为 17:00-20:00, 这里只是列举一个特殊实例, 还有很多其他情况, 此 处不再一一说明。 空间范围可以按照行政区划设置为某个市域, 例如北京市、 上海市、 黄浦区等; 也 可以按照城市空间结构设置为某个城市功能区, 例如某市中央商务区、 工业区等。 Based on actual application requirements, determine the time range and spatial extent for which traffic anomaly events need to be detected. The time range can be set to all day, that is, 0:00-24:00; it can also be set to a specific time period. For example, to detect the traffic abnormal time during the period from 17:00 to 20:00, the time will be detected. The range is set from 17:00 to 20:00. Here is just a special example. There are many other situations, which are not explained here. The spatial scope can be set as a certain city area according to the administrative division, such as Beijing, Shanghai, Huangpu District, etc. It can also be set as a certain urban functional area according to the urban spatial structure, such as a central business district and industrial area of a certain city.
时空子区的建立是指, 将检测的时间范围划分为若干个更小的时间片段, 将检测的空间范围, 即 城市道路交通异常检测的实施区域, 划分为若干个更小的空间片段。 时空子区的建立, 可以采用多种 经验划分方法, 包括等距时空划分法和非等距时空划分法。 The establishment of the spatiotemporal sub-area refers to dividing the detected time range into a number of smaller time segments, and dividing the detected spatial range, that is, the implementation area of the urban road traffic anomaly detection, into a plurality of smaller spatial segments. For the establishment of spatiotemporal sub-areas, a variety of empirical division methods can be used, including equidistant space-time division method and non-equidistant space-time division method.
2) 数据预处理。 2) Data preprocessing.
将 GNSS定位数据进行数据清洗、数据集成、数据转换、数据归约,提高数据的结构化程度。 GNSS , 即全球导航卫星系统定位系统, 是能在地球表面或近地空间的任何地点为提供全天候三维坐标和速度 以及时间信息的空基无线电导航定位系统。 它主要包括美国的 GPS ( Global Positioning System) , 俄罗 斯的 GLONASS ( Global Navigation Satellite System)、 欧盟的 GALILEO和中国的北斗卫星导航系统四 大全球性导航定位系统, 同时还包括日本的 QZSS、 印度的 IRNSS 等区域导航定位系统以及美国的 WASS、 日本的 MSAS 等卫星定位增强系统。 为了在不同的导航定位系统设备中建立统一的数据分发 标准,美国国家海洋电子协会制定了统一的 NEMA (National Marine Electronics Association)通讯协议, 以规范 GNSS的数据广播。 因此, GNSS中的各个成员系统, 例如 GPS、 GLONASS等, 虽然分别由 不同国家和机构建立和维护, 但是拥有一致的数据分发格式, 因此不需要对数据格式进行变换。 The GNSS positioning data is used for data cleaning, data integration, data conversion, and data reduction to improve the structure of the data. GNSS, the Global Navigation Satellite System Positioning System, is a space-based radio navigation and positioning system that provides all-weather three-dimensional coordinates and speed and time information anywhere on the Earth's surface or near-Earth space. It mainly includes GPS (Global Positioning System) in the United States, GLONASS (Global Navigation Satellite System) in Russia, GALILEO in the European Union and China's Beidou satellite navigation system. It also includes QZSS in Japan and IRNSS in India. Such as regional navigation and positioning systems, as well as satellite positioning enhancement systems such as WASS in the United States and MSAS in Japan. In order to establish a unified data distribution standard among different navigation and positioning system devices, the National Ocean Electronics Association of the United States has developed a unified NEMA (National Marine Electronics Association) communication protocol to regulate GNSS data broadcasting. Therefore, each member system in GNSS, such as GPS, GLONASS, etc., although established and maintained by different countries and organizations, has a consistent data distribution format, so there is no need to transform the data format.
选定的空间范围内, 有许多安装 GNSS 定位设备的车辆, 常见的有出租车、 公交车、 货运汽车、 私家车等。 基于当前城市交通数据应用现状, 在实际应用当中, 通常选用城市出租车为浮动车作为交 通异常检测系统的数据来源。
采集的 GNSS定位信息中包含一些不合理的信息,为了保证交通异常状态检测判别结果的准确性, 首先需要进行甄别以提出异常的数据, 保证数据的可靠性。 这些异常数据包括: 落在检测时空范围之 外的数据、 明显超出合理范围的空间位置跳跃。 所谓 "明显超出合理范围的空间位置跳跃", 下面举例 说明之。 若某日 10:30:00时刻某辆浮动车定位设备上传的定位点记为 A, 当日 10:30:30时刻该浮动车 定位设备上传的定位点记为 B, 位置 A与位置 B的距离为 1500米, 那么据此计算得到该浮动车的行 驶速度至少为 180km/h, 超出了一般常识, 因此是一种异常的空间位置跳跃, 数据处理中应当予以剔 除。 Within the selected space, there are many vehicles equipped with GNSS positioning equipment, such as taxis, buses, freight cars, private cars, etc. Based on the current application status of urban traffic data, in practical applications, urban taxis are often used as floating vehicles as data sources for traffic anomaly detection systems. The collected GNSS positioning information contains some unreasonable information. In order to ensure the accuracy of the traffic abnormal state detection and discrimination results, it is first necessary to identify the abnormal data and ensure the reliability of the data. These anomalous data include: data that falls outside the time and space of detection, and spatial position jumps that are clearly out of reasonable range. The so-called "space position jump beyond the reasonable range" is illustrated below. If the positioning point uploaded by a floating vehicle positioning device is recorded as A at 10:30:00 on a certain day, the positioning point uploaded by the floating vehicle positioning device at time 10:30:30 is recorded as B, the distance between position A and position B. It is 1500 meters, then the speed of the floating car is calculated to be at least 180km/h, which is beyond the common sense, so it is an abnormal spatial position jump, which should be eliminated in data processing.
3) 快速地图匹配。 3) Quick map matching.
经过预处理后的 GNSS定位数据, 需要结合城市路网数据, 通过地图匹配算法, 将 GNSS定位点 投影到城市地图, 建立定位点与路段的匹配关系, 并修正定位漂移带来的误差。 After pre-processed GNSS positioning data, it is necessary to combine the urban road network data, map the GNSS positioning points to the city map through the map matching algorithm, establish the matching relationship between the positioning points and the road segments, and correct the error caused by the positioning drift.
目前各个地理区域的电子地图都已较为详实, 这种电子地图可以来源于城市的地理信息系统, 当 然也可以来源自其他方式和途经。 这些电子地图对城市道路信息进行了详细刻画, 通过划分可以得到 若干路段。 通过借助距离、 角度等信息, 将定位点匹配到路段上, 这样就实现了将定位信息匹配到实 际的地理环境中。 At present, the electronic maps of various geographical regions are relatively detailed. Such electronic maps can be derived from the city's geographic information system, and of course can also be derived from other ways and means. These electronic maps detail the urban road information, and several sections can be obtained by dividing. By matching the anchor points to the road segments by means of distance, angle, etc., the positioning information is matched to the actual geographical environment.
4) 浮动车路径的表示和不同车辆路径的匹配。 4) The representation of the floating vehicle path and the matching of different vehicle paths.
在给定一组起终点的前提下,车辆的路径可能不是唯一的。复杂的城市交通路网包含了若干路段, 将这些不同的路段进行编号, 例如, 将路段表示为 Ll, L2等。 道路可能有两个不同的行驶方向, 在这 种情况下, 应该将两个不同的行驶方向表示为两个不同的路段, 给予不同的路段编号。 The path of the vehicle may not be unique given a set of starting and ending points. The complex urban traffic network consists of several sections, which are numbered, for example, Ll, L2, etc. Roads may have two different directions of travel. In this case, two different directions of travel should be represented as two different sections, given different sections.
给定的起点和终点, 通常可采用城市路网中路段的交点。 已知某浮动车行驶的路径, 现需要从其 他浮动车已经发送的路径信息中, 选择与该浮动车路径相同的路径, 从而获得起点和终点间的同路径 组。 For a given starting point and ending point, the intersection of the road segments in the urban road network can usually be used. Knowing the path of a floating car, it is now necessary to select the same path as the floating car path from the path information that has been sent by other floating cars, so as to obtain the same path group between the starting point and the ending point.
5) 数据抽样。 5) Data sampling.
浮动车的定位数据中, 包含位置坐标、 瞬时车速、 记录时间等信息。 在本专利提出的基于浮动车 数据的城市道路交通异常检测方法中, 数据抽样是指从全部的浮动车数据中筛选出部分数据进行后续 的分析处理, 这种筛选是基于数据中心的计算能力以及预先提出的精度要求而进行的。 基于不同的计 算能力和精度要求, 可采用不同的数据抽样方法。 例如, 当数据中心的计算能力较强, 且对检测的精 度要求较高时, 可以将全部的浮动车定位数据作为处理对象, 进行全面的处理分析; 而当数据中心的 计算能力有限时, 假定当前的数据中心能够在 1分钟内, 对每个空间子区处理 500条数据, 而实际情 况是在 1分钟每个空间子区能产生了 2000条浮动车定位数据,那么可以从 2000条数据中随机抽取 500 条数据进行分析, 从而在数据中心的计算能力范围内, 获得精度受限的处理结果。 The positioning data of the floating car includes information such as position coordinates, instantaneous vehicle speed, and recording time. In the urban road traffic anomaly detection method based on floating car data proposed in this patent, data sampling refers to screening part of the data from all floating car data for subsequent analysis and processing, and the screening is based on the computing power of the data center and Pre-proposed accuracy requirements were made. Different data sampling methods can be used based on different calculation capabilities and accuracy requirements. For example, when the computing power of the data center is strong and the accuracy of the detection is high, all the floating vehicle positioning data can be treated as a processing object, and comprehensive processing analysis is performed; and when the computing power of the data center is limited, it is assumed The current data center can process 500 data for each spatial sub-area within 1 minute, but the actual situation is that in 2000, each floating sub-area can generate 2000 floating-vehicle positioning data, so it can be from 2000 data. Randomly extract 500 data for analysis, so as to obtain processing results with limited accuracy within the computing power of the data center.
根据对浮动车数据利用方式的不同, 可以针对浮动车数据的不同属性进行采样, 例如行程车速和 行程时间等。 本专利中提出的基于浮动车数据的城市道路交通异常检测方法, 采用行程车速作为基础 进行城市道路交通异常检测。 因此, 数据抽样是指对行程车速进行抽样。 Depending on how the floating car data is used, it is possible to sample different properties of the floating car data, such as the travel speed and travel time. The urban road traffic anomaly detection method based on floating car data proposed in this patent uses the travel speed as the basis for urban road traffic anomaly detection. Therefore, data sampling refers to sampling the speed of the journey.
6) 历史轨迹数据分析和特征提取。 6) Historical trajectory data analysis and feature extraction.
所谓历史轨迹数据, 是指在长期的城市道路交通运行中积累下来的浮动车轨迹数据。 利用历史浮 动车轨迹数据, 可以建立城市道路交通特征模型, 用来反映城市交通运行的一般特性。 这里所说的城 市道路交通特征模型, 可以指某些特定的指标, 例如平均速度、 加权平均速度等; 也可以指各种某种 统计模型, 例如行程速度的概率分布。 以往的很多模型, 采用单一的指标来表示某个路段或区域的交 通特征 (如历史平均车速), 这种方式虽然应用简便, 但是精度不高, 敏感性差, 往往不能在交通异常
状态检测中发挥良好的效果。 因此, 本专利提出对于每个时空子区, 用交通特征变量的概率分布描述 交通特征, 建立交通特征模型并进行参数估计。 The so-called historical trajectory data refers to the floating vehicle trajectory data accumulated in long-term urban road traffic operations. Using historical floating vehicle trajectory data, an urban road traffic feature model can be established to reflect the general characteristics of urban traffic operations. The urban road traffic feature model mentioned here can refer to certain specific indicators, such as average speed, weighted average speed, etc.; it can also refer to various statistical models, such as the probability distribution of travel speed. In many previous models, a single indicator was used to represent the traffic characteristics of a certain section or area (such as the historical average speed). Although this method is simple, the accuracy is not high, the sensitivity is poor, and often it is not in traffic abnormalities. Good results in state detection. Therefore, this patent proposes to describe the traffic characteristics by using the probability distribution of traffic characteristic variables for each spatiotemporal sub-area, establish a traffic feature model and perform parameter estimation.
可采集的交通特征变量,包括行程车速和行程时间等,本专利中所述的交通特征变量的概率分布, 指行程车速的概率分布。 The traffic characteristic variables that can be collected, including the travel speed and travel time, etc., the probability distribution of the traffic characteristic variables described in this patent refers to the probability distribution of the travel speed.
7) 实时轨迹数据分析和特征提取。 7) Real-time trajectory data analysis and feature extraction.
所谓实时轨迹数据, 是指距离当前时刻不远的一段时间内的交通运行中浮动车的轨迹数据。 利用 实时浮动车轨迹数据, 可以掌握交通特征的变化动态, 用来反映当前交通运行的即时特性。 本专利采 用当前时空子区的行程速度描述当前交通特征。 The so-called real-time trajectory data refers to the trajectory data of the floating car in traffic operation in a period of time not far from the current time. Using real-time floating car trajectory data, you can grasp the dynamics of traffic characteristics and reflect the current characteristics of current traffic operations. This patent uses the travel speed of the current space-time sub-zone to describe the current traffic characteristics.
8) 异常检测。 8) Anomaly detection.
系统状态异常检测的思想最早由 Dennrng提出,即通过监视系统审计记录上系统使用的异常情况, 可以检测出违反安全、 可能引发系统异常的事件。 Dennrng建立的这种模型独立于任何特定的系统、 应 用环境、 系统弱点、 故障类型, 因而是一种普遍意义上的异常检测模型。 该模型包括主体、 客体、 审 计记录、 轮廓、 异常记录和活动规则 5个部分。 轮廓是用度量和统计模型来表示的主体相对于客体的 正常行为。 Dennrng 的模型定义了 3 种度量, 即事件计数器、 间隔定时器、 资源测量器, 并提出了 5 种统计模型, 即可操作模型、 均值和标准差模型、 多变量模型、 马尔可夫过程模型和时间序列模型。 Denning提出的模型通过对系统审计数据的分析, 建立起系统主体的基于统计的正常行为特征轮廓, 检 测时, 系统中的审计数据与已建立的主体的正常行为特征轮廓相比较, 若相异部分超过某个阈值, 就 认为是一个异常事件。 该模型奠定了异常检测的基础, 以后发展的许多异常检测方法和系统都是以它 为基础而发展起来的。 The idea of system state anomaly detection was first proposed by Dennrng, that is, by monitoring the abnormality of the system used in the system audit record, it is possible to detect an event that violates security and may cause a system abnormality. Dennrng's model is independent of any particular system, application environment, system vulnerability, and fault type, and is therefore a general anomaly detection model. The model consists of five parts: subject, object, audit record, outline, exception record and activity rule. A contour is a normal behavior of a subject relative to an object, represented by metrics and statistical models. Dennrng's model defines three metrics, namely event counter, interval timer, resource measurer, and proposes five statistical models, namely, operational model, mean and standard deviation models, multivariate models, Markov process models, and Time series model. The model proposed by Denning establishes the statistically-based normal behavioral feature profile of the system subject through the analysis of the system audit data. When the detection, the audit data in the system is compared with the normal behavioral feature profile of the established subject. Exceeding a certain threshold is considered an abnormal event. This model lays the foundation for anomaly detection, and many anomaly detection methods and systems developed in the future are developed on the basis of it.
近几年在异常检测技术的发展过程中, 引入了更多人工智能的方法, 以提高异常检测的性能。 这 些人工智能的方法主要包括数据挖掘、 人工神经网络、 模糊证据理论等。 数据挖掘的方法用来确定在 大量的数据集合中什么特征是最重要的。 该技术用于异常检测中主要是寻求一种正常模式更简洁的定 义, 而不是像传统的异常检测方法那样简单列举出所有的正常模式。 数据挖掘方法的引入使得检测系 统能仅通过识别正常模式中的主要特征, 就能够概括性地包括训练数据中所未包括的正常模式。 人工 神经网络异常检测问题可被看作是一个一般的数据分类问题.在前面谈到的统计异常检测中, 用户行为 数据按照某种统计准则被分为两类: 即异常行为和正常行为。 由于基于统计的方法在提取、 抽象审计 实例时存在一定困难, 可能造成较大误差, 必须依赖于一些概率分布假设, 一般需要凭经验和感觉来 刻画用户行为的度量, 所以引入了人工神经网络的聚类方法。 人工神经网络具有自学习自适应能力, 用代表正常用户行为的样本点来训练神经网络, 通过反复多次学习, 神经网络能从数据中提取正常的 用户或系统活动的模式, 并编码到网络结构中, 检测时, 将审计数据通过学习好的神经网络, 即可判 定系统是否正常。 由于异常的评判标准具有一定的模糊性, 所以模糊证据理论被引入到异常中, 如建 立一种基于模糊专家系统的入侵检测框架模型, 能较好地降低漏警率和虚警率。 In recent years, in the development of anomaly detection technology, more artificial intelligence methods have been introduced to improve the performance of anomaly detection. These methods of artificial intelligence mainly include data mining, artificial neural networks, and fuzzy evidence theory. Data mining methods are used to determine what features are most important in a large data set. This technique is used in anomaly detection mainly to seek a more concise definition of the normal mode, rather than simply enumerating all normal modes as in the conventional anomaly detection method. The introduction of data mining methods enables the detection system to generally include normal patterns not included in the training data simply by identifying the main features in the normal mode. The artificial neural network anomaly detection problem can be regarded as a general data classification problem. In the statistical anomaly detection mentioned above, user behavior data is divided into two categories according to certain statistical criteria: abnormal behavior and normal behavior. Because the statistical-based method has certain difficulties in extracting and abstracting the audit instance, it may cause large errors, and must rely on some probability distribution hypotheses. Generally, it is necessary to describe the measure of user behavior by experience and feeling, so the artificial neural network is introduced. Clustering method. The artificial neural network has the self-learning self-adaptive ability to train the neural network with sample points representing the normal user behavior. Through repeated learning, the neural network can extract the normal user or system activity patterns from the data and encode them into the network structure. In the detection, the audit data can be judged whether the system is normal by learning a good neural network. Because the anomaly evaluation criterion has certain ambiguity, the fuzzy evidence theory is introduced into the anomaly. For example, an intrusion detection framework model based on fuzzy expert system is established, which can better reduce the false alarm rate and false alarm rate.
本专利提出一种基于统计特征的异常检测方案, 基本思想是通过 Jensen-Shannon散度衡量历史交 通特征与实时交通特征的差异, 以实现异常交通状况的检测。 该方案具有可解释性好, 计算负担不大 的优点, 既克服了采用单一统计量检测不准确、 不及时的弱点, 又避免了人工神经网络等方法计算负 担重, 硬件要求高的缺陷。 This patent proposes an anomaly detection scheme based on statistical features. The basic idea is to measure the difference between historical traffic characteristics and real-time traffic characteristics by Jensen-Shannon divergence to achieve the detection of abnormal traffic conditions. The scheme has the advantages of good interpretability and little computational burden. It not only overcomes the inaccurate and untimely weakness of single statistic detection, but also avoids the defects of artificial neural network and other methods to calculate the negative load and high hardware requirements.
9) 异常严重性量化表征及异常信息发布。 9) Quantitative characterization of abnormal severity and release of abnormal information.
交通异常状况的严重性应该通过简洁明了的方式向公众发布, 以避开可能的拥堵区域, 提高城市 交通的运行效率。 异常状况的严重程度用交通异常指数表征, 范围为 0-10, 其中 0表示无异常, 10表
示高度异常。 The severity of traffic anomalies should be released to the public in a clear and concise manner to avoid possible congestion areas and improve the efficiency of urban traffic. The severity of the abnormal condition is characterized by the traffic anomaly index, ranging from 0-10, where 0 means no abnormality, 10 The height is abnormal.
异常的发生位置投影到电子地图上, 并通过智能移动设备 APP等形式公开发布。 The location of the anomaly is projected onto the electronic map and published publicly through the smart mobile device APP or the like.
10) 系统性能评价。 10) System performance evaluation.
系统性能的评价是指评价交通异常状态检测的准确性, 其评价指标包括误报率和漏报率。 误报率 和漏报率越低表明系统的性能越好。 所述步骤 1)中, 时空子区的划分具体可以采用以下方法: The evaluation of system performance refers to the evaluation of the accuracy of traffic abnormal state detection, and its evaluation indicators include false positive rate and false negative rate. The lower the false positive rate and the false negative rate, the better the performance of the system. In the step 1), the division of the spatiotemporal sub-area may specifically adopt the following method:
11) 等距时空划分法。确定时间维度的片段尺度, 时间片段跨度为固定值, 通常取 30mm作为一个 时间片段; 确定空间维度的片段尺度, 空间片段跨度为固定值, 通常取 200mX 200m的空间网格作为 一个空间片段; 11) Isometric space-time division method. Determining the segment scale of the time dimension, the time segment span is a fixed value, usually taking 30mm as a time segment; determining the segment scale of the spatial dimension, the spatial segment span is a fixed value, usually taking a spatial grid of 200m×200m as a spatial segment;
12) 基于路网密度的非等距时空划分法: 基于路网密度作为判断指标, 当路网密度大于等于 2km/km2时, 取 30min的时间片段和 200m X 200m的空间片段; 当路网密度小于 2km/km2时, 取 30min 的时间片段和 400m X 400m的空间片段; 12) Non-equidistant space-time division method based on road network density: Based on road network density as a judgment index, when the road network density is greater than or equal to 2km/km 2 , take 30min time segment and 200m X 200m space segment; When the density is less than 2km/km 2 , take a 30 min time segment and a 400 m X 400 m spatial segment;
13) 基于高峰小时流量的非等距时空划分法:基于高峰小时流量作为判断指标, 当高峰小时流量大 于等于 1000辆 /小时时, 取 30min的时间片段和 200m X 200m的空间片段; 当高峰小时流量小于 1000 辆 /小时时, 取 30min的时间片段和 400m X 400m的空间片段。 所述步骤 3)具体包含以下步骤: 13) Non-equidistant space-time division method based on peak hour flow: based on peak hour flow as a judgment indicator, when the peak hour flow rate is greater than or equal to 1000 vehicles/hour, take 30 min time segment and 200 m X 200 m space segment; When the flow rate is less than 1000 vehicles/hour, take a time segment of 30 minutes and a space segment of 400m X 400m. The step 3) specifically includes the following steps:
31) 将所需处理的空间区域划分为一定大小的格网, 每个格网区域的范围可表示为 31) Divide the space area to be processed into a grid of a certain size, and the range of each grid area can be expressed as
4 = {(xs,ys) \ xs ΓΛ 5 每个格网区域包含若干个路段, 把这些路段的集合表示为 RS, 所述路段的集合 中的每条路段表示为 ij, 并为每个路段赋予编号; 4 = {(x s , y s ) \ x s Γ Λ 5 Each grid area contains several road segments, and the set of these road segments is represented as R S , and each road segment in the set of the road segments is represented as ij. And assign a number to each road segment;
32) 判定定位点所在的格网区域, 并利用距离和方位角, 在路段的集合 中搜索某定位点 A所在 的路段^ 匹配方案包括: 32) Determine the grid area where the anchor point is located, and use the distance and azimuth angle to search for the section where the anchor point A is located in the set of road segments. ^ The matching scheme includes:
321) 单点匹配方案: 321) Single point matching scheme:
搜索距离点 A最近的路段, 当满足点 A的行驶方向角与路段 ij的方向角的差值小于阈值时, 即满 足 | < , 完成匹配, 所述的阈值 可取 2.5 ° , 5。 , 10° 等; 若不满足 | < , 在搜索空 间中删除路段 并继续搜索其他路段, 直至满足条件。 匹配方法如图 3所示。 Searching for the nearest road segment from the point A, when the difference between the traveling direction angle of the point A and the direction angle of the road segment ij is less than the threshold value, that is, the matching is completed, and the threshold value may be 2.5 °, 5. , 10°, etc.; if not satisfied, < , delete the segment in the search space and continue to search for other segments until the condition is met. The matching method is shown in Figure 3.
322) 点序列匹配方案: 322) Point sequence matching scheme:
本方案适用于高频浮动车数据。 将浮动车 GNSS数据采集频率表示为 f0= \l , 将时间上与 A相邻 的点 POHO) , Pfc+i。;)定义为 Α的 1-邻近点, P04-2iQ;), P 4+2iQ)定义为 A的 2-邻近点, 以此类推, 则 P(tA-kk) , Pfc+ 定义为 A的 /t-邻近点。 在/ Q<lHz时, 取 /t=l或 2。 取距离 A及 A的 /t-邻近点距离最 小的路段^并计算 A及 A的^邻近点行驶方向角的均值^ 4,,若满足 | . - | < ,完成匹配;否则, 搜索其他路段, 直至满足 | 一 | < 。 This program is suitable for high frequency floating car data. The floating vehicle GNSS data acquisition frequency is expressed as f 0 = \l , and the point POHO), Pfc+i, which is adjacent to A in time. ;) is defined as the 1-adjacent point of Α, P04-2i Q ;), P 4+2i Q ) is defined as the 2-adjacent point of A, and so on, then P(t A -kk) , Pfc+ is defined as A /t-adjacent point. When / Q <lHz, take /t=l or 2. Take the distance between the distances A and A/t-adjacent points and calculate the mean value of the driving direction angles of the neighboring points of A and A^, and if they meet | . - | < , complete the matching; otherwise, search for other road segments. Until the satisfaction of | a | <.
33) 利用路段的直线方程 (若为曲线路段则近似拆分为直线), 计算 GNSS定位点在路段上的投影 坐标, 减小因 GNSS定位漂移带来的误差。 具体方法采用 GNSS定位点直线投影法为: 33) Use the straight line equation of the road segment (if it is a curved road segment, it is roughly split into straight lines), calculate the projection coordinates of the GNSS positioning point on the road segment, and reduce the error caused by the GNSS positioning drift. The specific method uses the GNSS positioning point linear projection method as:
确定路段 ^的直线方程 (若路段为曲线, 则划分为若干直线路段):
y, - y, Determine the straight line equation of the section ^ (if the section is a curve, divide it into several straight sections): y, - y,
其中斜率为: 投影直线方程为: y-yA =- x- kyA -kyt +k2xt +xA Where the slope is: The projection line equation is: yy A =- x - ky A -ky t +k 2 x t +x A
解出投影坐标 p为: Solve the projected coordinates p is:
k2+l k 2 +l
k2yA + yt +kxA - xj k 2 y A + y t +kx A - x j
yP y P
在地图匹配过程后, 结合定位点坐标的时间戳数据, 将定位点匹配到时空子区 ( 所述步骤 5)具体可以采用以下方法之一: After the map matching process, combining the timestamp data of the coordinates of the positioning point, matching the positioning point to the space-time sub-region ( the step 5) may specifically adopt one of the following methods:
51) 速度信息的全样本方案。 由一个时空子区内各辆次浮动车的全部行程车速数据, 构成总体。 实 施方法是计算时空子区 内每辆车的行程车速: v^W ..""-1'", 其中 42...4— ln为时空子区 ^ 内的第 1个和第 2个 GNSS定位点间的距离, ......, 第《- 1个与第《个 GNSS定位点间的距离, —tn 为时空子区 内第 1个, ......, 第《个GNSS定位点的时间戳; 将每个时空子区内的数据不做筛选, 构成一个集合 ^, 用于后续处理。 51) Full sample plan for speed information. The total travel speed data of each sub-floating vehicle in a time and space sub-region constitutes the whole. The implementation method is to calculate the travel speed of each vehicle in the space-time sub-region: v^W ..""- 1 '", where 4 2 ... 4 - ln is the first and second in the space-time sub-area ^ The distance between the GNSS anchor points, ..., the distance between the -1 and the GNSS anchor points, -t n is the first in the space-time sub-region, ..., the first "Timestamp of a GNSS anchor point; the data in each spatio-temporal sub-area is not filtered to form a set ^ for subsequent processing.
52) 速度信息的时间平滑抽样方案。指定时间片段长度, 设定同一时间片段数据条数上限; 搜索一 个时空子区内时间各时间片段内的速度数据, 若时间片段内速度数据条数超过上限, 随机取上限条数 的数据用于后续处理。 实施方法是计算时空子区 内每辆车的行程车速: vf W ..""-1'", 其 中 2... 为时空子区 f 内的第 1个和第 2个 GNSS定位点间的距离, ......,第《-1个与第 n个 GNSS 定位点间的距离, ^…^为时空子区^内第丄个, ......, 第《个 GNSS定位点的时间戳; 指定时间片段 长度 同一时间片段数据条数上限;^∞; 搜索一个时空子区内时间第 各时间片段内的速度数据, 若 时间片段内速度数据条数超过上限; ^«, 随机取;^ «条数据加入 ^并用于后续处理。 所述步骤 6)具体可以采用以下方法之一: 52) Time-smooth sampling plan for speed information. Specify the length of the time segment, set the upper limit of the number of segments of the same time; search for the velocity data in each time segment of a time-space sub-region, if the number of velocity data in the time segment exceeds the upper limit, the data of the upper limit is taken Follow-up processing. The implementation method is to calculate the travel speed of each vehicle in the space-time sub-region: v f W ..""- 1 '", where 2 ... is the first and second GNSS positioning points in the space-time sub-area f Distance, ..., the distance between -1 and the nth GNSS anchor point, ^...^ is the space-time sub-area ^, ......, the first GNSS Time stamp of the anchor point; specify the upper limit of the number of segments of the time segment at the same time; ^ ∞; search for the velocity data in the time segment of the time and space sub-region, if the number of velocity data in the time segment exceeds the upper limit; ^« , randomly fetching; ^ «bar data is added to ^ and used for subsequent processing. The step 6) may specifically adopt one of the following methods:
61) 简单历史轨迹数据融合法。将无交通异常状况下的历史数据, 作为一个整体, 进行交通特征模 型建立和参数估计。 该方法利用有限混合模型, 建立交通特征模型, 并进行参数估计。 具体可采用以 下三种方案之一: 61) Simple historical trajectory data fusion method. The historical data under the condition of no traffic anomalies, as a whole, the traffic characteristic model establishment and parameter estimation. The method uses a finite mixing model to establish a traffic feature model and perform parameter estimation. Specifically, one of the following three options can be used:
611) 固定成分的混合高斯模型 611) Mixed Gaussian model of fixed composition
本方案采用固定成分数量 的混合高斯模型描述车速的概率分布。 成分数量根据车速在典型情况 下的分布模式人工指定。 为了保证概率分布的可靠性, 成分数量 不能过小。 一般可取 =4~6。 This scheme uses a mixed Gaussian model with a fixed component quantity to describe the probability distribution of vehicle speed. The number of components is manually specified according to the distribution pattern of the vehicle speed in a typical case. In order to ensure the reliability of the probability distribution, the number of components cannot be too small. Generally available = 4~6.
612) 成分数量可变的混合高斯模型 612) Mixed Gaussian model with variable composition
本方案采用一种基于模型评价的方法来选择合适的成分数量, 方法如下:
确定可能的最大成分数量 K, 并分别对《=1,2,…^个成分的混合高斯模型进行参数估计; 对于 K 个模型, 通过贝叶斯信息准则 (β/C) 确定最佳模型。 最大成分的数量一般按精度需求选取, 但必须注 意成分数量越多, 期望最大化算法收敛越慢。 这里选择的最大成分数量为 =5, 即需要计算: f»tA n {\, 2, ..., 5) 共 5种混合模型。 同时, 计算 5种模型的 其定义为: This program uses a model-based evaluation method to select the appropriate number of components, as follows: Determine the maximum number of possible components K, and estimate the parameters of the mixed Gaussian model of "1, 2, ... ^ components separately. For K models, determine the best model by Bayesian information criterion (β/C). The maximum number of components is generally selected according to the accuracy requirement, but it must be noted that the more the number of components, the slower the expectation maximization algorithm converges. The maximum number of components selected here is =5, which requires calculation: f»tA n {\, 2, ..., 5) A total of 5 mixed models. At the same time, the definition of the five models is defined as:
BIC = -2\nL + k - \nn BIC = -2\nL + k - \nn
式中, 为最大似然函数值, 为模型中参数的个数, 《为数据总量。 Where, is the maximum likelihood function value, which is the number of parameters in the model, "for the total amount of data.
之后, 选取 β/C最小的的混合模型, 记录其参数向量 ^ μ、 σ, 其中, η是所述历史交通特征模 型中各个子成分所占的比例向量, μ是所述历史交通特征模型中各个子成分的均值向量, σ是所述历史 交通特征模型中各个子成分的标准差向量, 作为本时空子区的特征记录。 混合模型的密度曲线形态在 图 6中示出。 After that, the mixed model with the smallest β/C is selected, and the parameter vectors ^ μ and σ are recorded, where η is the proportional vector occupied by each sub-component in the historical traffic feature model, and μ is the historical traffic feature model. The mean vector of each sub-component, σ is the standard deviation vector of each sub-component in the historical traffic feature model, as a feature record of the present-time sub-region. The density curve morphology of the hybrid model is shown in Figure 6.
613) 成分数量、 分布类型均可变的有限混合模型 613) Finite mixed model with variable composition and distribution type
本方案采用与 612)相同的基于模型评价的方法, 但子成分的分布形态和成分的数量均可变, 方法 如下: This scheme uses the same model-based evaluation method as 612), but the distribution of the sub-components and the number of components are variable. The method is as follows:
选取 Μ种概率分布模型作为子成分的分布类型, 包括但不限于: 正态分布、 伽马分布、 威布尔分 布。 使用正态分布时, 子分布函数采用: : exp The probability distribution model is chosen as the distribution type of the sub-components, including but not limited to: normal distribution, gamma distribution, Weibull distribution. When using a normal distribution, the sub-distribution function takes: : exp
1πσ1πσ
νν
使用伽马分布时, 子分布函数采用: 剥: v "" -le β", 其中 Γ(^ ) : 使用威布尔分布时, 子分布函数采用: 脚:When using a gamma distribution, the sub-distribution function uses: strip: v "" - l e β ", where Γ(^): When using the Weibull distribution, the sub-distribution function uses:
假定混合模型所有子成分的分布类型相同, 确定可能的最大成分数量 ^。 对于 M种子成分分布类 型、 种成分数量的选择, 共形成 M 种组合, 分别计算 δ/C值, 并取 δ/C最小的模型为最佳模型。 Assuming that the distribution types of all sub-components of the hybrid model are the same, determine the maximum number of possible components ^. For the selection of M seed component distribution types and the number of species components, M combinations are formed, and the δ/C values are calculated separately, and the model with the smallest δ/C is taken as the best model.
62) 分情境的历史轨迹数据分类法。 依据气温、 降水量、 能见度和交通管制措施, 将无交通异常状 况下的历史数据划分成不同的类别, 分别建立模型和进行参数估计。 实施方法如下: 62) Historical trajectory data classification method by context. According to the temperature, precipitation, visibility and traffic control measures, the historical data without traffic anomalies are divided into different categories, and models and parameter estimates are established. The implementation method is as follows:
根据气温、 降水量、 能见度和交通管制措施的不同, 将交通环境分为 5~8个类别, 由历史数据对 应的交通环境的不同, 将历史数据归入以上各个类别中。 对每个类别, 分别进行如同 5)所述的处理, 从而建立了映射关系 R( ^, 为交通环境, Γ为交通态势。 According to the temperature, precipitation, visibility and traffic control measures, the traffic environment is divided into 5~8 categories. The historical data corresponds to the different traffic environments, and the historical data is classified into the above categories. For each category, the processing as described in 5) is performed separately, thereby establishing a mapping relationship R (^, for the traffic environment, and for the traffic situation.
63) 历史数据聚类法。对于历史数据, 通过时空子区两两之间的比较, 获得不同时空区域的差异量 化表征, 并利用量化后的差异进行聚类。 将气温、 降水量、 能见度和交通管制措施作为特征因子, 进 行多项 Loglt回归, 建立交通环境与类别的映射关系。 实施流程参见附图 4。 实施步骤如下:
631) 根据 5)所述的方法, 建立交通特征模型, 并进行参数估计。 63) Historical data clustering method. For the historical data, the difference between the time and space sub-regions is obtained, and the difference quantitative representation of different space-time regions is obtained, and the quantized differences are used for clustering. Using temperature, precipitation, visibility and traffic control measures as characteristic factors, a number of Lo gl t regressions were performed to establish a mapping relationship between traffic environment and categories. See Figure 4 for the implementation process. The implementation steps are as follows: 631) According to the method described in 5), a traffic feature model is established, and parameter estimation is performed.
632) 根据之前的有限混合模型参数估计结果, 写出时空子区在不同日期对应的行程车速分布的概 率密度函数 p x) , 其参数以混合高斯模型为例: 632) Based on the previous finite-mix model parameter estimation results, write the probability density function p x) of the travel speed distribution corresponding to the spatio-temporal sub-region on different dates. The parameters are based on the mixed Gaussian model:
κ κ
A (vf ) =∑ ' ( ; σ 式中, 表示行程车速分布的子成分数量, //表示行程车速分布中某个子成分所占的比例, 表示 行程车速分布中某个子成分的均值, σ表示行程车速分布中某个子成分的标准差。 A ( v f ) =∑ '(; In σ , the number of subcomponents representing the speed distribution of the stroke, // represents the proportion of a subcomponent in the travel speed distribution, and represents the mean of a subcomponent in the travel speed distribution, σ Indicates the standard deviation of a subcomponent in the travel speed distribution.
633) 计算各分布两两之间的 Jensen-Shanno d,J : 633) Calculate Jensen-Shanno d, J between the two distributions :
d'j = JSD(P \\ Q) = \\ M) D'j = JSD(P \\ Q) = \\ M)
式中, P、 ρ为两个不同的概率分布, Μ = θΡ + ρ;), /?为 !^^-!^!^!"散度: Where P and ρ are two different probability distributions, Μ = θΡ + ρ;), /? For ! ^^-! ^!^!"Diversity:
D (P \\ Q) = ^ P(xk )log 在采用有限混合模型的情况下, 其值无法显式表示, 但可采用蒙特卡罗抽样方法近似计算, 其计 算方法是: D (P \\ Q) = ^ P(x k )log In the case of finite mixed model, the value cannot be explicitly expressed, but Monte Carlo sampling method can be used to approximate the calculation. The calculation method is:
^ D(f \\ g) 式中, Z)MC表示采用蒙特卡罗抽样方法近似计算得到的 Kullback-Leibler散度, /和 g表示任意两 个分布函数。 ^ D(f \\ g) where Z) MC represents the Kullback-Leibler divergence approximated by Monte Carlo sampling, and / and g represent any two distribution functions.
634) 将分布两两间的散度表示成距离矩阵: 634) Express the divergence between the two distributions as a distance matrix:
D - d„、 . .. d 该矩阵满足 4=4,, d,r0(i=j)。 D - d„, . . . d This matrix satisfies 4=4,, d, r 0(i=j).
635) 将距离矩阵作为 K-Medoids算法的输入, 得到聚类结果, 并对类别建立索引。 635) Using the distance matrix as input to the K-Medoids algorithm, the clustering results are obtained, and the categories are indexed.
636) 以类别索引为响应变量, 将交通环境数据 (包括气温、 降水量、 能见度等) 作为自变量, 进 行多项 Logit回归, 获取交通环境 E与交通态势类别 T的映射关系 R(£^。 636) Using the category index as the response variable, the traffic environment data (including temperature, precipitation, visibility, etc.) is used as an independent variable, and multiple Logit regression is performed to obtain the mapping relationship R between the traffic environment E and the traffic situation category T (£^.
637) 将相同类别的数据进行聚合,并利用聚合后新的数据集重新建立混合模型,并进行参数估计, 得到最终的历史交通特征数据集。 所述步骤 7)具体可以采用以下方法: 637) Aggregate the same type of data, and re-establish the hybrid model with the new data set after aggregation, and perform parameter estimation to obtain the final historical traffic characteristic data set. The step 7) may specifically adopt the following methods:
71) 简单实时数据处理法。 该方法与 61)同时实施。 将实时交通数据进行模型建立和参数估计, 获 取当前交通状况的特征函数。 该方法的实施步骤, 与 61)完全相同, 只是采用的数据是实时交通数据。 71) Simple real-time data processing. This method is implemented simultaneously with 61). The real-time traffic data is modeled and parameter estimated to obtain the characteristic function of the current traffic condition. The implementation steps of the method are exactly the same as 61), except that the data used is real-time traffic data.
72) 分类处理法。该方法与 62)或 63)同时实施。获取交通状况的特征函数, 同时获取当前的气温、 降水量、 能见度、 交通管制措施等信息, 并判断当前交通状况的类别。 实施流程参见附图 5。 实施步骤
如下: 72) Classification processing. This method is carried out simultaneously with 62) or 63). Obtain the characteristic function of the traffic condition, and obtain the current information such as temperature, precipitation, visibility, traffic control measures, etc., and judge the current traffic condition category. See Figure 5 for the implementation process. Implementation steps as follows:
721) 计算时空子区内的行程车速, 构成实时行程车速总体 721) Calculate the travel speed in the time-space sub-zone, which constitutes the overall real-time travel speed
722) 建立行程车速概率分布模型 (ν^) = | - ί^ ,- μ^σ^, 并进行参数估计; 722) Establish a travel speed probability distribution model (ν^) = | - ί^ , - μ^σ^, and perform parameter estimation;
723) 将当前交通环境数据 (包括气温、 降水量、 能见度等) 作为输入参数, 利用映射关系 R(£^) 获得当前交通态势的所述类别 T。 所述步骤 8)具体包含以下步骤: 723) Using the current traffic environment data (including temperature, precipitation, visibility, etc.) as an input parameter, use the mapping relationship R(£^) to obtain the category T of the current traffic situation. The step 8) specifically includes the following steps:
81) 当采用步骤 72)时, 根据当前交通态势所属类别 Τ, 定位该类别下历史交通特征数据, 否则不 进行处理; 81) When step 72) is adopted, according to the current traffic situation category Τ, the historical traffic characteristic data under the category is located, otherwise it will not be processed;
82) 根据当前交通特征的描述参数 τ 、 μ,^ 和历史交通特征的描述参数 η、 μ、 σ计算两个速度 分布间的差异:
ΙΙ Ρ) » 其中, ^是实时交通特征模型中各个子成分所 占的比例向量, ^是实时交通特征模型中各个子成分的均值向量, (Jrt是实时交通特征模型中各个子成 分的标准差向量; η是历史交通特征模型中各个子成分所占的比例向量, μ是历史交通特征模型中各个 子成分的均值向量, σ是历史交通特征模型中各个子成分的标准差向量。当历史交通特征与实时交通特 征 (即历史行程车速分布与实时行程车速分布) 相近时, 将得到较小的 Jensen-Shannon散度值, 即两 者之间的差异较小; 当历史交通特征与实时交通特征差别较大时, 将得到较大的 Jensen-Shannon散度 值, 即两者之间的差异较大, 亦即存在异常的概率较大, 参见附图 7。 所述步骤 9)具体包含以下步骤: 82) Calculate the difference between the two velocity distributions according to the description parameters τ, μ, ^ of the current traffic characteristics and the description parameters η, μ, σ of the historical traffic characteristics: ΙΙ Ρ) » where ^ is the proportional vector of each subcomponent in the real-time traffic feature model, ^ is the mean vector of each subcomponent in the real-time traffic feature model, (J rt is the standard for each subcomponent in the real-time traffic feature model Difference vector; η is the proportional vector of each sub-component in the historical traffic feature model, μ is the mean vector of each sub-component in the historical traffic feature model, and σ is the standard deviation vector of each sub-component in the historical traffic feature model. When the traffic characteristics and real-time traffic characteristics (that is, the historical travel speed distribution and the real-time travel speed distribution) are similar, a smaller Jensen-Shannon divergence value will be obtained, that is, the difference between the two is small; when historical traffic characteristics and real-time traffic When the difference in characteristics is large, a larger Jensen-Shannon divergence value is obtained, that is, the difference between the two is large, that is, the probability of existence of an abnormality is large, see FIG. 7. The step 9) specifically includes the following step:
91) 将各个时空子区的速度分布差异标准化为 0~1的规范化数值《ί: 91) Normalize the difference in velocity distribution of each space-time sub-region to a normalized value of 0~1 .
diff^ - min(diff) Diff^ - min(diff)
ξ' max、diff、 - min、diff、 ξ' max, diff, - min, diff,
92) 计算各个时空子区的交通异常指数
10; 92) Calculate the traffic anomaly index for each time and space sub-region 10;
93) 将异常指数高于 5的区域位置投影到电子地图上, 并智能移动设备 APP等形式向社会公开发 布, 以使司机避开潜在拥堵点, 提高城市道路交通的通行效率。 所述步骤 10)具体包含以下步骤: 93) Project the location of the region with an anomaly index higher than 5 onto the electronic map, and publish it to the public in the form of an intelligent mobile device APP, so that the driver can avoid potential congestion points and improve the traffic efficiency of urban road traffic. The step 10) specifically includes the following steps:
101) 计算交通异常状态的漏报率: 101) Calculate the false negative rate of traffic anomaly:
= -^ χ 100% = -^ χ 100%
na n a
102) 计算交通异常状态的误报率: α2 = ^ χ 100% 以上两式中, 为单位时间内漏报事件总数, 为单位时间内误报事件总数, 《。为单位时间内实际
发生的异常时间总数。 本发明相较于同一领域的相似技术, 具有以下优点: 102) Calculate the false alarm rate of traffic abnormal state: α 2 = ^ χ 100% In the above two formulas, the total number of missed events per unit time, which is the total number of false positive events per unit time. Actual time per unit time The total number of abnormal times that occurred. The present invention has the following advantages over similar technologies in the same field:
(1) 充分利用现有的浮动车运营数据 (GNSS 轨迹数据), 通过历史交通特征提取和实时交通态势 分析, 检测交通状态发生的变化, 可以实现城市道路交通异常事件实时性、 低成本、 智能化检测; (1) Make full use of existing floating vehicle operation data (GNSS trajectory data), detect historical traffic state changes through historical traffic feature extraction and real-time traffic situation analysis, and realize real-time, low-cost, intelligent urban road traffic anomaly events Detection
(2) 将交通特征参数的概率分布作为交通特征的描述, 反映的特征更加全面, 避免了利用单一指数 表征交通特征的片面性、 不稳定性, 检测的可靠性更高; (2) Taking the probability distribution of traffic characteristic parameters as the description of traffic characteristics, the characteristics reflected are more comprehensive, avoiding the one-sidedness and instability of traffic characteristics using a single index, and the reliability of detection is higher;
(3) 针对交通特征受到交通环境 (如天气状况) 影响的特点, 引入了聚类一多项 Loglt回归联合算 法, 建立了交通环境特征与交通态势类别的映射关系; (3) In view of the characteristics of traffic characteristics affected by traffic environment (such as weather conditions), a clustering multiple Lo gl t regression algorithm was introduced to establish the mapping relationship between traffic environment characteristics and traffic situation categories.
(4) 经实际数据的检验, 本发明提出的基于浮动车数据的城市道路交通异常检测技术, 能够实现准 确度较高的异常事件检测, 检测率超过 90%, 漏报率低于 15%, 误报率低于 20%, 取得了良好的检测 效果, 可以应用于城市交通智能化管理、 服务。 附图说明 (4) According to the test of actual data, the urban road traffic anomaly detection technology based on floating car data proposed by the present invention can realize the detection of abnormal events with high accuracy, the detection rate exceeds 90%, and the false negative rate is less than 15%. The false alarm rate is lower than 20%, and it has achieved good detection results, and can be applied to urban traffic intelligent management and service. DRAWINGS
本发明的具体内容及优点结合以下附图将变得明晰和易于理解, 其中: The details and advantages of the present invention will become apparent and readily understood in conjunction with the following drawings in which:
图 1示出了本发明的组成要素和基本原理示意图; Figure 1 shows a schematic diagram of the components and basic principles of the present invention;
图 2示出了本发明在实施过程中的总体流程示意图; Figure 2 is a schematic view showing the overall flow of the present invention in the implementation process;
图 3示出了本发明快速地图匹配算法实施方式示意图; 3 is a schematic diagram showing an implementation manner of a fast map matching algorithm of the present invention;
图 4示出了本发明实施历史交通特征提取方案的流程示意图; 4 is a schematic flow chart showing a historical traffic feature extraction scheme implemented by the present invention;
图 5示出了本发明实施实时交通特征提取方案的流程示意图; FIG. 5 is a schematic flow chart showing a real-time traffic feature extraction scheme implemented by the present invention; FIG.
图 6示出了高斯混合模型概率分布的形态示意图; Figure 6 shows a schematic diagram of the morphology of the Gaussian mixture model probability distribution;
图 7示出了历史交通特征与实时交通特征比较过程中差异的衡量示意图。 具体实施方案 Figure 7 shows a measurement of the difference in the comparison between historical traffic characteristics and real-time traffic characteristics. Specific implementation
为了更加清晰明确地表述本发明的目的、 技术方案和优势, 下面对本发明的具体实施方案进行详 细描述。 In order to more clearly and clearly clarify the objects, technical solutions and advantages of the present invention, the specific embodiments of the present invention are described in detail below.
如附图 1所示, 本发明的整体系统构架包括: 浮动车搭载的车载 GNSS轨迹记录仪、 数据中心、 GNSS卫星以及通信系统。 此处的 GNSS包括 GPS、 GLONASS、 GALILEO、 北斗、 IRNSS、 QZSS等 任何类似的导航卫星定位系统。 出租车、 公交车等浮动车搭载的 GNSS轨迹记录仪, 以一定的采样频 率/ (一般要求戶 0.1Hz) 记录车辆在行驶中各时点的位置信息, 并通过 GPRS移动通信网络 (亦可采 用 WCDMA、 TD-LTE 等无线网络通信技术, 但成本将相应提高) 将位置信息实时发送至数据中心。 数据中心通过数据预处理、 数据融合, 并通过特定算法建立历史道路交通特征数据库; 对于最近接收 的实时数据, 建立实时交通特征数据库; 通过历史数据库和实时数据库的映射关系, 判别当前交通特 征是否异常, 并通过处理终端进行可视化展示并生成交通异常事件报告。 As shown in Fig. 1, the overall system architecture of the present invention includes: an onboard GNSS track recorder mounted on a floating vehicle, a data center, a GNSS satellite, and a communication system. The GNSS here includes GPS, GLONASS, GALILEO, Beidou, IRNSS, QZSS and any similar navigation satellite positioning system. GNSS track recorders equipped with floating cars, buses, etc., record the position information of the vehicle at various points in time at a certain sampling frequency / (general requirements of 0.1 Hz), and pass the GPRS mobile communication network (also can be used) Wireless network communication technologies such as WCDMA and TD-LTE, but the cost will be increased accordingly) The location information will be sent to the data center in real time. The data center establishes a historical road traffic characteristic database through data preprocessing, data fusion, and through a specific algorithm; establishes a real-time traffic feature database for the recently received real-time data; and determines whether the current traffic feature is abnormal through the mapping relationship between the historical database and the real-time database And visualize the display through the processing terminal and generate a traffic anomaly event report.
方案的总体流程参见图 2, 包括采集和存储 GNSS轨迹数据, 建立时空子区, 历史交通特征提取, 实时交通特征提取, 异常识别等步骤。 采集和存储 GNSS轨迹数据, 是整个方案的数据基础, 由于数 据量级巨大, 应采用分布式存储方案, 对于分布式存储目前已有成熟的技术, 不是本发明的内容。 建 立时空子区, 其基本假设是在某一特定区域、 特定时段内, 有着相同的交通特征, 这一假设, 经过长 期观测, 是普遍适用的。 历史交通特征提取, 其原理是利用 GNSS轨迹数据, 计算得到行程车速, 利
用同一时空子区大量的行程车速数据, 建立车速的概率分布模型, 并进行参数估计, 用少量参数表征 交通特征。 实时交通特征提取, 其原理是将当前时间段内的速度数据进行处理分析, 同样建立当前的 车速概率分布模型。 异常识别是采用差异衡量指标, 判断实时特征相较于历史特征的变化程度, 根据 其是否达到阈值, 确定是否出现交通异常事件。 The overall process of the scheme is shown in Figure 2, including the acquisition and storage of GNSS trajectory data, the establishment of spatiotemporal sub-areas, historical traffic feature extraction, real-time traffic feature extraction, and anomaly identification. Collecting and storing GNSS trajectory data is the data foundation of the whole scheme. Due to the huge amount of data, a distributed storage scheme should be adopted. For distributed storage, there are mature technologies, which are not the content of the present invention. The basic assumption of establishing a spatiotemporal sub-area is that it has the same traffic characteristics in a certain area and a specific time period. This assumption is generally applicable after long-term observation. Historical traffic feature extraction, the principle is to use the GNSS trajectory data to calculate the travel speed, benefit Using a large number of travel speed data in the same space-time sub-area, the probability distribution model of the vehicle speed is established, and the parameters are estimated, and the traffic characteristics are characterized by a small number of parameters. Real-time traffic feature extraction, the principle is to process and analyze the speed data in the current time period, and also establish the current vehicle speed probability distribution model. The abnormality identification is to use the difference measurement index to judge the degree of change of the real-time feature compared with the historical feature, and determine whether a traffic anomaly event occurs according to whether it reaches the threshold.
根据发明内容所述实施方法的组合, 给出实施例如下。 实施例一 According to the combination of the embodiments of the invention, the implementation is given below. Embodiment 1
步骤 11、采用等距时空划分法, 确定时间维度的片段尺度, 时间片段跨度为固定值, 通常取 30mm 作为一个时间片段; 确定空间维度的片段尺度, 空间片段跨度为固定值, 通常取 200mX200m的空间 网格作为一个空间片段。 Step 11. Using the equidistant space-time division method, determining the segment scale of the time dimension, the time segment span is a fixed value, usually taking 30 mm as a time segment; determining the segment scale of the spatial dimension, the spatial segment span is a fixed value, usually taking 200m×200m The spatial grid acts as a spatial fragment.
步骤 12、 进行数据预处理, 将 GNSS定位数据进行数据清洗、 数据集成、 数据转换、 数据归约, 提高数据的结构化程度。 Step 12. Perform data preprocessing to perform data cleaning, data integration, data conversion, and data reduction on the GNSS positioning data to improve the structural degree of the data.
步骤 13、 将所需处理的空间区域划分为一定大小的格网, 每个格网区域的范围可表示为 Step 13. Divide the space area to be processed into a grid of a certain size, and the range of each grid area can be expressed as
4
- 判定定位点所在的格网区域, 并利用距离和方位角, 搜索定 位点所在的路段; 搜索距离点 Α最近的路段, 取阈值 ^=2.5° , 当满足点 A的行驶方向角与路段 ^的 方向角的差值小于阈值 时, 即满足 |< , 完成匹配; 若不满足 | - |< , 在搜索空间中删 除路段 并继续搜索其他路段, 直至满足条件; 利用路段的直线方程 (若为曲线路段则近似拆分为 直线), 计算 GNSS定位点在路段上的投影坐标, 减小因 GNSS定位漂移带来的误差, 具体方法为: 确定路段 ^的直线方程 (若路段为曲线, 则划分为若干直线路段): 4 - Determine the grid area where the anchor point is located, and use the distance and azimuth to search for the section where the anchor point is located; Search for the nearest section of the point, take the threshold ^=2.5°, and satisfy the travel direction angle and section of point A^ When the difference of the direction angle is less than the threshold, that is, |< is satisfied; the match is completed; if the - -<< is not satisfied, the link is deleted in the search space and the other sections are continued to be searched until the condition is satisfied; the straight line equation of the link is used (if The curved section is approximately split into straight lines. The projection coordinates of the GNSS positioning point on the road segment are calculated to reduce the error caused by the GNSS positioning drift. The specific method is: Determine the straight line equation of the road segment ^ (if the road segment is a curve, then divide For several straight segments):
y, - y, y, - y,
其中斜率为: 投影直线方程为: y-yA =- x- kyA -ky +k2x +χΛ Where the slope is: The projection line equation is: yy A =- x - ky A -ky +k 2 x +χ Λ
解出投影坐标 p为: Solve the projected coordinates p is:
k2+\ k 2 +\
k2yA + yt +kxA - xj k 2 y A + y t +kx A - x j
yP y P
k2 +l k 2 +l
在地图匹配过程后, 结合定位点坐标的时间戳数据, 将定位点匹配到时空子区。 After the map matching process, the anchor point is matched to the spatio-temporal sub-area in combination with the timestamp data of the coordinates of the positioning point.
步骤 14、 由一个时空子区内各辆次浮动车的全部行程车速数据, 构成总体。 计算时空子区 内每 辆车的行程车速: 其中 2... — 为时空子区 f 内的第 1个和第 2个 GNSS定 位点间的距离, ......,第《-1个与第 n个 GNSS定位点间的距离, 为时空子区 内第 1个, ......, 第《个 GNSS定位点的时间戳;将每个时空子区内的数据不做筛选,构成一个集合 V{,用于后续处理。 Step 14. The total travel speed data of each of the secondary floating cars in a time and space sub-zone constitutes an overall. Calculate the travel speed of each vehicle in the time-space sub-region: where 2 ... is the distance between the first and second GNSS anchor points in the space-time sub-region f, ..., "-1 The distance from the nth GNSS anchor point is the first time in the space-time sub-region, ..., the time stamp of the GNSS anchor point; the data in each spatio-temporal sub-region is not filtered. , constitute a collection V { for subsequent processing.
步骤 15、 将无交通异常状况下的历史数据, 作为一个整体, 进行交通特征模型建立和参数估计。
该方法利用有限混合模型, 建立交通特征模型, 并进行参数估计。 取最大成分数量 K=5, 并分别对 «=1,2,···, 个成分的混合高斯模型进行参数估计; 对于 个模型, 通过贝叶斯信息准则 ΒΙΟ 确定最 佳模型。 计算: = ΪΆ n {\,2,...,5} 共 5种混合模型。 同时, 计算 5种模型的 Step 15. As a whole, the historical data under the condition of no traffic anomaly is used to establish a traffic feature model and estimate the parameters. The method uses a finite mixing model to establish a traffic feature model and perform parameter estimation. Take the maximum component quantity K=5, and estimate the parameters of the mixed Gaussian model of «=1,2,···, respectively. For each model, determine the best model by Bayesian information criterion. Calculation: = ΪΆ n {\,2,...,5} A total of 5 mixed models. At the same time, calculate the five models
BIC = -2\nL + k - \an BIC = -2\nL + k - \an
式中, 为最大似然函数值, 为模型中参数的个数, 《为数据总量。 Where, is the maximum likelihood function value, which is the number of parameters in the model, "for the total amount of data.
之后, 选取 β/C最小的的混合模型, 记录其参数向量^ μ、 σ, 作为本时空子区的特征记录。 步骤 16、 将实时交通数据进行模型建立和参数估计, 获取当前交通状况的特征函数, 方法同步骤 一五, 记录参数向量 τ 、 μ、 a After that, the mixed model with the smallest β/C is selected, and its parameter vectors ^ μ and σ are recorded as the feature records of the present time-space sub-region. Step 16. Perform real-time traffic data model establishment and parameter estimation to obtain a characteristic function of the current traffic condition. The method is the same as step one, and the parameter vector τ, μ, a is recorded.
步骤 17、 根据当前交通特征的描述参数 τ 、 μ,^ 和历史交通特征的描述参数 η、 μ、 σ计算两个 速度分布间的差异:
JSD(/)ri || P)。 Step 17. Calculate the difference between the two velocity distributions according to the description parameters τ, μ, ^ of the current traffic characteristics and the description parameters η, μ, σ of the historical traffic characteristics: JSD(/) ri || P).
步骤 18、 将各个时空子区的速度分布差异标准化为 0~1的规范化数值 ad: Step 18: Normalize the difference in velocity distribution of each space-time sub-region to a normalized value a d of 0~1:
diffi} - min(diff) Diff i} - min(diff)
ξ' max、diff、 - min、diff、 ξ' max, diff, - min, diff,
计算各个时空子区的交通异常指数
10。 实施例二 Calculate the traffic anomaly index of each time and space sub-region 10. Embodiment 2
步骤 21、采用等距时空划分法, 确定时间维度的片段尺度, 时间片段跨度为固定值, 通常取 30mm 作为一个时间片段; 确定空间维度的片段尺度, 空间片段跨度为固定值, 通常取 200m X 200m的空间 网格作为一个空间片段。 Step 21: Using the equidistant space-time division method, determining the segment scale of the time dimension, the time segment span is a fixed value, usually taking 30 mm as a time segment; determining the segment scale of the spatial dimension, the spatial segment span is a fixed value, usually taking 200 m X The 200m spatial grid acts as a spatial fragment.
步骤 22、 进行数据预处理, 将 GNSS定位数据进行数据清洗、 数据集成、 数据转换、 数据归约, 提高数据的结构化程度。 Step 22. Perform data preprocessing to perform data cleaning, data integration, data conversion, and data reduction on the GNSS positioning data to improve the structural degree of the data.
的空间区域划分为一定大小的格网, 每个格网区域的范围可表示为
- 判定定位点所在的格网区域, 并利用距离和方位角, 搜索定 位点所在的路段; 搜索距离点 Α最近的路段, 取阈值 ^=2.5 ° , 当满足点 A的行驶方向角与路段 ^的 方向角的差值小于阈值 时, 即满足 | - | < , 完成匹配; 若不满足 | - | < , 在搜索空间中删 除路段 并继续搜索其他路段, 直至满足条件; 利用路段的直线方程 (若为曲线路段则近似拆分为 直线), 计算 GNSS定位点在路段上的投影坐标, 减小因 GNSS定位漂移带来的误差, 具体方法为: 确定路段 ^的直线方程 (若路段为曲线, 则划分为若干直线路段): The space area is divided into grids of a certain size, and the range of each grid area can be expressed as - Determine the grid area where the anchor point is located, and use the distance and azimuth to search for the section where the anchor point is located; Search for the nearest section of the point, take the threshold ^=2.5 °, when the travel direction angle and the section of the point A are satisfied ^ When the difference of the direction angle is less than the threshold, that is, | - | < is satisfied, and the match is completed; if the | - | < is not satisfied, the link is deleted in the search space and the other sections are continued to be searched until the condition is satisfied; If it is a curved road segment, it is roughly split into straight lines. Calculate the projection coordinates of the GNSS positioning point on the road segment, and reduce the error caused by the GNSS positioning drift. The specific method is: Determine the straight line equation of the road segment ^ (if the road segment is a curve, It is divided into several straight line segments):
y, - y, y, - y,
其中斜率为: k :
投影直线方程为:Where the slope is: k : The projection line equation is:
kyA -kyt +k2xt +xA Ky A -ky t +k 2 x t +x A
解出投影坐标 P为:
Solve the projected coordinate P as:
在地图匹配过程后, 结合定位点坐标的时间戳数据, 将定位点匹配到时空子区。 步骤 24、计算时空子区 内每辆车的行程车速: ,其中^ ... 1?!为时空子
After the map matching process, the anchor point is matched to the spatio-temporal sub-area in combination with the timestamp data of the coordinates of the positioning point. Step 24: Calculate the travel speed of each vehicle in the space-time sub-region: , where ^ ... 1?! is a time-space sub-
区 内的第 1个和第 2个 GNSS定位点间的距离, ......,第《-1个与第 η个 GNSS定位点间的距离, ^... 为时空子区 内第 1个, ......, 第《个 GNSS定位点的时间戳; 指定时间片段长度 同一时间片段 数据条数上限;^ «; 搜索一个时空子区内时间第 各时间片段内的速度数据, 若时间片段内速度数据条 数超过上限 p 随机取 p 条数据加入 ^。。 The distance between the first and second GNSS anchor points in the zone, ..., the distance between the -1 and the nth GNSS anchor points, ^... is the space-time sub-zone 1 , ..., the first time stamp of a GNSS anchor point; specify the maximum length of the clip data at the same time segment length; ^ «; search for the speed data in the time segment of the time and space sub-region If the number of speed data in the time segment exceeds the upper limit p, randomly f data is added to ^. .
步骤 25、 将无交通异常状况下的历史数据, 作为一个整体, 进行交通特征模型建立和参数估计。 该方法利用有限混合模型, 建立交通特征模型, 并进行参数估计。 取最大成分数量 K=5, 并分别对 «=1,2,···, 个成分的混合高斯模型进行参数估计; 对于 个模型, 通过贝叶斯信息准则 ΒΙΟ 确定最 佳模型。 计算:
Step 25. Perform historical traffic data without traffic anomalies as a whole, and perform traffic feature model establishment and parameter estimation. The method uses a finite mixing model to establish a traffic feature model and perform parameter estimation. Take the maximum component quantity K=5, and estimate the parameters of the mixed Gaussian model of «=1,2,···, respectively. For each model, determine the best model by Bayesian information criterion. Calculation:
共 5种混合模型。 同时, 计算 5种模型的 There are 5 mixed models in total. At the same time, calculate the five models
BIC = -2\nL + k-\an BIC = -2\nL + k-\an
式中, 为最大似然函数值, 为模型中参数的个数, 《为数据总量。 Where, is the maximum likelihood function value, which is the number of parameters in the model, "for the total amount of data.
之后, 选取 β/C最小的的混合模型, 记录其参数向量^ μ、 σ, 作为本时空子区的特征记录。 根据参数估计结果, 写出时空子区在不同日期对应的行程车速分布的概率密度函数 p,(x):
After that, the mixed model with the smallest β/C is selected, and its parameter vectors ^ μ and σ are recorded as the feature records of the local time and space sub-region. According to the parameter estimation result, the probability density function p, (x) of the travel speed distribution corresponding to the spatio-temporal sub-region on different dates is written:
计算各分布两两之间的 Jensen-Shannon散度 d1J: Calculate the Jensen-Shannon divergence d 1J between each of the two distributions :
d^JSDiPWQ) ---D(P\\M) + -D(Q\\M) 式中, P、 β为两个不同的概率分布, Μ -(P + Q), D为 Kullback— Leibler散度:
d^JSDiPWQ) ---D(P\\M) + -D(Q\\M) where P and β are two different probability distributions, Μ -(P + Q), D is Kullback- Leibler Divergence:
在采用有限混合模型的情况下, 采用蒙特卡罗抽样方法近似计算, 其计算方法是: In the case of a finite mixing model, Monte Carlo sampling is used to approximate the calculation. The calculation method is:
»MC( II g) = II g)»MC( II g) = II g)
将分布两两间的散度表示成距离矩阵: The divergence between the distributions is expressed as a distance matrix:
D = D =
d„、 . .. d 该矩阵满足 4=4,, d,r0(i=j)。 d„, . . . d The matrix satisfies 4=4,, d, r 0(i=j).
将距离矩阵作为 K-Medoids算法的输入, 得到聚类结果, 并对类别建立索引。 The distance matrix is used as the input of the K-Medoids algorithm to obtain clustering results and index the categories.
以类别索引为响应变量, 将交通环境数据 (包括气温、 降水量、 能见度等) 作为自变量, 进行多 项 Logit回归, 获取交通环境 E与交通态势类别 T的映射关系 R(E 。 Using the category index as the response variable, the traffic environment data (including temperature, precipitation, visibility, etc.) is used as an independent variable, and multiple Logit regressions are performed to obtain the mapping relationship between the traffic environment E and the traffic situation category T (E.
将相同类别的数据进行聚合, 并利用聚合后新的数据集重新建立混合模型, 并进行参数估计, 得 到最终的历史交通特征数据集。 The same type of data is aggregated, and the mixed model is re-established with the new data set after aggregation, and the parameter estimation is performed to obtain the final historical traffic characteristic data set.
步骤 26、 获取交通状况的特征函数, 同时获取当前的气温、 降水量、 能见度、 交通管制措施等信 息, 并判断当前交通状况的类别。 Step 26: Obtain a characteristic function of the traffic condition, and obtain information such as current temperature, precipitation, visibility, traffic control measures, and the type of the current traffic condition.
计算时空子区内的行程车速, 构成实时行程车速总体 Virt', 建立行程车速概率分布模型 ρ ξ = ^η; , 并进行参数估计; 将当前交通环境数据 (包括气温、 降水量、 能见度等) 作为输入参数, 利用映射关系 R(£^)获得 当前交通态势的所述类别 Γ。 Calculate the travel speed in the time-space sub-zone, form the real-time travel speed overall V irt ', establish the travel speed probability distribution model ρ ξ = ^η ; and make parameter estimation; the current traffic environment data (including temperature, precipitation, visibility, etc.) As an input parameter, the category Γ of the current traffic situation is obtained using the mapping relationship R(£^).
步骤 27、 根据当前交通态势所属类别 Τ, 定位该类别下历史交通特征数据; 根据当前交通特征的 描述参数 τ 、 μ„、 (Jrt和历史交通特征的描述参数 η、 μ、 σ 计算两个速度分布间的差异: diff [(η, , μΓί , σΓί ) , (η, μ, σ)] = JSD(Prt 11 Ρ)。 Step 27. According to the category of the current traffic situation, locate the historical traffic characteristic data of the category; calculate two parameters according to the description parameters τ, μ„, (J rt and the description parameters η, μ, σ of the historical traffic feature) The difference between the velocity distributions: diff [(η, , μ Γί , σ Γί ) , (η, μ, σ)] = JSD(P rt 11 Ρ).
步骤 28、 将各个时空子区的速度分布差异标准化为 0~1的规范化数值《ί: Step 28 : Normalize the difference in velocity distribution of each spatiotemporal sub-region to a normalized value of 0 to 1 .
diff} - min(diff) Diff } - min(diff)
ξ' max、diff、 - min、diff、 ξ' max, diff, - min, diff,
计算各个时空子区的交通异常指数
10。 实施例三 Calculate the traffic anomaly index of each time and space sub-region 10. Embodiment 3
步骤 31、 采用非等距时空划分法, 对于路网密度大于 2km/km2或高峰小时流量大于 1000辆 /小时 的城市中心区, 取 30min的时间片段和 200m X 200m的空间片段, 对于路网密度小于 2km/km2或高峰 小时流量小于 1000辆 /小时的城市郊区, 取 30min的时间片段和 400m X 400m的空间片段。 Step 31: Using a non-equidistant space-time division method, for a central area of the city where the road network density is greater than 2 km/km 2 or the peak hour flow is greater than 1000 vehicles/hour, a 30 min time segment and a 200 m X 200 m space segment are taken. For urban suburbs with a density of less than 2km/km 2 or a peak hour flow of less than 1000 vehicles per hour, a 30-minute time segment and a 400 m X 400 m space segment are taken.
步骤 32、 进行数据预处理, 将 GNSS定位数据进行数据清洗、 数据集成、 数据转换、 数据归约, 提高数据的结构化程度。 Step 32: Perform data preprocessing, perform data cleaning, data integration, data conversion, and data reduction on the GNSS positioning data to improve the structural degree of the data.
步骤 33、 将所需处理的空间区域划分为一定大小的格网, 每个格网区域的范围可表示为 4 = {(xs ^s ) \ xs Step 33: Divide the space area to be processed into a grid of a certain size, and the range of each grid area can be expressed as 4 = {( x s ^s ) \ x s
将浮动车 GNSS数据采集频率表示为
, Pfc+io)定义为 A的
1-邻近点, ^04-2¾), ^04+2¾)定义为 A的 2-邻近点, 以此类推, 则 Ρθ4-/¾;), 定义为 Α的 /- 邻近点。 在/ Q<lHz时, 取^ =1或 2。 取距离 A及 A的 邻近点距离最小的路段^ 并计算 A及 A的Express the floating vehicle GNSS data acquisition frequency as , Pfc+io) is defined as A 1-adjacent points, ^04-23⁄4), ^04+23⁄4) are defined as 2-adjacent points of A, and so on, then Ρθ4-/3⁄4;), defined as // adjacent points of Α. When / Q <lHz, take ^ =1 or 2. Take the distance between the neighboring points of the distances A and A and calculate the A and A
^邻近点行驶方向角的均值 ^4·, 取阈值 ^=5° , 若满足 |^.- |< , 完成匹配; 否则, 搜索其他路 段, 直至满足条件。 ^The mean value of the driving direction angle of the adjacent point is ^4·, and the threshold is ^=5°. If |^.- |< is satisfied, the matching is completed; otherwise, other sections are searched until the condition is met.
利用路段的直线方程 (若为曲线路段则近似拆分为直线),计算 GNSS定位点在路段上的投影坐标, 减小因 GNSS定位漂移带来的误差。 具体方法为: Use the straight line equation of the road segment (if it is a curved road segment, it is roughly split into straight lines), calculate the projection coordinates of the GNSS positioning point on the road segment, and reduce the error caused by the GNSS positioning drift. The specific method is:
确定路段 ^的直线方程 (若路段为曲线, 则划分为若干直线路段): y-y^kix-x,) y, - y, Determine the straight line equation of the road segment ^ (If the road segment is a curve, divide it into several straight line segments): y-y^kix-x,) y, - y,
其中斜率为: 投影直线方程为: The slope is: The projection line equation is:
kyA -kyt +k2xt +xA Ky A -ky t +k 2 x t +x A
解出投影坐标 p为: Solve the projected coordinates p is:
k2+l k 2 +l
-h , -kx: -h , -kx :
yP y P
在地图匹配过程后, 结合定位点坐标的时间戳数据, 将定位点匹配到时空子区。 步骤 34、计算时空子区 内每辆车的行程车速: 其中 2...4— 1,«为时空子
After the map matching process, the anchor point is matched to the spatio-temporal sub-area in combination with the timestamp data of the coordinates of the positioning point. Step 34: Calculate the travel speed of each vehicle in the space-time sub-region: where 2 ... 4 - 1, « is a time slot
区 内的第 1个和第 2个 GNSS定位点间的距离, ......,第《-1个与第 n个 GNSS定位点间的距离, ^... 为时空子区 内第 1个, ......, 第《个 GNSS定位点的时间戳; 指定时间片段长度 同一时间片段 数据条数上限;^ «; 搜索一个时空子区内时间第 各时间片段内的速度数据, 若时间片段内速度数据条 数超过上限 p 随机取 p 条数据加入 ^。。 The distance between the first and second GNSS anchor points in the zone, ..., the distance between the -1 and the nth GNSS anchor point, ^... is the space-time sub-zone 1 , ..., the first time stamp of a GNSS anchor point; specify the maximum length of the clip data at the same time segment length; ^ «; search for the speed data in the time segment of the time and space sub-region If the number of speed data in the time segment exceeds the upper limit p, randomly f data is added to ^. .
步骤 35、 将无交通异常状况下的历史数据, 作为一个整体, 进行交通特征模型建立和参数估计。 该方法利用有限混合模型, 建立交通特征模型, 并进行参数估计。 取最大成分数量 K=5, 并分别对 «=1,2,···, 个成分的混合高斯模型进行参数估计; 对于 个模型, 通过贝叶斯信息准则 ΒΙΟ 确定最 佳模型。 计算: "e{l,2 ..,5} 共 5种混合模型。 同时, 计算 5种模型的 Step 35: Perform historical data of no traffic anomaly as a whole, and establish traffic feature model and parameter estimation. The method uses a finite mixing model to establish a traffic feature model and perform parameter estimation. Take the maximum component quantity K=5, and estimate the parameters of the mixed Gaussian model of «=1,2,···, respectively; for each model, determine the best model by Bayesian information criterion ΒΙΟ. Calculation: "e{l,2 ..,5} A total of 5 hybrid models. At the same time, calculate 5 models
BIC = -2\nL + k-\nn BIC = -2\nL + k-\nn
式中, 为最大似然函数值, 为模型中参数的个数, 《为数据总量。 Where, is the maximum likelihood function value, which is the number of parameters in the model, "for the total amount of data.
之后, 选取 β/C最小的的混合模型, 记录其参数向量^ μ、 σ, 作为本时空子区的特征记录 ε 根据参数估计结果, 写出时空子区在不同日期对应的行程车速分布的概率密度函数 p,(x):
计算各分布两两之间的 Jense -Shannon散度 dy:
After that, the mixed model with the smallest β/C is selected, and the parameter vectors ^ μ and σ are recorded. As the characteristic record ε of the present space-time sub-region, according to the parameter estimation result, the probability of the distribution of the speed of the space-time sub-region corresponding to the different speeds is written. Density function p, (x): Calculate the Jense-Shannon divergence d y between each of the two distributions:
式中, P、 ρ为两个不同的概率分布, Μ=^0Ρ + ρ;), /?为 !^^-!^!^!"散度: Where P and ρ are two different probability distributions, Μ=^0Ρ + ρ;), /? For ! ^^-! ^!^!"Diversity:
D(P\\Q) = ^P(xk)log 在采用有限混合模型的情况下, 采用蒙特卡罗抽样方法近似计算, 其计算方法是:
D(P\\Q) = ^P(x k )log In the case of a finite-mixing model, Monte Carlo sampling is used to approximate the calculation. The calculation method is:
将分布两两间的散度表示成距离矩阵: The divergence between the distributions is expressed as a distance matrix:
D- d„、 . .. d 该矩阵满足 4=4,, d,r0(i=j)。 D- d„, . . . d This matrix satisfies 4=4,, d, r 0(i=j).
将距离矩阵作为 K-Medoids算法的输入, 得到聚类结果, 并对类别建立索引。 The distance matrix is used as the input of the K-Medoids algorithm to obtain clustering results and index the categories.
以类别索引为响应变量, 将交通环境数据 (包括气温、 降水量、 能见度等) 作为自变量, 进行多 项 Logit回归, 获取交通环境 E与交通态势类别 T的映射关系 R(E 。 将相同类别的数据进行聚合, 并利用聚合后新的数据集重新建立混合模型, 并进行参数估计, 得 到最终的历史交通特征数据集。 Using the category index as the response variable, the traffic environment data (including temperature, precipitation, visibility, etc.) is used as an independent variable to perform multiple logit regression to obtain the mapping relationship between the traffic environment E and the traffic situation category T (E. The same category will be used. The data is aggregated, and the hybrid model is re-established with the new data set after aggregation, and the parameter estimation is performed to obtain the final historical traffic characteristic data set.
步骤 36、 获取交通状况的特征函数, 同时获取当前的气温、 降水量、 能见度、 交通管制措施等信 息, 并判断当前交通状况的类别。 Step 36: Obtain a characteristic function of the traffic condition, and obtain current information such as temperature, precipitation, visibility, traffic control measures, and the type of the current traffic condition.
计算时空子区内的行程车速, 构成实时行程车速总体 VIRT', 建立行程车速概率分布模型 pjv rt) = f lj -/;( ^^^;), 并进行参数估计; 将当前交通环境数据 (包括气温、 降水量、 能见度等) 作为输入参数, 利用映射关系 R(E 得 当前交通态势的所述类别 Γ。 Calculate the travel speed in the time-space sub-zone, form the overall V IRT ' of the real-time travel speed, establish the travel speed probability distribution model pj v rt ) = f lj - / ; ( ^^^ ; ), and make parameter estimation; The data (including temperature, precipitation, visibility, etc.) is used as an input parameter, and the mapping relationship R (E is used to obtain the category 当前 of the current traffic situation.
步骤 37、 根据当前交通态势所属类别 Τ, 定位该类别下历史交通特征数据; 根据当前交通特征的 描述参数 τ 、 μ„、 (Jrt和历史交通特征的描述参数 η、 μ、 σ 计算两个速度分布间的差异: diff [(η, , μΓί , σΓί ) , (η, μ, σ)] = JSD(PRT 11 Ρ)。 Step 37: Locating the historical traffic characteristic data of the category according to the current traffic situation category ;; calculating two parameters according to the description parameters τ , μ „, (J rt and the description parameters η, μ, σ of the historical traffic feature) The difference between the velocity distributions: diff [(η, , μ Γί , σ Γί ) , (η, μ, σ)] = JSD(P RT 11 Ρ).
步骤 38、 将各个时空子区的速度分布差异标准化为 0~1的规范化数值《ί: Step 38 : Normalize the difference in velocity distribution of each space-time sub-region to a normalized value of 0~1 .
diff^ - in(diff) Diff^ - in(diff)
max [diff、 - min [diff、 Max [diff, - min [diff,
计算各个时空子区的交通异常指数 ^f xli^
Calculate the traffic anomaly index of each spatiotemporal sub-area ^f xli^