WO2020108219A1 - Traffic safety risk based group division and difference analysis method and system - Google Patents

Traffic safety risk based group division and difference analysis method and system Download PDF

Info

Publication number
WO2020108219A1
WO2020108219A1 PCT/CN2019/114373 CN2019114373W WO2020108219A1 WO 2020108219 A1 WO2020108219 A1 WO 2020108219A1 CN 2019114373 W CN2019114373 W CN 2019114373W WO 2020108219 A1 WO2020108219 A1 WO 2020108219A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
data
attribute
risk
traffic
Prior art date
Application number
PCT/CN2019/114373
Other languages
French (fr)
Chinese (zh)
Inventor
刘林
吕伟韬
陈凝
饶欢
Original Assignee
江苏智通交通科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 江苏智通交通科技有限公司 filed Critical 江苏智通交通科技有限公司
Publication of WO2020108219A1 publication Critical patent/WO2020108219A1/en

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0133Traffic data processing for classifying traffic situation

Definitions

  • the invention relates to a method and system for group division and difference analysis based on traffic safety risks.
  • the purpose of the present invention is to provide a method and system for group division and difference analysis based on traffic safety risks, to make up for the defects of the integrated learning algorithm in the description of the risk calibration process through statistical methods, and to dig out the causes of accidents of groups with different risk levels 3.
  • the different characteristics of accident results solve the problem that the existing technology-specific traffic safety management application scenarios for individuals are relatively limited.
  • a group division and difference analysis method based on traffic safety risk which takes drivers and motor vehicles as the object, calibrates the object safety risk through integrated learning algorithm, then divides the group on this basis, and identifies significant differences through statistical methods Indicators; includes the following steps,
  • S1 Determine the objects of traffic participants, including drivers and motor vehicles; obtain the historical records of traffic violations and traffic accidents based on the information of the target objects as sample data;
  • step S3 Determine the secondary attribute dimension of the target object according to the sample data obtained in step S1, and divide it into the secondary attribute set of the cause of the accident and the secondary attribute set of the result of the accident; split the secondary attribute into three levels to determine each secondary level The three-level attribute factor corresponding to the attribute;
  • the construction process of the risk prediction model specifically includes data label definition and data set division, model feature variable screening based on embedding method, data set equalization processing, model training based on cross-validation, and acceptance based on ROC curve
  • the model performance evaluation of the operator's operating curve and the area under the curve AUC selects the model with the best fitting effect; the risk index output by the model is the label classification probability of the data.
  • the fields of the group division data table include object information, time, three-level attribute factors, risk degree, and belonging group; wherein the data of the belonging group field belongs to the threshold range of each group risk degree according to the risk degree of the object information The situation is ok.
  • step S5 Fisher's exact test result is determined according to p_value, and variables with significant differences are used as group safety feature attributes. Specifically, if the fuzzy solution of p-value p_value is less than the set value, then the null hypothesis H0 is accepted; otherwise Reject the original hypothesis H0 and accept the hypothesis H1.
  • a traffic safety risk-based group division and difference analysis system implementing any of the above-mentioned traffic safety risk-based group division and difference analysis methods, including a data docking module, a risk prediction module, and an attribute factor analysis module And group feature recognition module,
  • Data docking module extract traffic accident records and traffic violation records from the database
  • Risk prediction module access to the historical traffic violation data and traffic accident data of the data docking module as samples for model construction; define data labels, divide sample data sets; filter model feature variables; perform balanced processing of data sets; adopt cross-validation Method to train the model, select the model with the best fitting effect according to the ROC curve and AUC value; complete the construction of the risk prediction model, and extract the historical traffic violation records of the specified target object from the data docking module according to user instructions, through the model Process and output the predicted value of the risk degree of the target object; generate a risk degree table;
  • Attribute factor analysis module access the sample data of the data docking module, determine the second-level attribute according to the original sample data field; determine the third-level attribute factor corresponding to the second-level attribute according to the specific value of the sample data field, where the second-level attribute is discrete data , Then the third-level attribute factor is the corresponding data value range. If the second-level attribute is continuous data, the third-level attribute factor is determined through discretization; the second-level attribute table and the third-level attribute table are generated;
  • Group feature recognition module the risk degree prediction module is connected to the risk degree table, and the attribute factor analysis module obtains the second-level attribute table and the third-level attribute table; the group division data table is generated according to the setting of the risk threshold interval; Fisher exact Test and Monte Carlo simulation calculation method to determine the secondary attribute p-value and write it into the secondary attribute table; filter out the secondary attribute with p-value less than the set value, as the differential characteristics of different groups, generate a group characteristic table.
  • a visualization module obtaining a group division data table and a group characteristic table from the group characteristic recognition module, and counting each group sample according to the three-level attributes corresponding to the different characteristics to generate the different characteristics of each group Table; call the visualization engine and use thematic maps to visualize and display the differential characteristics of each group and the statistics of the three-level attribute samples.
  • This grouping and difference analysis method and system based on traffic safety risk mines the characteristics of traffic participants' traffic behavior performance, and calibrates the degree of their safety risk, in order to reduce the application of data for analysis and judgment Granularity, according to the degree of risk, the target groups of different security levels are divided; at the same time, in order to overcome the problem of the lack of interpretation of the integrated learning algorithm in the process of security risk calibration, Fisher Fisher’s exact test is used to identify significant differences between groups, so Accurately describe the characteristics of each risk level group to provide data support for active traffic safety governance.
  • the present invention performs traffic safety risk ratings on traffic participants such as drivers and vehicles, and takes groups of traffic participants of the same level as objects, and explores the differences between groups to solve the traffic safety management of individuals. The problem of relatively limited application scenarios.
  • FIG. 1 is a schematic flowchart of a method for group division and difference analysis based on traffic safety risks according to an embodiment of the present invention.
  • FIG. 2 is an explanatory diagram of a group division and difference analysis system based on traffic safety risks in an embodiment.
  • a group division and difference analysis method based on traffic safety risk which takes drivers and motor vehicles as the object, calibrates the object safety risk through integrated learning algorithm, then divides the group on this basis, and identifies significant differences through statistical methods Index; as shown in figure 1, the specific steps are:
  • the target object information of the driver is the ID number
  • the target object information of the motor vehicle is the combination of the number plate type and the number plate number
  • the time range of historical records usually exceeds one year to ensure a sufficient sample size.
  • the risk degree is the label classification probability of the sample data after the model processing.
  • the risk prediction model construction process includes data label definition and data set division, model feature variable selection based on embedding method, data set equalization processing, model training based on cross-validation, based on ROC curve (receiver operation curve) and under curve
  • the model performance evaluation of the area AUC selects the model with the best fitting effect; the risk index output by the model is the label classification probability of the data.
  • a method for combining the improved sampling method and the RF random forest algorithm is used to construct a risk prediction model, with a model coverage rate of 0.06 and an accuracy of 0.889.
  • step S3 Determine the secondary attribute dimension of the target object according to the sample data obtained in step S1, and divide it into the secondary attribute set of the cause of the accident and the secondary attribute set of the result of the accident; split the secondary attribute into three levels to determine each secondary level The three-level attribute factor corresponding to the attribute.
  • the corresponding elements in the secondary attribute set of the cause of the accident include gender, age, nationality, hukou nature, type of person, driving age, accident identification reason, blood alcohol content, seat belt helmet usage, etc. ;
  • Target the vehicle the secondary attributes of the cause of the accident include the type of vehicle, mode of transportation, nature of use of the vehicle, mileage, legal status, insurance, whether it is overloaded, the status of the light, the amount of load, etc.;
  • the secondary attributes of the accident result include the form of the accident , Accident level, direct property loss, accident liability, etc.
  • step S4 synthesizing the processing results of steps S2 and S3, establishing a group division data table to determine the sample group attribution;
  • the field of the group division data table includes object information, time, three-level attribute factor, risk degree, and belonging group; among which the group field data It is determined according to the attribution of the risk degree of the object information within the risk degree threshold range of each group.
  • the risk threshold interval of the general group is [0,0.15]
  • the risk group interval is (0.15,0.8)
  • the risk group interval is [0.8,1.0].
  • the script for checking the difference of attribute variable values between groups is edited by R language, the fisher.test function in the stats statistical method package is called, the parameter simulate.p.value is set to TRUE, and the number of Monte Carlo simulations B is set to 105; p value is less than 0.05, accept the null hypothesis H0, otherwise reject the null hypothesis H0.
  • the person is the target object
  • Fisher's exact test is performed on the secondary attributes of the accident result.
  • the resulting R*C contingency table for the accident level, accident form, accident liability, and direct property loss is as follows:
  • This method of group division and difference analysis based on traffic safety risk is used to rate traffic safety risks for traffic participants such as drivers and vehicles, and to target groups composed of traffic participants of the same level to explore the differences between groups. , Solve the problem that the application scenarios of traffic safety management for individuals are relatively limited.
  • a traffic participant group division and feature research and judgment system includes a data docking module, a risk prediction module, an attribute factor analysis module, a group feature recognition module, and a visualization module.
  • the data docking module extracts traffic accident records and traffic violation records from the database.
  • Risk prediction module access to the historical traffic violation data and traffic accident data of the data docking module, as a sample of the model construction; define data labels, divide the sample data set; filter model feature variables; perform balanced processing of the data set; use cross-validation Method to train the model, select the best fitting model according to the ROC curve and AUC value; this module completes the construction of the risk prediction model, and extracts the historical traffic violation records of the specified target object from the data docking module according to user instructions, Through the model processing, the predicted risk value of the target object is output; a risk degree table is generated.
  • Attribute factor analysis module access to the sample data of the data docking module, determine the secondary attribute according to the original sample data field; determine the tertiary attribute factor corresponding to the secondary attribute according to the specific value of the sample data field, where the secondary attribute is discrete data , Then the third-level attribute factor is the corresponding data range. If the second-level attribute is continuous data, the third-level attribute factor is determined through discretization; the second-level attribute table and the third-level attribute table are generated.
  • the risk degree prediction module accesses the risk degree table
  • the attribute factor analysis module obtains the second-level attribute table and the third-level attribute table; generates the group division data table according to the setting of the risk threshold interval; adopts Fisher's exact Test and Monte Carlo simulation calculation method to determine the secondary attribute p-value and write it into the secondary attribute table; filter out the secondary attribute with p-value less than the set value, as the differential characteristics of different groups, generate a group characteristic table.
  • the set value is preferably 0.05.
  • the visualization module obtains the group division data table and the group characteristic table from the group characteristic recognition module, counts the samples of each group according to the three-level attributes corresponding to the different characteristics, and generates the different characteristic table of each group; calls the visualization engine and uses the topic
  • the graph visualizes and displays the difference characteristics of each group and the statistics of the three-level attribute samples.
  • the thematic maps include word cloud, histogram, pie chart, doughnut chart, number chart and other proportional and comparative graphic forms.
  • This grouping and difference analysis method and system based on traffic safety risk is based on integrated learning algorithm to mine the characteristics of traffic participants' traffic behavior performance and calibrate their safety risk degree, in order to reduce the data granularity of the analysis and judgment application , Divide several target groups with different security levels according to the degree of risk; at the same time, in order to overcome the problem of the lack of explanatory degree of the integrated learning algorithm in the process of security risk calibration, the Fisher Fisher's exact test is used to identify the significant differences between the groups to accurately describe The characteristics of each risk level group provide data support for active traffic safety governance.
  • the groups of different safety levels are divided, and the R*C contingency table Fisher exact test method is used to identify the difference attribute factors.
  • the method and system of the embodiment use Monte Carlo simulation calculation to obtain the fuzzy solution of p value , Effectively saving the time cost of the algorithm.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Traffic safety risk based group division and a difference analysis method and system: taking drivers and motor vehicles as objects, demarcating safety risks of the objects by using an ensemble learning algorithm and carrying out group division on the basis of the safety risks, and identifying salient difference indexes through a statistical approach; mining features of traffic participants from traffic behaviors thereof on the basis of the ensemble learning algorithm and demarcating safety risk degrees of said traffic participants, and in order to reduce data granularity of an analysis and determining program, dividing, according to the risk degrees, into a plurality of target groups having different safety levels; in order to overcome the problem that the ensemble learning algorithm lacks explanation during safety risk demarcation, accurately testing and identifying salient difference indexes among groups by using Fisher, thus accurately describing features of groups having various risk levels and providing data support for active traffic safety management.

Description

基于交通安全风险的群体划分与差异性分析方法及系统Group division and difference analysis method and system based on traffic safety risk 技术领域Technical field
本发明涉及一种基于交通安全风险的群体划分与差异性分析方法及系统。The invention relates to a method and system for group division and difference analysis based on traffic safety risks.
背景技术Background technique
以机器学习方法进行交通参与者的交通事故发生概率预测,能够为每一个存在交通违法、事故记录的驾驶人、车辆标定一个明确的安全风险指标,但在目前的实际应用中,以个体为对象的交通安全管理应用场景较为有限。Using machine learning methods to predict the probability of traffic accidents of traffic participants, it is possible to calibrate a clear safety risk index for each driver and vehicle with traffic violations and accident records, but in the current practical application, the individual is the object Of traffic safety management application scenarios are relatively limited.
在这种应用条件下,降低数据颗粒度,从群体角度识别关键性的安全特征,对于主动化的安全治理具有更为现实的指导作用。为此,目前急需一种基于交通安全风险的群体划分与差异性分析方法和系统,来实现以上目的。Under such application conditions, reducing data granularity and identifying critical security features from a group perspective have a more realistic guiding role for active security governance. To this end, there is an urgent need for a method and system for grouping and difference analysis based on traffic safety risks to achieve the above objectives.
发明内容Summary of the invention
本发明的目的是提供一种基于交通安全风险的群体划分与差异性分析方法及系统,通过统计学方法弥补集成学习算法在风险标定过程描述中存在的缺陷,挖掘不同风险等级的群体在事故原因、事故结果的差异性特征,解决现有技术中存在的以个体为对象的交通安全管理应用场景较为有限的问题。The purpose of the present invention is to provide a method and system for group division and difference analysis based on traffic safety risks, to make up for the defects of the integrated learning algorithm in the description of the risk calibration process through statistical methods, and to dig out the causes of accidents of groups with different risk levels 3. The different characteristics of accident results solve the problem that the existing technology-specific traffic safety management application scenarios for individuals are relatively limited.
本发明的技术解决方案是:The technical solution of the present invention is:
一种基于交通安全风险的群体划分与差异性分析方法,以驾驶人、机动车为对象,通过集成学习算法标定对象安全风险,在此基础上进行群体划分,并通过统计学方法识别显著性差异指标;包括以下步骤,A group division and difference analysis method based on traffic safety risk, which takes drivers and motor vehicles as the object, calibrates the object safety risk through integrated learning algorithm, then divides the group on this basis, and identifies significant differences through statistical methods Indicators; includes the following steps,
S1、确定交通参与者对象,包括驾驶人、机动车;根据目标对象信息获取其交通违法与交通事故历史记录,作为样本数据;S1. Determine the objects of traffic participants, including drivers and motor vehicles; obtain the historical records of traffic violations and traffic accidents based on the information of the target objects as sample data;
S2、基于集成学习算法构建目标对象的风险度预测模型;将样本数据输入模型,模型输出目标对象的风险度指标;其中,风险度为样本数据经过模型处理后的标签分类概率;S2. Construct a risk prediction model of the target object based on the integrated learning algorithm; input the sample data into the model, and the model outputs the risk index of the target object; wherein, the risk degree is the label classification probability of the sample data after the model processing;
S3、根据步骤S1获取的样本数据确定目标对象的二级属性维度,将其分为事故成因二级属性集合、事故结果二级属性集合;将二级属性拆分至三级,确定各二级属性对应的三级属性因子;S3. Determine the secondary attribute dimension of the target object according to the sample data obtained in step S1, and divide it into the secondary attribute set of the cause of the accident and the secondary attribute set of the result of the accident; split the secondary attribute into three levels to determine each secondary level The three-level attribute factor corresponding to the attribute;
S4、综合步骤S2、S3的处理结果,建立群体划分数据表,确定样本群体归属;S4, synthesizing the processing results of steps S2 and S3, establishing a group division data table, and determining the attribution of the sample group;
S5、以群体为对象,以二级属性为统计维度,进行群体内部的三级属性数据统计;集成各群体的统计结果,生成二级属性变量R*C列联表,其中R表征群体量,C表征二级属性对应的三级属性因子个数;采用费舍尔精确检验方式,假设H0:不同群体间的属性变量值存在显著差异,H1:不同群体间的属性变量不存在显著差异;采用蒙特卡罗模拟计算方法获取费舍尔精确检验p值的模糊解p_value;根据p_value确定费舍尔精确检验结果,将存在显著差异的变量作为群体安全特征属性。S5. Taking the group as the object and taking the second-level attribute as the statistical dimension, perform the statistics of the third-level attribute data within the group; integrate the statistical results of each group to generate a secondary attribute variable R*C contingency table, where R represents the amount of the group, C represents the number of third-level attribute factors corresponding to the second-level attributes; adopts Fisher's exact test method, assuming H0: there is a significant difference in the value of the attribute variable between different groups, H1: there is no significant difference in the attribute variable between different groups; adopt The Monte Carlo simulation calculation method obtains the fuzzy solution p_value of the Fisher's exact test p-value; the Fisher's exact test result is determined according to the p_value, and the variables with significant differences are used as the group safety feature attributes.
进一步地,步骤S2中,风险度预测模型的构建流程具体包括数据标签定义与数据集划分,基于嵌入法的模型特征变量筛选,数据集均衡处理,基于交叉验证的模型训练,基于ROC曲线即接受者操作曲线与曲线下面积AUC的模型性能评估筛选出拟合效果最佳的模型;该模型输出的风险度指标为数据的标签分类概率。Further, in step S2, the construction process of the risk prediction model specifically includes data label definition and data set division, model feature variable screening based on embedding method, data set equalization processing, model training based on cross-validation, and acceptance based on ROC curve The model performance evaluation of the operator's operating curve and the area under the curve AUC selects the model with the best fitting effect; the risk index output by the model is the label classification probability of the data.
进一步地,步骤S4中,群体划分数据表的字段包括对象信息、时间、三级属性因子、风险度、所属群体;其中所属群体字段数据根据对象信息的风险度在各群体风险度阈值区间的归属情况确定。Further, in step S4, the fields of the group division data table include object information, time, three-level attribute factors, risk degree, and belonging group; wherein the data of the belonging group field belongs to the threshold range of each group risk degree according to the risk degree of the object information The situation is ok.
进一步地,步骤S5中,根据p_value确定费舍尔精确检验结果,将存在显著差异的变量作为群体安全特征属性,具体为,p值的模糊解p_value小于设定值,则接受原假设H0;否则拒绝原假设H0,接受假设H1。Further, in step S5, Fisher's exact test result is determined according to p_value, and variables with significant differences are used as group safety feature attributes. Specifically, if the fuzzy solution of p-value p_value is less than the set value, then the null hypothesis H0 is accepted; otherwise Reject the original hypothesis H0 and accept the hypothesis H1.
一种实现上述任一项所述的基于交通安全风险的群体划分与差异性分析方法的基于交通安全风险的群体划分与差异性分析系统,包括数据对接模块、风险度预测模块、属性因子分析模块和群体特征识别模块,A traffic safety risk-based group division and difference analysis system implementing any of the above-mentioned traffic safety risk-based group division and difference analysis methods, including a data docking module, a risk prediction module, and an attribute factor analysis module And group feature recognition module,
数据对接模块:从数据库中提取交通事故记录、交通违法记录;Data docking module: extract traffic accident records and traffic violation records from the database;
风险度预测模块:接入数据对接模块的历史交通违法数据、交通事故数据,作为模型构建的样本;定义数据标签,划分样本数据集;筛选模型特征变量;进行数据集的均衡处理;采用交叉验证方法训练模型,根据ROC曲线以及AUC值筛选出拟合效果最佳的模型;完成风险度预测模型的构建,并根据用户指令从数据对接模块中提取指定的目标对象的历史交通违法记录,通过模型处理输出该目标对象的风险度预测值;生成风险度表;Risk prediction module: access to the historical traffic violation data and traffic accident data of the data docking module as samples for model construction; define data labels, divide sample data sets; filter model feature variables; perform balanced processing of data sets; adopt cross-validation Method to train the model, select the model with the best fitting effect according to the ROC curve and AUC value; complete the construction of the risk prediction model, and extract the historical traffic violation records of the specified target object from the data docking module according to user instructions, through the model Process and output the predicted value of the risk degree of the target object; generate a risk degree table;
属性因子分析模块:接入数据对接模块的样本数据,根据原始样本数据字段 确定二级属性;根据样本数据字段的具体数值确定二级属性对应的三级属性因子,其中二级属性为离散型数据,则三级属性因子即为对应的数据值域,二级属性若为连续性数据,则通过离散化处理,确定三级属性因子;生成二级属性表、三级属性表;Attribute factor analysis module: access the sample data of the data docking module, determine the second-level attribute according to the original sample data field; determine the third-level attribute factor corresponding to the second-level attribute according to the specific value of the sample data field, where the second-level attribute is discrete data , Then the third-level attribute factor is the corresponding data value range. If the second-level attribute is continuous data, the third-level attribute factor is determined through discretization; the second-level attribute table and the third-level attribute table are generated;
群体特征识别模块:由风险度预测模块接入风险度表,由属性因子分析模块获取二级属性表、三级属性表;根据风险度阈值区间设置情况生成群体划分数据表;采用费舍尔精确检验与蒙特卡罗模拟计算法方法,确定二级属性p值,写入二级属性表;筛选出p值小于设定值的二级属性,作为不同群体的差异性特征,生成群体特征表。Group feature recognition module: the risk degree prediction module is connected to the risk degree table, and the attribute factor analysis module obtains the second-level attribute table and the third-level attribute table; the group division data table is generated according to the setting of the risk threshold interval; Fisher exact Test and Monte Carlo simulation calculation method to determine the secondary attribute p-value and write it into the secondary attribute table; filter out the secondary attribute with p-value less than the set value, as the differential characteristics of different groups, generate a group characteristic table.
进一步地,还包括可视化模块,可视化模块:从群体特征识别模块中获取群体划分数据表、群体特征表,根据差异性特征对应的三级属性将各群体样本进行统计,生成各群体的差异性特征表;调用可视化引擎,采用专题图对各群体的差异性特征及三级属性样本统计情况进行可视化处理与展示。Further, it also includes a visualization module, a visualization module: obtaining a group division data table and a group characteristic table from the group characteristic recognition module, and counting each group sample according to the three-level attributes corresponding to the different characteristics to generate the different characteristics of each group Table; call the visualization engine and use thematic maps to visualize and display the differential characteristics of each group and the statistics of the three-level attribute samples.
本发明的有益效果是:The beneficial effects of the present invention are:
一、该种基于交通安全风险的群体划分与差异性分析方法及系统,基于集成学习算法从交通参与者的交通行为表现中挖掘其特征,并标定其安全风险程度,为降低分析研判应用的数据颗粒度,根据风险度划分若干不同安全等级的目标群体;同时为了克服集成学习算法在安全风险标定的过程缺乏解释度的问题,通过Fisher费舍尔精确检验识别群体间的显著差异性指标,从而准确描述各风险等级群体的特征,为主动化的交通安全治理提供数据支撑。1. This grouping and difference analysis method and system based on traffic safety risk, based on the integrated learning algorithm, mines the characteristics of traffic participants' traffic behavior performance, and calibrates the degree of their safety risk, in order to reduce the application of data for analysis and judgment Granularity, according to the degree of risk, the target groups of different security levels are divided; at the same time, in order to overcome the problem of the lack of interpretation of the integrated learning algorithm in the process of security risk calibration, Fisher Fisher’s exact test is used to identify significant differences between groups, so Accurately describe the characteristics of each risk level group to provide data support for active traffic safety governance.
二、在基于集成学习的驾驶人、车辆等交通参与者的交通安全风险度预测基础上,划分不同安全性等级的群体,采用R*C列联表Fisher精确检验方法,识别差异性属性因子,在检验过程中,因R*C列联表行列数均大于2,导致其计算精确解的误差显著,计算耗时长,为此本发明采用了蒙特卡罗模拟计算,获得p值的模糊解,有效节约了该算法的时间成本。2. Based on the prediction of traffic safety risk of drivers, vehicles and other traffic participants based on integrated learning, divide the groups of different safety levels, use the R*C contingency table Fisher exact test method to identify the different attribute factors, In the inspection process, because the number of rows and columns of the R*C contingency table are all greater than 2, the error of calculating the exact solution is significant and the calculation takes a long time. Therefore, the present invention uses Monte Carlo simulation calculation to obtain the fuzzy solution of p value Effectively save the time cost of the algorithm.
三、本发明对驾驶人、车辆等交通参与者进行交通安全风险评级,并以同等级交通参与者组成的群体为对象,挖掘群体间的差异性特征,解决了以个体为对象的交通安全管理应用场景较为有限的问题。3. The present invention performs traffic safety risk ratings on traffic participants such as drivers and vehicles, and takes groups of traffic participants of the same level as objects, and explores the differences between groups to solve the traffic safety management of individuals. The problem of relatively limited application scenarios.
附图说明BRIEF DESCRIPTION
图1是本发明实施例基于交通安全风险的群体划分与差异性分析方法的流程示意图。FIG. 1 is a schematic flowchart of a method for group division and difference analysis based on traffic safety risks according to an embodiment of the present invention.
图2是实施例基于交通安全风险的群体划分与差异性分析系统的说明示意图。FIG. 2 is an explanatory diagram of a group division and difference analysis system based on traffic safety risks in an embodiment.
具体实施方式detailed description
下面结合附图详细说明本发明的优选实施例。The preferred embodiments of the present invention will be described in detail below with reference to the drawings.
实施例Examples
一种基于交通安全风险的群体划分与差异性分析方法,以驾驶人、机动车为对象,通过集成学习算法标定对象安全风险,在此基础上进行群体划分,并通过统计学方法识别显著性差异指标;如图1,具体步骤为:A group division and difference analysis method based on traffic safety risk, which takes drivers and motor vehicles as the object, calibrates the object safety risk through integrated learning algorithm, then divides the group on this basis, and identifies significant differences through statistical methods Index; as shown in figure 1, the specific steps are:
S1、确定交通参与者对象,包括驾驶人、机动车;根据目标对象信息获取其交通违法与交通事故历史记录,作为样本数据。S1. Determine the objects of traffic participants, including drivers and motor vehicles; obtain the historical records of traffic violations and traffic accidents based on the information of the target objects as sample data.
实施例中,驾驶人的目标对象信息为证件号码,机动车的目标对象信息为号牌类型与号牌号码组合;历史记录的时间范围通常超过1年,以保证足够多的样本量。In the embodiment, the target object information of the driver is the ID number, and the target object information of the motor vehicle is the combination of the number plate type and the number plate number; the time range of historical records usually exceeds one year to ensure a sufficient sample size.
S2、基于集成学习算法构建目标对象的风险度预测模型;将样本数据输入模型,模型输出目标对象的风险度指标。其中,风险度为样本数据经过模型处理后的标签分类概率。S2. Construct a risk prediction model of the target object based on the integrated learning algorithm; input sample data into the model, and the model outputs the risk index of the target object. Among them, the risk degree is the label classification probability of the sample data after the model processing.
其中,风险度预测模型构建流程包括数据标签定义与数据集划分,基于嵌入法的模型特征变量筛选,数据集均衡处理,基于交叉验证的模型训练,基于ROC曲线(接受者操作曲线)与曲线下面积AUC的模型性能评估筛选出拟合效果最佳的模型;该模型输出的风险度指标为数据的标签分类概率。Among them, the risk prediction model construction process includes data label definition and data set division, model feature variable selection based on embedding method, data set equalization processing, model training based on cross-validation, based on ROC curve (receiver operation curve) and under curve The model performance evaluation of the area AUC selects the model with the best fitting effect; the risk index output by the model is the label classification probability of the data.
在实施例中,采用改进的抽样方法与RF随机森林算法相结合的方法构建风险度预测模型,模型覆盖率recall为0.06,精确度为0.889。In the embodiment, a method for combining the improved sampling method and the RF random forest algorithm is used to construct a risk prediction model, with a model coverage rate of 0.06 and an accuracy of 0.889.
S3、根据步骤S1获取的样本数据确定目标对象的二级属性维度,将其分为事故成因二级属性集合、事故结果二级属性集合;将二级属性拆分至三级,确定各二级属性对应的三级属性因子。S3. Determine the secondary attribute dimension of the target object according to the sample data obtained in step S1, and divide it into the secondary attribute set of the cause of the accident and the secondary attribute set of the result of the accident; split the secondary attribute into three levels to determine each secondary level The three-level attribute factor corresponding to the attribute.
在实施例中,以驾驶人为目标对象,对应的事故成因二级属性集合内元素包括性别、年龄、国籍、户口性质、人员类型、驾龄、事故认定原因、血液酒精含 量、安全带头盔使用情况等;以车辆为目标对象,事故成因二级属性包括车辆类型、交通方式、车辆使用性质、里程数、合法状况、保险、是否超载、照明灯状态、荷载量等;事故结果二级属性包括事故形态、事故等级、直接财产损失、事故责任等。In the embodiment, taking the driver as the target object, the corresponding elements in the secondary attribute set of the cause of the accident include gender, age, nationality, hukou nature, type of person, driving age, accident identification reason, blood alcohol content, seat belt helmet usage, etc. ; Target the vehicle, the secondary attributes of the cause of the accident include the type of vehicle, mode of transportation, nature of use of the vehicle, mileage, legal status, insurance, whether it is overloaded, the status of the light, the amount of load, etc.; the secondary attributes of the accident result include the form of the accident , Accident level, direct property loss, accident liability, etc.
S4、综合步骤S2、S3的处理结果,建立群体划分数据表,确定样本群体归属;群体划分数据表的字段包括对象信息、时间、三级属性因子、风险度、所属群体;其中所属群体字段数据根据对象信息的风险度在各群体风险度阈值区间的归属情况确定。S4, synthesizing the processing results of steps S2 and S3, establishing a group division data table to determine the sample group attribution; the field of the group division data table includes object information, time, three-level attribute factor, risk degree, and belonging group; among which the group field data It is determined according to the attribution of the risk degree of the object information within the risk degree threshold range of each group.
在实施例中,设置了一般、风险、危险三类群体,一般群体的风险度阈值区间为[0,0.15],风险群体区间为(0.15,0.8),危险群体区间为[0.8,1.0]。In the embodiment, three general, risk, and risk groups are set. The risk threshold interval of the general group is [0,0.15], the risk group interval is (0.15,0.8), and the risk group interval is [0.8,1.0].
S5、以群体为对象,以二级属性为统计维度,进行群体内部的三级属性数据统计;集成各群体的统计结果,生成二级属性变量R*C列联表,其中R表征群体量,C表征二级属性对应的三级属性因子个数;采用Fisher费舍尔精确检验方式,假设H0:不同群体间的属性变量值存在显著差异,H1:不同群体间的属性变量不存在显著差异;鉴于三级属性因子通常超过2个,且列联表的行列数一般不同,采用蒙特卡罗模拟计算方法获取Fisher检验p值的模糊解p_value;根据p_value值确定Fisher精确检验结果,将存在显著差异的变量作为群体安全特征属性。S5. Taking the group as the object and taking the second-level attribute as the statistical dimension, perform the statistics of the third-level attribute data within the group; integrate the statistical results of each group to generate a secondary attribute variable R*C contingency table, where R represents the amount of the group, C represents the number of third-level attribute factors corresponding to the second-level attributes; adopts Fisher Fisher's exact test, assuming that H0: there are significant differences in attribute variable values between different groups, and H1: there are no significant differences in attribute variables between different groups; In view of the fact that the three-level attribute factor usually exceeds 2, and the number of rows and columns of the contingency table is generally different, the Monte Carlo simulation calculation method is used to obtain the fuzzy solution p_value of the Fisher test p-value. The variable is used as a group security characteristic attribute.
在实施例中,由R语言编辑群体间属性变量值差异性检验的脚本,调用stats统计方法包中的fisher.test函数,参数simulate.p.value设置为TRUE,蒙特卡罗模拟次数B设置为105;p值小于0.05,接受原假设H0,否则拒绝原假设H0。In the embodiment, the script for checking the difference of attribute variable values between groups is edited by R language, the fisher.test function in the stats statistical method package is called, the parameter simulate.p.value is set to TRUE, and the number of Monte Carlo simulations B is set to 105; p value is less than 0.05, accept the null hypothesis H0, otherwise reject the null hypothesis H0.
在一个实施例中,以人员为目标对象,对事故结果二级属性进行Fisher精确检验,生成的事故等级、事故形态、事故责任、直接财产损失的R*C列联表如下所示:In one embodiment, the person is the target object, and Fisher's exact test is performed on the secondary attributes of the accident result. The resulting R*C contingency table for the accident level, accident form, accident liability, and direct property loss is as follows:
表1.事故等级R*C列联表Table 1. Contingency table R*C
Figure PCTCN2019114373-appb-000001
Figure PCTCN2019114373-appb-000001
表2.事故形态R*C列联表Table 2. R*C contingency table
Figure PCTCN2019114373-appb-000002
Figure PCTCN2019114373-appb-000002
表3.事故责任R*C列联表Table 3. Accident liability R*C contingency table
Figure PCTCN2019114373-appb-000003
Figure PCTCN2019114373-appb-000003
表4.直接财产损失R*C列联表Table 4. R*C Consolidated List of Direct Property Losses
Figure PCTCN2019114373-appb-000004
Figure PCTCN2019114373-appb-000004
通过蒙特卡罗模拟计算Fisher费舍尔精确检验p值,结果分别为:Monte Carlo simulation calculation Fisher Fisher exact test p value, the results are:
事故结果二级属性Accident result secondary attribute 事故等级Accident level 事故形态Accident pattern 事故责任Accident liability 直接财产损失Direct property loss
p_valuep_value 1.01.0 0.587900.58790 0.035330.03533 0.014690.01469
不同群体的事故责任、直接财产损失差异显著,将这两个变量作为群体安全特征属性变量。The accident liability and direct property loss of different groups are significantly different, and these two variables are used as group safety characteristic attribute variables.
该种基于交通安全风险的群体划分与差异性分析方法,对驾驶人、车辆等交通参与者进行交通安全风险评级,并以同等级交通参与者组成的群体为对象,挖掘群体间的差异性特征,解决了以个体为对象的交通安全管理应用场景较为有限的问题。This method of group division and difference analysis based on traffic safety risk is used to rate traffic safety risks for traffic participants such as drivers and vehicles, and to target groups composed of traffic participants of the same level to explore the differences between groups. , Solve the problem that the application scenarios of traffic safety management for individuals are relatively limited.
一种交通参与者群体划分与特征研判系统,如图2,包括数据对接模块、风险度预测模块、属性因子分析模块、群体特征识别模块、可视化模块。A traffic participant group division and feature research and judgment system, as shown in Fig. 2, includes a data docking module, a risk prediction module, an attribute factor analysis module, a group feature recognition module, and a visualization module.
数据对接模块,从数据库中提取交通事故记录、交通违法记录。The data docking module extracts traffic accident records and traffic violation records from the database.
风险度预测模块,接入数据对接模块的历史交通违法数据、交通事故数据,作为模型构建的样本;定义数据标签,划分样本数据集;筛选模型特征变量;进 行数据集的均衡处理;采用交叉验证方法训练模型,根据ROC曲线以及AUC值筛选出拟合效果最佳的模型;该模块完成风险度预测模型的构建,并根据用户指令从数据对接模块中提取指定的目标对象的历史交通违法记录,通过模型处理输出该目标对象的风险度预测值;生成风险度表。Risk prediction module, access to the historical traffic violation data and traffic accident data of the data docking module, as a sample of the model construction; define data labels, divide the sample data set; filter model feature variables; perform balanced processing of the data set; use cross-validation Method to train the model, select the best fitting model according to the ROC curve and AUC value; this module completes the construction of the risk prediction model, and extracts the historical traffic violation records of the specified target object from the data docking module according to user instructions, Through the model processing, the predicted risk value of the target object is output; a risk degree table is generated.
属性因子分析模块,接入数据对接模块的样本数据,根据原始样本数据字段确定二级属性;根据样本数据字段的具体数值确定二级属性对应的三级属性因子,其中二级属性为离散型数据,则三级属性因子即为对应的数据值域,二级属性若为连续性数据,则通过离散化处理,确定三级属性因子;生成二级属性表、三级属性表。Attribute factor analysis module, access to the sample data of the data docking module, determine the secondary attribute according to the original sample data field; determine the tertiary attribute factor corresponding to the secondary attribute according to the specific value of the sample data field, where the secondary attribute is discrete data , Then the third-level attribute factor is the corresponding data range. If the second-level attribute is continuous data, the third-level attribute factor is determined through discretization; the second-level attribute table and the third-level attribute table are generated.
群体特征识别模块,由风险度预测模块接入风险度表,由属性因子分析模块获取二级属性表、三级属性表;根据风险度阈值区间设置情况生成群体划分数据表;采用费舍尔精确检验与蒙特卡罗模拟计算法方法,确定二级属性p值,写入二级属性表;筛选出p值小于设定值的二级属性,作为不同群体的差异性特征,生成群体特征表。其中,设定值优选为0.05。Group feature recognition module, the risk degree prediction module accesses the risk degree table, and the attribute factor analysis module obtains the second-level attribute table and the third-level attribute table; generates the group division data table according to the setting of the risk threshold interval; adopts Fisher's exact Test and Monte Carlo simulation calculation method to determine the secondary attribute p-value and write it into the secondary attribute table; filter out the secondary attribute with p-value less than the set value, as the differential characteristics of different groups, generate a group characteristic table. Among them, the set value is preferably 0.05.
可视化模块,从群体特征识别模块中获取群体划分数据表、群体特征表,根据差异性特征对应的三级属性将各群体样本进行统计,生成各群体的差异性特征表;调用可视化引擎,采用专题图对各群体的差异性特征及三级属性样本统计情况进行可视化处理与展示,专题图包括词云、柱状图、饼图、圆环图、数图等占比类、比较类图形形式。The visualization module obtains the group division data table and the group characteristic table from the group characteristic recognition module, counts the samples of each group according to the three-level attributes corresponding to the different characteristics, and generates the different characteristic table of each group; calls the visualization engine and uses the topic The graph visualizes and displays the difference characteristics of each group and the statistics of the three-level attribute samples. The thematic maps include word cloud, histogram, pie chart, doughnut chart, number chart and other proportional and comparative graphic forms.
该种基于交通安全风险的群体划分与差异性分析方法及系统,基于集成学习算法从交通参与者的交通行为表现中挖掘其特征,并标定其安全风险程度,为降低分析研判应用的数据颗粒度,根据风险度划分若干不同安全等级的目标群体;同时为了克服集成学习算法在安全风险标定的过程缺乏解释度的问题,通过Fisher费舍尔精确检验识别群体间的显著差异性指标,从而准确描述各风险等级群体的特征,为主动化的交通安全治理提供数据支撑。This grouping and difference analysis method and system based on traffic safety risk is based on integrated learning algorithm to mine the characteristics of traffic participants' traffic behavior performance and calibrate their safety risk degree, in order to reduce the data granularity of the analysis and judgment application , Divide several target groups with different security levels according to the degree of risk; at the same time, in order to overcome the problem of the lack of explanatory degree of the integrated learning algorithm in the process of security risk calibration, the Fisher Fisher's exact test is used to identify the significant differences between the groups to accurately describe The characteristics of each risk level group provide data support for active traffic safety governance.
在基于集成学习的驾驶人、车辆等交通参与者的交通安全风险度预测基础上,划分不同安全性等级的群体,采用R*C列联表Fisher精确检验方法,识别差异性属性因子,在检验过程中,因R*C列联表行列数均大于2,导致其计算精确解的误差显著,计算耗时长,为此实施例方法及系统采用了蒙特卡罗模拟计算,获 得p值的模糊解,有效节约了该算法的时间成本。Based on the prediction of traffic safety risk of drivers, vehicles and other traffic participants based on integrated learning, the groups of different safety levels are divided, and the R*C contingency table Fisher exact test method is used to identify the difference attribute factors. In the process, because the number of rows and columns of the R*C contingency table are all greater than 2, the error of calculating the exact solution is significant, and the calculation takes a long time. Therefore, the method and system of the embodiment use Monte Carlo simulation calculation to obtain the fuzzy solution of p value , Effectively saving the time cost of the algorithm.

Claims (6)

  1. 一种基于交通安全风险的群体划分与差异性分析方法,其特征在于:以驾驶人、机动车为对象,通过集成学习算法标定对象安全风险,在此基础上进行群体划分,并通过统计学方法识别显著性差异指标;包括以下步骤,A group division and difference analysis method based on traffic safety risk, which is characterized by taking drivers and motor vehicles as objects, calibrating object safety risks through integrated learning algorithms, group division on this basis, and statistical methods Identify significant difference indicators; include the following steps,
    S1、确定交通参与者对象,包括驾驶人、机动车;根据目标对象信息获取其交通违法与交通事故历史记录,作为样本数据;S1. Determine the objects of traffic participants, including drivers and motor vehicles; obtain the historical records of traffic violations and traffic accidents based on the information of the target objects as sample data;
    S2、基于集成学习算法构建目标对象的风险度预测模型;将样本数据输入模型,模型输出目标对象的风险度指标;其中,风险度为样本数据经过模型处理后的标签分类概率;S2. Construct a risk prediction model of the target object based on the integrated learning algorithm; input the sample data into the model, and the model outputs the risk index of the target object; wherein, the risk degree is the label classification probability of the sample data after the model processing;
    S3、根据步骤S1获取的样本数据确定目标对象的二级属性维度,将其分为事故成因二级属性集合、事故结果二级属性集合;将二级属性拆分至三级,确定各二级属性对应的三级属性因子;S3. Determine the secondary attribute dimension of the target object according to the sample data obtained in step S1, and divide it into the secondary attribute set of the cause of the accident and the secondary attribute set of the result of the accident; split the secondary attribute into three levels to determine each secondary level The three-level attribute factor corresponding to the attribute;
    S4、综合步骤S2、S3的处理结果,建立群体划分数据表,确定样本群体归属;S4, synthesizing the processing results of steps S2 and S3, establishing a group division data table, and determining the attribution of the sample group;
    S5、以群体为对象,以二级属性为统计维度,进行群体内部的三级属性数据统计;集成各群体的统计结果,生成二级属性变量R*C列联表,其中R表征群体量,C表征二级属性对应的三级属性因子个数;采用费舍尔精确检验方式,假设H0:不同群体间的属性变量值存在显著差异,H1:不同群体间的属性变量不存在显著差异;采用蒙特卡罗模拟计算方法获取费舍尔精确检验p值的模糊解p_value;根据p_value确定费舍尔精确检验结果,将存在显著差异的变量作为群体安全特征属性。S5. Taking the group as the object and taking the second-level attribute as the statistical dimension, perform the statistics of the third-level attribute data within the group; integrate the statistical results of each group to generate a secondary attribute variable R*C contingency table, where R represents the amount of the group, C represents the number of third-level attribute factors corresponding to the second-level attributes; adopts Fisher's exact test method, assuming H0: there is a significant difference in the value of the attribute variable between different groups, H1: there is no significant difference in the attribute variable between different groups; adopt The Monte Carlo simulation calculation method obtains the fuzzy solution p_value of the Fisher's exact test p-value; the Fisher's exact test result is determined according to the p_value, and the variables with significant differences are used as the group safety feature attributes.
  2. 如权利要求1所述的基于交通安全风险的群体划分与差异性分析方法,其特征在于:步骤S2中,风险度预测模型的构建流程具体包括数据标签定义与数据集划分,基于嵌入法的模型特征变量筛选,数据集均衡处理,基于交叉验证的模型训练,基于ROC曲线即接受者操作曲线与曲线下面积AUC的模型性能评估筛选出拟合效果最佳的模型;该模型输出的风险度指标为数据的标签分类概率。The method for group division and difference analysis based on traffic safety risk according to claim 1, wherein in step S2, the construction process of the risk prediction model specifically includes data label definition and data set division, and the model based on the embedding method Feature variable screening, data set equalization processing, model training based on cross-validation, model performance evaluation based on ROC curve, ie receiver operating curve and area under the curve AUC, the best fitting model is selected; the risk index output by the model Probability of label classification for data.
  3. 如权利要求1所述的基于交通安全风险的群体划分与差异性分析方法,其特征在于:步骤S4中,群体划分数据表的字段包括对象信息、时间、三级属性因子、风险度、所属群体;其中所属群体字段数据根据对象信息的风险度在各 群体风险度阈值区间的归属情况确定。The method for group division and difference analysis based on traffic safety risk according to claim 1, wherein in step S4, the fields of the group division data table include object information, time, three-level attribute factor, risk degree, and group ; The field data of the group to which it belongs is determined according to the attribution of the risk degree of the object information within the threshold range of the risk degree of each group.
  4. 如权利要求1-3任一项所述的基于交通安全风险的群体划分与差异性分析方法,其特征在于:步骤S5中,根据p_value确定费舍尔精确检验结果,将存在显著差异的变量作为群体安全特征属性,具体为,p值的模糊解p_value小于设定值,则接受原假设H0;否则拒绝原假设H0,接受假设H1。The method for group division and difference analysis based on traffic safety risk according to any one of claims 1 to 3, characterized in that in step S5, the Fisher exact test result is determined according to p_value, and variables with significant differences are used as Group security feature attributes, specifically, the fuzzy solution p_value of p-value is less than the set value, then accept the null hypothesis H0; otherwise reject the null hypothesis H0, accept the hypothesis H1.
  5. 一种实现权利要求1-4任一项所述的基于交通安全风险的群体划分与差异性分析方法的基于交通安全风险的群体划分与差异性分析系统,其特征在于:包括数据对接模块、风险度预测模块、属性因子分析模块和群体特征识别模块,A traffic safety risk-based group division and difference analysis system for realizing the traffic safety risk-based group division and difference analysis method according to any one of claims 1 to 4, characterized in that it includes a data docking module and a risk Degree prediction module, attribute factor analysis module and group feature recognition module,
    数据对接模块:从数据库中提取交通事故记录、交通违法记录;Data docking module: extract traffic accident records and traffic violation records from the database;
    风险度预测模块:接入数据对接模块的历史交通违法数据、交通事故数据,作为模型构建的样本;定义数据标签,划分样本数据集;筛选模型特征变量;进行数据集的均衡处理;采用交叉验证方法训练模型,根据ROC曲线以及AUC值筛选出拟合效果最佳的模型;完成风险度预测模型的构建,并根据用户指令从数据对接模块中提取指定的目标对象的历史交通违法记录,通过模型处理输出该目标对象的风险度预测值;生成风险度表;Risk prediction module: access to the historical traffic violation data and traffic accident data of the data docking module as samples for model construction; define data labels, divide sample data sets; filter model feature variables; perform balanced processing of data sets; adopt cross-validation Method to train the model, select the model with the best fitting effect according to the ROC curve and AUC value; complete the construction of the risk prediction model, and extract the historical traffic violation records of the specified target object from the data docking module according to user instructions, through the model Process and output the predicted value of the risk degree of the target object; generate a risk degree table;
    属性因子分析模块:接入数据对接模块的样本数据,根据原始样本数据字段确定二级属性;根据样本数据字段的具体数值确定二级属性对应的三级属性因子,其中二级属性为离散型数据,则三级属性因子即为对应的数据值域,二级属性若为连续性数据,则通过离散化处理,确定三级属性因子;生成二级属性表、三级属性表;Attribute factor analysis module: access the sample data of the data docking module, determine the second-level attribute according to the original sample data field; determine the third-level attribute factor corresponding to the second-level attribute according to the specific value of the sample data field, where the second-level attribute is discrete data , Then the third-level attribute factor is the corresponding data value range. If the second-level attribute is continuous data, the third-level attribute factor is determined through discretization; the second-level attribute table and the third-level attribute table are generated;
    群体特征识别模块:由风险度预测模块接入风险度表,由属性因子分析模块获取二级属性表、三级属性表;根据风险度阈值区间设置情况生成群体划分数据表;采用费舍尔精确检验与蒙特卡罗模拟计算法方法,确定二级属性p值,写入二级属性表;筛选出p值小于设定值的二级属性,作为不同群体的差异性特征,生成群体特征表。Group feature recognition module: the risk degree prediction module is connected to the risk degree table, and the attribute factor analysis module obtains the second-level attribute table and the third-level attribute table; the group division data table is generated according to the setting of the risk threshold interval; Fisher exact Test and Monte Carlo simulation calculation method to determine the secondary attribute p-value and write it into the secondary attribute table; filter out the secondary attribute with p-value less than the set value, as the differential characteristics of different groups, generate a group characteristic table.
  6. 如权利要求5所述的基于交通安全风险的群体划分与差异性分析系统,其特征在于:还包括可视化模块,可视化模块:从群体特征识别模块中获取群体划分数据表、群体特征表,根据差异性特征对应的三级属性将各群体样本进行统计,生成各群体的差异性特征表;调用可视化引擎,采用专题图对各群体的差异 性特征及三级属性样本统计情况进行可视化处理与展示。The group division and difference analysis system based on traffic safety risk according to claim 5, further comprising: a visualization module, the visualization module: obtaining a group division data table and a group characteristic table from the group characteristic recognition module, according to the difference The three-level attribute corresponding to the sexual characteristics collects statistics of each group sample to generate the different characteristic table of each group; calls the visualization engine and uses thematic maps to visualize and display the different characteristics of each group and the statistics of the three-level attribute samples.
PCT/CN2019/114373 2018-11-30 2019-10-30 Traffic safety risk based group division and difference analysis method and system WO2020108219A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811463865.6A CN109598931B (en) 2018-11-30 2018-11-30 Group division and difference analysis method and system based on traffic safety risk
CN201811463865.6 2018-11-30

Publications (1)

Publication Number Publication Date
WO2020108219A1 true WO2020108219A1 (en) 2020-06-04

Family

ID=65959467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114373 WO2020108219A1 (en) 2018-11-30 2019-10-30 Traffic safety risk based group division and difference analysis method and system

Country Status (2)

Country Link
CN (1) CN109598931B (en)
WO (1) WO2020108219A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598931B (en) * 2018-11-30 2021-06-11 江苏智通交通科技有限公司 Group division and difference analysis method and system based on traffic safety risk
CN110458425A (en) * 2019-07-25 2019-11-15 腾讯科技(深圳)有限公司 Risk analysis method, device, readable medium and the electronic equipment of risk subject
CN110555277A (en) * 2019-09-09 2019-12-10 山东科技大学 method for evaluating sand burst risk of mining water under loose aquifer
CN110570655B (en) * 2019-09-19 2021-03-05 安徽百诚慧通科技有限公司 Vehicle feature evaluation method based on hierarchical clustering and decision tree
CN117274762B (en) * 2023-11-20 2024-02-06 西南交通大学 Real-time track extraction method based on vision under subway tunnel low-illumination scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104969216A (en) * 2013-02-04 2015-10-07 日本电气株式会社 Hierarchical latent variable model estimation device
US20160292599A1 (en) * 2015-04-06 2016-10-06 Fmr Llc Analyzing and remediating operational risks in production computing systems
CN106022787A (en) * 2016-04-25 2016-10-12 王琳 People-vehicle multifactorial assessment method and system based on big data
CN106126960A (en) * 2016-07-25 2016-11-16 东软集团股份有限公司 Driving safety appraisal procedure and device
CN108596409A (en) * 2018-07-16 2018-09-28 江苏智通交通科技有限公司 The method for promoting traffic hazard personnel's accident risk prediction precision
CN109598931A (en) * 2018-11-30 2019-04-09 江苏智通交通科技有限公司 Group based on traffic safety risk divides and difference analysis method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2778007B1 (en) * 2013-03-12 2022-09-07 INRIA - Institut National de Recherche en Informatique et en Automatique Method and system to assess abnormal driving behaviour of vehicles travelling on road
CN107886723B (en) * 2017-11-13 2021-07-20 深圳大学 Traffic travel survey data processing method
CN108573601B (en) * 2018-03-26 2021-05-11 同济大学 Traffic safety risk field construction method based on WIM data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104969216A (en) * 2013-02-04 2015-10-07 日本电气株式会社 Hierarchical latent variable model estimation device
US20160292599A1 (en) * 2015-04-06 2016-10-06 Fmr Llc Analyzing and remediating operational risks in production computing systems
CN106022787A (en) * 2016-04-25 2016-10-12 王琳 People-vehicle multifactorial assessment method and system based on big data
CN106126960A (en) * 2016-07-25 2016-11-16 东软集团股份有限公司 Driving safety appraisal procedure and device
CN108596409A (en) * 2018-07-16 2018-09-28 江苏智通交通科技有限公司 The method for promoting traffic hazard personnel's accident risk prediction precision
CN109598931A (en) * 2018-11-30 2019-04-09 江苏智通交通科技有限公司 Group based on traffic safety risk divides and difference analysis method and system

Also Published As

Publication number Publication date
CN109598931A (en) 2019-04-09
CN109598931B (en) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2020108219A1 (en) Traffic safety risk based group division and difference analysis method and system
WO2021051609A1 (en) Method and apparatus for predicting fine particulate matter pollution level, and computer device
Chen et al. A graphical modeling method for individual driving behavior and its application in driving safety analysis using GPS data
CN110298321B (en) Road blocking information extraction method based on deep learning image classification
WO2017143921A1 (en) Multi-sampling model training method and device
WO2022142042A1 (en) Abnormal data detection method and apparatus, computer device and storage medium
CN109493119B (en) POI data-based urban business center identification method and system
CN106919957B (en) Method and device for processing data
CN109360421B (en) Traffic information prediction method and device based on machine learning and electronic terminal
CN114077970B (en) Method and device for determining carbon emission related factor based on urban morphology
CN110866775A (en) User air-rail joint inter-city trip information processing method based on machine learning
WO2024067387A1 (en) User portrait generation method based on characteristic variable scoring, device, vehicle, and storage medium
CN112150046A (en) Road intersection safety risk index calculation method
TWI677830B (en) Method and device for detecting key variables in a model
CN112149922A (en) Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel
Dimitriou et al. Exploring the temporal stability of global road safety statistics
CN110059749B (en) Method and device for screening important features and electronic equipment
CN113673609B (en) Questionnaire data analysis method based on linear hidden variables
CN115906669A (en) Dense residual error network landslide susceptibility evaluation method considering negative sample selection strategy
CN111951546B (en) Method for quantifying safety influence range of congestion charging policy
CN113808393A (en) Method for eliminating influence of hybrid control object
CN114419894A (en) Method and system for setting and monitoring use of parking space in road
CN105303274B (en) Oil gas trap analysis method
CN113642162A (en) Simulation comprehensive analysis method for urban road traffic emergency plan
Keskin et al. Cohort fertility heterogeneity during the fertility decline period in Turkey

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889496

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889496

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19889496

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22/12/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19889496

Country of ref document: EP

Kind code of ref document: A1