CN113077625A - Road traffic accident form prediction method - Google Patents

Road traffic accident form prediction method Download PDF

Info

Publication number
CN113077625A
CN113077625A CN202110312213.8A CN202110312213A CN113077625A CN 113077625 A CN113077625 A CN 113077625A CN 202110312213 A CN202110312213 A CN 202110312213A CN 113077625 A CN113077625 A CN 113077625A
Authority
CN
China
Prior art keywords
traffic accident
formula
association rule
variables
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110312213.8A
Other languages
Chinese (zh)
Other versions
CN113077625B (en
Inventor
石琴
胡宗品
陈一锴
骆仁佳
于淑君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202110312213.8A priority Critical patent/CN113077625B/en
Publication of CN113077625A publication Critical patent/CN113077625A/en
Application granted granted Critical
Publication of CN113077625B publication Critical patent/CN113077625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/095Predicting travel path or likelihood of collision
    • B60W30/0956Predicting travel path or likelihood of collision the prediction being responsive to traffic or environmental parameters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/166Anti-collision systems for active traffic, e.g. moving vehicles, pedestrians, bikes
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/16Anti-collision systems
    • G08G1/167Driving aids for lane monitoring, lane changing, e.g. blind spot detection

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Operations Research (AREA)
  • Transportation (AREA)
  • Quality & Reliability (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Automation & Control Theory (AREA)
  • Game Theory and Decision Science (AREA)
  • Mechanical Engineering (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a road traffic accident form prediction method, which comprises the following steps: 1. collecting and processing road traffic accident data; 2. discretizing continuous independent variables in the traffic accident data by adopting a minimum description length criterion; 3. mining interaction among independent variables by adopting an attribute selection method based on association rules in the field of data mining; 4. establishing a hybrid Logit model, and performing parameter estimation by adopting a maximum likelihood estimation method; 5. and predicting the traffic accident form probability based on the constructed mixed Logit model. The invention fully utilizes the information of the predictive variables in the continuous independent variable dispersion process and excavates the influence of interaction among the variables on the accident form so as to reduce the information loss of the dispersed variables and overcome the problem of error inference caused by neglecting the interaction among the variables, thereby improving the prediction precision of a traffic accident form prediction model and providing technical support for the improvement of the road traffic safety environment.

Description

Road traffic accident form prediction method
Technical Field
The invention relates to a road traffic accident form prediction method, and belongs to the technical field of road traffic safety analysis.
Background
Data of the road safety global status report 2018 show that the number of deaths caused by traffic accidents is increased to 135 thousands of people every year in the world, wherein 80% of the deaths caused by traffic accidents occur in countries with moderate income. As the country with the largest medium income, more than 24 thousands of traffic accidents occur in China every year, so that more than 6 thousands of people die, and the traffic safety situation is severe. The influence factors of the traffic accidents with different forms are obviously different. The method constructs the relationship between the traffic accident form and the influence factors such as drivers, roads, environments and the like, thereby predicting the traffic accident form, and is one of important traffic safety improvement measures.
In the aspect of an accident form prediction method, fixed parameter discrete selection models such as Probit and multiple Logit are widely applied. However, such methods ignore unobserved heterogeneity that is prevalent in traffic accident data, often resulting in biased parameter estimates. Compared with a fixed parameter discrete selection model, the hybrid Logit model reflects the heterogeneity of traffic accident data by assuming variable coefficients as random parameters. However, this method has the following problems in the accident pattern prediction: (1) for continuous independent variables in traffic accident data, an unsupervised discretization algorithm is mostly adopted for discretization, and the relation between the independent variables and the prediction variables cannot be considered in the discretization process, so that the information loss of the discretized independent variables is serious; (2) the occurrence of traffic accidents generally depends on the combined action of a plurality of independent variables, and the method ignores the influence of the interaction among the variables on the accident form and is easy to cause wrong prediction and inference.
Disclosure of Invention
The invention provides a road traffic accident form prediction method for overcoming the defects of the prior art, aiming at fully utilizing the information of a prediction variable in the continuous independent variable dispersion process and mining the influence of interaction among the variables on the accident form so as to reduce the information loss of the dispersed variable and overcome the problem of error inference caused by neglecting the interaction among the variables, thereby improving the prediction precision of a traffic accident form prediction model.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a road traffic accident form prediction method, which is characterized by comprising the following steps:
step 1, collecting and processing road traffic accident data;
step 1.1, acquiring N traffic accidents from a road traffic accident database to form traffic accident data D; defining the self-variable set influencing the traffic accident form in the traffic accident data set D as X ═ X1,x2,…,xk,…,xKIn which xkDenotes the kth argument, K1, 21,x2,…,xkIs a set of categorical arguments, { x }k+1,xk+2,…,xk+l,…,xKIs a set of continuous self-variables, l 1, 2., K-K;
step 1.2, according to the concrete situation when the accident happens, dividing the traffic accident form into the accident y between vehicles1Vehicle and pedestrian accident y2Accident of bicycle y3So as to obtain the prediction variable Y ═ Y formed by three types of accidents1,y2,y3};
Step 2, adopting a minimum description length criterion to combine the continuous autovariate set { xk+1,xk+2,…,xk+l,…,xKDiscretizing;
step 2.1, initializing 1;
step 2.2, from the l continuous independent variable x of each traffic accident in the traffic accident data set Dk+lForming a set of continuous argument values
Figure BDA0002990268090000021
And for continuous sets of argument values
Figure BDA0002990268090000022
Performing descending arrangement;
step 2.3, obtaining the continuous independent variable x by using the formula (1)k+lInformation entropy e (d) with predictor variable Y:
Figure BDA0002990268090000023
in the formula (1), | Y | represents the kind of the prediction variable Y in the traffic accident data set D; p is a radical ofyiIndicating type i accident yiThe fraction in the accident data set D; i is 1,2, 3;
step 2.4, traversing and searching the l continuous independent variable x according to the information gain maximization principlek+lIs optimized to the discrete point blAt the optimum discrete point blDividing a traffic accident data set D into a first subset D for a boundary1And a second subset D2And calculating the information gain G (b) obtained by dispersion using the formula (2)l,D):
Figure BDA0002990268090000024
In the formula (2), | D1|、|D2Respectively, the first subset D1A second subset D2The number of traffic accident cases in the traffic accident data set D; e (D)1) And E (D)2) Are respectively a first subset D1And a second subset D2The entropy of the information of (1);
step 2.5, calculating a stop criterion S by using the formula (3):
Figure BDA0002990268090000025
in the formula (3, | Y1|、|Y2Respectively denote the first subset D1A second subset D2The kind of the medium predictive variable;
step 2.6, judge the information gain G (b)lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point blEffectively, the optimal discrete point blAdding the discrete point into a discrete point set B; respectively connecting the first subset D1And a second subset D2Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first subset D1And a second subset D2Searching the next optimal discrete tangent point; otherwise, executing step 2.7;
step 2.7, after l +1 is assigned to l, whether l is larger than K-K is judged, if yes, a continuous self-variable set is represented
Figure BDA0002990268090000026
All continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, a step 2.8 is executed; otherwise, returning to the step 2.2 for execution;
step 2.8, discretizing each continuous independent variable based on the discrete point set B to convert all independent variables into classified independent variables, thereby obtaining a discretized independent variable set XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K }; wherein x isk+lRepresents the k + l classification independent variable;
step 3, adopting an attribute selection method based on association rules to mine interaction among independent variables;
step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; the discrete autovariate is collected into XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K Setting all factors as a rule front piece A, and setting all types of accidents of a prediction variable Y as a rule back piece B;
step 3.2, respectively defining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B, as shown in formulas (4), (5) and (6):
Figure BDA0002990268090000031
Figure BDA0002990268090000032
Figure BDA0002990268090000033
in the expressions (4), (5) and (6), N is the total number of samples of the traffic accident, and P (A ≈ B) represents the frequency of the simultaneous occurrence of the factor A and the factor B in the traffic accident data; p (A), P (B) respectively represent the frequency of the factor A and the factor B in the traffic accident data;
step 3.3, defining and initializing the minimum support degree minSup, the minimum confidence degree minConf and the minimum promotion degree minLift of the association rule A → B;
3.4, mining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B;
step 3.5, defining three constrained rules of the attribute selection method based on the association rule A → B, namely a strong association rule SAR, a classification association rule CAR and an atomic association rule AAR;
step 3.5.1, obtaining an expression of the strong association rule SAR by using the formula (7):
Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift(7)
in the formula (7), Λ represents ^ and;
step 3.5.2, set X of discretized autovariablesMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K In, let | xkIs the kth independent variable xkA range of values of; k is more than or equal to 1 and less than or equal to K;
step 3.5.3, defining the set of influencer values FVIS to be taken for all possible argumentsSet of values, i.e.
Figure BDA0002990268090000041
Defining a target value set TVIS as a set of possible values of all prediction variables, namely TVIS ═ Y |;
step 3.5.4, obtaining an expression of the classification association rule CAR by using the formula (8):
Figure BDA0002990268090000042
in the formula (8), | B | is the kind of the predictor variable;
step 3.5.3, obtaining the expression of the atomic association rule AAR by using the formula (9):
Figure BDA0002990268090000043
in the formula (8), | a | is the number of independent variable types;
step 3.6, enabling all association rules A → B meeting the classification association rule CAR to form CARset, and enabling association rules A → B meeting the atomic association rule AAR to form an atomic association rule set AARset;
3.7, arranging the association rules A → B in the atomic type association rule set AARset in a descending order according to the confidence;
step 3.8, sequentially judging whether the rule back-piece of the atomic association rule AAR appears in the rule front-piece of the classification association rule set CARset, if so, determining the rule back-piece of the atomic association rule set AAR as a redundant variable, and deleting all association rules A → B with the rule back-piece of the atomic association rule AAR from the classification association rule set CARset;
step 3.9, processing according to the step 3.8 until the atomic type association rule set AARset is empty;
step 3.10, mapping the residual association rules A → B in the classification association rule Set CARset to corresponding independent variables, thereby obtaining an independent variable Set containing interaction among the variables;
step 4, constructing an accident form prediction model based on a mixed Logit principle;
step 4.1, establishing a hybrid Logit model by using the formula (10):
Figure BDA0002990268090000044
in the formula (10), Pn(yi) The accident form of the nth traffic accident is yiThe probability of (a) of (b) being,
Figure BDA0002990268090000045
the accident form of the nth traffic accident is yiThe vector of parameters of the time-independent variable,
Figure BDA0002990268090000046
representing independent variables
Figure BDA0002990268090000047
Vector form of the estimated parameters;
Figure BDA0002990268090000048
probability density function representing random parameter beta, beta and
Figure BDA0002990268090000049
respectively representing the vector form of the probability density function mean value and the variance parameter;
step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;
step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, under the set confidence level, the parameters of the hybrid Logit model are screened by adopting a stepwise regression method, and the parameter estimation value Par of the hybrid Logit model is obtained by screening
Figure BDA0002990268090000051
Step 5, predicting the traffic accident form probability based on the constructed mixed Logit model;
step 5.1, obtaining independent variable information influencing traffic accident forms in real time;
step 5.2, inputting the independent variable information obtained in the step 5.1 into a formula (11), and calculating to obtain the accident form y under the condition of corresponding independent variable informationiUtility function of
Figure BDA0002990268090000052
Figure BDA0002990268090000053
In the formula (11), the reaction mixture is,
Figure BDA0002990268090000054
representing hybrid Logit model parameter estimates
Figure BDA0002990268090000055
The median accident pattern is yiA parameter vector of time;
step 5.3, obtaining the accident form y under the condition of the independent variable information influencing the traffic accident form in real time by using the formula (12)iIs predicted with probability of
Figure BDA0002990268090000056
Figure BDA0002990268090000057
In the formula (12), the reaction mixture is,
Figure BDA0002990268090000058
the total utility function is represented.
Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the problem that the information loss of the dispersed independent variable is serious due to the traditional unsupervised discretization algorithm, the method innovatively adopts the minimum description length criterion in the supervised discretization algorithm to carry out continuous variable discretization, reduces the information loss of the dispersed independent variable, and is beneficial to seeking a better discrete point position, thereby improving the prediction precision of the model;
2. according to the method, interaction among the independent variables is mined through an attribute selection method based on the association rule, and the interaction among the independent variables is incorporated into the mixed Logit model, so that the influence of the interaction among the independent variables on the traffic accident form probability is favorably deeply understood, and the adverse influence of error inference caused by neglecting the interaction among the variables is overcome;
3. the method provides a mixed Logit model containing interaction among variables, provides a new solution for predicting traffic accident form probability, and provides technical support for improving road traffic safety environment.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of the minimum description length criterion of the present invention;
FIG. 3 is a flowchart of an association rule based attribute selection method according to the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a road traffic accident form prediction method takes the death traffic accident data in shenzhen, guangdong province as an example, and is performed according to the following steps:
step 1, collecting and processing death road traffic accident data of Shenzhen city;
step 1.1, acquiring road traffic accident data from 2014 to 2016 in Shenzhen city from a road traffic safety research information sharing platform, screening out death traffic accident data, deleting accident data which are incompletely (with blank items) or unreasonably recorded in a traffic accident database, acquiring 1264(N is 1264) death traffic accident data as a traffic accident data set D, selecting 16(K is 16) independent variables which potentially influence accident forms from five aspects of motor vehicle drivers, vehicles, roads, environments and time, and forming an independent variable set X which influences the traffic accident forms1,x2,…,xk,…,xKThe age of the driver and the accident occurrence time are continuous independent variables, so the classification variable set is { x }1,x2,…,x1414, from a set of successive variables x14+1,x14+2The descriptive statistics of the respective variables are shown in table 1;
step 1.2, according to the concrete situation when the accident happens, dividing the traffic accident form into the accident y between vehicles1Vehicle and pedestrian accident y2Accident of bicycle y3So as to obtain the prediction variable Y ═ Y formed by three types of accidents1,y2,y3};
TABLE 1 descriptive statistics of independent variables
Figure BDA0002990268090000061
Figure BDA0002990268090000071
Note: drivers under 18 years of age were all unlicensed motorcycle drivers, indicating that this variable was the reference variable.
Step 2, adopting a minimum description length criterion to combine the continuous autovariate set { xk+1,xk+2,…,xk+l,…,xKDiscretizing;
step 2.1 and fig. 2 are flowcharts of the minimum description length criterion, where K is 16, K is 14, and l is initialized to 1;
step 2.2, from the l continuous independent variable x of each traffic accident in the traffic accident data set Dk+lForming a set of continuous argument values
Figure BDA0002990268090000072
And according to a continuous argument xk+lThe values of (a) are sorted in descending order;
step 2.3, obtaining the continuous independent variable x by using the formula (1)k+lInformation entropy e (d) with predictor variable Y:
Figure BDA0002990268090000073
in the formula (1), | Y | represents the kind of the prediction variable Y in the traffic accident data set D; p is a radical ofyiIndicating type i accident yiThe fraction in the accident data set D; i is 1,2, 3;
step 2.4, traversing and searching the l continuous independent variable x according to the information gain maximization principlek+lIs optimized to the discrete point blAt the optimum discrete point blDividing a traffic accident data set D into a first subset D for a boundary1And a second subset D2And calculating the information gain G (b) obtained by dispersion using the formula (2)l,D):
Figure BDA0002990268090000081
In the formula (2), | D1|、|D2Respectively, the first subset D1A second subset D2The number of traffic accident cases in the traffic accident data set D; e (D)1) And E (D)2) Are respectively a first subset D1And a second subset D2The entropy of the information of (1);
step 2.5, calculating a stop criterion S by using the formula (3):
Figure BDA0002990268090000082
in the formula (3, | Y1|、|Y2Respectively denote the first subset D1A second subset D2The kind of the medium predictive variable;
step 2.6, judge the information gain G (b)lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point blEffectively, the optimal discrete point blAdding the discrete point into a discrete point set B; and respectively connecting the first subset D1And a second subset D2Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first sub-stepCollection D1And a second subset D2Searching the next optimal discrete tangent point; otherwise, executing step 2.7;
step 2.7, after l +1 is assigned to l, whether l is larger than K-K is judged, if yes, a continuous self-variable set is represented
Figure BDA0002990268090000083
All continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, a step 2.8 is executed; otherwise, returning to the step 2.2 for execution;
step 2.8, discretizing the age and the accident occurrence time of the continuous independent variable driver based on the discrete point set B, so that independent variables are all converted into classified independent variables, and obtaining a discrete continuous independent variable set XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K }; wherein x is k+l Represents the k + l classification variable; the discretization results are shown in table 2;
TABLE 2 discretization of continuous arguments
Figure BDA0002990268090000084
Note: denotes this variable as a reference variable.
Step 3, adopting an attribute selection method based on association rules to mine interaction among independent variables;
step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; as shown in FIG. 2, the discrete autovariables are collected into XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K Setting all factors in the forecast as rule front pieces A, and setting all types of accidents of a forecast variable Y as rule back pieces B;
step 3.2, respectively defining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B, as shown in formulas (4), (5) and (6):
Figure BDA0002990268090000091
Figure BDA0002990268090000092
Figure BDA0002990268090000093
in the expressions (4), (5) and (6), N is the total number of samples of the traffic accident, and P (A ≈ B) represents the frequency of the simultaneous occurrence of the factor A and the factor B in the traffic accident data; p (A), P (B) respectively represent the frequency of the factor A and the factor B in the traffic accident data;
step 3.3, as shown in fig. 3, defining and initializing a minimum support minsupo ═ 10%, a minimum confidence minConf ═ 50%, and a minimum lift minLift ═ 100% of the association rule a → B;
3.4, mining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B;
step 3.5, defining three constrained rules of the attribute selection method based on the association rule A → B, namely a strong association rule SAR, a classification association rule CAR and an atomic association rule AAR;
step 3.5.1, obtaining an expression of the strong association rule SAR by using the formula (7):
Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift(7)
in the formula (7), Λ represents ^ and;
step 3.5.2, set X of discretized autovariablesMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K In, let | xkIs the kth independent variable xkA range of values of; k is more than or equal to 1 and less than or equal to K;
step 3.5.3, defining the influencer value set FVIS as the set of all possible values of the independent variable, that is
Figure BDA0002990268090000094
Defining a target value set TVIS as a set of possible values of all prediction variables, namely TVIS ═ Y |;
step 3.5.4, obtaining an expression of the classification association rule CAR by using the formula (8):
Figure BDA0002990268090000095
in the formula (8), | B | is the kind of the predictor variable;
step 3.5.3, obtaining the expression of the atomic association rule AAR by using the formula (9):
Figure BDA0002990268090000101
in the formula (8), | a | is the number of independent variable types;
step 3.6, enabling all association rules A → B meeting the classification association rule CAR to form CARset, and enabling association rules A → B meeting the atomic association rule AAR to form an atomic association rule set AARset;
3.7, arranging the association rules A → B in the atomic type association rule set AARset in a descending order according to the confidence;
step 3.8, sequentially judging whether the rule back-piece of the atomic association rule AAR appears in the rule front-piece of the classification association rule set CARset, if so, determining the rule back-piece of the atomic association rule set AAR as a redundant variable, and deleting all association rules A → B with the rule back-piece of the atomic association rule AAR from the classification association rule set CARset;
step 3.9, processing according to the step 3.8 until the atomic type association rule set AARset is empty;
step 3.10, mapping the residual association rules A → B in the classification association rule Set CARset to corresponding independent variables, thereby obtaining an independent variable Set containing interaction among the variables;
step 4, constructing an accident form prediction model based on a mixed Logit principle;
step 4.1, establishing a hybrid Logit model by using the formula (10):
Figure BDA0002990268090000102
in the formula (10), Pn(yi) The accident form of the nth traffic accident is yiThe probability of (a) of (b) being,
Figure BDA0002990268090000103
the accident form of the nth traffic accident is yiThe vector of parameters of the time-independent variable,
Figure BDA0002990268090000104
representing independent variables
Figure BDA0002990268090000105
Vector form of the estimated parameters;
Figure BDA0002990268090000106
probability density function representing random parameter beta, beta and
Figure BDA0002990268090000107
respectively representing the vector form of the probability density function mean value and the variance parameter;
step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, applying software SAS9.4, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;
step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, screening the parameters of the hybrid Logit model by adopting a stepwise regression method under the confidence level of 90 percent, and obtaining the parameter estimation value Par of the hybrid Logit model by screening
Figure BDA0002990268090000108
See table 3;
TABLE 3 hybrid Logit model parameter estimation results for death traffic accident morphology
Figure BDA0002990268090000111
Step 5, predicting the traffic accident form probability based on the constructed mixed Logit model;
step 5.1, obtaining independent variable information influencing traffic accident forms in real time; (ii) a
Step 5.1.1, as shown in table 3, the mixed Logit model parameter estimation result shows that the independent variables influencing the death traffic accident form include: the age of a driver, the type of a vehicle, the road isolation form, the type of a road section at an intersection, the road alignment, the accident occurrence time, the lighting condition, the weather and the traffic accident form are obviously related; acquiring driver age and vehicle type data from video data of an urban road intelligent traffic video monitoring system; acquiring road isolation form, intersection section type and road alignment data based on road design data; acquiring accident occurrence time, lighting conditions and weather data through a meteorological department;
step 5.2, the independent variable information input formula (11) which influences the traffic accident form and is obtained in the step 5.1 is calculated to obtain the accident form y under the traffic information conditioniUtility function of
Figure BDA0002990268090000121
Figure BDA0002990268090000122
In the formula (11), the reaction mixture is,
Figure BDA0002990268090000123
representing hybrid Logit model parameter estimates
Figure BDA0002990268090000124
The median accident pattern is yiA parameter vector of time;
and 5.3, obtaining the accident form y under the condition of the current independent variable information by using the formula (12)iIs predicted with probability of
Figure BDA0002990268090000125
Figure BDA0002990268090000126
In the formula (12), the reaction mixture is,
Figure BDA0002990268090000127
representing a total utility function;
and 5.4, obtaining the prediction probability of the traffic accident in each form under the condition of the current independent variable information according to the step 5.3, transmitting the information to vehicle-mounted communication equipment based on the vehicle networking wireless communication technology, and giving early warning and reminding to the traffic accident form which is mainly prevented by the driver through an intelligent voice broadcasting device. For example, on a non-straight road, the probability of a single vehicle accident of a 14-21 year old motorcycle driver is up to 96.38%, and the probability of an inter-vehicle accident and a vehicle-pedestrian accident are 1.49% and 2.14%, respectively. When 14 to 21 year old motorcycle driver is about to go to the non-straight road, send the early warning of the place ahead high probability bicycle accident to motorcycle driver through intelligent voice broadcast device, remind the driver to prudent deceleration and go to realize the accurate management and control to the driver, guarantee driving safety.

Claims (1)

1. A road traffic accident form prediction method is characterized by comprising the following steps:
step 1, collecting and processing road traffic accident data;
step 1.1, acquiring N traffic accidents from a road traffic accident database to form traffic accident data D; defining the self-variable set influencing the traffic accident form in the traffic accident data set D as X ═ X1,x2,…,xk,…,xKIn which xkDenotes the kth argument, K1, 21,x2,…,xkIs a set of categorical arguments, { x }k+1,xk+2,…,xk+l,…,xKIs a set of continuous self-variables, l 1, 2., K-K;
step 1.2, according to the concrete situation when the accident happens, dividing the traffic accident form into the accident y between vehicles1Vehicle and pedestrian accident y2Accident of bicycle y3So as to obtain the prediction variable Y ═ Y formed by three types of accidents1,y2,y3};
Step 2, adopting a minimum description length criterion to combine the continuous autovariate set { xk+1,xk+2,…,xk+l,…,xKDiscretizing;
step 2.1, initializing 1;
step 2.2, from the l continuous independent variable x of each traffic accident in the traffic accident data set Dk+lForming a set of continuous argument values
Figure FDA0002990268080000011
And for continuous sets of argument values
Figure FDA0002990268080000012
Performing descending arrangement;
step 2.3, obtaining the continuous independent variable x by using the formula (1)k+lInformation entropy e (d) with predictor variable Y:
Figure FDA0002990268080000013
in the formula (1), | Y | represents the kind of the prediction variable Y in the traffic accident data set D; p is a radical ofyiIndicating type i accident yiThe fraction in the accident data set D; i is 1,2, 3;
step 2.4, traversing and searching the l continuous independent variable x according to the information gain maximization principlek+lIs optimized to the discrete point blAt the optimum discrete point blDividing a traffic accident data set D into a first subset D for a boundary1And a second subset D2And calculating the information gain G (b) obtained by dispersion using the formula (2)l,D):
Figure FDA0002990268080000014
In the formula (2), | D1|、|D2Respectively, the first subset D1A second subset D2The number of traffic accident cases in the traffic accident data set D; e (D)1) And E (D)2) Are respectively a first subset D1And a second subset D2The entropy of the information of (1);
step 2.5, calculating a stop criterion S by using the formula (3):
Figure FDA0002990268080000015
in the formula (3, | Y1|、|Y2Respectively denote the first subset D1A second subset D2The kind of the medium predictive variable;
step 2.6, judge the information gain G (b)lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point blEffectively, the optimal discrete point blAdding the discrete point into a discrete point set B; respectively connecting the first subset D1And a second subset D2Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first subset D1And a second subset D2Searching the next optimal discrete tangent point; otherwise, executing step 2.7;
step 2.7, after l +1 is assigned to l, whether l is larger than K-K is judged, if yes, a continuous self-variable set is represented
Figure FDA0002990268080000021
All continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, step 2.8 is executed(ii) a Otherwise, returning to the step 2.2 for execution;
step 2.8, discretizing each continuous independent variable based on the discrete point set B to convert all independent variables into classified independent variables, thereby obtaining a discretized independent variable set XMDLP={x1,x2,…,xk,xk+1,xk+2,xk+l,…,xK}; wherein x isk+lRepresents the k + l classification independent variable;
step 3, adopting an attribute selection method based on association rules to mine interaction among independent variables;
step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; the discrete autovariate is collected into XMDLP={x1,x2,…,xk,xk+1,xk+2,xk+l,…,xKSetting all factors as a rule front piece A, and setting all types of accidents of a prediction variable Y as a rule back piece B;
step 3.2, respectively defining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B, as shown in formulas (4), (5) and (6):
Figure FDA0002990268080000022
Figure FDA0002990268080000023
Figure FDA0002990268080000024
in the expressions (4), (5) and (6), N is the total number of samples of the traffic accident, and P (A ≈ B) represents the frequency of the simultaneous occurrence of the factor A and the factor B in the traffic accident data; p (A), P (B) respectively represent the frequency of the factor A and the factor B in the traffic accident data;
step 3.3, defining and initializing the minimum support degree min Sup, the minimum confidence degree min Conf and the minimum Lift degree min Lift of the association rule A → B;
3.4, mining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B;
step 3.5, defining three constrained rules of the attribute selection method based on the association rule A → B, namely a strong association rule SAR, a classification association rule CAR and an atomic association rule AAR;
step 3.5.1, obtaining an expression of the strong association rule SAR by using the formula (7):
Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift (7)
in the formula (7), Λ represents ^ and;
step 3.5.2, set X of discretized autovariablesMDLP={x1,x2,…,xk,xk+1,xk+2,xk+l,…,xKIn, let | xkIs the kth independent variable xkA range of values of; k is more than or equal to 1 and less than or equal to K;
step 3.5.3, defining the influencer value set FVIS as the set of all possible values of the independent variable, that is
Figure FDA0002990268080000031
Defining a target value set TVIS as a set of possible values of all prediction variables, namely TVIS ═ Y |;
step 3.5.4, obtaining an expression of the classification association rule CAR by using the formula (8):
Figure FDA0002990268080000032
in the formula (8), | B | is the kind of the predictor variable;
step 3.5.3, obtaining the expression of the atomic association rule AAR by using the formula (9):
Figure FDA0002990268080000033
in the formula (8), | a | is the number of independent variable types;
step 3.6, enabling all association rules A → B meeting the classification association rule CAR to form CARset, and enabling association rules A → B meeting the atomic association rule AAR to form an atomic association rule set AARset;
3.7, arranging the association rules A → B in the atomic type association rule set AARset in a descending order according to the confidence;
step 3.8, sequentially judging whether the rule back-piece of the atomic association rule AAR appears in the rule front-piece of the classification association rule set CARset, if so, determining the rule back-piece of the atomic association rule set AAR as a redundant variable, and deleting all association rules A → B with the rule back-piece of the atomic association rule AAR from the classification association rule set CARset;
step 3.9, processing according to the step 3.8 until the atomic type association rule set AARset is empty;
step 3.10, mapping the residual association rules A → B in the classification association rule Set CARset to corresponding independent variables, thereby obtaining an independent variable Set containing interaction among the variables;
step 4, constructing an accident form prediction model based on a mixed Logit principle;
step 4.1, establishing a hybrid Logit model by using the formula (10):
Figure FDA0002990268080000034
in the formula (10), Pn(yi) The accident form of the nth traffic accident is yiThe probability of (a) of (b) being,
Figure FDA0002990268080000035
the accident form of the nth traffic accident is yiThe vector of parameters of the time-independent variable,
Figure FDA0002990268080000041
representing independent variables
Figure FDA0002990268080000042
Vector form of the estimated parameters;
Figure FDA0002990268080000043
probability density function representing random parameter beta, beta and
Figure FDA0002990268080000044
respectively representing the vector form of the probability density function mean value and the variance parameter;
step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;
step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, under the set confidence level, the parameters of the hybrid Logit model are screened by adopting a stepwise regression method, and the parameter estimation value Par of the hybrid Logit model is obtained by screening
Figure FDA0002990268080000045
Step 5, predicting the traffic accident form probability based on the constructed mixed Logit model;
step 5.1, obtaining independent variable information influencing traffic accident forms in real time;
step 5.2, inputting the independent variable information obtained in the step 5.1 into a formula (11), and calculating to obtain the accident form y under the condition of corresponding independent variable informationiUtility function of
Figure FDA0002990268080000046
Figure FDA0002990268080000047
In the formula (11), the reaction mixture is,
Figure FDA0002990268080000048
representing hybrid Logit model parameter estimates
Figure FDA0002990268080000049
The median accident pattern is yiA parameter vector of time;
step 5.3, obtaining the accident form y under the condition of the independent variable information influencing the traffic accident form in real time by using the formula (12)iIs predicted with probability of
Figure FDA00029902680800000410
Figure FDA00029902680800000411
In the formula (12), the reaction mixture is,
Figure FDA00029902680800000412
the total utility function is represented.
CN202110312213.8A 2021-03-24 2021-03-24 Road traffic accident form prediction method Active CN113077625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110312213.8A CN113077625B (en) 2021-03-24 2021-03-24 Road traffic accident form prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110312213.8A CN113077625B (en) 2021-03-24 2021-03-24 Road traffic accident form prediction method

Publications (2)

Publication Number Publication Date
CN113077625A true CN113077625A (en) 2021-07-06
CN113077625B CN113077625B (en) 2022-03-15

Family

ID=76613618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110312213.8A Active CN113077625B (en) 2021-03-24 2021-03-24 Road traffic accident form prediction method

Country Status (1)

Country Link
CN (1) CN113077625B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762364A (en) * 2021-08-23 2021-12-07 东南大学 Unbalanced traffic accident data synthesis sampling method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09190422A (en) * 1996-01-11 1997-07-22 Toshiba Corp Device for predicting traffic condition
CN104821080A (en) * 2015-03-02 2015-08-05 北京理工大学 Intelligent vehicle traveling speed and time predication method based on macro city traffic flow
CN105931460A (en) * 2016-05-13 2016-09-07 东南大学 Variable speed limit control strategy optimization method for continuous bottleneck section of expressway
CN106709595A (en) * 2016-11-24 2017-05-24 北京交通大学 Accident delay time prediction method and system based on unformatted information
CN108717786A (en) * 2018-07-17 2018-10-30 南京航空航天大学 A kind of traffic accident causation method for digging based on universality meta-rule
CN109636053A (en) * 2018-12-20 2019-04-16 黄凤南 A kind of car accident solution optimization system
WO2019103197A1 (en) * 2017-11-23 2019-05-31 (주)에이텍티앤 System for predicting traffic accident on basis of artificial intelligence and method therefor
CN110555565A (en) * 2019-09-09 2019-12-10 南京东控智能交通研究院有限公司 Decision tree model-based expressway exit ramp accident severity prediction method
CN110782070A (en) * 2019-09-25 2020-02-11 北京市交通信息中心 Urban rail transit emergency passenger flow space-time distribution prediction method
CN110826244A (en) * 2019-11-15 2020-02-21 同济大学 Conjugate gradient cellular automata method for simulating influence of rail transit on urban growth
CN111768625A (en) * 2020-07-01 2020-10-13 中国计量大学 Traffic road event prediction method based on graph embedding
CN112149922A (en) * 2020-11-03 2020-12-29 南京信息职业技术学院 Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel
CN112224211A (en) * 2020-10-19 2021-01-15 中交第一公路勘察设计研究院有限公司 Driving simulation system based on multi-autonomous-body traffic flow

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09190422A (en) * 1996-01-11 1997-07-22 Toshiba Corp Device for predicting traffic condition
CN104821080A (en) * 2015-03-02 2015-08-05 北京理工大学 Intelligent vehicle traveling speed and time predication method based on macro city traffic flow
CN105931460A (en) * 2016-05-13 2016-09-07 东南大学 Variable speed limit control strategy optimization method for continuous bottleneck section of expressway
CN106709595A (en) * 2016-11-24 2017-05-24 北京交通大学 Accident delay time prediction method and system based on unformatted information
WO2019103197A1 (en) * 2017-11-23 2019-05-31 (주)에이텍티앤 System for predicting traffic accident on basis of artificial intelligence and method therefor
CN108717786A (en) * 2018-07-17 2018-10-30 南京航空航天大学 A kind of traffic accident causation method for digging based on universality meta-rule
CN109636053A (en) * 2018-12-20 2019-04-16 黄凤南 A kind of car accident solution optimization system
CN110555565A (en) * 2019-09-09 2019-12-10 南京东控智能交通研究院有限公司 Decision tree model-based expressway exit ramp accident severity prediction method
CN110782070A (en) * 2019-09-25 2020-02-11 北京市交通信息中心 Urban rail transit emergency passenger flow space-time distribution prediction method
CN110826244A (en) * 2019-11-15 2020-02-21 同济大学 Conjugate gradient cellular automata method for simulating influence of rail transit on urban growth
CN111768625A (en) * 2020-07-01 2020-10-13 中国计量大学 Traffic road event prediction method based on graph embedding
CN112224211A (en) * 2020-10-19 2021-01-15 中交第一公路勘察设计研究院有限公司 Driving simulation system based on multi-autonomous-body traffic flow
CN112149922A (en) * 2020-11-03 2020-12-29 南京信息职业技术学院 Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIANGHAI MENG 等: "Research on Accident Prediction Models for Freeways in Mountainous and Rolling Areas", 《2015 SEVENTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION》 *
刘文玲 等: "基于关联规则的公交事故受伤情况预测研究", 《控制工程》 *
王磊等: "高速公路交通事故影响因素分析及伤害估计", 《中国安全科学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762364A (en) * 2021-08-23 2021-12-07 东南大学 Unbalanced traffic accident data synthesis sampling method
CN113762364B (en) * 2021-08-23 2022-11-04 东南大学 Unbalanced traffic accident data synthesis sampling method

Also Published As

Publication number Publication date
CN113077625B (en) 2022-03-15

Similar Documents

Publication Publication Date Title
CN110458244B (en) Traffic accident severity prediction method applied to regional road network
CN107958269A (en) A kind of driving risk factor Forecasting Methodology based on hidden Markov model
CN103971523A (en) Mountainous road traffic safety dynamic early-warning system
CN113837446B (en) Airport land side area traffic situation prediction method based on multi-source heterogeneous data
CN114463972B (en) Road section interval traffic analysis prediction method based on ETC portal communication data
CN106251642A (en) A kind of public transport road based on real-time bus gps data chain speed calculation method
CN113077625B (en) Road traffic accident form prediction method
CN110288826B (en) Traffic control subregion clustering division method based on multi-source data fusion and MILP
CN112509328B (en) Method for analyzing conflict behavior of intersection right-turning motor vehicle and electric bicycle
Wang et al. Energy consumption characteristics based driving conditions construction and prediction for hybrid electric buses energy management
Shang et al. Analyzing the effects of road type and rainy weather on fuel consumption and emissions: A mesoscopic model based on big traffic data
CN111907523A (en) Vehicle following optimization control method based on fuzzy reasoning
CN110097757B (en) Intersection group critical path identification method based on depth-first search
CN117746626A (en) Intelligent traffic management method and system based on traffic flow
CN112651666A (en) Driver risk assessment method based on driving mode transfer characteristics
CN116824868A (en) Method, device, equipment and medium for identifying illegal parking points and predicting congestion of vehicles
Jain et al. Enhance traffic flow prediction with real-time vehicle data integration
CN116453352A (en) Freight car traffic flow prediction method based on K clustering algorithm and neural network
CN115774942A (en) Driving style identification model modeling and statistical method based on Internet of vehicles real vehicle data and SVM
CN115587536A (en) Traffic accident severity prediction method, equipment and storage medium
CN113313941B (en) Vehicle track prediction method based on memory network and encoder-decoder model
CN112036709A (en) Random forest based rainfall weather expressway secondary accident cause analysis method
CN113945958A (en) Taxi GPS data-based method for identifying vehicle in passenger searching state in road section
CN110827446A (en) Method for predicting running state of electric automobile
CN111275241A (en) Bus passenger getting-off station inference method based on machine learning decision tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant