CN113077625A

CN113077625A - Road traffic accident form prediction method

Info

Publication number: CN113077625A
Application number: CN202110312213.8A
Authority: CN
Inventors: 石琴; 胡宗品; 陈一锴; 骆仁佳; 于淑君
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2021-07-06
Anticipated expiration: 2041-03-24
Also published as: CN113077625B

Abstract

The invention discloses a road traffic accident form prediction method, which comprises the following steps: 1. collecting and processing road traffic accident data; 2. discretizing continuous independent variables in the traffic accident data by adopting a minimum description length criterion; 3. mining interaction among independent variables by adopting an attribute selection method based on association rules in the field of data mining; 4. establishing a hybrid Logit model, and performing parameter estimation by adopting a maximum likelihood estimation method; 5. and predicting the traffic accident form probability based on the constructed mixed Logit model. The invention fully utilizes the information of the predictive variables in the continuous independent variable dispersion process and excavates the influence of interaction among the variables on the accident form so as to reduce the information loss of the dispersed variables and overcome the problem of error inference caused by neglecting the interaction among the variables, thereby improving the prediction precision of a traffic accident form prediction model and providing technical support for the improvement of the road traffic safety environment.

Description

Road traffic accident form prediction method

Technical Field

The invention relates to a road traffic accident form prediction method, and belongs to the technical field of road traffic safety analysis.

Background

Data of the road safety global status report 2018 show that the number of deaths caused by traffic accidents is increased to 135 thousands of people every year in the world, wherein 80% of the deaths caused by traffic accidents occur in countries with moderate income. As the country with the largest medium income, more than 24 thousands of traffic accidents occur in China every year, so that more than 6 thousands of people die, and the traffic safety situation is severe. The influence factors of the traffic accidents with different forms are obviously different. The method constructs the relationship between the traffic accident form and the influence factors such as drivers, roads, environments and the like, thereby predicting the traffic accident form, and is one of important traffic safety improvement measures.

In the aspect of an accident form prediction method, fixed parameter discrete selection models such as Probit and multiple Logit are widely applied. However, such methods ignore unobserved heterogeneity that is prevalent in traffic accident data, often resulting in biased parameter estimates. Compared with a fixed parameter discrete selection model, the hybrid Logit model reflects the heterogeneity of traffic accident data by assuming variable coefficients as random parameters. However, this method has the following problems in the accident pattern prediction: (1) for continuous independent variables in traffic accident data, an unsupervised discretization algorithm is mostly adopted for discretization, and the relation between the independent variables and the prediction variables cannot be considered in the discretization process, so that the information loss of the discretized independent variables is serious; (2) the occurrence of traffic accidents generally depends on the combined action of a plurality of independent variables, and the method ignores the influence of the interaction among the variables on the accident form and is easy to cause wrong prediction and inference.

Disclosure of Invention

The invention provides a road traffic accident form prediction method for overcoming the defects of the prior art, aiming at fully utilizing the information of a prediction variable in the continuous independent variable dispersion process and mining the influence of interaction among the variables on the accident form so as to reduce the information loss of the dispersed variable and overcome the problem of error inference caused by neglecting the interaction among the variables, thereby improving the prediction precision of a traffic accident form prediction model.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a road traffic accident form prediction method, which is characterized by comprising the following steps:

step 1, collecting and processing road traffic accident data;

step 1.1, acquiring N traffic accidents from a road traffic accident database to form traffic accident data D; defining the self-variable set influencing the traffic accident form in the traffic accident data set D as X ═ X¹,x²,…,x^k,…,x^KIn which x^kDenotes the kth argument, K1, 2¹,x²,…,x^kIs a set of categorical arguments, { x }^k+1,x^k+2,…,x^k+l,…,x^KIs a set of continuous self-variables, l 1, 2., K-K;

step 1.2, according to the concrete situation when the accident happens, dividing the traffic accident form into the accident y between vehicles₁Vehicle and pedestrian accident y₂Accident of bicycle y₃So as to obtain the prediction variable Y ═ Y formed by three types of accidents₁,y₂,y₃}；

Step 2, adopting a minimum description length criterion to combine the continuous autovariate set { x^k+1,x^k+2,…,x^k+l,…,x^KDiscretizing;

step 2.1, initializing 1;

step 2.2, from the l continuous independent variable x of each traffic accident in the traffic accident data set D^k+lForming a set of continuous argument values

And for continuous sets of argument values

Performing descending arrangement;

step 2.3, obtaining the continuous independent variable x by using the formula (1)^k+lInformation entropy e (d) with predictor variable Y:

in the formula (1), | Y | represents the kind of the prediction variable Y in the traffic accident data set D; p is a radical of_yiIndicating type i accident y_iThe fraction in the accident data set D; i is 1,2, 3;

step 2.4, traversing and searching the l continuous independent variable x according to the information gain maximization principle^k+lIs optimized to the discrete point b_lAt the optimum discrete point b_lDividing a traffic accident data set D into a first subset D for a boundary₁And a second subset D₂And calculating the information gain G (b) obtained by dispersion using the formula (2)_l,D)：

In the formula (2), | D₁|、|D₂Respectively, the first subset D₁A second subset D₂The number of traffic accident cases in the traffic accident data set D; e (D)₁) And E (D)₂) Are respectively a first subset D₁And a second subset D₂The entropy of the information of (1);

step 2.5, calculating a stop criterion S by using the formula (3):

in the formula (3, | Y₁|、|Y₂Respectively denote the first subset D₁A second subset D₂The kind of the medium predictive variable;

step 2.6, judge the information gain G (b)_lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point b_lEffectively, the optimal discrete point b_lAdding the discrete point into a discrete point set B; respectively connecting the first subset D₁And a second subset D₂Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first subset D₁And a second subset D₂Searching the next optimal discrete tangent point; otherwise, executing step 2.7;

step 2.7, after l +1 is assigned to l, whether l is larger than K-K is judged, if yes, a continuous self-variable set is represented

All continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, a step 2.8 is executed; otherwise, returning to the step 2.2 for execution;

step 2.8, discretizing each continuous independent variable based on the discrete point set B to convert all independent variables into classified independent variables, thereby obtaining a discretized independent variable set X_MDLP＝{x¹,x²,…,x^k,x ^k+1 ,x ^k+2 ,x ^k+l ,…,x ^K }; wherein x is^k+lRepresents the k + l classification independent variable;

step 3, adopting an attribute selection method based on association rules to mine interaction among independent variables;

step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; the discrete autovariate is collected into X_MDLP＝{x¹,x²,…,x^k,x ^k+1 ,x ^k+2 ,x ^k+l ,…,x ^K Setting all factors as a rule front piece A, and setting all types of accidents of a prediction variable Y as a rule back piece B;

step 3.2, respectively defining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B, as shown in formulas (4), (5) and (6):

in the expressions (4), (5) and (6), N is the total number of samples of the traffic accident, and P (A ≈ B) represents the frequency of the simultaneous occurrence of the factor A and the factor B in the traffic accident data; p (A), P (B) respectively represent the frequency of the factor A and the factor B in the traffic accident data;

step 3.3, defining and initializing the minimum support degree minSup, the minimum confidence degree minConf and the minimum promotion degree minLift of the association rule A → B;

3.4, mining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B;

step 3.5, defining three constrained rules of the attribute selection method based on the association rule A → B, namely a strong association rule SAR, a classification association rule CAR and an atomic association rule AAR;

step 3.5.1, obtaining an expression of the strong association rule SAR by using the formula (7):

Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift(7)

in the formula (7), Λ represents ^ and;

step 3.5.2, set X of discretized autovariables_MDLP＝{x¹,x²,…,x^k,x ^k+1 ,x ^k+2 ,x ^k+l ,…,x ^K In, let | x^kIs the kth independent variable x_kA range of values of; k is more than or equal to 1 and less than or equal to K;

step 3.5.3, defining the set of influencer values FVIS to be taken for all possible argumentsSet of values, i.e.

Defining a target value set TVIS as a set of possible values of all prediction variables, namely TVIS ═ Y |;

step 3.5.4, obtaining an expression of the classification association rule CAR by using the formula (8):

in the formula (8), | B | is the kind of the predictor variable;

step 3.5.3, obtaining the expression of the atomic association rule AAR by using the formula (9):

in the formula (8), | a | is the number of independent variable types;

step 3.6, enabling all association rules A → B meeting the classification association rule CAR to form CARset, and enabling association rules A → B meeting the atomic association rule AAR to form an atomic association rule set AARset;

3.7, arranging the association rules A → B in the atomic type association rule set AARset in a descending order according to the confidence;

step 3.8, sequentially judging whether the rule back-piece of the atomic association rule AAR appears in the rule front-piece of the classification association rule set CARset, if so, determining the rule back-piece of the atomic association rule set AAR as a redundant variable, and deleting all association rules A → B with the rule back-piece of the atomic association rule AAR from the classification association rule set CARset;

step 3.9, processing according to the step 3.8 until the atomic type association rule set AARset is empty;

step 3.10, mapping the residual association rules A → B in the classification association rule Set CARset to corresponding independent variables, thereby obtaining an independent variable Set containing interaction among the variables;

step 4, constructing an accident form prediction model based on a mixed Logit principle;

step 4.1, establishing a hybrid Logit model by using the formula (10):

in the formula (10), P_n(y_i) The accident form of the nth traffic accident is y_iThe probability of (a) of (b) being,

the accident form of the nth traffic accident is y_iThe vector of parameters of the time-independent variable,

representing independent variables

Vector form of the estimated parameters;

probability density function representing random parameter beta, beta and

respectively representing the vector form of the probability density function mean value and the variance parameter;

step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;

step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, under the set confidence level, the parameters of the hybrid Logit model are screened by adopting a stepwise regression method, and the parameter estimation value Par of the hybrid Logit model is obtained by screening

Step 5, predicting the traffic accident form probability based on the constructed mixed Logit model;

step 5.1, obtaining independent variable information influencing traffic accident forms in real time;

step 5.2, inputting the independent variable information obtained in the step 5.1 into a formula (11), and calculating to obtain the accident form y under the condition of corresponding independent variable information_iUtility function of

In the formula (11), the reaction mixture is,

representing hybrid Logit model parameter estimates

The median accident pattern is y_iA parameter vector of time;

step 5.3, obtaining the accident form y under the condition of the independent variable information influencing the traffic accident form in real time by using the formula (12)_iIs predicted with probability of

In the formula (12), the reaction mixture is,

the total utility function is represented.

Compared with the prior art, the invention has the beneficial effects that:

1. aiming at the problem that the information loss of the dispersed independent variable is serious due to the traditional unsupervised discretization algorithm, the method innovatively adopts the minimum description length criterion in the supervised discretization algorithm to carry out continuous variable discretization, reduces the information loss of the dispersed independent variable, and is beneficial to seeking a better discrete point position, thereby improving the prediction precision of the model;

2. according to the method, interaction among the independent variables is mined through an attribute selection method based on the association rule, and the interaction among the independent variables is incorporated into the mixed Logit model, so that the influence of the interaction among the independent variables on the traffic accident form probability is favorably deeply understood, and the adverse influence of error inference caused by neglecting the interaction among the variables is overcome;

3. the method provides a mixed Logit model containing interaction among variables, provides a new solution for predicting traffic accident form probability, and provides technical support for improving road traffic safety environment.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow chart of the minimum description length criterion of the present invention;

FIG. 3 is a flowchart of an association rule based attribute selection method according to the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, a road traffic accident form prediction method takes the death traffic accident data in shenzhen, guangdong province as an example, and is performed according to the following steps:

step 1, collecting and processing death road traffic accident data of Shenzhen city;

step 1.1, acquiring road traffic accident data from 2014 to 2016 in Shenzhen city from a road traffic safety research information sharing platform, screening out death traffic accident data, deleting accident data which are incompletely (with blank items) or unreasonably recorded in a traffic accident database, acquiring 1264(N is 1264) death traffic accident data as a traffic accident data set D, selecting 16(K is 16) independent variables which potentially influence accident forms from five aspects of motor vehicle drivers, vehicles, roads, environments and time, and forming an independent variable set X which influences the traffic accident forms¹,x²,…,x^k,…,x^KThe age of the driver and the accident occurrence time are continuous independent variables, so the classification variable set is { x }¹,x²,…,x¹⁴14, from a set of successive variables x¹⁴⁺¹,x¹⁴⁺²The descriptive statistics of the respective variables are shown in table 1;

TABLE 1 descriptive statistics of independent variables

Note: drivers under 18 years of age were all unlicensed motorcycle drivers, indicating that this variable was the reference variable.

step 2.1 and fig. 2 are flowcharts of the minimum description length criterion, where K is 16, K is 14, and l is initialized to 1;

And according to a continuous argument x^k+lThe values of (a) are sorted in descending order;

step 2.5, calculating a stop criterion S by using the formula (3):

step 2.6, judge the information gain G (b)_lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point b_lEffectively, the optimal discrete point b_lAdding the discrete point into a discrete point set B; and respectively connecting the first subset D₁And a second subset D₂Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first sub-stepCollection D₁And a second subset D₂Searching the next optimal discrete tangent point; otherwise, executing step 2.7;

step 2.8, discretizing the age and the accident occurrence time of the continuous independent variable driver based on the discrete point set B, so that independent variables are all converted into classified independent variables, and obtaining a discrete continuous independent variable set X_MDLP＝{x¹,x²,…,x^k,x ^k+1 ,x ^k+2 ,x ^k+l ,…,x ^K }; wherein x is ^k+l Represents the k + l classification variable; the discretization results are shown in table 2;

TABLE 2 discretization of continuous arguments

Note: denotes this variable as a reference variable.

step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; as shown in FIG. 2, the discrete autovariables are collected into X_MDLP＝{x¹,x²,…,x^k,x ^k+1 ,x ^k+2 ,x ^k+l ,…,x ^K Setting all factors in the forecast as rule front pieces A, and setting all types of accidents of a forecast variable Y as rule back pieces B;

step 3.3, as shown in fig. 3, defining and initializing a minimum support minsupo ═ 10%, a minimum confidence minConf ═ 50%, and a minimum lift minLift ═ 100% of the association rule a → B;

Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift(7)

in the formula (7), Λ represents ^ and;

step 3.5.3, defining the influencer value set FVIS as the set of all possible values of the independent variable, that is

in the formula (8), | B | is the kind of the predictor variable;

in the formula (8), | a | is the number of independent variable types;

step 4.1, establishing a hybrid Logit model by using the formula (10):

representing independent variables

Vector form of the estimated parameters;

probability density function representing random parameter beta, beta and

step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, applying software SAS9.4, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;

step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, screening the parameters of the hybrid Logit model by adopting a stepwise regression method under the confidence level of 90 percent, and obtaining the parameter estimation value Par of the hybrid Logit model by screening

See table 3;

TABLE 3 hybrid Logit model parameter estimation results for death traffic accident morphology

step 5.1, obtaining independent variable information influencing traffic accident forms in real time; (ii) a

Step 5.1.1, as shown in table 3, the mixed Logit model parameter estimation result shows that the independent variables influencing the death traffic accident form include: the age of a driver, the type of a vehicle, the road isolation form, the type of a road section at an intersection, the road alignment, the accident occurrence time, the lighting condition, the weather and the traffic accident form are obviously related; acquiring driver age and vehicle type data from video data of an urban road intelligent traffic video monitoring system; acquiring road isolation form, intersection section type and road alignment data based on road design data; acquiring accident occurrence time, lighting conditions and weather data through a meteorological department;

step 5.2, the independent variable information input formula (11) which influences the traffic accident form and is obtained in the step 5.1 is calculated to obtain the accident form y under the traffic information condition_iUtility function of

In the formula (11), the reaction mixture is,

representing hybrid Logit model parameter estimates

The median accident pattern is y_iA parameter vector of time;

and 5.3, obtaining the accident form y under the condition of the current independent variable information by using the formula (12)_iIs predicted with probability of

In the formula (12), the reaction mixture is,

representing a total utility function;

and 5.4, obtaining the prediction probability of the traffic accident in each form under the condition of the current independent variable information according to the step 5.3, transmitting the information to vehicle-mounted communication equipment based on the vehicle networking wireless communication technology, and giving early warning and reminding to the traffic accident form which is mainly prevented by the driver through an intelligent voice broadcasting device. For example, on a non-straight road, the probability of a single vehicle accident of a 14-21 year old motorcycle driver is up to 96.38%, and the probability of an inter-vehicle accident and a vehicle-pedestrian accident are 1.49% and 2.14%, respectively. When 14 to 21 year old motorcycle driver is about to go to the non-straight road, send the early warning of the place ahead high probability bicycle accident to motorcycle driver through intelligent voice broadcast device, remind the driver to prudent deceleration and go to realize the accurate management and control to the driver, guarantee driving safety.

Claims

1. A road traffic accident form prediction method is characterized by comprising the following steps:

step 1, collecting and processing road traffic accident data;

step 2.1, initializing 1;

And for continuous sets of argument values

Performing descending arrangement;

step 2.5, calculating a stop criterion S by using the formula (3):

All continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, step 2.8 is executed(ii) a Otherwise, returning to the step 2.2 for execution;

step 2.8, discretizing each continuous independent variable based on the discrete point set B to convert all independent variables into classified independent variables, thereby obtaining a discretized independent variable set X_MDLP＝{x¹,x²,…,x^k,x^k+1,x^k+2,x^k+l,…,x^K}; wherein x is^k+lRepresents the k + l classification independent variable;

step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; the discrete autovariate is collected into X_MDLP＝{x¹,x²,…,x^k,x^k+1,x^k+2,x^k+l,…,x^KSetting all factors as a rule front piece A, and setting all types of accidents of a prediction variable Y as a rule back piece B;

step 3.3, defining and initializing the minimum support degree min Sup, the minimum confidence degree min Conf and the minimum Lift degree min Lift of the association rule A → B;

Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift (7)

in the formula (7), Λ represents ^ and;

step 3.5.2, set X of discretized autovariables_MDLP＝{x¹,x²,…,x^k,x^k+1,x^k+2,x^k+l,…,x^KIn, let | x^kIs the kth independent variable x_kA range of values of; k is more than or equal to 1 and less than or equal to K;