CN110458244A

CN110458244A - A kind of traffic accident Severity forecasting method applied to Regional Road Network

Info

Publication number: CN110458244A
Application number: CN201910770584.3A
Authority: CN
Inventors: 石琴; 杨慧敏; 陈一锴; 骆仁佳; 于淑君; 董满生
Original assignee: Hefei Polytechnic University
Current assignee: Hefei University of Technology; Hefei Polytechnic University
Priority date: 2019-08-20
Filing date: 2019-08-20
Publication date: 2019-11-15
Anticipated expiration: 2039-08-20
Also published as: CN110458244B

Abstract

The invention discloses a kind of traffic accident Severity forecasting method applied to Regional Road Network, step includes: the acquisition and pretreatment of 1, Regional Road Network traffic accident data；2, Regional Road Network traffic accident data are based on, potential category analysis model is established；3, according to potential category analysis as a result, establishing CART decision-tree model respectively to each subclass；4, the accident severity model (considering independent variable and interaction item) returned based on binary logistic is established respectively to each subclass, and using susceptibility and specific intersections of complex curve as model predictive classification threshold value.The present invention can reduce casualty data heterogeneity to the adverse effect of analysis result, overcome the problems, such as that conventional traffic accident severity prediction model ignores interaction item and non-equilibrium data integrated forecasting effect is poor, improves precision of prediction and the goodness of fit of accident severity model.

Description

A kind of traffic accident Severity forecasting method applied to Regional Road Network

Technical field

The present invention relates to a kind of traffic accident Severity forecasting methods applied to Regional Road Network, belong to traffic safety Analysis technical field.

Background technique

According to global road safety status report, road traffic accident is the eighth-largest cause of death in the whole world, cause per year over 1350000 people are dead, and traffic safety is increasingly becoming the whole world all in the great focal issue of concern.By traffic accident data point It analyses to determine the factor for influencing accident severity and the countermeasure for proposing reduction death by accident risk, is traffic peace most realistic at present One of full Improving Measurements.However, road traffic accident is to be related to various drivers to external environment reaction and vehicle, road The complicated event to interact between situation, traffic factor and environmental factor, it is understood that there may be the accident impact factor not observed, This makes traffic accident data have height heterogeneity, and accident severity may be by shadow interactive between each factor It rings.

In terms of accident severity (dead and nonfatal accident) analysis method, the application of binary logistic regression model is most It is extensive.However, this method has ignored the reciprocation between the heterogeneous and each independent variable of casualty data to analysis result It influences, may result in the parameter Estimation of inaccuracy or ignores important hiding relationship.Yu Rongjie et al. utilizes potential classification Analysis by casualty data be divided into the potential classification of several homogeneities reduce casualty data heterogeneity to analysis result influence (Yu R, Wang X,Abdel-Aty M.A Hybrid Latent Class Analysis Modeling Approach to Analyze Urban Expressway Crash Risk[J].Accident Analysis and Prevention,2017, 101:37-43.).High-order reciprocation between Rusli et al. utilization decision tree screening independent variable, and high-order is interacted into item and master Effect, which combines, is included in accident severity model, influence of the reciprocation of quantitative analysis independent variable to accident severity, and is somebody's turn to do Method only accounts for the high-order reciprocation between independent variable and has ignored existing each rank reciprocation (Rusdi between independent variable Rusli,Md.Mazharul Haque,Mohammad Saifuzzaman,Mark King.Crash severity along rural mountainous highways in Malaysia:An application of a combined decision tree and logistic regression model[J].Traffic Injury Prevention,2018,19(7): 741-748.).In addition, traditional binary logistic regression model only considers the whole precision of prediction of model, 0.5 conduct is chosen Category of model threshold value.However, often accounting is less (i.e. the data are non-equilibrium data) for death by accident in traffic accident data, adopt Although 0.5, which is used, as classification thresholds enables model obtain higher whole precision of prediction, but susceptibility can be made too low, makes its mistake Remove prediction significance.

Summary of the invention

The present invention in order to overcome the deficiencies of the prior art place, propose a kind of traffic accident severity applied to Regional Road Network Prediction technique, to which casualty data heterogeneity can be reduced to the adverse effect of analysis result, the interaction item of identification independent variable With adjustment prediction model classification thresholds, so as to overcome conventional traffic accident severity prediction model to ignore interaction item and non- The problem of equilibrium data integrated forecasting effect difference improves precision of prediction and the goodness of fit of accident severity model.

In order to achieve the above objectives, the present invention adopts the following technical scheme:

A kind of the characteristics of traffic accident Severity forecasting method applied to Regional Road Network of the invention be as follows into Row:

Step 1: the acquisition and pretreatment of Regional Road Network road traffic accident data；

Casualty data is risen in rebellion as casualty data collection D, and from any i-th from obtaining N in road traffic accident database Therefore K classified variable is chosen in data and forms set X={ x₁,x₂,…,x_k,…,x_KCharacterize i-th accident, wherein x_kTable Show k-th of classified variable, and k-th of classified variable x_kInclude C_kKind classification, k-th of classified variable x_kIn C_kTaking in kind classification Value is denoted as s_k, enable s_ikIndicate the value of k-th of classified variable of i-th accident, then all K classified variables in i-th accident Value composed by classified variable value set be denoted as S_i={ s_i1,s_i2,...,s_ik,...,s_iK}；It enablesIndicate that i-th rises in rebellion Therefore K classified variable all possible values in any one value set；K=1,2,3 ..., K；I=1,2, 3,...,N；

Using the severity of i-th accident as predictive variable, it is denoted as y_i, and y_iValue be " 0 " or " 1 " respectively indicate it is non- Death by accident and death by accident；

Step 2: establishing potential category analysis model according to Regional Road Network road traffic accident data；

There are a potential class variable V, V to include T kind classification in step 2.1, the definition potential category analysis model, And any one classification is denoted as t, t=1,2 ..., T；The value of potential class variable V in i-th accident is enabled to be denoted as V_i；

Step 2.1.1, definition outer circulation number is τ, maximum outer circulation the number of iterations is τ_max；Enable the τ times set class Other number is T_τ；Initialize τ=1；

Step 2.1.2, t=1 is initialized；

Step 2.1.3, i-th accident V just is obtained using formula (1)_iValue is t, that is, when belonging to the potential classification of t kind, i-th Value collection of the accident on K classified variable is played to be combined intoConditional probability

In formula (1), P (s_ik=s_k|V_i=t) indicate i-th accident when belonging to t-th of potential classification, k-th of classified variable Upper value is s_kConditional probability；

Step 2.1.4, K classified variable value collection in i-th accident is obtained using formula (2) to be combined intoUnconditional probability The joint probability of i.e. potential category analysis model

In formula (2), P (V_i=t) it is the probability that i-th accident belongs to t-th of potential classification, potential classification t accounts for overall Ratio；

Step 2.2 carries out model parameter estimation using maximum-likelihood method, obtains potential class probability and classified variable condition The estimated value of probabilityAnd the τ secondary maximum likelihood function value of potential category analysis model L_τ；

Step 2.3 calculates the posterior probability that i-th accident is classified into t-th of potential classification using formula (3)

Step 2.4 enables t+1 be assigned to t, and judges t > T_τIt is whether true, if so, then follow the steps 2.5；Otherwise, it returns Step 2.1.3 is returned to execute；

Step 2.5 obtains models fitting evaluation index using formula (4), formula (5), formula (6) and formula (7), comprising: the τ times letter Cease evaluation index AIC_τ, the τ times bayesian information criterion BIC_τ, the τ times sample correction bayesian information criterion aBIC_τ, τ Secondary entropy

AIC_τ=-2ln (L_τ)+2M (4)

BIC_τ=-2ln (L_τ)+ln(N)×M (5)

aBIC_τ=-2ln (L_τ)+ln(n^*)×M (6)

In formula (4), formula (5), formula (6) and formula (7), M is the number of unknown parameter in potential category analysis model；n^*It is to adjust Sample size after whole, and n^*=(N+2)/24；

τ+1 is assigned to rear τ by step 2.6, judges τ > τ_maxIt is whether true, if so, then return step 2.7；Otherwise, Execute step 2.1.3；

Step 2.7, from τ_maxSecondary information evaluation index AIC, bayesian information criterion BIC, sample correction Bayesian Information Criterion aBIC and entropy R²In select each models fitting evaluation index and get potential classification number corresponding when optimal value, It is denoted as T^*；The casualty data collection D is divided into T^*A accident subclass, is denoted asIndicate the t^*Casualty data in a accident subclass, t^*=1,2 ..., T^*；

Step 3: according to potential category analysis model result, to T^*A subclass establishes CART decision-tree model respectively；

Step 3.1 enables the t^*Casualty data in a accident subclassAs training sample set, K classification is enabled Set X composed by variable is the feature set in the CART decision-tree model；Enabling node sample threshold is σ, characteristic value cutting It is ε that point, which is α, Gini index threshold,；

Step 3.2, initialization t^*=1；

Step 3.3, by the training sample setFeature set X, node sample threshold σ and Gini index threshold ε is defined Input the CART decision-tree model；

Step 3.4 enables t^*+ 1 is assigned to t^*, and judge t^*> T^*It is whether true, if so, it then indicates to obtain T^*A decision Tree, and execute step 3.5；Otherwise, return step 3.3 executes；

Step 3.5, according to the T^*The arborescence of a binary decision tree determines the interaction item between classified variable, In, t^*Interaction item determined by the corresponding binary decision tree of a accident subclass；

Step 4: to T^*A subclass establishes the accident severity model returned based on binary logistic respectively；

Step 4.1, by the t^*Casualty data in a subclassAs the fitting data of accident severity model, Set X and t are formed with K classified variable^*The interaction item of a subclass is collectively as the accident severity model Independent variable X^*；Define t^*A accident subclass includes J casualty data, and the value of J isJth plays the predictive variable of accident It is denoted as y_j；

Step 4.2, initialization t^*=1；

Step 4.3 obtains returning based on binary logistic in independent variable X using formula (11)^*Under the conditions of death by accident, that is, y_j =1 probability of happening P (y=1 | X^*):

In formula (11), w^*For independent variable X^*Regression coefficient；

Step 4.4, the parameter that the accident severity model that the binary logistic is returned is estimated using maximum-likelihood method w^*:

Accident is risen for jth,To give independent variableUnder the conditions of y_j=1 probability then gives certainly VariableUnder the conditions of y_j=0 probability is 1-P_j；And likelihood function L (w is obtained using formula (12)^*):

Using Maximum-likelihood estimation, find out so that L (w^*) obtain maximum value when estimation parameter w '；

Jth, which is obtained, according to estimation parameter w ' plays accident in independent variableUnder the conditions of y_j=1 prediction probabilityTo obtain J plays the prediction probability of accidentAnd ascending sort is carried out, the prediction probability set after being sorted is denoted as {P₁′,...,P′_j,...,P′_J}；

Step 4.5, the prediction classification thresholds for adjusting accident severity model；

Step 4.6 enables t^*+ 1 is assigned to t^*, and judge t^*> T^*It is whether true, if so, it then indicates to obtain T^*A accident is tight Severe prediction model, otherwise, return step 4.3 execute.

The characteristics of traffic accident Severity forecasting method of the present invention, lies also in, and the step 3.3 is by following mistake Cheng Jinhang:

Step 3.3.1, CART decision tree uses whether Gini coefficient carries out the foundation of branch as determination decisions tree, establishes Binary decision tree-model, according to characteristic value cut-off α, by the training sample setIt is divided into the first subset D_α1And second subset D_α2, the Gini index Gini (D of the characteristic value cut-off α is obtained using formula (8)_α):

In formula (8),|D_α1| and | D_α2| respectively indicate training sample setFirst subset D_α1With second subset D_α2 In include total number of accident；

Gini(D_α1) indicate the first subset D_α1Gini index, and have:

In formula (9),WithRespectively indicate the first subset D_α1In non-dead and death by accident probability；

In formula (8), Gini (D_α2) indicate second subset D_α2Gini index, and have:

In formula (10),WithRespectively indicate second subset D_α2In non-dead and death by accident probability；

Step 3.3.2, the cut-off of each characteristic value in the feature set X is traversed, and calculates the cutting of each characteristic value The Gini index of point；If the Gini index of the cut-off of each characteristic value is less than threshold epsilon in feature set X, then it represents that the CART Decision-tree model is the tree of a single node, and exports the tree of the single node；It is no to then follow the steps 3.3.3；

Step 3.3.3, characteristic value X corresponding to the Gini index of minimum cut-off in feature set X is selected_minAnd its it is corresponding Cut-off α_min, and according to the cut-off α_minBy training sample setIt is divided into two subset Ds_min1And D_min2, then by subset D_min1And subset D_min ₂It is separately dispensed into training sample setFor in two child nodes of father node；

If subset D_min1And subset D_min2Sample number be respectively less than given node sample threshold σ, then it represents that two subsets D_min1And D_min2The child node at place is leaf node, exports binary decision tree；If subset D_min1And/or subset D_min2Sample This number is greater than the node sample threshold σ, then it represents that subset D_min1Or subset D_min2The child node at place is that non-leaf node can be into One step is divided, and executes step 3.3.4；

Step 3.3.4, for n omicronn-leaf child node, training sample set is enabledEqual to subset corresponding to n omicronn-leaf child node, and Characteristic value X corresponding to Gini index by minimum cut-off_minAfter deleting in feature set X, 3.3.1 is returned to step, directly When to be respectively less than node sample threshold σ or feature set X be empty to the sample numbers of all child nodes, final binary decision tree is exported.

The step 4.5 is to carry out according to the following procedure:

Step 4.5.1, the prediction classification thresholds that θ is model, and 0 < θ < 1 are defined；Indicate that accident severity model is pre- It surveys jth accident and is predicted as death by accident；Indicate that accident severity model prediction jth plays accident and is predicted as non-dead thing Therefore；

Step 4.5.2, j=1 is initialized；

Step 4.5.3, j-th of classification thresholds θ of model is enabled_jEqual to P '_j, accident severity model is obtained using formula (13) J-th of susceptibility Se (θ of prediction_j), i.e., casualty data concentrates death by accident to be predicted as the probability of death by accident:

In formula (13),Indicate that s plays the probability that accident is predicted as death by accident, y_s=1 indicates that s plays accident as death Accident, 1≤s≤J；

J-th of specificity Sp (θ of accident severity model prediction is obtained using formula (14)_j), i.e., casualty data is concentrated non- Death by accident is predicted as the probability of nonfatal accident:

In formula (14),Indicate that s plays the probability that accident is predicted as death by accident, y_s=0 indicates that s plays accident as death Accident, 1≤s≤J；

Step 4.5.4, it enables j+1 be assigned to j, and judges whether j > J is true, if so, it then indicates to obtain J to susceptibility With specific value, and step 4.5.5 is executed；Otherwise, return step 4.5.3 is executed；

Step 4.5.5, with j-th of classification thresholds θ_jFor abscissa, respectively with j-th of classification thresholds θ_jCorresponding sensitivity Spend Se (θ_j) and specificity Sp (θ_j) value be ordinate, draw susceptibility and specificity curve, it is corresponding with cusp Threshold value predicts classification thresholds θ ' as best model.

Compared with prior art, the beneficial effects of the present invention are:

1, the method for the present invention is based on Regional Road Network traffic accident data, potential category analysis model is established, by casualty data It is divided into several homogeneity subclass；Secondly, establishing CART decision-tree model respectively to each subclass, interaction is made between identifying independent variable With item；Then, it returns to establish each subclass respectively based on binary logistic and considers interaction item accident severity model, And the prediction classification thresholds of susceptibility and specific intersections of complex curve as accident severity model are set.This method reduce accidents Data heterogeneity overcomes conventional traffic accident severity prediction model and ignores interaction item to the adverse effect of analysis result With the problem of non-equilibrium data integrated forecasting effect difference, precision of prediction and the goodness of fit of accident severity model are improved.

2, traffic accident data are divided into several homogeneity subclass by potential category analysis by the method for the present invention, can either It is heterogeneous to reflect casualty data, and can precisely identify, analyze potential road traffic accident emergence pattern and mechanism；

3, the method for the present invention identifies each rank interaction item between independent variable by CART decision-tree model, and is included in binary Logistic regression model, improves the goodness of fit of model, and identifies the weight of influence area road grid traffic accident severity Independent variable and interaction item are wanted, it is horizontal to help to improve Regional Road Network traffic safety；

4, the method for the present invention uses susceptibility and specific intersections of complex curve to correspond to threshold value as binary logistic and returns mould The classification thresholds of type solve non-equilibrium data classification problem, improve the prediction accuracy of accident severity model.

Detailed description of the invention

Fig. 1 is classification 1CART decision tree diagram of the present invention；

Fig. 2 is the sensitivity and specificity curve graph of classification 1 of the present invention；

Fig. 3 is the ROC curve figure of classification 1 of the present invention；

Fig. 4 is the method for the present invention flow chart.

Specific embodiment

In the present embodiment, as shown in figure 4, a kind of traffic accident Severity forecasting method applied to Regional Road Network is by such as Lower step carries out:

Step 1.1, the traffic accident data that certain Regional Road Network is acquired from road traffic accident platform delete traffic accident Incomplete recording in database (having void item) records unreasonable casualty data, obtains 2595 (N=2595) altogether and plays accidents Data choose 26 classified variable compositions as analysis casualty data collection D in terms of people, vehicle, Accident Characteristic, road and environment five Set X={ x₁,x₂,...,x₂₆I-th accident is characterized, and using them as the independent variable of prediction model, independent variable is specific Value is shown in Table 1；Wherein, x_kIndicate k-th of classified variable, and k-th of classified variable x_kInclude C_kKind classification, x_kIn C_kIn kind classification Value be denoted as s_k(such as: x₁Indicate that first classified variable includes two categories i.e. C₁Value be 2, then s₁For 1 women or 2 male Property), every accident may be expressed as the set S of 26 classified variable values_i={ s_i1,s_i2,...,s_ik,...,s_i26}；It enablesIndicate any one value set in all possible values of the K classified variable of i-th accident；K=1,2,3 ..., K；I=1,2,3 ..., N；

The accident severity of every accident is denoted as y as predictive variable_i, y_iValue be " 0 " or " 1 " respectively indicate it is non- Death by accident and death by accident；

Step 1.2 carries out test for multi-collinearity using SPSS software, deletes the classified variable with synteny, passes through Synteny examines discovery variance inflation factor (VIF) to be respectively less than 5, and corresponding tolerance (TOL) is all larger than 0.1 (as shown in table 1), it was demonstrated that Without co-linear relationship between 26 classified variables, it can be included in model analysis.

The definition of 1 independent variable of table is examined with assignment and synteny

There are a potential class variable V, V to include T kind classification in step 2.1, the potential category analysis model of definition, and appoints A kind of classification of anticipating is denoted as t, t=1,2 ..., T；The value of potential class variable V in i-th accident is enabled to be denoted as V_i；

Step 2.1.1, definition outer circulation number is τ, maximum outer circulation the number of iterations is 5；Enable the τ times set classification Number is T_τAnd T_τ=τ；Initialize τ=1；

Step 2.1.2, t=1 is initialized；

In addition, the basic qualifications of potential category analysis model are each potential class probability and each classified variable Conditional probability summation is 1, as shown in formula (3), formula (4):

Step 2.3, according to bayesian theory, calculate i-th accident using formula (5) and be classified into t-th of potential classification Posterior probability

Wherein,It is indicated by formula (6):

The posterior probability that i-th accident belongs to a certain classification is maximum, then i-th accident is divided into the subclass, right All N play the calculating of casualty data progress posterior probability compared with, to realize the purpose of cluster；

Step 2.5 obtains models fitting evaluation index using formula (7), formula (8), formula (9) and formula (10), comprising: the τ times Information evaluation index AIC_τ, the τ times bayesian information criterion BIC_τ, the τ times sample correction bayesian information criterion aBIC_τ, τ entropy

AIC_τ=-2ln (L_τ)+2M (7)

BIC_τ=-2ln (L_τ)+ln(N)×M (8)

aBIC_τ=-2ln (L_τ)+ln(n^*)×M (9)

Using in formula (7), formula (8), formula (9) and formula (10), M is the number of unknown parameter in potential category analysis model； n^*It is sample size adjusted, and n^*=(N+2)/24；

τ+1 is assigned to rear τ by step 2.6, judges whether τ > 5 is true, if so, then return step 2.7；Otherwise, it holds Row step 2.1.3；

Step 2.7, the modeling of potential category analysis model and parameter Estimation are carried out using Mplus vision7.4 software, By limiting potential class number T.Potential class number is gradually increased since T=1 to T=5, obtain 5 it is different potential Category analysis model estimates parameter ln (L), i.e., the value of τ is 5.Calculate separately the fitting evaluation index of 5 models, comprising: the τ times Information evaluation index AIC_τ, the τ times bayesian information criterion BIC_τ, the τ times sample correction bayesian information criterion aBIC_τ, τ entropyCorresponding models fitting index is shown in Table 2.

2 models fitting index of table summarizes

In table 2, the fitting degree of the value of AIC, BIC, aBIC more mini Mod is higher, entropy be greater than 0.8 show to have 90% with Upper classification accuracy rate, LMR and BLRT are opposite fitting index, and P value significantly indicates that T classification is significant better than T-1 classification.Cause This, casualty data is divided into 3 classifications and is analyzed i.e. T by consideration^*=3.T^*Potential category analysis model estimated result when=3 As shown in table 3, the characteristic of accident that each subclass is identified by conditional probability distribution, passenger car classification 1 being named as on county road Accident, the motor vehicle accident on 2 backroad of classification, 3 the elderly's non-motor vehicle accident of classification identify potential road traffic Accident emergence pattern.

According to bayesian theory, i-th observation casualty data is calculated using formula (5) and is classified into the 3rd potential classification Posterior probabilityTo all casualty datas carry out posterior probability calculating compared with, thus by 2595 casualty datas 3 accident subclass are divided into, { D is denoted as₁,D₂,D₃, separately include 1104,485 and 1006 casualty datas；

3 T of table^*Potential class probability and independent variable conditional probability (part) when=3

Step 3: establishing CART decision-tree model respectively to 3 subclass according to potential category analysis model result；

Step 3.1 enables t^*Casualty data in a accident subclassAs training sample set t^*=1,2,3., enable 26 Set X composed by a classified variable is the feature set in CART decision-tree model；Node sample threshold is enabled to cut for σ, characteristic value Branch is that α, Gini index threshold are ε；

Step 3.2, initialization t^*=1；

Step 3.3 utilizes SPSS software, building CART decision-tree model, input casualty data collectionFeature set X is set To identify that the variable of conspicuousness, node sample threshold σ are 50 and Gini index threshold ε is 0.001 in step 3.1；

Step 3.3.1, CART decision tree uses whether Gini coefficient carries out the foundation of branch as determination decisions tree, establishes Binary decision tree-model, according to characteristic value cut-off α, by training sample setIt is divided into the first subset D_α1With second subset D_α2, I.e. by classified variable x_kA certain classification C_kAs cut-off α, sample set D can be divided into two subset Ds_α1And D_α2；It utilizes Formula (11) obtains the Gini index Gini (D of characteristic value cut-off α_α):

In formula (11),|D_α1| and | D_α2| respectively indicate training sample setFirst subset D_α1And second subset D_α2In include total number of accident；

Gini(D_α1) indicate the first subset D_α1Gini index, and have:

In formula (12),WithRespectively indicate the first subset D_α1In non-dead and death by accident probability；

In formula (11), Gini (D_α2) indicate second subset D_α2Gini index, and have:

In formula (13),WithRespectively indicate second subset D_α2In non-dead and death by accident probability；

Step 3.3.2, the cut-off of each characteristic value in feature set X is traversed, and calculate the cut-off of each characteristic value Gini index；If the Gini index of the cut-off of each characteristic value is less than threshold value 0.001 in feature set X, then it represents that CART decision Tree-model is the tree of a single node, and exports the tree of single node, at this time non-interaction action item；It is no to then follow the steps 3.3.3；

Step 3.3.3, characteristic value X corresponding to the Gini index of minimum cut-off in feature set X is selected_minAnd its it is corresponding Cut-off α_min, and according to cut-off α_minBy training sample setIt is divided into two subset Ds_min1And D_min2, then by subset D_min1 And subset D_min2It is separately dispensed into training sample setFor in two child nodes of father node；

If subset D_min1And subset D_min2Sample number be respectively less than given node sample threshold 50, then it represents that two subsets D_min1And D_min2The child node at place is leaf node, exports binary decision tree, there is only second order interaction items at this time；If Subset D_min1And/or subset D_min2Sample number be greater than node sample threshold 50, then it represents that subset D_min1Or subset D_min2Place Child node be non-leaf node can further progress divide, and execute step 3.3.4；

Step 3.3.4, for n omicronn-leaf child node, training sample set is enabledEqual to subset corresponding to n omicronn-leaf child node, and Characteristic value X corresponding to Gini index by minimum cut-off_minAfter deleting in feature set X, 3.3.1 is returned to step, directly When to be respectively less than node sample threshold 50 or feature set X be empty to the sample numbers of all child nodes, final binary decision tree is exported；

Step 3.4 enables t^*+ 1 is assigned to t^*, and judge t^*Whether > 3 is true, if so, it then indicates to obtain 3 decision trees Model, and execute step 3.5；Otherwise, return step 3.3 executes；

Step 3.5, according to the arborescence of 3 binary decision trees, determine the interaction item between classified variable, wherein t^*Interaction item determined by the corresponding binary decision tree of a accident subclass；

It is the binary decision tree arborescence of classification 1 shown in Fig. 1, which, using all data in classification 1 as root node, includes 4 Layer tree is high, 5 leaf nodes.In figure each node rectangle frame designate total number of accident, death by accident that the node includes and Nonfatal accident number and the two ratio.From type of vehicle and passenger, type of vehicle and road road technique etc. known to arborescence (Fig. 1) There are second order reciprocation, type of vehicle, road industrial grade and road line styles between grade, road industrial grade and road line style Between there are third-order interaction effects；

Similarly, determine that interacting item there are second order in classification 2 is Crash characteristics and lighting condition, Crash characteristics and vehicle respectively Type there are second order interaction item is type of vehicle and driver's age in classification 3.

Step 4: 3 subclass are established with the accident severity model returned based on binary logistic respectively；

Step 4.1, by t^*Casualty data in a accident subclassAs the fitting data of accident severity model, Set X and t are formed with K classified variable^*The interaction item of a subclass collectively as accident severity model from Variable X^*；Define t^*A accident subclass includes J casualty data, and the value of J isThe predictive variable that jth plays accident is denoted as y_j；

Single factor test Chi-square Test is carried out to each subclass using SPSS, wherein P value indicates independent variable less than 0.05 and because becoming Measure significant correlation.Single factor test Chi-square Test the results are shown in Table 4, and 16 variables are significant related to accident severity in classification 1.

Each subclass single factor test Chi-square Test result of table 4

Step 4.2, initialization t^*=1；

Step 4.3 obtains returning based on binary logistic in independent variable X using formula (14)^*Under the conditions of death by accident, that is, y_j =1 probability of happening P (y=1 | X^*):

In formula (13), w^*For independent variable X^*Regression coefficient；

Step 4.4 utilizes the parameter w of the maximum-likelihood method estimation binary logistic accident severity model returned^*:

Accident is risen for jth,To give independent variableUnder the conditions of y_j=1 probability then gives certainly VariableUnder the conditions of y_j=0 probability is 1-P_j；And likelihood function L (w is obtained using formula (15)^*):

Using Maximum-likelihood estimation, find out so that L (w^*) obtain maximum value when estimation parameter w '；Using SPSS software into The parameter Estimation of row accident severity model, wherein the interaction item of classified variable is in the form of classified variable product as mould For convenience of model result explanation and dummy variable is arranged to each independent variable in the independent variable of type analysis；Independent variable enters or deleting madel It is examined using Wald, entrance or rejecting standard are respectively P<0.05 and P>0.1, and setting the number of iterations is 20 times；

Jth, which is obtained, according to estimation parameter w ' plays accident in independent variableUnder the conditions of y_j=1 prediction probabilityTo The prediction probability of accident is played to JAnd ascending sort is carried out, the prediction probability set after being sorted is denoted as {P₁′,...,P′_j,...,P′_J}；

Step 4.5.1, the classification thresholds that θ is model prediction, and 0 < θ < 1 are defined；Indicate that accident severity model is pre- It surveys jth accident and is predicted as death by accident；Indicate that accident severity model prediction jth plays accident and is predicted as non-dead thing Therefore；

Step 4.5.2, j=1 is initialized；

Step 4.5.3, j-th of classification thresholds θ of model is enabled_jEqual to P '_j, accident severity model is obtained using formula (15) J-th of susceptibility Se (θ of prediction_j), i.e., casualty data concentrates death by accident to be predicted as the probability of death by accident:

In formula (15),Indicate that s plays the probability that accident is predicted as death by accident, y_s=1 indicates that s plays accident as death Accident, 1≤s≤J；

J-th of specificity Sp (θ of accident severity model prediction is obtained using formula (16)_j), i.e., casualty data is concentrated non- Death by accident is predicted as the probability of nonfatal accident:

In formula (16),Indicate that s plays the probability that accident is predicted as death by accident, y_s=0 indicates that s plays accident as death Accident, 1≤s≤J；

Step 4.5.5, with j-th of classification thresholds θ_jFor abscissa, respectively with j-th of classification thresholds θ_jCorresponding sensitivity Spend Se (θ_j) and specificity Sp (θ_j) value be ordinate, draw susceptibility and specificity curve, it is corresponding with cusp Threshold value predicts classification thresholds θ ' as best model；

Step 4.6 enables t^*+ 1 is assigned to t^*, and judge t^*Whether > 3 is true, if so, then indicate that 3 accidents of acquisition are tight Severe prediction model, otherwise, return step 4.3 execute.

It obtains 3 binary logistic regression models and obtains accident severity model parameter estimation resultAs shown in table 6； Regression coefficient w^*It is by constant term β₀The vector constituted with independent variable regression coefficient B, wherein B value indicates the coefficient of independent variable, Value, which is positive, indicates there is positive influence to death by accident, is negative, indicates negative sense influence；OR=exp (B) indicates a certain The probability that death by accident occurs for the presence of independent variable increases or the amount of reduction.

6 accident severity model estimated result of table

Note: B is model regression coefficient；OR is odds ratio, OR=exp (B)；

Meanwhile jth is obtained according to estimation parameter w ' and plays accident in independent variableUnder the conditions of y_j=1 prediction probabilityFrom And obtain the susceptibility that classification 1 is illustrated in figure 2 with susceptibility prediction classification thresholds corresponding with specific intersection point and specificity Curve graph.To which the prediction classification thresholds for obtaining 3 subclass are respectively 0.2930,0.3928 and 0.4133, and are solved pair Answer the model prediction accuracy 68.8%, 75.5% and 66.3% under classification thresholds；

Step 4.6.1, accident severity model result is analyzed:

As shown in Table 6, there are significant differences between the factor of influence accident severity in each subclass, wherein no card is driven It sails, drunk driving, hypervelocity, central isolation facility, landform, the second order reciprocation and lorry and Class IV highway of motorcycle and passenger, The third-order interaction effect of road alignment is only significant in classification 1；Agricultural vehicle, hit fixture, off-peak period, road line style, Visibility is only significant in classification 2；The reciprocation for falling vehicle, substandard highway, traffic control device, age and non-motor vehicle only exists It is significant in classification 3.

It by taking classification 1 as an example, drives without a license, exceed the speed limit and the regression coefficient of drunk driving is positive, death by accident is sent out in the case of three kinds Raw probability increases separately about 132%, 140% and 124%.In terms of Crash characteristics, hitting on-fixed object makes the hair of death by accident Raw probability increases by 96%；There is death by accident probability of happening under passenger status to increase by 165%, lacking road center isolation facility makes extremely The probability for dying accident generation increases by 120%；The probability of happening of death by accident rises about 44% when night.

In terms of variable reciprocation, motorcycle, which carries death by accident probability of happening when passenger drives, reduces about 60%；Lorry On Class IV highway when driving, accident severity is influenced vulnerable to road line style, wherein curved slope combination section influences, maximum (OR value is 12.036), followed by bend section (OR value is 5.57).

Step 4.6.2, model compares:

To compare the superiority and inferiority of the method for the present invention and traditional binary logistic regression model in terms of accident severity analysis Property, model prediction accuracy is measured using model prediction accuracy and ROC curve two indices, using Hosmer-Lemeshow (HL) statistic measures the goodness of fit of model.

Model prediction accuracy is obtained using susceptibility and specific intersections of complex curve as classification thresholds, value is higher to show model Performance is better；It is that ordinate draws ROC curve using 1- specificity as abscissa, susceptibility, area, that is, AUC under ROC curve comes The classification efficiency of evaluation model, AUC value, which is greater than 0.5, indicates there is predictive value better than random guess, and AUC value is closer to 1 table The prediction classification capacity of representation model is better；By taking classification 1 as an example, using susceptibility threshold value corresponding with specific intersections of complex curve as mould Type prediction classification thresholds are as shown in Fig. 2, be that ordinate draws ROC curve such as Fig. 3 institute using 1- specificity as abscissa, susceptibility Show；In addition, models fitting goodness uses Hosmer-Lemeshow (HL) statistic, obey chi square distribution, P value it is not significant (> 0.05) indicate that models fitting data are preferable.

7 model testing index summary sheet of table

As shown in Table 7, a kind of traffic accident Severity forecasting method applied to Regional Road Network proposed by the present invention is in mould It is better than traditional binary logistic regression model in terms of type prediction accuracy and the goodness of fit.

Claims

1. a kind of traffic accident Severity forecasting method applied to Regional Road Network, it is characterized in that carrying out as follows:

Casualty data is as casualty data collection D from obtaining N in road traffic accident database, and the accident number from any i-th Set X={ x is formed according to K classified variable of middle selection₁,x₂,…,x_k,…,x_KCharacterize i-th accident, wherein x_kIndicate kth A classified variable, and k-th of classified variable x_kInclude C_kKind classification, k-th of classified variable x_kIn C_kValue in kind classification is denoted as s_k, enable s_ikIndicate the value of k-th of classified variable of i-th accident, then in i-th accident all K classified variables value Composed classified variable value set is denoted as S_i={ s_i1,s_i2,...,s_ik,...,s_iK}；It enablesIndicate K of i-th accident Any one value set in all possible values of classified variable；K=1,2,3 ..., K；I=1,2,3 ..., N；

Using the severity of i-th accident as predictive variable, it is denoted as y_i, and y_iValue be " 0 " or " 1 " respectively indicate non-death Accident and death by accident；

There are a potential class variable V, V to include T kind classification in step 2.1, the definition potential category analysis model, and appoints A kind of classification of anticipating is denoted as t, t=1,2 ..., T；The value of potential class variable V in i-th accident is enabled to be denoted as V_i；

Step 2.1.1, definition outer circulation number is τ, maximum outer circulation the number of iterations is τ_max；Enable the τ times set classification number Mesh is T_τ；Initialize τ=1；

Step 2.1.2, t=1 is initialized；

Step 2.1.3, i-th accident V just is obtained using formula (1)_iValue is t, that is, when belonging to the potential classification of t kind, i-th rises in rebellion Therefore the value collection on K classified variable is combined intoConditional probability

In formula (1), P (s_ik=s_k|V_i=t) indicate that i-th accident when belonging to t-th of potential classification, takes on k-th of classified variable Value is s_kConditional probability；

Step 2.1.4, K classified variable value collection in i-th accident is obtained using formula (2) to be combined intoUnconditional probability it is i.e. latent In the joint probability of category analysis model

Step 2.2 carries out model parameter estimation using maximum-likelihood method, obtains potential class probability and classified variable conditional probability Estimated valueAnd the τ secondary maximum likelihood function value L of potential category analysis model_τ；

Step 2.4 enables t+1 be assigned to t, and judges t > T_τIt is whether true, if so, then follow the steps 2.5；Otherwise, step is returned Rapid 2.1.3 is executed；

Step 2.5 obtains models fitting evaluation index using formula (4), formula (5), formula (6) and formula (7), comprising: the τ times information is commented Valence index AIC_τ, the τ times bayesian information criterion BIC_τ, the τ times sample correction bayesian information criterion aBIC_τ, the τ times entropy Value

AIC_τ=-2ln (L_τ)+2M (4)

BIC_τ=-2ln (L_τ)+ln(N)×M (5)

aBIC_τ=-2ln (L_τ)+ln(n^*)×M (6)

In formula (4), formula (5), formula (6) and formula (7), M is the number of unknown parameter in potential category analysis model；n^*After being adjustment Sample size, and n^*=(N+2)/24；

τ+1 is assigned to rear τ by step 2.6, judges τ > τ_maxIt is whether true, if so, then return step 2.7；Otherwise, it executes Step 2.1.3；

Step 2.7, from τ_maxSecondary information evaluation index AIC, bayesian information criterion BIC, sample correction bayesian information criterion ABIC and entropy R²In select each models fitting evaluation index and get potential classification number corresponding when optimal value, be denoted as T^*；The casualty data collection D is divided into T^*A accident subclass, is denoted as Indicate t^*It is a Casualty data in accident subclass, t^*=1,2 ..., T^*；

Step 3.1 enables the t^*Casualty data in a accident subclassAs training sample set, K classified variable is enabled Composed set X is the feature set in the CART decision-tree model；Enabling node sample threshold is σ, characteristic value cut-off is α, Gini index threshold are ε；

Step 3.2, initialization t^*=1；

Step 3.3, by the training sample setFeature set X, node sample threshold σ and Gini index threshold ε input is defined The CART decision-tree model；

Step 3.5, according to the T^*The arborescence of a binary decision tree determines the interaction item between classified variable, wherein the t^*Interaction item determined by the corresponding binary decision tree of a accident subclass；

Step 4.1, by the t^*Casualty data in a subclassAs the fitting data of accident severity model, with K A classified variable forms set X and t^*The interaction item of a subclass collectively as the accident severity model from Variable X^*；Define t^*A accident subclass includes J casualty data, and the value of J isThe predictive variable that jth plays accident is denoted as y_j；

Step 4.2, initialization t^*=1；

Step 4.3 obtains returning based on binary logistic in independent variable X using formula (11)^*Under the conditions of death by accident, that is, y_j=1 Probability of happening P (y=1 | X^*):

In formula (11), w^*For independent variable X^*Regression coefficient；

Step 4.4, the parameter w that the accident severity model that the binary logistic is returned is estimated using maximum-likelihood method^*:

Accident is risen for jth,To give independent variableUnder the conditions of y_j=1 probability then gives independent variableUnder the conditions of y_j=0 probability is 1-P_j；And likelihood function L (w is obtained using formula (12)^*):

Jth, which is obtained, according to estimation parameter w ' plays accident in independent variableUnder the conditions of y_j=1 prediction probabilityIt is risen to obtain J The prediction probability of accidentAnd ascending sort is carried out, the prediction probability set after being sorted is denoted as {P′₁,...,P′_j,...,P′_J}；

Step 4.6 enables t^*+ 1 is assigned to t^*, and judge t^*> T^*It is whether true, if so, it then indicates to obtain T^*A accident severity Prediction model, otherwise, return step 4.3 execute.

2. traffic accident Severity forecasting method according to claim 1, characterized in that the step 3.3 is by as follows Process carries out:

Step 3.3.1, CART decision tree uses whether Gini coefficient carries out the foundation of branch as determination decisions tree, establishes y-bend Decision-tree model, according to characteristic value cut-off α, by the training sample setIt is divided into the first subset D_α1With second subset D_α2, benefit The Gini index Gini (D of the characteristic value cut-off α is obtained with formula (8)_α):

In formula (8),|D_α1| and | D_α2| respectively indicate training sample setFirst subset D_α1With second subset D_α2Middle packet Containing total number of accident；

Gini(D_α1) indicate the first subset D_α1Gini index, and have:

In formula (8), Gini (D_α2) indicate second subset D_α2Gini index, and have:

Step 3.3.2, the cut-off of each characteristic value in the feature set X is traversed, and calculate the cut-off of each characteristic value Gini index；If the Gini index of the cut-off of each characteristic value is less than threshold epsilon in feature set X, then it represents that the CART decision Tree-model is the tree of a single node, and exports the tree of the single node；It is no to then follow the steps 3.3.3；

Step 3.3.3, characteristic value X corresponding to the Gini index of minimum cut-off in feature set X is selected_minAnd its it cuts accordingly Branch α_min, and according to the cut-off α_minBy training sample setIt is divided into two subset Ds_min1And D_min2, then by subset D_min1 And subset D_min2It is separately dispensed into training sample setFor in two child nodes of father node；

If subset D_min1And subset D_min2Sample number be respectively less than given node sample threshold σ, then it represents that two subset Ds_min1With D_min2The child node at place is leaf node, exports binary decision tree；If subset D_min1And/or subset D_min2Sample number it is big In the node sample threshold σ, then it represents that subset D_min1Or subset D_min2The child node at place be non-leaf node can further into Row divides, and executes step 3.3.4；

Step 3.3.4, for n omicronn-leaf child node, training sample set is enabledEqual to subset corresponding to n omicronn-leaf child node, and will most Characteristic value X corresponding to the Gini index of small cut-off_minAfter deleting in feature set X, 3.3.1, Zhi Daosuo are returned to step When to have the sample number of child node to be respectively less than node sample threshold σ or feature set X be empty, final binary decision tree is exported.

3. traffic accident Severity forecasting method according to claim 1, characterized in that the step 4.5 is by as follows Process carries out:

Step 4.5.1, the prediction classification thresholds that θ is model, and 0 < θ < 1 are defined；Indicate accident severity model prediction the J plays accident and is predicted as death by accident；Indicate that accident severity model prediction jth plays accident and is predicted as nonfatal accident；

Step 4.5.2, j=1 is initialized；

Step 4.5.3, j-th of classification thresholds θ of model is enabled_jEqual to P '_j, accident severity model prediction is obtained using formula (13) J-th of susceptibility Se (θ_j), i.e., casualty data concentrates death by accident to be predicted as the probability of death by accident:

In formula (13),Indicate that s plays the probability that accident is predicted as death by accident, y_s=1 indicates that s accidents are death by accident, 1≤s≤J；

J-th of specificity Sp (θ of accident severity model prediction is obtained using formula (14)_j), i.e., casualty data concentrates non-dead thing Therefore it is predicted as the probability of nonfatal accident:

In formula (14),Indicate that s plays the probability that accident is predicted as death by accident, y_s=0 indicates that s accidents are death by accident, 1≤s≤J；

Step 4.5.4, it enables j+1 be assigned to j, and judges whether j > J is true, if so, it then indicates to obtain J to susceptibility and spy Anisotropic value, and execute step 4.5.5；Otherwise, return step 4.5.3 is executed；

Step 4.5.5, with j-th of classification thresholds θ_jFor abscissa, respectively with j-th of classification thresholds θ_jCorresponding susceptibility Se (θ_j) and specificity Sp (θ_j) value be ordinate, draw susceptibility and specificity curve, with the corresponding threshold value of cusp Classification thresholds θ ' is predicted as best model.