CN110458244A - A kind of traffic accident Severity forecasting method applied to Regional Road Network - Google Patents
A kind of traffic accident Severity forecasting method applied to Regional Road Network Download PDFInfo
- Publication number
- CN110458244A CN110458244A CN201910770584.3A CN201910770584A CN110458244A CN 110458244 A CN110458244 A CN 110458244A CN 201910770584 A CN201910770584 A CN 201910770584A CN 110458244 A CN110458244 A CN 110458244A
- Authority
- CN
- China
- Prior art keywords
- accident
- model
- formula
- value
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010039203 Road traffic accident Diseases 0.000 title claims abstract description 40
- 238000013277 forecasting method Methods 0.000 title claims abstract description 11
- 238000004458 analytical method Methods 0.000 claims abstract description 40
- 238000003066 decision tree Methods 0.000 claims abstract description 36
- 230000003993 interaction Effects 0.000 claims abstract description 24
- 238000000034 method Methods 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 20
- 238000007476 Maximum Likelihood Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 11
- 238000012937 correction Methods 0.000 claims description 6
- 238000013480 data collection Methods 0.000 claims description 6
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 4
- 239000012141 concentrate Substances 0.000 claims description 4
- 230000001174 ascending effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 9
- 230000002411 adverse Effects 0.000 abstract description 3
- 238000007477 logistic regression Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 5
- 241001269238 Data Species 0.000 description 3
- 238000000546 chi-square test Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 238000002955 isolation Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 235000006629 Prosopis spicigera Nutrition 0.000 description 1
- 240000000037 Prosopis spicigera Species 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Strategic Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of traffic accident Severity forecasting method applied to Regional Road Network, step includes: the acquisition and pretreatment of 1, Regional Road Network traffic accident data;2, Regional Road Network traffic accident data are based on, potential category analysis model is established;3, according to potential category analysis as a result, establishing CART decision-tree model respectively to each subclass;4, the accident severity model (considering independent variable and interaction item) returned based on binary logistic is established respectively to each subclass, and using susceptibility and specific intersections of complex curve as model predictive classification threshold value.The present invention can reduce casualty data heterogeneity to the adverse effect of analysis result, overcome the problems, such as that conventional traffic accident severity prediction model ignores interaction item and non-equilibrium data integrated forecasting effect is poor, improves precision of prediction and the goodness of fit of accident severity model.
Description
Technical field
The present invention relates to a kind of traffic accident Severity forecasting methods applied to Regional Road Network, belong to traffic safety
Analysis technical field.
Background technique
According to global road safety status report, road traffic accident is the eighth-largest cause of death in the whole world, cause per year over
1350000 people are dead, and traffic safety is increasingly becoming the whole world all in the great focal issue of concern.By traffic accident data point
It analyses to determine the factor for influencing accident severity and the countermeasure for proposing reduction death by accident risk, is traffic peace most realistic at present
One of full Improving Measurements.However, road traffic accident is to be related to various drivers to external environment reaction and vehicle, road
The complicated event to interact between situation, traffic factor and environmental factor, it is understood that there may be the accident impact factor not observed,
This makes traffic accident data have height heterogeneity, and accident severity may be by shadow interactive between each factor
It rings.
In terms of accident severity (dead and nonfatal accident) analysis method, the application of binary logistic regression model is most
It is extensive.However, this method has ignored the reciprocation between the heterogeneous and each independent variable of casualty data to analysis result
It influences, may result in the parameter Estimation of inaccuracy or ignores important hiding relationship.Yu Rongjie et al. utilizes potential classification
Analysis by casualty data be divided into the potential classification of several homogeneities reduce casualty data heterogeneity to analysis result influence (Yu R,
Wang X,Abdel-Aty M.A Hybrid Latent Class Analysis Modeling Approach to
Analyze Urban Expressway Crash Risk[J].Accident Analysis and Prevention,2017,
101:37-43.).High-order reciprocation between Rusli et al. utilization decision tree screening independent variable, and high-order is interacted into item and master
Effect, which combines, is included in accident severity model, influence of the reciprocation of quantitative analysis independent variable to accident severity, and is somebody's turn to do
Method only accounts for the high-order reciprocation between independent variable and has ignored existing each rank reciprocation (Rusdi between independent variable
Rusli,Md.Mazharul Haque,Mohammad Saifuzzaman,Mark King.Crash severity along
rural mountainous highways in Malaysia:An application of a combined decision
tree and logistic regression model[J].Traffic Injury Prevention,2018,19(7):
741-748.).In addition, traditional binary logistic regression model only considers the whole precision of prediction of model, 0.5 conduct is chosen
Category of model threshold value.However, often accounting is less (i.e. the data are non-equilibrium data) for death by accident in traffic accident data, adopt
Although 0.5, which is used, as classification thresholds enables model obtain higher whole precision of prediction, but susceptibility can be made too low, makes its mistake
Remove prediction significance.
Summary of the invention
The present invention in order to overcome the deficiencies of the prior art place, propose a kind of traffic accident severity applied to Regional Road Network
Prediction technique, to which casualty data heterogeneity can be reduced to the adverse effect of analysis result, the interaction item of identification independent variable
With adjustment prediction model classification thresholds, so as to overcome conventional traffic accident severity prediction model to ignore interaction item and non-
The problem of equilibrium data integrated forecasting effect difference improves precision of prediction and the goodness of fit of accident severity model.
In order to achieve the above objectives, the present invention adopts the following technical scheme:
A kind of the characteristics of traffic accident Severity forecasting method applied to Regional Road Network of the invention be as follows into
Row:
Step 1: the acquisition and pretreatment of Regional Road Network road traffic accident data;
Casualty data is risen in rebellion as casualty data collection D, and from any i-th from obtaining N in road traffic accident database
Therefore K classified variable is chosen in data and forms set X={ x1,x2,…,xk,…,xKCharacterize i-th accident, wherein xkTable
Show k-th of classified variable, and k-th of classified variable xkInclude CkKind classification, k-th of classified variable xkIn CkTaking in kind classification
Value is denoted as sk, enable sikIndicate the value of k-th of classified variable of i-th accident, then all K classified variables in i-th accident
Value composed by classified variable value set be denoted as Si={ si1,si2,...,sik,...,siK};It enablesIndicate that i-th rises in rebellion
Therefore K classified variable all possible values in any one value set;K=1,2,3 ..., K;I=1,2,
3,...,N;
Using the severity of i-th accident as predictive variable, it is denoted as yi, and yiValue be " 0 " or " 1 " respectively indicate it is non-
Death by accident and death by accident;
Step 2: establishing potential category analysis model according to Regional Road Network road traffic accident data;
There are a potential class variable V, V to include T kind classification in step 2.1, the definition potential category analysis model,
And any one classification is denoted as t, t=1,2 ..., T;The value of potential class variable V in i-th accident is enabled to be denoted as Vi;
Step 2.1.1, definition outer circulation number is τ, maximum outer circulation the number of iterations is τmax;Enable the τ times set class
Other number is Tτ;Initialize τ=1;
Step 2.1.2, t=1 is initialized;
Step 2.1.3, i-th accident V just is obtained using formula (1)iValue is t, that is, when belonging to the potential classification of t kind, i-th
Value collection of the accident on K classified variable is played to be combined intoConditional probability
In formula (1), P (sik=sk|Vi=t) indicate i-th accident when belonging to t-th of potential classification, k-th of classified variable
Upper value is skConditional probability;
Step 2.1.4, K classified variable value collection in i-th accident is obtained using formula (2) to be combined intoUnconditional probability
The joint probability of i.e. potential category analysis model
In formula (2), P (Vi=t) it is the probability that i-th accident belongs to t-th of potential classification, potential classification t accounts for overall
Ratio;
Step 2.2 carries out model parameter estimation using maximum-likelihood method, obtains potential class probability and classified variable condition
The estimated value of probabilityAnd the τ secondary maximum likelihood function value of potential category analysis model
Lτ;
Step 2.3 calculates the posterior probability that i-th accident is classified into t-th of potential classification using formula (3)
Step 2.4 enables t+1 be assigned to t, and judges t > TτIt is whether true, if so, then follow the steps 2.5;Otherwise, it returns
Step 2.1.3 is returned to execute;
Step 2.5 obtains models fitting evaluation index using formula (4), formula (5), formula (6) and formula (7), comprising: the τ times letter
Cease evaluation index AICτ, the τ times bayesian information criterion BICτ, the τ times sample correction bayesian information criterion aBICτ, τ
Secondary entropy
AICτ=-2ln (Lτ)+2M (4)
BICτ=-2ln (Lτ)+ln(N)×M (5)
aBICτ=-2ln (Lτ)+ln(n*)×M (6)
In formula (4), formula (5), formula (6) and formula (7), M is the number of unknown parameter in potential category analysis model;n*It is to adjust
Sample size after whole, and n*=(N+2)/24;
τ+1 is assigned to rear τ by step 2.6, judges τ > τmaxIt is whether true, if so, then return step 2.7;Otherwise,
Execute step 2.1.3;
Step 2.7, from τmaxSecondary information evaluation index AIC, bayesian information criterion BIC, sample correction Bayesian Information
Criterion aBIC and entropy R2In select each models fitting evaluation index and get potential classification number corresponding when optimal value,
It is denoted as T*;The casualty data collection D is divided into T*A accident subclass, is denoted asIndicate the
t*Casualty data in a accident subclass, t*=1,2 ..., T*;
Step 3: according to potential category analysis model result, to T*A subclass establishes CART decision-tree model respectively;
Step 3.1 enables the t*Casualty data in a accident subclassAs training sample set, K classification is enabled
Set X composed by variable is the feature set in the CART decision-tree model;Enabling node sample threshold is σ, characteristic value cutting
It is ε that point, which is α, Gini index threshold,;
Step 3.2, initialization t*=1;
Step 3.3, by the training sample setFeature set X, node sample threshold σ and Gini index threshold ε is defined
Input the CART decision-tree model;
Step 3.4 enables t*+ 1 is assigned to t*, and judge t*> T*It is whether true, if so, it then indicates to obtain T*A decision
Tree, and execute step 3.5;Otherwise, return step 3.3 executes;
Step 3.5, according to the T*The arborescence of a binary decision tree determines the interaction item between classified variable,
In, t*Interaction item determined by the corresponding binary decision tree of a accident subclass;
Step 4: to T*A subclass establishes the accident severity model returned based on binary logistic respectively;
Step 4.1, by the t*Casualty data in a subclassAs the fitting data of accident severity model,
Set X and t are formed with K classified variable*The interaction item of a subclass is collectively as the accident severity model
Independent variable X*;Define t*A accident subclass includes J casualty data, and the value of J isJth plays the predictive variable of accident
It is denoted as yj;
Step 4.2, initialization t*=1;
Step 4.3 obtains returning based on binary logistic in independent variable X using formula (11)*Under the conditions of death by accident, that is, yj
=1 probability of happening P (y=1 | X*):
In formula (11), w*For independent variable X*Regression coefficient;
Step 4.4, the parameter that the accident severity model that the binary logistic is returned is estimated using maximum-likelihood method
w*:
Accident is risen for jth,To give independent variableUnder the conditions of yj=1 probability then gives certainly
VariableUnder the conditions of yj=0 probability is 1-Pj;And likelihood function L (w is obtained using formula (12)*):
Using Maximum-likelihood estimation, find out so that L (w*) obtain maximum value when estimation parameter w ';
Jth, which is obtained, according to estimation parameter w ' plays accident in independent variableUnder the conditions of yj=1 prediction probabilityTo obtain
J plays the prediction probability of accidentAnd ascending sort is carried out, the prediction probability set after being sorted is denoted as
{P1′,...,P′j,...,P′J};
Step 4.5, the prediction classification thresholds for adjusting accident severity model;
Step 4.6 enables t*+ 1 is assigned to t*, and judge t*> T*It is whether true, if so, it then indicates to obtain T*A accident is tight
Severe prediction model, otherwise, return step 4.3 execute.
The characteristics of traffic accident Severity forecasting method of the present invention, lies also in, and the step 3.3 is by following mistake
Cheng Jinhang:
Step 3.3.1, CART decision tree uses whether Gini coefficient carries out the foundation of branch as determination decisions tree, establishes
Binary decision tree-model, according to characteristic value cut-off α, by the training sample setIt is divided into the first subset Dα1And second subset
Dα2, the Gini index Gini (D of the characteristic value cut-off α is obtained using formula (8)α):
In formula (8),|Dα1| and | Dα2| respectively indicate training sample setFirst subset Dα1With second subset Dα2
In include total number of accident;
Gini(Dα1) indicate the first subset Dα1Gini index, and have:
In formula (9),WithRespectively indicate the first subset Dα1In non-dead and death by accident probability;
In formula (8), Gini (Dα2) indicate second subset Dα2Gini index, and have:
In formula (10),WithRespectively indicate second subset Dα2In non-dead and death by accident probability;
Step 3.3.2, the cut-off of each characteristic value in the feature set X is traversed, and calculates the cutting of each characteristic value
The Gini index of point;If the Gini index of the cut-off of each characteristic value is less than threshold epsilon in feature set X, then it represents that the CART
Decision-tree model is the tree of a single node, and exports the tree of the single node;It is no to then follow the steps 3.3.3;
Step 3.3.3, characteristic value X corresponding to the Gini index of minimum cut-off in feature set X is selectedminAnd its it is corresponding
Cut-off αmin, and according to the cut-off αminBy training sample setIt is divided into two subset Dsmin1And Dmin2, then by subset
Dmin1And subset Dmin 2It is separately dispensed into training sample setFor in two child nodes of father node;
If subset Dmin1And subset Dmin2Sample number be respectively less than given node sample threshold σ, then it represents that two subsets
Dmin1And Dmin2The child node at place is leaf node, exports binary decision tree;If subset Dmin1And/or subset Dmin2Sample
This number is greater than the node sample threshold σ, then it represents that subset Dmin1Or subset Dmin2The child node at place is that non-leaf node can be into
One step is divided, and executes step 3.3.4;
Step 3.3.4, for n omicronn-leaf child node, training sample set is enabledEqual to subset corresponding to n omicronn-leaf child node, and
Characteristic value X corresponding to Gini index by minimum cut-offminAfter deleting in feature set X, 3.3.1 is returned to step, directly
When to be respectively less than node sample threshold σ or feature set X be empty to the sample numbers of all child nodes, final binary decision tree is exported.
The step 4.5 is to carry out according to the following procedure:
Step 4.5.1, the prediction classification thresholds that θ is model, and 0 < θ < 1 are defined;Indicate that accident severity model is pre-
It surveys jth accident and is predicted as death by accident;Indicate that accident severity model prediction jth plays accident and is predicted as non-dead thing
Therefore;
Step 4.5.2, j=1 is initialized;
Step 4.5.3, j-th of classification thresholds θ of model is enabledjEqual to P 'j, accident severity model is obtained using formula (13)
J-th of susceptibility Se (θ of predictionj), i.e., casualty data concentrates death by accident to be predicted as the probability of death by accident:
In formula (13),Indicate that s plays the probability that accident is predicted as death by accident, ys=1 indicates that s plays accident as death
Accident, 1≤s≤J;
J-th of specificity Sp (θ of accident severity model prediction is obtained using formula (14)j), i.e., casualty data is concentrated non-
Death by accident is predicted as the probability of nonfatal accident:
In formula (14),Indicate that s plays the probability that accident is predicted as death by accident, ys=0 indicates that s plays accident as death
Accident, 1≤s≤J;
Step 4.5.4, it enables j+1 be assigned to j, and judges whether j > J is true, if so, it then indicates to obtain J to susceptibility
With specific value, and step 4.5.5 is executed;Otherwise, return step 4.5.3 is executed;
Step 4.5.5, with j-th of classification thresholds θjFor abscissa, respectively with j-th of classification thresholds θjCorresponding sensitivity
Spend Se (θj) and specificity Sp (θj) value be ordinate, draw susceptibility and specificity curve, it is corresponding with cusp
Threshold value predicts classification thresholds θ ' as best model.
Compared with prior art, the beneficial effects of the present invention are:
1, the method for the present invention is based on Regional Road Network traffic accident data, potential category analysis model is established, by casualty data
It is divided into several homogeneity subclass;Secondly, establishing CART decision-tree model respectively to each subclass, interaction is made between identifying independent variable
With item;Then, it returns to establish each subclass respectively based on binary logistic and considers interaction item accident severity model,
And the prediction classification thresholds of susceptibility and specific intersections of complex curve as accident severity model are set.This method reduce accidents
Data heterogeneity overcomes conventional traffic accident severity prediction model and ignores interaction item to the adverse effect of analysis result
With the problem of non-equilibrium data integrated forecasting effect difference, precision of prediction and the goodness of fit of accident severity model are improved.
2, traffic accident data are divided into several homogeneity subclass by potential category analysis by the method for the present invention, can either
It is heterogeneous to reflect casualty data, and can precisely identify, analyze potential road traffic accident emergence pattern and mechanism;
3, the method for the present invention identifies each rank interaction item between independent variable by CART decision-tree model, and is included in binary
Logistic regression model, improves the goodness of fit of model, and identifies the weight of influence area road grid traffic accident severity
Independent variable and interaction item are wanted, it is horizontal to help to improve Regional Road Network traffic safety;
4, the method for the present invention uses susceptibility and specific intersections of complex curve to correspond to threshold value as binary logistic and returns mould
The classification thresholds of type solve non-equilibrium data classification problem, improve the prediction accuracy of accident severity model.
Detailed description of the invention
Fig. 1 is classification 1CART decision tree diagram of the present invention;
Fig. 2 is the sensitivity and specificity curve graph of classification 1 of the present invention;
Fig. 3 is the ROC curve figure of classification 1 of the present invention;
Fig. 4 is the method for the present invention flow chart.
Specific embodiment
In the present embodiment, as shown in figure 4, a kind of traffic accident Severity forecasting method applied to Regional Road Network is by such as
Lower step carries out:
Step 1: the acquisition and pretreatment of Regional Road Network road traffic accident data;
Step 1.1, the traffic accident data that certain Regional Road Network is acquired from road traffic accident platform delete traffic accident
Incomplete recording in database (having void item) records unreasonable casualty data, obtains 2595 (N=2595) altogether and plays accidents
Data choose 26 classified variable compositions as analysis casualty data collection D in terms of people, vehicle, Accident Characteristic, road and environment five
Set X={ x1,x2,...,x26I-th accident is characterized, and using them as the independent variable of prediction model, independent variable is specific
Value is shown in Table 1;Wherein, xkIndicate k-th of classified variable, and k-th of classified variable xkInclude CkKind classification, xkIn CkIn kind classification
Value be denoted as sk(such as: x1Indicate that first classified variable includes two categories i.e. C1Value be 2, then s1For 1 women or 2 male
Property), every accident may be expressed as the set S of 26 classified variable valuesi={ si1,si2,...,sik,...,si26};It enablesIndicate any one value set in all possible values of the K classified variable of i-th accident;K=1,2,3 ...,
K;I=1,2,3 ..., N;
The accident severity of every accident is denoted as y as predictive variablei, yiValue be " 0 " or " 1 " respectively indicate it is non-
Death by accident and death by accident;
Step 1.2 carries out test for multi-collinearity using SPSS software, deletes the classified variable with synteny, passes through
Synteny examines discovery variance inflation factor (VIF) to be respectively less than 5, and corresponding tolerance (TOL) is all larger than 0.1 (as shown in table 1), it was demonstrated that
Without co-linear relationship between 26 classified variables, it can be included in model analysis.
The definition of 1 independent variable of table is examined with assignment and synteny
Step 2: establishing potential category analysis model according to Regional Road Network road traffic accident data;
There are a potential class variable V, V to include T kind classification in step 2.1, the potential category analysis model of definition, and appoints
A kind of classification of anticipating is denoted as t, t=1,2 ..., T;The value of potential class variable V in i-th accident is enabled to be denoted as Vi;
Step 2.1.1, definition outer circulation number is τ, maximum outer circulation the number of iterations is 5;Enable the τ times set classification
Number is TτAnd Tτ=τ;Initialize τ=1;
Step 2.1.2, t=1 is initialized;
Step 2.1.3, i-th accident V just is obtained using formula (1)iValue is t, that is, when belonging to the potential classification of t kind, i-th
Value collection of the accident on K classified variable is played to be combined intoConditional probability
In formula (1), P (sik=sk|Vi=t) indicate i-th accident when belonging to t-th of potential classification, k-th of classified variable
Upper value is skConditional probability;
Step 2.1.4, K classified variable value collection in i-th accident is obtained using formula (2) to be combined intoUnconditional probability
The joint probability of i.e. potential category analysis model
In formula (2), P (Vi=t) it is the probability that i-th accident belongs to t-th of potential classification, potential classification t accounts for overall
Ratio;
In addition, the basic qualifications of potential category analysis model are each potential class probability and each classified variable
Conditional probability summation is 1, as shown in formula (3), formula (4):
Step 2.2 carries out model parameter estimation using maximum-likelihood method, obtains potential class probability and classified variable condition
The estimated value of probabilityAnd the τ secondary maximum likelihood function value of potential category analysis model
Lτ;
Step 2.3, according to bayesian theory, calculate i-th accident using formula (5) and be classified into t-th of potential classification
Posterior probability
Wherein,It is indicated by formula (6):
The posterior probability that i-th accident belongs to a certain classification is maximum, then i-th accident is divided into the subclass, right
All N play the calculating of casualty data progress posterior probability compared with, to realize the purpose of cluster;
Step 2.4 enables t+1 be assigned to t, and judges t > TτIt is whether true, if so, then follow the steps 2.5;Otherwise, it returns
Step 2.1.3 is returned to execute;
Step 2.5 obtains models fitting evaluation index using formula (7), formula (8), formula (9) and formula (10), comprising: the τ times
Information evaluation index AICτ, the τ times bayesian information criterion BICτ, the τ times sample correction bayesian information criterion aBICτ,
τ entropy
AICτ=-2ln (Lτ)+2M (7)
BICτ=-2ln (Lτ)+ln(N)×M (8)
aBICτ=-2ln (Lτ)+ln(n*)×M (9)
Using in formula (7), formula (8), formula (9) and formula (10), M is the number of unknown parameter in potential category analysis model;
n*It is sample size adjusted, and n*=(N+2)/24;
τ+1 is assigned to rear τ by step 2.6, judges whether τ > 5 is true, if so, then return step 2.7;Otherwise, it holds
Row step 2.1.3;
Step 2.7, the modeling of potential category analysis model and parameter Estimation are carried out using Mplus vision7.4 software,
By limiting potential class number T.Potential class number is gradually increased since T=1 to T=5, obtain 5 it is different potential
Category analysis model estimates parameter ln (L), i.e., the value of τ is 5.Calculate separately the fitting evaluation index of 5 models, comprising: the τ times
Information evaluation index AICτ, the τ times bayesian information criterion BICτ, the τ times sample correction bayesian information criterion aBICτ,
τ entropyCorresponding models fitting index is shown in Table 2.
2 models fitting index of table summarizes
In table 2, the fitting degree of the value of AIC, BIC, aBIC more mini Mod is higher, entropy be greater than 0.8 show to have 90% with
Upper classification accuracy rate, LMR and BLRT are opposite fitting index, and P value significantly indicates that T classification is significant better than T-1 classification.Cause
This, casualty data is divided into 3 classifications and is analyzed i.e. T by consideration*=3.T*Potential category analysis model estimated result when=3
As shown in table 3, the characteristic of accident that each subclass is identified by conditional probability distribution, passenger car classification 1 being named as on county road
Accident, the motor vehicle accident on 2 backroad of classification, 3 the elderly's non-motor vehicle accident of classification identify potential road traffic
Accident emergence pattern.
According to bayesian theory, i-th observation casualty data is calculated using formula (5) and is classified into the 3rd potential classification
Posterior probabilityTo all casualty datas carry out posterior probability calculating compared with, thus by 2595 casualty datas
3 accident subclass are divided into, { D is denoted as1,D2,D3, separately include 1104,485 and 1006 casualty datas;
3 T of table*Potential class probability and independent variable conditional probability (part) when=3
Step 3: establishing CART decision-tree model respectively to 3 subclass according to potential category analysis model result;
Step 3.1 enables t*Casualty data in a accident subclassAs training sample set t*=1,2,3., enable 26
Set X composed by a classified variable is the feature set in CART decision-tree model;Node sample threshold is enabled to cut for σ, characteristic value
Branch is that α, Gini index threshold are ε;
Step 3.2, initialization t*=1;
Step 3.3 utilizes SPSS software, building CART decision-tree model, input casualty data collectionFeature set X is set
To identify that the variable of conspicuousness, node sample threshold σ are 50 and Gini index threshold ε is 0.001 in step 3.1;
Step 3.3.1, CART decision tree uses whether Gini coefficient carries out the foundation of branch as determination decisions tree, establishes
Binary decision tree-model, according to characteristic value cut-off α, by training sample setIt is divided into the first subset Dα1With second subset Dα2,
I.e. by classified variable xkA certain classification CkAs cut-off α, sample set D can be divided into two subset Dsα1And Dα2;It utilizes
Formula (11) obtains the Gini index Gini (D of characteristic value cut-off αα):
In formula (11),|Dα1| and | Dα2| respectively indicate training sample setFirst subset Dα1And second subset
Dα2In include total number of accident;
Gini(Dα1) indicate the first subset Dα1Gini index, and have:
In formula (12),WithRespectively indicate the first subset Dα1In non-dead and death by accident probability;
In formula (11), Gini (Dα2) indicate second subset Dα2Gini index, and have:
In formula (13),WithRespectively indicate second subset Dα2In non-dead and death by accident probability;
Step 3.3.2, the cut-off of each characteristic value in feature set X is traversed, and calculate the cut-off of each characteristic value
Gini index;If the Gini index of the cut-off of each characteristic value is less than threshold value 0.001 in feature set X, then it represents that CART decision
Tree-model is the tree of a single node, and exports the tree of single node, at this time non-interaction action item;It is no to then follow the steps 3.3.3;
Step 3.3.3, characteristic value X corresponding to the Gini index of minimum cut-off in feature set X is selectedminAnd its it is corresponding
Cut-off αmin, and according to cut-off αminBy training sample setIt is divided into two subset Dsmin1And Dmin2, then by subset Dmin1
And subset Dmin2It is separately dispensed into training sample setFor in two child nodes of father node;
If subset Dmin1And subset Dmin2Sample number be respectively less than given node sample threshold 50, then it represents that two subsets
Dmin1And Dmin2The child node at place is leaf node, exports binary decision tree, there is only second order interaction items at this time;If
Subset Dmin1And/or subset Dmin2Sample number be greater than node sample threshold 50, then it represents that subset Dmin1Or subset Dmin2Place
Child node be non-leaf node can further progress divide, and execute step 3.3.4;
Step 3.3.4, for n omicronn-leaf child node, training sample set is enabledEqual to subset corresponding to n omicronn-leaf child node, and
Characteristic value X corresponding to Gini index by minimum cut-offminAfter deleting in feature set X, 3.3.1 is returned to step, directly
When to be respectively less than node sample threshold 50 or feature set X be empty to the sample numbers of all child nodes, final binary decision tree is exported;
Step 3.4 enables t*+ 1 is assigned to t*, and judge t*Whether > 3 is true, if so, it then indicates to obtain 3 decision trees
Model, and execute step 3.5;Otherwise, return step 3.3 executes;
Step 3.5, according to the arborescence of 3 binary decision trees, determine the interaction item between classified variable, wherein
t*Interaction item determined by the corresponding binary decision tree of a accident subclass;
It is the binary decision tree arborescence of classification 1 shown in Fig. 1, which, using all data in classification 1 as root node, includes 4
Layer tree is high, 5 leaf nodes.In figure each node rectangle frame designate total number of accident, death by accident that the node includes and
Nonfatal accident number and the two ratio.From type of vehicle and passenger, type of vehicle and road road technique etc. known to arborescence (Fig. 1)
There are second order reciprocation, type of vehicle, road industrial grade and road line styles between grade, road industrial grade and road line style
Between there are third-order interaction effects;
Similarly, determine that interacting item there are second order in classification 2 is Crash characteristics and lighting condition, Crash characteristics and vehicle respectively
Type there are second order interaction item is type of vehicle and driver's age in classification 3.
Step 4: 3 subclass are established with the accident severity model returned based on binary logistic respectively;
Step 4.1, by t*Casualty data in a accident subclassAs the fitting data of accident severity model,
Set X and t are formed with K classified variable*The interaction item of a subclass collectively as accident severity model from
Variable X*;Define t*A accident subclass includes J casualty data, and the value of J isThe predictive variable that jth plays accident is denoted as
yj;
Single factor test Chi-square Test is carried out to each subclass using SPSS, wherein P value indicates independent variable less than 0.05 and because becoming
Measure significant correlation.Single factor test Chi-square Test the results are shown in Table 4, and 16 variables are significant related to accident severity in classification 1.
Each subclass single factor test Chi-square Test result of table 4
Step 4.2, initialization t*=1;
Step 4.3 obtains returning based on binary logistic in independent variable X using formula (14)*Under the conditions of death by accident, that is, yj
=1 probability of happening P (y=1 | X*):
In formula (13), w*For independent variable X*Regression coefficient;
Step 4.4 utilizes the parameter w of the maximum-likelihood method estimation binary logistic accident severity model returned*:
Accident is risen for jth,To give independent variableUnder the conditions of yj=1 probability then gives certainly
VariableUnder the conditions of yj=0 probability is 1-Pj;And likelihood function L (w is obtained using formula (15)*):
Using Maximum-likelihood estimation, find out so that L (w*) obtain maximum value when estimation parameter w ';Using SPSS software into
The parameter Estimation of row accident severity model, wherein the interaction item of classified variable is in the form of classified variable product as mould
For convenience of model result explanation and dummy variable is arranged to each independent variable in the independent variable of type analysis;Independent variable enters or deleting madel
It is examined using Wald, entrance or rejecting standard are respectively P<0.05 and P>0.1, and setting the number of iterations is 20 times;
Jth, which is obtained, according to estimation parameter w ' plays accident in independent variableUnder the conditions of yj=1 prediction probabilityTo
The prediction probability of accident is played to JAnd ascending sort is carried out, the prediction probability set after being sorted is denoted as
{P1′,...,P′j,...,P′J};
Step 4.5, the prediction classification thresholds for adjusting accident severity model;
Step 4.5.1, the classification thresholds that θ is model prediction, and 0 < θ < 1 are defined;Indicate that accident severity model is pre-
It surveys jth accident and is predicted as death by accident;Indicate that accident severity model prediction jth plays accident and is predicted as non-dead thing
Therefore;
Step 4.5.2, j=1 is initialized;
Step 4.5.3, j-th of classification thresholds θ of model is enabledjEqual to P 'j, accident severity model is obtained using formula (15)
J-th of susceptibility Se (θ of predictionj), i.e., casualty data concentrates death by accident to be predicted as the probability of death by accident:
In formula (15),Indicate that s plays the probability that accident is predicted as death by accident, ys=1 indicates that s plays accident as death
Accident, 1≤s≤J;
J-th of specificity Sp (θ of accident severity model prediction is obtained using formula (16)j), i.e., casualty data is concentrated non-
Death by accident is predicted as the probability of nonfatal accident:
In formula (16),Indicate that s plays the probability that accident is predicted as death by accident, ys=0 indicates that s plays accident as death
Accident, 1≤s≤J;
Step 4.5.4, it enables j+1 be assigned to j, and judges whether j > J is true, if so, it then indicates to obtain J to susceptibility
With specific value, and step 4.5.5 is executed;Otherwise, return step 4.5.3 is executed;
Step 4.5.5, with j-th of classification thresholds θjFor abscissa, respectively with j-th of classification thresholds θjCorresponding sensitivity
Spend Se (θj) and specificity Sp (θj) value be ordinate, draw susceptibility and specificity curve, it is corresponding with cusp
Threshold value predicts classification thresholds θ ' as best model;
Step 4.6 enables t*+ 1 is assigned to t*, and judge t*Whether > 3 is true, if so, then indicate that 3 accidents of acquisition are tight
Severe prediction model, otherwise, return step 4.3 execute.
It obtains 3 binary logistic regression models and obtains accident severity model parameter estimation resultAs shown in table 6;
Regression coefficient w*It is by constant term β0The vector constituted with independent variable regression coefficient B, wherein B value indicates the coefficient of independent variable,
Value, which is positive, indicates there is positive influence to death by accident, is negative, indicates negative sense influence;OR=exp (B) indicates a certain
The probability that death by accident occurs for the presence of independent variable increases or the amount of reduction.
6 accident severity model estimated result of table
Note: B is model regression coefficient;OR is odds ratio, OR=exp (B);
Meanwhile jth is obtained according to estimation parameter w ' and plays accident in independent variableUnder the conditions of yj=1 prediction probabilityFrom
And obtain the susceptibility that classification 1 is illustrated in figure 2 with susceptibility prediction classification thresholds corresponding with specific intersection point and specificity
Curve graph.To which the prediction classification thresholds for obtaining 3 subclass are respectively 0.2930,0.3928 and 0.4133, and are solved pair
Answer the model prediction accuracy 68.8%, 75.5% and 66.3% under classification thresholds;
Step 4.6.1, accident severity model result is analyzed:
As shown in Table 6, there are significant differences between the factor of influence accident severity in each subclass, wherein no card is driven
It sails, drunk driving, hypervelocity, central isolation facility, landform, the second order reciprocation and lorry and Class IV highway of motorcycle and passenger,
The third-order interaction effect of road alignment is only significant in classification 1;Agricultural vehicle, hit fixture, off-peak period, road line style,
Visibility is only significant in classification 2;The reciprocation for falling vehicle, substandard highway, traffic control device, age and non-motor vehicle only exists
It is significant in classification 3.
It by taking classification 1 as an example, drives without a license, exceed the speed limit and the regression coefficient of drunk driving is positive, death by accident is sent out in the case of three kinds
Raw probability increases separately about 132%, 140% and 124%.In terms of Crash characteristics, hitting on-fixed object makes the hair of death by accident
Raw probability increases by 96%;There is death by accident probability of happening under passenger status to increase by 165%, lacking road center isolation facility makes extremely
The probability for dying accident generation increases by 120%;The probability of happening of death by accident rises about 44% when night.
In terms of variable reciprocation, motorcycle, which carries death by accident probability of happening when passenger drives, reduces about 60%;Lorry
On Class IV highway when driving, accident severity is influenced vulnerable to road line style, wherein curved slope combination section influences, maximum (OR value is
12.036), followed by bend section (OR value is 5.57).
Step 4.6.2, model compares:
To compare the superiority and inferiority of the method for the present invention and traditional binary logistic regression model in terms of accident severity analysis
Property, model prediction accuracy is measured using model prediction accuracy and ROC curve two indices, using Hosmer-Lemeshow
(HL) statistic measures the goodness of fit of model.
Model prediction accuracy is obtained using susceptibility and specific intersections of complex curve as classification thresholds, value is higher to show model
Performance is better;It is that ordinate draws ROC curve using 1- specificity as abscissa, susceptibility, area, that is, AUC under ROC curve comes
The classification efficiency of evaluation model, AUC value, which is greater than 0.5, indicates there is predictive value better than random guess, and AUC value is closer to 1 table
The prediction classification capacity of representation model is better;By taking classification 1 as an example, using susceptibility threshold value corresponding with specific intersections of complex curve as mould
Type prediction classification thresholds are as shown in Fig. 2, be that ordinate draws ROC curve such as Fig. 3 institute using 1- specificity as abscissa, susceptibility
Show;In addition, models fitting goodness uses Hosmer-Lemeshow (HL) statistic, obey chi square distribution, P value it is not significant (>
0.05) indicate that models fitting data are preferable.
7 model testing index summary sheet of table
As shown in Table 7, a kind of traffic accident Severity forecasting method applied to Regional Road Network proposed by the present invention is in mould
It is better than traditional binary logistic regression model in terms of type prediction accuracy and the goodness of fit.
Claims (3)
1. a kind of traffic accident Severity forecasting method applied to Regional Road Network, it is characterized in that carrying out as follows:
Step 1: the acquisition and pretreatment of Regional Road Network road traffic accident data;
Casualty data is as casualty data collection D from obtaining N in road traffic accident database, and the accident number from any i-th
Set X={ x is formed according to K classified variable of middle selection1,x2,…,xk,…,xKCharacterize i-th accident, wherein xkIndicate kth
A classified variable, and k-th of classified variable xkInclude CkKind classification, k-th of classified variable xkIn CkValue in kind classification is denoted as
sk, enable sikIndicate the value of k-th of classified variable of i-th accident, then in i-th accident all K classified variables value
Composed classified variable value set is denoted as Si={ si1,si2,...,sik,...,siK};It enablesIndicate K of i-th accident
Any one value set in all possible values of classified variable;K=1,2,3 ..., K;I=1,2,3 ..., N;
Using the severity of i-th accident as predictive variable, it is denoted as yi, and yiValue be " 0 " or " 1 " respectively indicate non-death
Accident and death by accident;
Step 2: establishing potential category analysis model according to Regional Road Network road traffic accident data;
There are a potential class variable V, V to include T kind classification in step 2.1, the definition potential category analysis model, and appoints
A kind of classification of anticipating is denoted as t, t=1,2 ..., T;The value of potential class variable V in i-th accident is enabled to be denoted as Vi;
Step 2.1.1, definition outer circulation number is τ, maximum outer circulation the number of iterations is τmax;Enable the τ times set classification number
Mesh is Tτ;Initialize τ=1;
Step 2.1.2, t=1 is initialized;
Step 2.1.3, i-th accident V just is obtained using formula (1)iValue is t, that is, when belonging to the potential classification of t kind, i-th rises in rebellion
Therefore the value collection on K classified variable is combined intoConditional probability
In formula (1), P (sik=sk|Vi=t) indicate that i-th accident when belonging to t-th of potential classification, takes on k-th of classified variable
Value is skConditional probability;
Step 2.1.4, K classified variable value collection in i-th accident is obtained using formula (2) to be combined intoUnconditional probability it is i.e. latent
In the joint probability of category analysis model
In formula (2), P (Vi=t) it is the probability that i-th accident belongs to t-th of potential classification, potential classification t accounts for overall ratio;
Step 2.2 carries out model parameter estimation using maximum-likelihood method, obtains potential class probability and classified variable conditional probability
Estimated valueAnd the τ secondary maximum likelihood function value L of potential category analysis modelτ;
Step 2.3 calculates the posterior probability that i-th accident is classified into t-th of potential classification using formula (3)
Step 2.4 enables t+1 be assigned to t, and judges t > TτIt is whether true, if so, then follow the steps 2.5;Otherwise, step is returned
Rapid 2.1.3 is executed;
Step 2.5 obtains models fitting evaluation index using formula (4), formula (5), formula (6) and formula (7), comprising: the τ times information is commented
Valence index AICτ, the τ times bayesian information criterion BICτ, the τ times sample correction bayesian information criterion aBICτ, the τ times entropy
Value
AICτ=-2ln (Lτ)+2M (4)
BICτ=-2ln (Lτ)+ln(N)×M (5)
aBICτ=-2ln (Lτ)+ln(n*)×M (6)
In formula (4), formula (5), formula (6) and formula (7), M is the number of unknown parameter in potential category analysis model;n*After being adjustment
Sample size, and n*=(N+2)/24;
τ+1 is assigned to rear τ by step 2.6, judges τ > τmaxIt is whether true, if so, then return step 2.7;Otherwise, it executes
Step 2.1.3;
Step 2.7, from τmaxSecondary information evaluation index AIC, bayesian information criterion BIC, sample correction bayesian information criterion
ABIC and entropy R2In select each models fitting evaluation index and get potential classification number corresponding when optimal value, be denoted as
T*;The casualty data collection D is divided into T*A accident subclass, is denoted as Indicate t*It is a
Casualty data in accident subclass, t*=1,2 ..., T*;
Step 3: according to potential category analysis model result, to T*A subclass establishes CART decision-tree model respectively;
Step 3.1 enables the t*Casualty data in a accident subclassAs training sample set, K classified variable is enabled
Composed set X is the feature set in the CART decision-tree model;Enabling node sample threshold is σ, characteristic value cut-off is
α, Gini index threshold are ε;
Step 3.2, initialization t*=1;
Step 3.3, by the training sample setFeature set X, node sample threshold σ and Gini index threshold ε input is defined
The CART decision-tree model;
Step 3.4 enables t*+ 1 is assigned to t*, and judge t*> T*It is whether true, if so, it then indicates to obtain T*A decision tree, and
Execute step 3.5;Otherwise, return step 3.3 executes;
Step 3.5, according to the T*The arborescence of a binary decision tree determines the interaction item between classified variable, wherein the
t*Interaction item determined by the corresponding binary decision tree of a accident subclass;
Step 4: to T*A subclass establishes the accident severity model returned based on binary logistic respectively;
Step 4.1, by the t*Casualty data in a subclassAs the fitting data of accident severity model, with K
A classified variable forms set X and t*The interaction item of a subclass collectively as the accident severity model from
Variable X*;Define t*A accident subclass includes J casualty data, and the value of J isThe predictive variable that jth plays accident is denoted as
yj;
Step 4.2, initialization t*=1;
Step 4.3 obtains returning based on binary logistic in independent variable X using formula (11)*Under the conditions of death by accident, that is, yj=1
Probability of happening P (y=1 | X*):
In formula (11), w*For independent variable X*Regression coefficient;
Step 4.4, the parameter w that the accident severity model that the binary logistic is returned is estimated using maximum-likelihood method*:
Accident is risen for jth,To give independent variableUnder the conditions of yj=1 probability then gives independent variableUnder the conditions of yj=0 probability is 1-Pj;And likelihood function L (w is obtained using formula (12)*):
Using Maximum-likelihood estimation, find out so that L (w*) obtain maximum value when estimation parameter w ';
Jth, which is obtained, according to estimation parameter w ' plays accident in independent variableUnder the conditions of yj=1 prediction probabilityIt is risen to obtain J
The prediction probability of accidentAnd ascending sort is carried out, the prediction probability set after being sorted is denoted as
{P′1,...,P′j,...,P′J};
Step 4.5, the prediction classification thresholds for adjusting accident severity model;
Step 4.6 enables t*+ 1 is assigned to t*, and judge t*> T*It is whether true, if so, it then indicates to obtain T*A accident severity
Prediction model, otherwise, return step 4.3 execute.
2. traffic accident Severity forecasting method according to claim 1, characterized in that the step 3.3 is by as follows
Process carries out:
Step 3.3.1, CART decision tree uses whether Gini coefficient carries out the foundation of branch as determination decisions tree, establishes y-bend
Decision-tree model, according to characteristic value cut-off α, by the training sample setIt is divided into the first subset Dα1With second subset Dα2, benefit
The Gini index Gini (D of the characteristic value cut-off α is obtained with formula (8)α):
In formula (8),|Dα1| and | Dα2| respectively indicate training sample setFirst subset Dα1With second subset Dα2Middle packet
Containing total number of accident;
Gini(Dα1) indicate the first subset Dα1Gini index, and have:
In formula (9),WithRespectively indicate the first subset Dα1In non-dead and death by accident probability;
In formula (8), Gini (Dα2) indicate second subset Dα2Gini index, and have:
In formula (10),WithRespectively indicate second subset Dα2In non-dead and death by accident probability;
Step 3.3.2, the cut-off of each characteristic value in the feature set X is traversed, and calculate the cut-off of each characteristic value
Gini index;If the Gini index of the cut-off of each characteristic value is less than threshold epsilon in feature set X, then it represents that the CART decision
Tree-model is the tree of a single node, and exports the tree of the single node;It is no to then follow the steps 3.3.3;
Step 3.3.3, characteristic value X corresponding to the Gini index of minimum cut-off in feature set X is selectedminAnd its it cuts accordingly
Branch αmin, and according to the cut-off αminBy training sample setIt is divided into two subset Dsmin1And Dmin2, then by subset Dmin1
And subset Dmin2It is separately dispensed into training sample setFor in two child nodes of father node;
If subset Dmin1And subset Dmin2Sample number be respectively less than given node sample threshold σ, then it represents that two subset Dsmin1With
Dmin2The child node at place is leaf node, exports binary decision tree;If subset Dmin1And/or subset Dmin2Sample number it is big
In the node sample threshold σ, then it represents that subset Dmin1Or subset Dmin2The child node at place be non-leaf node can further into
Row divides, and executes step 3.3.4;
Step 3.3.4, for n omicronn-leaf child node, training sample set is enabledEqual to subset corresponding to n omicronn-leaf child node, and will most
Characteristic value X corresponding to the Gini index of small cut-offminAfter deleting in feature set X, 3.3.1, Zhi Daosuo are returned to step
When to have the sample number of child node to be respectively less than node sample threshold σ or feature set X be empty, final binary decision tree is exported.
3. traffic accident Severity forecasting method according to claim 1, characterized in that the step 4.5 is by as follows
Process carries out:
Step 4.5.1, the prediction classification thresholds that θ is model, and 0 < θ < 1 are defined;Indicate accident severity model prediction the
J plays accident and is predicted as death by accident;Indicate that accident severity model prediction jth plays accident and is predicted as nonfatal accident;
Step 4.5.2, j=1 is initialized;
Step 4.5.3, j-th of classification thresholds θ of model is enabledjEqual to P 'j, accident severity model prediction is obtained using formula (13)
J-th of susceptibility Se (θj), i.e., casualty data concentrates death by accident to be predicted as the probability of death by accident:
In formula (13),Indicate that s plays the probability that accident is predicted as death by accident, ys=1 indicates that s accidents are death by accident,
1≤s≤J;
J-th of specificity Sp (θ of accident severity model prediction is obtained using formula (14)j), i.e., casualty data concentrates non-dead thing
Therefore it is predicted as the probability of nonfatal accident:
In formula (14),Indicate that s plays the probability that accident is predicted as death by accident, ys=0 indicates that s accidents are death by accident,
1≤s≤J;
Step 4.5.4, it enables j+1 be assigned to j, and judges whether j > J is true, if so, it then indicates to obtain J to susceptibility and spy
Anisotropic value, and execute step 4.5.5;Otherwise, return step 4.5.3 is executed;
Step 4.5.5, with j-th of classification thresholds θjFor abscissa, respectively with j-th of classification thresholds θjCorresponding susceptibility Se
(θj) and specificity Sp (θj) value be ordinate, draw susceptibility and specificity curve, with the corresponding threshold value of cusp
Classification thresholds θ ' is predicted as best model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770584.3A CN110458244B (en) | 2019-08-20 | 2019-08-20 | Traffic accident severity prediction method applied to regional road network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910770584.3A CN110458244B (en) | 2019-08-20 | 2019-08-20 | Traffic accident severity prediction method applied to regional road network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110458244A true CN110458244A (en) | 2019-11-15 |
CN110458244B CN110458244B (en) | 2021-03-30 |
Family
ID=68488078
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910770584.3A Active CN110458244B (en) | 2019-08-20 | 2019-08-20 | Traffic accident severity prediction method applied to regional road network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458244B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942260A (en) * | 2019-12-12 | 2020-03-31 | 长安大学 | University traffic safety evaluation method based on Bayesian maximum entropy |
CN111476274A (en) * | 2020-03-16 | 2020-07-31 | 宜通世纪科技股份有限公司 | Big data prediction analysis method, system, device and storage medium |
CN111931861A (en) * | 2020-09-09 | 2020-11-13 | 北京志翔科技股份有限公司 | Anomaly detection method for heterogeneous data set and computer-readable storage medium |
CN111951550A (en) * | 2020-08-06 | 2020-11-17 | 华南理工大学 | Traffic safety risk monitoring method and device, storage medium and computer equipment |
CN112270994A (en) * | 2020-10-14 | 2021-01-26 | 中国医学科学院阜外医院 | Method, device, terminal and storage medium for constructing risk prediction model |
CN112349098A (en) * | 2020-11-03 | 2021-02-09 | 南京信息职业技术学院 | Method for estimating accident severity by environmental elements in exit ramp area of expressway |
CN112561175A (en) * | 2020-12-18 | 2021-03-26 | 深圳赛安特技术服务有限公司 | Traffic accident influence factor prediction method, device, equipment and storage medium |
CN112837533A (en) * | 2021-01-08 | 2021-05-25 | 合肥工业大学 | Highway accident frequency prediction method considering risk factor time-varying characteristics |
CN113762364A (en) * | 2021-08-23 | 2021-12-07 | 东南大学 | Unbalanced traffic accident data synthesis sampling method |
CN116882780A (en) * | 2023-07-05 | 2023-10-13 | 北京大学 | Rural space element extraction and locality classification planning method based on landscape pictures |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130332026A1 (en) * | 2012-06-12 | 2013-12-12 | Guardity Technologies, Inc. | Qualifying Automatic Vehicle Crash Emergency Calls to Public Safety Answering Points |
US20180060508A1 (en) * | 2016-08-26 | 2018-03-01 | International Business Machines Corporation | Personalized tolerance prediction of adverse drug events |
CN108154681A (en) * | 2016-12-06 | 2018-06-12 | 杭州海康威视数字技术股份有限公司 | Risk Forecast Method, the apparatus and system of traffic accident occurs |
CN109447306A (en) * | 2018-08-13 | 2019-03-08 | 上海海事大学 | Metro accidents delay time at stop prediction technique based on maximum likelihood regression tree |
CN109598929A (en) * | 2018-11-26 | 2019-04-09 | 北京交通大学 | A kind of multi-class the number of traffic accidents prediction technique |
-
2019
- 2019-08-20 CN CN201910770584.3A patent/CN110458244B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130332026A1 (en) * | 2012-06-12 | 2013-12-12 | Guardity Technologies, Inc. | Qualifying Automatic Vehicle Crash Emergency Calls to Public Safety Answering Points |
US20180060508A1 (en) * | 2016-08-26 | 2018-03-01 | International Business Machines Corporation | Personalized tolerance prediction of adverse drug events |
CN108154681A (en) * | 2016-12-06 | 2018-06-12 | 杭州海康威视数字技术股份有限公司 | Risk Forecast Method, the apparatus and system of traffic accident occurs |
CN109447306A (en) * | 2018-08-13 | 2019-03-08 | 上海海事大学 | Metro accidents delay time at stop prediction technique based on maximum likelihood regression tree |
CN109598929A (en) * | 2018-11-26 | 2019-04-09 | 北京交通大学 | A kind of multi-class the number of traffic accidents prediction technique |
Non-Patent Citations (4)
Title |
---|
JIAN-FENG XI等: "The Model of Severity Prediction of Traffic Crash on the Curve", 《GREEN TRANSPORTATION SYSTEM AND SAFETY》 * |
YIKAI CHEN等: "Evaluation of the safety performance of highway alignments based on fault tree analysis and safety boundaries", 《TRAFFIC INJURY PREVENTION》 * |
冯成建: "车辆碰撞中行人死亡风险及颅脑损伤类型预测研究", 《中国博士学位论文全文数据库 工程科技Ⅱ辑》 * |
李庚凭: "基于有序Logit和多项Logit模型的高速公路交通事故严重程度预测", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942260A (en) * | 2019-12-12 | 2020-03-31 | 长安大学 | University traffic safety evaluation method based on Bayesian maximum entropy |
CN110942260B (en) * | 2019-12-12 | 2024-02-13 | 长安大学 | College traffic safety evaluation method based on Bayesian maximum entropy |
CN111476274A (en) * | 2020-03-16 | 2020-07-31 | 宜通世纪科技股份有限公司 | Big data prediction analysis method, system, device and storage medium |
CN111476274B (en) * | 2020-03-16 | 2024-03-08 | 宜通世纪科技股份有限公司 | Big data predictive analysis method, system, device and storage medium |
CN111951550A (en) * | 2020-08-06 | 2020-11-17 | 华南理工大学 | Traffic safety risk monitoring method and device, storage medium and computer equipment |
CN111951550B (en) * | 2020-08-06 | 2021-10-29 | 华南理工大学 | Traffic safety risk monitoring method and device, storage medium and computer equipment |
CN111931861A (en) * | 2020-09-09 | 2020-11-13 | 北京志翔科技股份有限公司 | Anomaly detection method for heterogeneous data set and computer-readable storage medium |
CN112270994B (en) * | 2020-10-14 | 2021-08-17 | 中国医学科学院阜外医院 | Method, device, terminal and storage medium for constructing risk prediction model |
CN112270994A (en) * | 2020-10-14 | 2021-01-26 | 中国医学科学院阜外医院 | Method, device, terminal and storage medium for constructing risk prediction model |
CN112349098A (en) * | 2020-11-03 | 2021-02-09 | 南京信息职业技术学院 | Method for estimating accident severity by environmental elements in exit ramp area of expressway |
CN112561175A (en) * | 2020-12-18 | 2021-03-26 | 深圳赛安特技术服务有限公司 | Traffic accident influence factor prediction method, device, equipment and storage medium |
CN112837533A (en) * | 2021-01-08 | 2021-05-25 | 合肥工业大学 | Highway accident frequency prediction method considering risk factor time-varying characteristics |
CN112837533B (en) * | 2021-01-08 | 2021-11-19 | 合肥工业大学 | Highway accident frequency prediction method considering risk factor time-varying characteristics |
CN113762364A (en) * | 2021-08-23 | 2021-12-07 | 东南大学 | Unbalanced traffic accident data synthesis sampling method |
CN113762364B (en) * | 2021-08-23 | 2022-11-04 | 东南大学 | Unbalanced traffic accident data synthesis sampling method |
CN116882780A (en) * | 2023-07-05 | 2023-10-13 | 北京大学 | Rural space element extraction and locality classification planning method based on landscape pictures |
CN116882780B (en) * | 2023-07-05 | 2024-04-05 | 北京大学 | Rural space element extraction and locality classification planning method based on landscape pictures |
Also Published As
Publication number | Publication date |
---|---|
CN110458244B (en) | 2021-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458244A (en) | A kind of traffic accident Severity forecasting method applied to Regional Road Network | |
CN108550263B (en) | Expressway traffic accident cause analysis method based on fault tree model | |
CN106127586A (en) | Vehicle insurance rate aid decision-making system under big data age | |
CN108492557A (en) | Highway jam level judgment method based on multi-model fusion | |
CN107492251A (en) | It is a kind of to be identified and driving condition supervision method based on the driver identity of machine learning and deep learning | |
Jang et al. | Evaluation of pedestrian safety: Pedestrian crash hot spots and risk factors for injury severity | |
CN105809193B (en) | A kind of recognition methods of the illegal vehicle in use based on kmeans algorithm | |
CN110588658B (en) | Method for detecting risk level of driver based on comprehensive model | |
CN105046673B (en) | High spectrum image and visual image fusion sorting technique based on self study | |
CN101751438A (en) | Theme webpage filter system for driving self-adaption semantics | |
CN108682153B (en) | Urban road traffic jam state discrimination method based on RFID electronic license plate data | |
CN113436433B (en) | Efficient urban traffic outlier detection method | |
CN110562261B (en) | Method for detecting risk level of driver based on Markov model | |
CN105034986A (en) | On-line identification method and on-line identification device for steering characteristics of drivers | |
CN111563555A (en) | Driver driving behavior analysis method and system | |
CN112767684A (en) | Highway traffic jam detection method based on charging data | |
CN104809476A (en) | Multi-target evolutionary fuzzy rule classification method based on decomposition | |
Yuan et al. | A roadway safety sustainable approach: modeling for real-time traffic crash with limited data and its reliability verification | |
CN109635852A (en) | A kind of building of user's portrait and clustering method based on multidimensional property | |
CN111080158A (en) | Urban intersection traffic danger index evaluation method based on composite weight | |
CN106570537A (en) | Random forest model selection method based on confusion matrix | |
CN112884014A (en) | Traffic speed short-time prediction method based on road section topological structure classification | |
CN115422747A (en) | Method and device for calculating discharge amount of pollutants in tail gas of motor vehicle | |
CN113232669A (en) | Driving style identification method based on machine learning | |
CN113581188A (en) | Commercial vehicle driver driving style identification method based on Internet of vehicles data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |