CN113077625A - Road traffic accident form prediction method - Google Patents
Road traffic accident form prediction method Download PDFInfo
- Publication number
- CN113077625A CN113077625A CN202110312213.8A CN202110312213A CN113077625A CN 113077625 A CN113077625 A CN 113077625A CN 202110312213 A CN202110312213 A CN 202110312213A CN 113077625 A CN113077625 A CN 113077625A
- Authority
- CN
- China
- Prior art keywords
- traffic accident
- formula
- association rule
- variables
- variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010039203 Road traffic accident Diseases 0.000 title claims abstract description 98
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000003993 interaction Effects 0.000 claims abstract description 20
- 238000010187 selection method Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims abstract description 6
- 239000006185 dispersion Substances 0.000 claims abstract description 5
- 238000005065 mining Methods 0.000 claims abstract description 5
- 238000007476 Maximum Likelihood Methods 0.000 claims abstract description 4
- 230000014509 gene expression Effects 0.000 claims description 12
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 6
- 102000003712 Complement factor B Human genes 0.000 claims description 6
- 108090000056 Complement factor B Proteins 0.000 claims description 6
- 239000011541 reaction mixture Substances 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 abstract description 2
- 238000007418 data mining Methods 0.000 abstract 1
- 230000034994 death Effects 0.000 description 8
- 231100000517 death Toxicity 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/08—Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
- B60W30/095—Predicting travel path or likelihood of collision
- B60W30/0956—Predicting travel path or likelihood of collision the prediction being responsive to traffic or environmental parameters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/16—Anti-collision systems
- G08G1/166—Anti-collision systems for active traffic, e.g. moving vehicles, pedestrians, bikes
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/16—Anti-collision systems
- G08G1/167—Driving aids for lane monitoring, lane changing, e.g. blind spot detection
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Theoretical Computer Science (AREA)
- Operations Research (AREA)
- Transportation (AREA)
- Quality & Reliability (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Automation & Control Theory (AREA)
- Game Theory and Decision Science (AREA)
- Mechanical Engineering (AREA)
- Educational Administration (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Traffic Control Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a road traffic accident form prediction method, which comprises the following steps: 1. collecting and processing road traffic accident data; 2. discretizing continuous independent variables in the traffic accident data by adopting a minimum description length criterion; 3. mining interaction among independent variables by adopting an attribute selection method based on association rules in the field of data mining; 4. establishing a hybrid Logit model, and performing parameter estimation by adopting a maximum likelihood estimation method; 5. and predicting the traffic accident form probability based on the constructed mixed Logit model. The invention fully utilizes the information of the predictive variables in the continuous independent variable dispersion process and excavates the influence of interaction among the variables on the accident form so as to reduce the information loss of the dispersed variables and overcome the problem of error inference caused by neglecting the interaction among the variables, thereby improving the prediction precision of a traffic accident form prediction model and providing technical support for the improvement of the road traffic safety environment.
Description
Technical Field
The invention relates to a road traffic accident form prediction method, and belongs to the technical field of road traffic safety analysis.
Background
Data of the road safety global status report 2018 show that the number of deaths caused by traffic accidents is increased to 135 thousands of people every year in the world, wherein 80% of the deaths caused by traffic accidents occur in countries with moderate income. As the country with the largest medium income, more than 24 thousands of traffic accidents occur in China every year, so that more than 6 thousands of people die, and the traffic safety situation is severe. The influence factors of the traffic accidents with different forms are obviously different. The method constructs the relationship between the traffic accident form and the influence factors such as drivers, roads, environments and the like, thereby predicting the traffic accident form, and is one of important traffic safety improvement measures.
In the aspect of an accident form prediction method, fixed parameter discrete selection models such as Probit and multiple Logit are widely applied. However, such methods ignore unobserved heterogeneity that is prevalent in traffic accident data, often resulting in biased parameter estimates. Compared with a fixed parameter discrete selection model, the hybrid Logit model reflects the heterogeneity of traffic accident data by assuming variable coefficients as random parameters. However, this method has the following problems in the accident pattern prediction: (1) for continuous independent variables in traffic accident data, an unsupervised discretization algorithm is mostly adopted for discretization, and the relation between the independent variables and the prediction variables cannot be considered in the discretization process, so that the information loss of the discretized independent variables is serious; (2) the occurrence of traffic accidents generally depends on the combined action of a plurality of independent variables, and the method ignores the influence of the interaction among the variables on the accident form and is easy to cause wrong prediction and inference.
Disclosure of Invention
The invention provides a road traffic accident form prediction method for overcoming the defects of the prior art, aiming at fully utilizing the information of a prediction variable in the continuous independent variable dispersion process and mining the influence of interaction among the variables on the accident form so as to reduce the information loss of the dispersed variable and overcome the problem of error inference caused by neglecting the interaction among the variables, thereby improving the prediction precision of a traffic accident form prediction model.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a road traffic accident form prediction method, which is characterized by comprising the following steps:
step 1.1, acquiring N traffic accidents from a road traffic accident database to form traffic accident data D; defining the self-variable set influencing the traffic accident form in the traffic accident data set D as X ═ X1,x2,…,xk,…,xKIn which xkDenotes the kth argument, K1, 21,x2,…,xkIs a set of categorical arguments, { x }k+1,xk+2,…,xk+l,…,xKIs a set of continuous self-variables, l 1, 2., K-K;
step 1.2, according to the concrete situation when the accident happens, dividing the traffic accident form into the accident y between vehicles1Vehicle and pedestrian accident y2Accident of bicycle y3So as to obtain the prediction variable Y ═ Y formed by three types of accidents1,y2,y3};
step 2.1, initializing 1;
step 2.2, from the l continuous independent variable x of each traffic accident in the traffic accident data set Dk+lForming a set of continuous argument valuesAnd for continuous sets of argument valuesPerforming descending arrangement;
step 2.3, obtaining the continuous independent variable x by using the formula (1)k+lInformation entropy e (d) with predictor variable Y:
in the formula (1), | Y | represents the kind of the prediction variable Y in the traffic accident data set D; p is a radical ofyiIndicating type i accident yiThe fraction in the accident data set D; i is 1,2, 3;
step 2.4, traversing and searching the l continuous independent variable x according to the information gain maximization principlek+lIs optimized to the discrete point blAt the optimum discrete point blDividing a traffic accident data set D into a first subset D for a boundary1And a second subset D2And calculating the information gain G (b) obtained by dispersion using the formula (2)l,D):
In the formula (2), | D1|、|D2Respectively, the first subset D1A second subset D2The number of traffic accident cases in the traffic accident data set D; e (D)1) And E (D)2) Are respectively a first subset D1And a second subset D2The entropy of the information of (1);
step 2.5, calculating a stop criterion S by using the formula (3):
in the formula (3, | Y1|、|Y2Respectively denote the first subset D1A second subset D2The kind of the medium predictive variable;
step 2.6, judge the information gain G (b)lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point blEffectively, the optimal discrete point blAdding the discrete point into a discrete point set B; respectively connecting the first subset D1And a second subset D2Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first subset D1And a second subset D2Searching the next optimal discrete tangent point; otherwise, executing step 2.7;
step 2.7, after l +1 is assigned to l, whether l is larger than K-K is judged, if yes, a continuous self-variable set is representedAll continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, a step 2.8 is executed; otherwise, returning to the step 2.2 for execution;
step 2.8, discretizing each continuous independent variable based on the discrete point set B to convert all independent variables into classified independent variables, thereby obtaining a discretized independent variable set XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K }; wherein x isk+lRepresents the k + l classification independent variable;
step 3, adopting an attribute selection method based on association rules to mine interaction among independent variables;
step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; the discrete autovariate is collected into XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K Setting all factors as a rule front piece A, and setting all types of accidents of a prediction variable Y as a rule back piece B;
step 3.2, respectively defining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B, as shown in formulas (4), (5) and (6):
in the expressions (4), (5) and (6), N is the total number of samples of the traffic accident, and P (A ≈ B) represents the frequency of the simultaneous occurrence of the factor A and the factor B in the traffic accident data; p (A), P (B) respectively represent the frequency of the factor A and the factor B in the traffic accident data;
step 3.3, defining and initializing the minimum support degree minSup, the minimum confidence degree minConf and the minimum promotion degree minLift of the association rule A → B;
3.4, mining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B;
step 3.5, defining three constrained rules of the attribute selection method based on the association rule A → B, namely a strong association rule SAR, a classification association rule CAR and an atomic association rule AAR;
step 3.5.1, obtaining an expression of the strong association rule SAR by using the formula (7):
Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift(7)
in the formula (7), Λ represents ^ and;
step 3.5.2, set X of discretized autovariablesMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K In, let | xkIs the kth independent variable xkA range of values of; k is more than or equal to 1 and less than or equal to K;
step 3.5.3, defining the set of influencer values FVIS to be taken for all possible argumentsSet of values, i.e.Defining a target value set TVIS as a set of possible values of all prediction variables, namely TVIS ═ Y |;
step 3.5.4, obtaining an expression of the classification association rule CAR by using the formula (8):
in the formula (8), | B | is the kind of the predictor variable;
step 3.5.3, obtaining the expression of the atomic association rule AAR by using the formula (9):
in the formula (8), | a | is the number of independent variable types;
step 3.6, enabling all association rules A → B meeting the classification association rule CAR to form CARset, and enabling association rules A → B meeting the atomic association rule AAR to form an atomic association rule set AARset;
3.7, arranging the association rules A → B in the atomic type association rule set AARset in a descending order according to the confidence;
step 3.8, sequentially judging whether the rule back-piece of the atomic association rule AAR appears in the rule front-piece of the classification association rule set CARset, if so, determining the rule back-piece of the atomic association rule set AAR as a redundant variable, and deleting all association rules A → B with the rule back-piece of the atomic association rule AAR from the classification association rule set CARset;
step 3.9, processing according to the step 3.8 until the atomic type association rule set AARset is empty;
step 3.10, mapping the residual association rules A → B in the classification association rule Set CARset to corresponding independent variables, thereby obtaining an independent variable Set containing interaction among the variables;
step 4, constructing an accident form prediction model based on a mixed Logit principle;
step 4.1, establishing a hybrid Logit model by using the formula (10):
in the formula (10), Pn(yi) The accident form of the nth traffic accident is yiThe probability of (a) of (b) being,the accident form of the nth traffic accident is yiThe vector of parameters of the time-independent variable,representing independent variablesVector form of the estimated parameters;probability density function representing random parameter beta, beta andrespectively representing the vector form of the probability density function mean value and the variance parameter;
step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;
step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, under the set confidence level, the parameters of the hybrid Logit model are screened by adopting a stepwise regression method, and the parameter estimation value Par of the hybrid Logit model is obtained by screening
Step 5, predicting the traffic accident form probability based on the constructed mixed Logit model;
step 5.1, obtaining independent variable information influencing traffic accident forms in real time;
step 5.2, inputting the independent variable information obtained in the step 5.1 into a formula (11), and calculating to obtain the accident form y under the condition of corresponding independent variable informationiUtility function of
In the formula (11), the reaction mixture is,representing hybrid Logit model parameter estimatesThe median accident pattern is yiA parameter vector of time;
step 5.3, obtaining the accident form y under the condition of the independent variable information influencing the traffic accident form in real time by using the formula (12)iIs predicted with probability of
Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the problem that the information loss of the dispersed independent variable is serious due to the traditional unsupervised discretization algorithm, the method innovatively adopts the minimum description length criterion in the supervised discretization algorithm to carry out continuous variable discretization, reduces the information loss of the dispersed independent variable, and is beneficial to seeking a better discrete point position, thereby improving the prediction precision of the model;
2. according to the method, interaction among the independent variables is mined through an attribute selection method based on the association rule, and the interaction among the independent variables is incorporated into the mixed Logit model, so that the influence of the interaction among the independent variables on the traffic accident form probability is favorably deeply understood, and the adverse influence of error inference caused by neglecting the interaction among the variables is overcome;
3. the method provides a mixed Logit model containing interaction among variables, provides a new solution for predicting traffic accident form probability, and provides technical support for improving road traffic safety environment.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of the minimum description length criterion of the present invention;
FIG. 3 is a flowchart of an association rule based attribute selection method according to the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, a road traffic accident form prediction method takes the death traffic accident data in shenzhen, guangdong province as an example, and is performed according to the following steps:
step 1.1, acquiring road traffic accident data from 2014 to 2016 in Shenzhen city from a road traffic safety research information sharing platform, screening out death traffic accident data, deleting accident data which are incompletely (with blank items) or unreasonably recorded in a traffic accident database, acquiring 1264(N is 1264) death traffic accident data as a traffic accident data set D, selecting 16(K is 16) independent variables which potentially influence accident forms from five aspects of motor vehicle drivers, vehicles, roads, environments and time, and forming an independent variable set X which influences the traffic accident forms1,x2,…,xk,…,xKThe age of the driver and the accident occurrence time are continuous independent variables, so the classification variable set is { x }1,x2,…,x1414, from a set of successive variables x14+1,x14+2The descriptive statistics of the respective variables are shown in table 1;
step 1.2, according to the concrete situation when the accident happens, dividing the traffic accident form into the accident y between vehicles1Vehicle and pedestrian accident y2Accident of bicycle y3So as to obtain the prediction variable Y ═ Y formed by three types of accidents1,y2,y3};
TABLE 1 descriptive statistics of independent variables
Note: drivers under 18 years of age were all unlicensed motorcycle drivers, indicating that this variable was the reference variable.
step 2.1 and fig. 2 are flowcharts of the minimum description length criterion, where K is 16, K is 14, and l is initialized to 1;
step 2.2, from the l continuous independent variable x of each traffic accident in the traffic accident data set Dk+lForming a set of continuous argument valuesAnd according to a continuous argument xk+lThe values of (a) are sorted in descending order;
step 2.3, obtaining the continuous independent variable x by using the formula (1)k+lInformation entropy e (d) with predictor variable Y:
in the formula (1), | Y | represents the kind of the prediction variable Y in the traffic accident data set D; p is a radical ofyiIndicating type i accident yiThe fraction in the accident data set D; i is 1,2, 3;
step 2.4, traversing and searching the l continuous independent variable x according to the information gain maximization principlek+lIs optimized to the discrete point blAt the optimum discrete point blDividing a traffic accident data set D into a first subset D for a boundary1And a second subset D2And calculating the information gain G (b) obtained by dispersion using the formula (2)l,D):
In the formula (2), | D1|、|D2Respectively, the first subset D1A second subset D2The number of traffic accident cases in the traffic accident data set D; e (D)1) And E (D)2) Are respectively a first subset D1And a second subset D2The entropy of the information of (1);
step 2.5, calculating a stop criterion S by using the formula (3):
in the formula (3, | Y1|、|Y2Respectively denote the first subset D1A second subset D2The kind of the medium predictive variable;
step 2.6, judge the information gain G (b)lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point blEffectively, the optimal discrete point blAdding the discrete point into a discrete point set B; and respectively connecting the first subset D1And a second subset D2Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first sub-stepCollection D1And a second subset D2Searching the next optimal discrete tangent point; otherwise, executing step 2.7;
step 2.7, after l +1 is assigned to l, whether l is larger than K-K is judged, if yes, a continuous self-variable set is representedAll continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, a step 2.8 is executed; otherwise, returning to the step 2.2 for execution;
step 2.8, discretizing the age and the accident occurrence time of the continuous independent variable driver based on the discrete point set B, so that independent variables are all converted into classified independent variables, and obtaining a discrete continuous independent variable set XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K }; wherein x is k+l Represents the k + l classification variable; the discretization results are shown in table 2;
TABLE 2 discretization of continuous arguments
Note: denotes this variable as a reference variable.
Step 3, adopting an attribute selection method based on association rules to mine interaction among independent variables;
step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; as shown in FIG. 2, the discrete autovariables are collected into XMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K Setting all factors in the forecast as rule front pieces A, and setting all types of accidents of a forecast variable Y as rule back pieces B;
step 3.2, respectively defining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B, as shown in formulas (4), (5) and (6):
in the expressions (4), (5) and (6), N is the total number of samples of the traffic accident, and P (A ≈ B) represents the frequency of the simultaneous occurrence of the factor A and the factor B in the traffic accident data; p (A), P (B) respectively represent the frequency of the factor A and the factor B in the traffic accident data;
step 3.3, as shown in fig. 3, defining and initializing a minimum support minsupo ═ 10%, a minimum confidence minConf ═ 50%, and a minimum lift minLift ═ 100% of the association rule a → B;
3.4, mining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B;
step 3.5, defining three constrained rules of the attribute selection method based on the association rule A → B, namely a strong association rule SAR, a classification association rule CAR and an atomic association rule AAR;
step 3.5.1, obtaining an expression of the strong association rule SAR by using the formula (7):
Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift(7)
in the formula (7), Λ represents ^ and;
step 3.5.2, set X of discretized autovariablesMDLP={x1,x2,…,xk,x k+1 ,x k+2 ,x k+l ,…,x K In, let | xkIs the kth independent variable xkA range of values of; k is more than or equal to 1 and less than or equal to K;
step 3.5.3, defining the influencer value set FVIS as the set of all possible values of the independent variable, that isDefining a target value set TVIS as a set of possible values of all prediction variables, namely TVIS ═ Y |;
step 3.5.4, obtaining an expression of the classification association rule CAR by using the formula (8):
in the formula (8), | B | is the kind of the predictor variable;
step 3.5.3, obtaining the expression of the atomic association rule AAR by using the formula (9):
in the formula (8), | a | is the number of independent variable types;
step 3.6, enabling all association rules A → B meeting the classification association rule CAR to form CARset, and enabling association rules A → B meeting the atomic association rule AAR to form an atomic association rule set AARset;
3.7, arranging the association rules A → B in the atomic type association rule set AARset in a descending order according to the confidence;
step 3.8, sequentially judging whether the rule back-piece of the atomic association rule AAR appears in the rule front-piece of the classification association rule set CARset, if so, determining the rule back-piece of the atomic association rule set AAR as a redundant variable, and deleting all association rules A → B with the rule back-piece of the atomic association rule AAR from the classification association rule set CARset;
step 3.9, processing according to the step 3.8 until the atomic type association rule set AARset is empty;
step 3.10, mapping the residual association rules A → B in the classification association rule Set CARset to corresponding independent variables, thereby obtaining an independent variable Set containing interaction among the variables;
step 4, constructing an accident form prediction model based on a mixed Logit principle;
step 4.1, establishing a hybrid Logit model by using the formula (10):
in the formula (10), Pn(yi) The accident form of the nth traffic accident is yiThe probability of (a) of (b) being,the accident form of the nth traffic accident is yiThe vector of parameters of the time-independent variable,representing independent variablesVector form of the estimated parameters;probability density function representing random parameter beta, beta andrespectively representing the vector form of the probability density function mean value and the variance parameter;
step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, applying software SAS9.4, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;
step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, screening the parameters of the hybrid Logit model by adopting a stepwise regression method under the confidence level of 90 percent, and obtaining the parameter estimation value Par of the hybrid Logit model by screeningSee table 3;
TABLE 3 hybrid Logit model parameter estimation results for death traffic accident morphology
Step 5, predicting the traffic accident form probability based on the constructed mixed Logit model;
step 5.1, obtaining independent variable information influencing traffic accident forms in real time; (ii) a
Step 5.1.1, as shown in table 3, the mixed Logit model parameter estimation result shows that the independent variables influencing the death traffic accident form include: the age of a driver, the type of a vehicle, the road isolation form, the type of a road section at an intersection, the road alignment, the accident occurrence time, the lighting condition, the weather and the traffic accident form are obviously related; acquiring driver age and vehicle type data from video data of an urban road intelligent traffic video monitoring system; acquiring road isolation form, intersection section type and road alignment data based on road design data; acquiring accident occurrence time, lighting conditions and weather data through a meteorological department;
step 5.2, the independent variable information input formula (11) which influences the traffic accident form and is obtained in the step 5.1 is calculated to obtain the accident form y under the traffic information conditioniUtility function of
In the formula (11), the reaction mixture is,representing hybrid Logit model parameter estimatesThe median accident pattern is yiA parameter vector of time;
and 5.3, obtaining the accident form y under the condition of the current independent variable information by using the formula (12)iIs predicted with probability of
and 5.4, obtaining the prediction probability of the traffic accident in each form under the condition of the current independent variable information according to the step 5.3, transmitting the information to vehicle-mounted communication equipment based on the vehicle networking wireless communication technology, and giving early warning and reminding to the traffic accident form which is mainly prevented by the driver through an intelligent voice broadcasting device. For example, on a non-straight road, the probability of a single vehicle accident of a 14-21 year old motorcycle driver is up to 96.38%, and the probability of an inter-vehicle accident and a vehicle-pedestrian accident are 1.49% and 2.14%, respectively. When 14 to 21 year old motorcycle driver is about to go to the non-straight road, send the early warning of the place ahead high probability bicycle accident to motorcycle driver through intelligent voice broadcast device, remind the driver to prudent deceleration and go to realize the accurate management and control to the driver, guarantee driving safety.
Claims (1)
1. A road traffic accident form prediction method is characterized by comprising the following steps:
step 1, collecting and processing road traffic accident data;
step 1.1, acquiring N traffic accidents from a road traffic accident database to form traffic accident data D; defining the self-variable set influencing the traffic accident form in the traffic accident data set D as X ═ X1,x2,…,xk,…,xKIn which xkDenotes the kth argument, K1, 21,x2,…,xkIs a set of categorical arguments, { x }k+1,xk+2,…,xk+l,…,xKIs a set of continuous self-variables, l 1, 2., K-K;
step 1.2, according to the concrete situation when the accident happens, dividing the traffic accident form into the accident y between vehicles1Vehicle and pedestrian accident y2Accident of bicycle y3So as to obtain the prediction variable Y ═ Y formed by three types of accidents1,y2,y3};
Step 2, adopting a minimum description length criterion to combine the continuous autovariate set { xk+1,xk+2,…,xk+l,…,xKDiscretizing;
step 2.1, initializing 1;
step 2.2, from the l continuous independent variable x of each traffic accident in the traffic accident data set Dk+lForming a set of continuous argument valuesAnd for continuous sets of argument valuesPerforming descending arrangement;
step 2.3, obtaining the continuous independent variable x by using the formula (1)k+lInformation entropy e (d) with predictor variable Y:
in the formula (1), | Y | represents the kind of the prediction variable Y in the traffic accident data set D; p is a radical ofyiIndicating type i accident yiThe fraction in the accident data set D; i is 1,2, 3;
step 2.4, traversing and searching the l continuous independent variable x according to the information gain maximization principlek+lIs optimized to the discrete point blAt the optimum discrete point blDividing a traffic accident data set D into a first subset D for a boundary1And a second subset D2And calculating the information gain G (b) obtained by dispersion using the formula (2)l,D):
In the formula (2), | D1|、|D2Respectively, the first subset D1A second subset D2The number of traffic accident cases in the traffic accident data set D; e (D)1) And E (D)2) Are respectively a first subset D1And a second subset D2The entropy of the information of (1);
step 2.5, calculating a stop criterion S by using the formula (3):
in the formula (3, | Y1|、|Y2Respectively denote the first subset D1A second subset D2The kind of the medium predictive variable;
step 2.6, judge the information gain G (b)lD) whether it is greater than the stop criterion S, if so, it represents the optimal discrete point blEffectively, the optimal discrete point blAdding the discrete point into a discrete point set B; respectively connecting the first subset D1And a second subset D2Replacing the traffic accident data set D, according to the procedure from step 2.4 to step 2.6, in the first subset D1And a second subset D2Searching the next optimal discrete tangent point; otherwise, executing step 2.7;
step 2.7, after l +1 is assigned to l, whether l is larger than K-K is judged, if yes, a continuous self-variable set is representedAll continuous independent variables in the process are discretized, and after a discrete point set B of each continuous independent variable is output, step 2.8 is executed(ii) a Otherwise, returning to the step 2.2 for execution;
step 2.8, discretizing each continuous independent variable based on the discrete point set B to convert all independent variables into classified independent variables, thereby obtaining a discretized independent variable set XMDLP={x1,x2,…,xk,xk+1,xk+2,xk+l,…,xK}; wherein x isk+lRepresents the k + l classification independent variable;
step 3, adopting an attribute selection method based on association rules to mine interaction among independent variables;
step 3.1, defining an association rule as A → B, wherein A is a rule front piece, B is a rule back piece, and → is a relation symbol; the discrete autovariate is collected into XMDLP={x1,x2,…,xk,xk+1,xk+2,xk+l,…,xKSetting all factors as a rule front piece A, and setting all types of accidents of a prediction variable Y as a rule back piece B;
step 3.2, respectively defining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B, as shown in formulas (4), (5) and (6):
in the expressions (4), (5) and (6), N is the total number of samples of the traffic accident, and P (A ≈ B) represents the frequency of the simultaneous occurrence of the factor A and the factor B in the traffic accident data; p (A), P (B) respectively represent the frequency of the factor A and the factor B in the traffic accident data;
step 3.3, defining and initializing the minimum support degree min Sup, the minimum confidence degree min Conf and the minimum Lift degree min Lift of the association rule A → B;
3.4, mining the Support ratio Support (A → B), the Confidence coefficient (A → B) and the Lift ratio Lift (A → B) of the association rule A → B;
step 3.5, defining three constrained rules of the attribute selection method based on the association rule A → B, namely a strong association rule SAR, a classification association rule CAR and an atomic association rule AAR;
step 3.5.1, obtaining an expression of the strong association rule SAR by using the formula (7):
Support(A→B)>minSup∧Confidence(A→B)>minConf∧Lift(A→B)>minLift (7)
in the formula (7), Λ represents ^ and;
step 3.5.2, set X of discretized autovariablesMDLP={x1,x2,…,xk,xk+1,xk+2,xk+l,…,xKIn, let | xkIs the kth independent variable xkA range of values of; k is more than or equal to 1 and less than or equal to K;
step 3.5.3, defining the influencer value set FVIS as the set of all possible values of the independent variable, that isDefining a target value set TVIS as a set of possible values of all prediction variables, namely TVIS ═ Y |;
step 3.5.4, obtaining an expression of the classification association rule CAR by using the formula (8):
in the formula (8), | B | is the kind of the predictor variable;
step 3.5.3, obtaining the expression of the atomic association rule AAR by using the formula (9):
in the formula (8), | a | is the number of independent variable types;
step 3.6, enabling all association rules A → B meeting the classification association rule CAR to form CARset, and enabling association rules A → B meeting the atomic association rule AAR to form an atomic association rule set AARset;
3.7, arranging the association rules A → B in the atomic type association rule set AARset in a descending order according to the confidence;
step 3.8, sequentially judging whether the rule back-piece of the atomic association rule AAR appears in the rule front-piece of the classification association rule set CARset, if so, determining the rule back-piece of the atomic association rule set AAR as a redundant variable, and deleting all association rules A → B with the rule back-piece of the atomic association rule AAR from the classification association rule set CARset;
step 3.9, processing according to the step 3.8 until the atomic type association rule set AARset is empty;
step 3.10, mapping the residual association rules A → B in the classification association rule Set CARset to corresponding independent variables, thereby obtaining an independent variable Set containing interaction among the variables;
step 4, constructing an accident form prediction model based on a mixed Logit principle;
step 4.1, establishing a hybrid Logit model by using the formula (10):
in the formula (10), Pn(yi) The accident form of the nth traffic accident is yiThe probability of (a) of (b) being,the accident form of the nth traffic accident is yiThe vector of parameters of the time-independent variable,representing independent variablesVector form of the estimated parameters;probability density function representing random parameter beta, beta andrespectively representing the vector form of the probability density function mean value and the variance parameter;
step 4.2, bringing each influence factor in the Set of independent variables including the interaction among the variables into the mixed Logit model, and estimating the parameters of the mixed Logit model by adopting a maximum likelihood estimation method;
step 4.3, according to the parameter estimation value Par of the hybrid Logit model obtained in the step 4.2, under the set confidence level, the parameters of the hybrid Logit model are screened by adopting a stepwise regression method, and the parameter estimation value Par of the hybrid Logit model is obtained by screening
Step 5, predicting the traffic accident form probability based on the constructed mixed Logit model;
step 5.1, obtaining independent variable information influencing traffic accident forms in real time;
step 5.2, inputting the independent variable information obtained in the step 5.1 into a formula (11), and calculating to obtain the accident form y under the condition of corresponding independent variable informationiUtility function of
In the formula (11), the reaction mixture is,representing hybrid Logit model parameter estimatesThe median accident pattern is yiA parameter vector of time;
step 5.3, obtaining the accident form y under the condition of the independent variable information influencing the traffic accident form in real time by using the formula (12)iIs predicted with probability of
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110312213.8A CN113077625B (en) | 2021-03-24 | 2021-03-24 | Road traffic accident form prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110312213.8A CN113077625B (en) | 2021-03-24 | 2021-03-24 | Road traffic accident form prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113077625A true CN113077625A (en) | 2021-07-06 |
CN113077625B CN113077625B (en) | 2022-03-15 |
Family
ID=76613618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110312213.8A Active CN113077625B (en) | 2021-03-24 | 2021-03-24 | Road traffic accident form prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113077625B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762364A (en) * | 2021-08-23 | 2021-12-07 | 东南大学 | Unbalanced traffic accident data synthesis sampling method |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09190422A (en) * | 1996-01-11 | 1997-07-22 | Toshiba Corp | Device for predicting traffic condition |
CN104821080A (en) * | 2015-03-02 | 2015-08-05 | 北京理工大学 | Intelligent vehicle traveling speed and time predication method based on macro city traffic flow |
CN105931460A (en) * | 2016-05-13 | 2016-09-07 | 东南大学 | Variable speed limit control strategy optimization method for continuous bottleneck section of expressway |
CN106709595A (en) * | 2016-11-24 | 2017-05-24 | 北京交通大学 | Accident delay time prediction method and system based on unformatted information |
CN108717786A (en) * | 2018-07-17 | 2018-10-30 | 南京航空航天大学 | A kind of traffic accident causation method for digging based on universality meta-rule |
CN109636053A (en) * | 2018-12-20 | 2019-04-16 | 黄凤南 | A kind of car accident solution optimization system |
WO2019103197A1 (en) * | 2017-11-23 | 2019-05-31 | (주)에이텍티앤 | System for predicting traffic accident on basis of artificial intelligence and method therefor |
CN110555565A (en) * | 2019-09-09 | 2019-12-10 | 南京东控智能交通研究院有限公司 | Decision tree model-based expressway exit ramp accident severity prediction method |
CN110782070A (en) * | 2019-09-25 | 2020-02-11 | 北京市交通信息中心 | Urban rail transit emergency passenger flow space-time distribution prediction method |
CN110826244A (en) * | 2019-11-15 | 2020-02-21 | 同济大学 | Conjugate gradient cellular automata method for simulating influence of rail transit on urban growth |
CN111768625A (en) * | 2020-07-01 | 2020-10-13 | 中国计量大学 | Traffic road event prediction method based on graph embedding |
CN112149922A (en) * | 2020-11-03 | 2020-12-29 | 南京信息职业技术学院 | Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel |
CN112224211A (en) * | 2020-10-19 | 2021-01-15 | 中交第一公路勘察设计研究院有限公司 | Driving simulation system based on multi-autonomous-body traffic flow |
-
2021
- 2021-03-24 CN CN202110312213.8A patent/CN113077625B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09190422A (en) * | 1996-01-11 | 1997-07-22 | Toshiba Corp | Device for predicting traffic condition |
CN104821080A (en) * | 2015-03-02 | 2015-08-05 | 北京理工大学 | Intelligent vehicle traveling speed and time predication method based on macro city traffic flow |
CN105931460A (en) * | 2016-05-13 | 2016-09-07 | 东南大学 | Variable speed limit control strategy optimization method for continuous bottleneck section of expressway |
CN106709595A (en) * | 2016-11-24 | 2017-05-24 | 北京交通大学 | Accident delay time prediction method and system based on unformatted information |
WO2019103197A1 (en) * | 2017-11-23 | 2019-05-31 | (주)에이텍티앤 | System for predicting traffic accident on basis of artificial intelligence and method therefor |
CN108717786A (en) * | 2018-07-17 | 2018-10-30 | 南京航空航天大学 | A kind of traffic accident causation method for digging based on universality meta-rule |
CN109636053A (en) * | 2018-12-20 | 2019-04-16 | 黄凤南 | A kind of car accident solution optimization system |
CN110555565A (en) * | 2019-09-09 | 2019-12-10 | 南京东控智能交通研究院有限公司 | Decision tree model-based expressway exit ramp accident severity prediction method |
CN110782070A (en) * | 2019-09-25 | 2020-02-11 | 北京市交通信息中心 | Urban rail transit emergency passenger flow space-time distribution prediction method |
CN110826244A (en) * | 2019-11-15 | 2020-02-21 | 同济大学 | Conjugate gradient cellular automata method for simulating influence of rail transit on urban growth |
CN111768625A (en) * | 2020-07-01 | 2020-10-13 | 中国计量大学 | Traffic road event prediction method based on graph embedding |
CN112224211A (en) * | 2020-10-19 | 2021-01-15 | 中交第一公路勘察设计研究院有限公司 | Driving simulation system based on multi-autonomous-body traffic flow |
CN112149922A (en) * | 2020-11-03 | 2020-12-29 | 南京信息职业技术学院 | Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel |
Non-Patent Citations (3)
Title |
---|
XIANGHAI MENG 等: "Research on Accident Prediction Models for Freeways in Mountainous and Rolling Areas", 《2015 SEVENTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION》 * |
刘文玲 等: "基于关联规则的公交事故受伤情况预测研究", 《控制工程》 * |
王磊等: "高速公路交通事故影响因素分析及伤害估计", 《中国安全科学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762364A (en) * | 2021-08-23 | 2021-12-07 | 东南大学 | Unbalanced traffic accident data synthesis sampling method |
CN113762364B (en) * | 2021-08-23 | 2022-11-04 | 东南大学 | Unbalanced traffic accident data synthesis sampling method |
Also Published As
Publication number | Publication date |
---|---|
CN113077625B (en) | 2022-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458244B (en) | Traffic accident severity prediction method applied to regional road network | |
CN107958269A (en) | A kind of driving risk factor Forecasting Methodology based on hidden Markov model | |
CN103971523A (en) | Mountainous road traffic safety dynamic early-warning system | |
CN113837446B (en) | Airport land side area traffic situation prediction method based on multi-source heterogeneous data | |
CN114463972B (en) | Road section interval traffic analysis prediction method based on ETC portal communication data | |
CN106251642A (en) | A kind of public transport road based on real-time bus gps data chain speed calculation method | |
CN113077625B (en) | Road traffic accident form prediction method | |
CN110288826B (en) | Traffic control subregion clustering division method based on multi-source data fusion and MILP | |
CN112509328B (en) | Method for analyzing conflict behavior of intersection right-turning motor vehicle and electric bicycle | |
Wang et al. | Energy consumption characteristics based driving conditions construction and prediction for hybrid electric buses energy management | |
Shang et al. | Analyzing the effects of road type and rainy weather on fuel consumption and emissions: A mesoscopic model based on big traffic data | |
CN111907523A (en) | Vehicle following optimization control method based on fuzzy reasoning | |
CN110097757B (en) | Intersection group critical path identification method based on depth-first search | |
CN117746626A (en) | Intelligent traffic management method and system based on traffic flow | |
CN112651666A (en) | Driver risk assessment method based on driving mode transfer characteristics | |
CN116824868A (en) | Method, device, equipment and medium for identifying illegal parking points and predicting congestion of vehicles | |
Jain et al. | Enhance traffic flow prediction with real-time vehicle data integration | |
CN116453352A (en) | Freight car traffic flow prediction method based on K clustering algorithm and neural network | |
CN115774942A (en) | Driving style identification model modeling and statistical method based on Internet of vehicles real vehicle data and SVM | |
CN115587536A (en) | Traffic accident severity prediction method, equipment and storage medium | |
CN113313941B (en) | Vehicle track prediction method based on memory network and encoder-decoder model | |
CN112036709A (en) | Random forest based rainfall weather expressway secondary accident cause analysis method | |
CN113945958A (en) | Taxi GPS data-based method for identifying vehicle in passenger searching state in road section | |
CN110827446A (en) | Method for predicting running state of electric automobile | |
CN111275241A (en) | Bus passenger getting-off station inference method based on machine learning decision tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |