CN117238126A - Traffic accident risk assessment method under continuous flow road scene - Google Patents

Traffic accident risk assessment method under continuous flow road scene Download PDF

Info

Publication number
CN117238126A
CN117238126A CN202311103644.9A CN202311103644A CN117238126A CN 117238126 A CN117238126 A CN 117238126A CN 202311103644 A CN202311103644 A CN 202311103644A CN 117238126 A CN117238126 A CN 117238126A
Authority
CN
China
Prior art keywords
data
accident
risk
traffic
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311103644.9A
Other languages
Chinese (zh)
Inventor
陆建
马潇驰
车忠兴
叶凡
夏萧菡
霍宗鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202311103644.9A priority Critical patent/CN117238126A/en
Publication of CN117238126A publication Critical patent/CN117238126A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a traffic accident risk assessment method under a continuous flow road scene, which comprises the following steps: constructing a case control data set, dividing the case control data set into historical sample data and test data, performing self-organizing map clustering analysis by using the historical sample data, and constructing a risk scene recognition model; inputting a data set to be tested, judging a risk scene to which the data to be tested belongs, taking historical sample data of the risk scene as a training set to perform model training of a base learner, and outputting a prediction result of the data to be tested; drawing an ROC curve for the prediction result of the test data, evaluating the goodness by adopting an AUC index, and selecting an optimal risk threshold according to the Jordan index; and acquiring traffic flow parameters on the road to be evaluated in real time, determining a risk scene, and calculating the risk level on the road to be evaluated in real time. The invention accurately and real-time early-warns the road traffic accident risk with lower cost, and improves the running safety and reliability of the traffic system.

Description

Traffic accident risk assessment method under continuous flow road scene
Technical Field
The invention belongs to the technical field of intelligent traffic, and particularly relates to a traffic accident risk assessment method in a continuous flow road scene.
Background
The continuous traffic flow field includes expressway, city expressway and other high grade roads, and features that the road is limited in and out, and the road is special for motor vehicle, full-grade crossing and full-closed. Such roads can have adverse effects on personal and property and transportation efficiency if accidents occur. In order to reduce traffic accidents in continuous traffic flow scenes and improve traffic operation safety, research and development staff aim at constructing a movable model at lower cost, so that short-time monitoring and early warning are implemented on large-scale high-grade road risks, and the method is beneficial to road managers to timely issue management and control measures and is also beneficial to travelers to plan routes in advance. There are a great deal of researches on accident cause analysis and safety guarantee measures at home and abroad, and researches by a plurality of students prove that the accident occurrence has close relation with the road traffic flow characteristics, and parameters such as traffic volume, average speed, environmental information and the like on the road traffic flow state before the accident occurrence can represent certain high risk characteristics, so that the accident risk can be estimated in advance by utilizing the road traffic flow information. However, the current situation of frequent traffic accidents is not improved. Therefore, the center of gravity of traffic safety control is advanced from post accident analysis to pre risk assessment, and the real-time accident risk assessment system can prompt drivers and supervisors to take necessary measures to avoid risks in time, so that the traffic safety control system has great practical significance for reducing traffic accidents and guaranteeing personal and property safety.
At present, a learner establishes an accident risk assessment model based on traffic flow data by using a metering economy model, the model is simple and convenient to operate and easy to understand, but the problem of low assessment precision is commonly existed, and the precision of risk assessment is further improved by using a complex machine learning model represented by deep learning. However, the deep learning framework requires complex model training and parameter debugging knowledge, and the obtained model is difficult to have mobility, and often requires an expert to re-perform parameter debugging and training to be put into use. Therefore, a new risk assessment model needs to be built from the aspects of precision and mobility, and an ideal risk assessment model should have the characteristics of high precision and mobility at the same time, and can be built and put into use without complex expert knowledge.
The invention with the patent publication number of CN106991510A provides a method for predicting urban traffic accidents based on space-time distribution characteristics, and the invention with the patent publication number of CN110532298B provides a multi-attribute railway accident cause weight analysis method. However, the former focuses on the space-time distribution characteristics, and influencing factors relate to personal information of traffic accident personnel, and the data acquisition is difficult and the mobility is insufficient; the latter is proposed for railway accidents and cannot be applied to continuous flow road scenes.
Disclosure of Invention
The technical problems to be solved are as follows: aiming at the problems that the existing real-time risk assessment model is low in precision and needs a large amount of expert knowledge to carry out model debugging generally, the invention aims to design a traffic accident risk assessment method under a continuous flow road scene, an accident risk assessment module which can be put into use without complex parameter debugging and model training can be quickly constructed, and the assessment precision reaches the level of the existing mainstream complex machine learning model, and the road risk can be predicted in real time.
The technical scheme is as follows:
a traffic accident risk assessment method in a continuous flow road scene, the traffic accident risk assessment method comprising the steps of:
s1: acquiring historical road accident data and historical traffic flow information of a road to be evaluated, acquiring accident data occurring in a certain time range and traffic flow parameters in the time range, taking whether the accident occurs or not as a dependent variable, taking the traffic flow information as the independent variable, and constructing a case comparison data set; the accident data comprise the accident occurrence event, the accident position and the accident uplink and downlink directions; the traffic flow parameters comprise traffic volume of section of road section, average speed and lane occupancy;
s2: dividing data in the case control data set into two types of historical sample data and test data, performing self-organizing mapping clustering analysis by using the historical sample data, and constructing a risk scene recognition model;
s3: inputting a data set to be tested, judging a risk scene to which the data to be tested belongs, taking historical sample data of the risk scene as a training set to perform model training of a base learner, and outputting a prediction result of the data to be tested;
s4: drawing an ROC curve for the prediction result of the test data, evaluating the goodness by adopting an AUC index, and selecting an optimal risk threshold according to the Jordan index;
s5: and (3) acquiring traffic flow parameters on the road to be evaluated in real time, determining a risk scene by using the risk scene identification model determined in the step (S2), and calculating the risk level on the road to be evaluated in real time by using the risk evaluation model determined in the step (S3) and the step (S4).
Further, in step S1, the process of collecting accident data occurring within a certain time range and traffic flow parameters within the time range includes the following sub-steps:
s101: processing the acquired historical road accident data, and distinguishing the upstream and downstream of the position according to the position of the traffic accident;
s102: the method comprises the steps of acquiring historical traffic flow information by sensors arranged on a lane, marking a traffic flow information sensor in the downstream direction of a traffic accident place as K, marking an upstream direction sensor as K-1, acquiring traffic flow information of the sensors K and K-1 within 5-10 minutes and 10-15 minutes before the accident according to the time of the traffic accident, and marking traffic flow parameter variables flowdata as follows according to the section position and time period:
flowdata=[f 1,up ,v 1,up ,o 1,up ,f 2,up ,v 2,up ,o 2,up ,f 1,down ,v 1,down ,o 1,down ,f 2,down ,v 2,down ,o 2,down ]
wherein, the variable names f, v and o respectively represent traffic volume, average vehicle speed and lane occupation rate, the subscript 1 represents 5-10 minutes before the accident, the subscript 2 represents 10-15 minutes before the accident, the subscript up represents the upstream of the accident point, and the subscript down represents the downstream of the accident point;
s103: marking the data label crash of the accident in the step S101 as 1, selecting normal running data crash of the same place, same time and different date of the accident data as 0, and constructing a case contrast data set, wherein the basic structure of data in the case contrast data set is as follows:
data=[crash,flowdata]。
further, in step S1, the historical road accident data includes time and location of the traffic accident, the time is accurate to a minute level, and the location is accurate to a hundred meters level.
Further, in step S1, the ratio of the number of normal driving data to the number of accident data in the case control data set is 3:1.
Further, in step S2, the process of constructing the risk scene recognition model by performing self-organizing map clustering analysis using the historical sample data includes the following sub-steps:
s201: extracting historical sample data and test data from the case control data set according to the prediction dividing proportion, and keeping the proportion of accident data and normal running data in the historical sample data set and the test data set consistent with the proportion of accident data and normal running data in the case control data set;
s202: training a self-organizing map network model by using a historical sample data set, and selecting the SOM network side length M according to the number N of the historical sample data sets by the following formula:
wherein each side of the SOM network has MNeurons, M columns and M in total 2 Each neuron is connected with other neurons according to a hexagon to form a honeycomb network structure;
s203: initializing SOM network, and randomly assigning K-dimensional weight w to each neuron j =[a k ]The value of k is consistent with the number of flowdata argument elements in the historical sample data; j is the number of the neuron and, j=1, 2, M 2
S204: for historical sample dataset x= [ X 1 ,x 2 ,...x i ...,x N ]Sequentially inputting samples for training, and comparing the ith historical sample x i And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the history sample, and determining the history sample data x according to the following formula i Is defined in the following classes:
where i=1, 2, N;
s205: updating SOM network weights:
w←w+ηh(d)(x i -w)
wherein w is the weight to be updated, η is the learning rate, h (·) is the decay function, d is the distance between the active neuron and the other neurons;
s206: dividing risk scenes according to the number of samples in each SOM neuron and the number of accident samples, and calculating the accident rate r of each neuron c
Wherein N is sample Is all samples belonging to neuron c, N crash Is the accident sample size in neuron c;
s207, comparing the accident proportion in the clustered neurons with the occupation proportion of the accident data in the historical sample data set, and if the accident proportion in the clustered neurons exceeds the occupation proportion of the accident data in the historical sample data set, determining a high-risk scene, otherwise, determining a low-risk scene.
Further, in step S3, the base learner model adopts any one of a support vector machine model, a decision tree model, and an artificial neural network model.
Further, in step S3, the process of performing the model training of the base learner by using the historical sample data of the risk scene as the training set and outputting the prediction result of the data to be tested includes the following steps:
s301: the method comprises the steps of selecting an artificial neural network as a base learner, determining the number of neurons of each layer of the artificial neural network, keeping the element number consistent with that of a traffic flow parameter variable flowdata by an input layer, selecting 1 layer for a hidden layer, configuring 8 hidden elements, and comparing the output layer with an actual measurement value after each learning output prediction result, wherein the difference value is used as error back propagation learning to update each synaptic weight;
s302: inputting the data to be tested in the test set to make the data to be tested complete classification in the SOM network, and comparing the data x to be tested p And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the sample, and determining the data x to be detected p Category t of (2);
s303: selecting all data x belonging to the data to be tested in the historical sample data set p Constructing a training set by the data of the category t, applying the training set to an artificial neural network for training, completing model output after meeting the preset precision requirement, predicting the data to be detected, and outputting a predicted value; after the prediction is completed, the trained artificial neural network is abandoned, and the instant learning process is completed.
Further, in step S4, the process of drawing an ROC curve for the prediction result of the test data, evaluating the goodness by using the AUC index, and selecting the optimal risk threshold according to the about log index includes the following steps:
s401: calculating ROC curve of test set data, calculating accident risk r of case control data set, and determining minimum value r of accident risk predictive value min And maximum value r max By r min At a minimum value r max Constructing a classification threshold sequence c= { C for maximum value and 0.001 for step size j Sequentially calculating corresponding confusion matrixes according to elements in the classification threshold sequence C, wherein the specific steps are as follows:
converting the risk index into a prediction of whether an accident occurred according to the following formula:
wherein y is the predicted occurrence of an accident, and when y=y=1, it is noted as true positive TP; when y=1, y=0, it is noted as a false positive FP; when y=0, y=1, it is noted as a false negative FN; when y=y=0, it is noted as true negative TN, and the threshold value c is classified for each j Calculating the recall TPR below the threshold j False positive rate FPR j
Taking FPR as an abscissa and TPR as an ordinate, sequentially tracing points in a two-dimensional coordinate system, and calculating a curve area AUC of a working state of the subject, which is enclosed by the coordinates and an origin, (1, 1) and (1, 0);
s402: selecting TPR j -FPR j Maximum corresponding classification threshold c j As final classification threshold c y
Further, in step S5, for the road to be predicted, corresponding traffic flow information data is collected, where the collected data includes traffic volume, average vehicle speed and lane occupancy of each section;
importing the collected traffic flow information data into a risk scene identification model to determine the category to which the data belongs, and identifying to obtain a risk scene; and then, the base learner is utilized to perform instant learning, the current road risk index is evaluated in real time, and early warning is issued for the scenes exceeding the final classification threshold.
The beneficial effects are that:
the traffic accident risk assessment method in the continuous flow road scene is beneficial to accurately and real-time early warning the road traffic accident risk with lower cost, provides a theoretical basis for formulating active risk prevention and control measures, and improves the running safety and reliability of a traffic system.
Drawings
FIG. 1 is a flow chart of a traffic accident risk assessment method in a continuous flow road scenario of the present invention;
FIG. 2 is a schematic diagram of data acquisition;
FIG. 3 is a schematic diagram of SOM topology;
FIG. 4 is a flow chart comparing an instant learning strategy with a conventional predictive model;
FIG. 5 is a risk assessment module based on self-organizing map and instant learning strategy.
Detailed Description
The following examples will provide those skilled in the art with a more complete understanding of the invention, but are not intended to limit the invention in any way.
Referring to fig. 1, the invention discloses a traffic accident risk assessment method in a continuous flow road scene, which provides basis for predicting road risk, issuing management and control measures and planning travel routes in real time. The traffic accident risk assessment method comprises the following steps:
s1: acquiring historical road accident data and historical traffic flow information of a road to be evaluated, acquiring accident data occurring in a certain time range and traffic flow parameters in the time range, taking whether the accident occurs or not as a dependent variable, taking the traffic flow information as the independent variable, and constructing a case comparison data set; the accident data comprise the accident occurrence event, the accident position and the accident uplink and downlink directions; the traffic flow parameters comprise traffic volume of section of road section, average speed and lane occupancy;
s2: dividing data in the case control data set into two types of historical sample data and test data, performing self-organizing mapping clustering analysis by using the historical sample data, and constructing a risk scene recognition model;
s3: inputting a data set to be tested, judging a risk scene to which the data to be tested belongs, taking historical sample data of the risk scene as a training set to perform model training of a base learner, and outputting a prediction result of the data to be tested;
s4: drawing an ROC curve for the prediction result of the test data, evaluating the goodness by adopting an AUC index, and selecting an optimal risk threshold according to the Jordan index;
s5: and (3) acquiring traffic flow parameters on the road to be evaluated in real time, determining a risk scene by using the risk scene identification model determined in the step (S2), and calculating the risk level on the road to be evaluated in real time by using the risk evaluation model determined in the step (S3) and the step (S4).
Specifically, the step S1 includes the following substeps:
s1.1: the historical road accident data comprise time and position of traffic accidents, the time is accurate to be minute, the position is accurate to hundred meters, and the upstream and downstream of the position are distinguished according to the position of the traffic accidents;
s1.2: as shown in fig. 2, the historical traffic flow information can be obtained by sensors arranged on the lanes, the traffic flow information sensor in the downstream direction of the traffic accident place is denoted as K, the upstream direction sensor is denoted as K-1, the traffic flow information of the sensors K and K-1 within 5 to 10 minutes and 10 to 15 minutes before the accident occurs according to the time of the traffic accident, including the traffic volume of the section of the road section, the average vehicle speed and the lane occupation rate, and the variables thereof are respectively denoted as:
flowdata=[f 1,up ,v 1,uup ,o 1,up ,f 2,up ,v 2,up ,o 2,up ,f 1,down ,v 1,down ,o 1,down ,f 2,down ,v 2,down ,o 2,down ];
wherein, the variable names f, v and o respectively represent traffic volume, average vehicle speed and lane occupation rate, the subscript 1 represents 5-10 minutes before the accident, the subscript 2 represents 10-15 minutes before the accident, the subscript up represents the upstream of the accident point, and the subscript down represents the downstream of the accident point;
s1.3: the case contrast data set is a data set constructed by marking the data tag crash of the accident in S1.1 as 1, selecting normal running data crash of the accident data at the same place, at the same time and on different dates as 0, wherein the ratio of the number of the selected normal running data to the number of the accident data is 3:1, and the accident data and the non-accident data are added with the historical traffic flow information variable according to S1.2, and the basic structure of the obtained case contrast data set is as follows:
data=[crash,flowdata]。
specifically, the step S2 includes the following substeps:
s2.1: the historical sample data and the test data are extracted from the case control data set in the step S1 according to the proportion of 7:3, and the proportion of accident data and normal driving data in the historical sample data set and the test data set is kept to be approximately 1:3;
s2.2: training an SOM network model by using a historical sample data set, selecting the side length M of the SOM network according to the number N of the historical sample data set, and selecting according to the following formula:
that is, each side of the SOM network has M neurons, M columns and M total 2 Each neuron is connected with other neurons according to a hexagon to form a honeycomb network structure, as shown in fig. 3;
s2.3: training SOM clustering network, firstly initializing SOM network, randomly assigning K-dimension weight w to each neuron j =[a k ]The value of k is consistent with the number of flowdata argument elements in the historical sample data, and can be taken as 12;
s2.4: for historical sample dataset x= [ X 1 ,x 2 ,...,x N ]Sequentially inputting samples for training and comparing historical samples x i And each neuron weightThe Euclidean distance between the two, taking the neuron with the shortest Euclidean distance as the clustering label of the sample, namely historical sample data x i Is determined by the following formula:
s2.5: after each sample data is classified, the SOM network weights will be updated with the following formula:
w←w+ηh(d)(x i -w)
wherein w is the weight to be updated, eta is the learning rate, and can be set to 0.5, h (·) is the decay function, and a standard Gaussian function is often used in practical applicationsd is the distance between the activated neuron and other neurons, and the Euclidean distance is calculated according to the coordinates of the neurons in the network;
s2.6: after all samples are classified, the risk scenario is divided according to the number of samples and the number of accident samples in each SOM neuron, and the accident rate of each neuron is calculated:
wherein N is sample Is all samples belonging to neuron c, N crash The number of incidents in neuron c is theoretically kept at a ratio of about 1:3 in the historical sample data set, and therefore, if the proportion of incidents in clustered neurons exceeds 25%, a high risk scenario is considered, and the other is a low risk scenario.
Specifically, the step S3 includes the following substeps:
s3.1: as shown in fig. 4, a basic flow of instant learning is shown, a basic learner serving as a classifier is first determined, multiple machine learning methods such as a support vector machine, a CART decision tree, an Artificial Neural Network (ANN) with fewer super parameters are selected as the basic learner, the number of neurons of each layer of the ANN is determined, the number of elements of an input layer and flowdata is kept to be consistent to 12, a hidden layer is 1 layer, 8 hidden elements are configured, an output layer is a unit, after each learning output prediction result, the comparison is performed with an actual measurement value, and the difference value is used as error back propagation learning to update each synaptic weight;
s3.2: inputting the data to be tested in the test set, firstly classifying the data to be tested in the SOM network, and comparing the data x to be tested p And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the sample, namely the data x to be tested p Is determined by the following formula:
s3.3: in this embodiment, the training set is all the data x belonging to the data x to be tested in the history sample data set p The data of the category t is trained by applying the training set to ANN, model output is completed after the preset precision requirement is met, and the data x to be tested is obtained p And (5) predicting, outputting a predicted value, and discarding the trained artificial neural network after the prediction is completed.
Specifically, the step S4 includes the following substeps:
s4.1 calculating an ROC curve of the test set data obtained in the step S3, calculating the accident risk r of the case control data set according to the SOM-JITL-ANN model, and determining the minimum value and the maximum value r of the accident risk predictive value min 、r max By r min At a minimum value r max Constructing a classification threshold sequence c= { C for maximum value and 0.001 for step size j Sequentially calculating corresponding confusion matrixes according to elements in the classification threshold sequence C, wherein the specific steps are as follows:
converting the risk index into a prediction of whether an accident occurred according to the following formula:
wherein y is the predicted occurrence of an accident, and when y=y=1, true Positive (TP) is noted; when y=1, y=0, it is noted as False Positive (FP); when y=0, y=1, it is noted as False Negative (FN); when y=y=0, it is denoted as True Negative (TN), and the threshold value c is classified for each j Calculating the recall rate (TPR) below the threshold j ) And False Positive Rate (FPR) j ):
Taking FPR as an abscissa and TPR as an ordinate, sequentially tracing points in a two-dimensional coordinate system, and calculating the area AUC of a subject working state curve (ROC) enclosed by the coordinates and an origin, (1, 1) and (1, 0);
s4.2: selecting TPR j -FPR j Maximum corresponding classification threshold c j As final classification threshold c y
Specifically, the step S5 includes the following substeps:
s5.1: as shown in fig. 5, for the road to be predicted, traffic flow information data in step S1 including the traffic volume, average vehicle speed, and lane occupancy of each section is collected;
s5.2: as shown in fig. 5, after inputting the data to be tested, firstly determining the category to which the data belongs by using the SOM network obtained in step S2, performing risk scene recognition, then performing instant learning by using the base learner in step S3, performing real-time evaluation on the risk index of the current road, and exceeding the final classification threshold c obtained in step S4 y Is used for issuing early warning.
Examples of the invention
In order to show the practicability of the continuous flow road scene accident risk assessment method based on self-organizing mapping and instant learning, the following specific embodiments are utilized for further explanation.
Taking a certain expressway as an example, collecting traffic flow and accident data between 3 months according to the step S1, wherein the whole length of the expressway is 13 km, 7 microwave sensors are distributed along the expressway to collect the traffic flow data, the distance between each sensor is about 1.6 km, and the sensors record the section traffic volume, the average vehicle speed and the lane occupancy within the period once every 5 min. Accident data is collected and the time and position of the accident are confirmed, and the accident is collected 123. For each accident, according to the specific implementation step S1.2, traffic volume, occupancy and vehicle speed data in the first 5-10 minutes and 10-15 minutes collected by the upstream and downstream sensors of the road section where the accident point is located are taken as characteristic variables, and total 12 independent variables are taken. According to the specific embodiment, as shown in step S1.3, a case control data set is constructed, 3 pieces of non-accident data of the same observation section, the same time and different dates are randomly selected for each piece of accident data, and corresponding characteristic variable calculation is performed. Finally, the case control dataset totaled 492 pieces of data.
In performance testing using this dataset, all methods applying the JITL strategy use 70% of the data as the historical sample database and 30% of the data as the test data, as described in step S2.1 of the detailed description. For the traditional modeling method, the training set and the testing set are still divided, the data proportion of the two sets is 70% and 30%, and the proportion of accident data and non-accident data in the two sets is still controlled to be about 1:3.
According to the embodiment step S2.2, a SOM network is constructed, calculated M e (2.65,4.04), where m=3 is selected to build a SOM network with 9 neurons, and training is performed as described in the embodiment steps S2.3 to S2.5. According to the specific embodiment, the risk scenes are identified in step S2.6, wherein the accident sample rate of 2 neurons in 9 neurons exceeds 25%, and the accident sample rate reaches 29.2% and 37.1% respectively, namely the risk scenes are identified as high risk scenes, and the risk scenes are identified as low risk scenes.
According to the specific implementation mode, as described in step S3.1, an ANN is used as a base learner, the number of neurons in each layer of the ANN is determined, the number of elements of the input layer kept consistent with the number of elements of flowdata is 12, 1 layer is selected as a hidden layer, 8 hidden elements are configured, an output layer is a unit, after each learning output prediction result, the comparison is performed with an actual measurement value, and the difference value is used as error back propagation learning to update each synaptic weight. According to the embodiment, as described in step S3.2 and step S3.3, a similar sample set based on the SOM clustering result is constructed for the data to be tested, and a prediction result of the data to be tested is output, and according to the embodiment, as described in step S4, an AUC index is calculated. And comparing the SOM-JITL-ANN result with ANN and XGBoost, directly using an ANN model to perform risk assessment, wherein the effect is poor, the AUC is only 0.665, the performance is greatly improved after the SOM-JITL strategy is applied, the AUC index reaches 0.830, and the AUC index is improved by 24.8 percent compared with that of the traditional ANN and exceeds the AUC performance of 0.759 of XGBoost of complex machine learning.
In the modeling process, the SOM-JITL-ANN uses default parameters of the base learner to make parameter adjustments only once on the network structure configuration of the SOM. When the XGBoost algorithm is adopted to test the data set, the AUC index on the training set is finally obtained by using the default parameters and is close to 1, the AUC of the test set is only 0.6, a huge training test index difference is formed, and obviously, the XGBoost is fitted on the training set, and the super parameters are required to be adjusted. The parameter adjustment adopts a greedy algorithm, a plurality of recommended values of 7 parameters, namely the number of decision trees, the learning rate, the maximum depth, the column sampling proportion, the L1 regularization weight, the L2 regularization weight and the minimum leaf node branch loss are listed, the parameters are tried one by one, when one parameter is adjusted to be optimal, the next parameter is optimized, an optimized model is finally obtained, the AUC of a test set is 0.759, and if a data set is replaced, the process is needed to be carried out again. The SOM-JITL module greatly simplifies the parameter debugging process of a complex machine learning model by combining model precision comparison results, enables the risk assessment modeling to be completed by using less parameter tuning knowledge, and is a practical and effective method when the model portability is considered and high precision requirements exist.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims (9)

1. The traffic accident risk assessment method under the continuous flow road scene is characterized by comprising the following steps of:
s1: acquiring historical road accident data and historical traffic flow information of a road to be evaluated, acquiring accident data occurring in a certain time range and traffic flow parameters in the time range, taking whether the accident occurs or not as a dependent variable, taking the traffic flow information as the independent variable, and constructing a case comparison data set; the accident data comprise the accident occurrence event, the accident position and the accident uplink and downlink directions; the traffic flow parameters comprise traffic volume of section of road section, average speed and lane occupancy;
s2: dividing data in the case control data set into two types of historical sample data and test data, performing self-organizing mapping clustering analysis by using the historical sample data, and constructing a risk scene recognition model;
s3: inputting a data set to be tested, judging a risk scene to which the data to be tested belongs, taking historical sample data of the risk scene as a training set to perform model training of a base learner, and outputting a prediction result of the data to be tested;
s4: drawing an ROC curve for the prediction result of the test data, evaluating the goodness by adopting an AUC index, and selecting an optimal risk threshold according to the Jordan index;
s5: and (3) acquiring traffic flow parameters on the road to be evaluated in real time, determining a risk scene by using the risk scene identification model determined in the step (S2), and calculating the risk level on the road to be evaluated in real time by using the risk evaluation model determined in the step (S3) and the step (S4).
2. The traffic accident risk assessment method according to claim 1, wherein in step S1, the process of collecting accident data occurring within a certain time frame and traffic flow parameters within the time frame comprises the following sub-steps:
s101: processing the acquired historical road accident data, and distinguishing the upstream and downstream of the position according to the position of the traffic accident;
s102: the method comprises the steps of acquiring historical traffic flow information by sensors arranged on a lane, marking a traffic flow information sensor in the downstream direction of a traffic accident place as K, marking an upstream direction sensor as K-1, acquiring traffic flow information of the sensors K and K-1 within 5-10 minutes and 10-15 minutes before the accident according to the time of the traffic accident, and marking traffic flow parameter variables flowdata as follows according to the section position and time period:
flowdata=[f 1,up ,v 1,up ,o 1,up ,f 2,up ,v 2,up ,o 2,up ,f 1,down ,v 1,down ,o 1,down ,f 2,down ,v 2,down ,o 2,down ]
wherein, the variable names f, v and o respectively represent traffic volume, average vehicle speed and lane occupation rate, the subscript 1 represents 5-10 minutes before the accident, the subscript 2 represents 10-15 minutes before the accident, the subscript up represents the upstream of the accident point, and the subscript down represents the downstream of the accident point;
s103: marking the data label crash of the accident in the step S101 as 1, selecting normal running data crash of the same place, same time and different date of the accident data as 0, and constructing a case contrast data set, wherein the basic structure of data in the case contrast data set is as follows:
data=[crash,flowdata]。
3. the method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 2, wherein in step S1, the historical road accident data includes the time and the location of the occurrence of the traffic accident, the time is accurate to the order of minutes, and the location is accurate to the order of hundred meters.
4. The method according to claim 2, wherein in step S1, the ratio of the number of normal driving data to the number of accident data in the case-control data set is 3:1.
5. The method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 1, wherein in step S2, the process of constructing a risk scene recognition model by performing self-organizing map cluster analysis using historical sample data comprises the following sub-steps:
s201: extracting historical sample data and test data from the case control data set according to the prediction dividing proportion, and keeping the proportion of accident data and normal running data in the historical sample data set and the test data set consistent with the proportion of accident data and normal running data in the case control data set;
s202: training a self-organizing map network model by using a historical sample data set, and selecting the SOM network side length M according to the number N of the historical sample data sets by the following formula:
wherein, each side of the SOM network has M neurons, which are in M columns and in M total 2 Each neuron is connected with other neurons according to a hexagon to form a honeycomb network structure;
s203: initializing SOM network, and randomly assigning K-dimensional weight w to each neuron j =[a k ]The value of k is consistent with the number of flowdata argument elements in the historical sample data; j is the number of the neuron and, j=1, 2, M 2
S204: for historical sample dataset x= [ X 1 ,x 2 ,...x i ...,x N ]Sequentially inputting samples for training, and comparing the ith historical sample x i And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the history sample, and determining the history sample data x according to the following formula i Is defined in the following classes:
where i=1, 2, N;
s205: updating SOM network weights:
w←w+-ηh(d)(x i -w)
wherein w is the weight to be updated, η is the learning rate, h (·) is the decay function, d is the distance between the active neuron and the other neurons;
s206: dividing risk scenes according to the number of samples in each SOM neuron and the number of accident samples, and calculating the accident rate r of each neuron c
Wherein N is sample Is all samples belonging to neuron c, N erash Is the accident sample size in neuron c;
s207, comparing the accident proportion in the clustered neurons with the occupation proportion of the accident data in the historical sample data set, and if the accident proportion in the clustered neurons exceeds the occupation proportion of the accident data in the historical sample data set, determining a high-risk scene, otherwise, determining a low-risk scene.
6. The method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 1, wherein in step S3, the base learner model adopts any one of a support vector machine model, a decision tree model and an artificial neural network model.
7. The method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 1, wherein in step S3, the process of performing the training of the base learner model by using the historical sample data of the risk scene as the training set and outputting the prediction result of the data to be tested comprises the following steps:
s301: the method comprises the steps of selecting an artificial neural network as a base learner, determining the number of neurons of each layer of the artificial neural network, keeping the element number consistent with that of a traffic flow parameter variable flowdata by an input layer, selecting 1 layer for a hidden layer, configuring 8 hidden elements, and comparing the output layer with an actual measurement value after each learning output prediction result, wherein the difference value is used as error back propagation learning to update each synaptic weight;
s302: inputting the data to be tested in the test set to make the data to be tested complete classification in the SOM network, and comparing the data x to be tested p And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the sample, and determining the data x to be detected p Category t of (2);
s303: selecting all data x belonging to the data to be tested in the historical sample data set p Constructing a training set by the data of the category t, applying the training set to an artificial neural network for training, completing model output after meeting the preset precision requirement, predicting the data to be detected, and outputting a predicted value; after the prediction is completed, the trained artificial neural network is abandoned, and the instant learning process is completed.
8. The method according to claim 1, wherein in step S4, the process of drawing an ROC curve for the prediction result of the test data, evaluating the goodness by using an AUC index, and selecting the optimal risk threshold according to the about log index comprises the steps of:
s401: calculating ROC curve of test set data, calculating accident risk r of case control data set, and determining minimum value r of accident risk predictive value min And maximum value r max By r min At a minimum value r max Constructing a classification threshold sequence c= { C for maximum value and 0.001 for step size j Sequentially calculating corresponding confusion matrixes according to elements in the classification threshold sequence C, wherein the specific steps are as follows:
converting the risk index into a prediction of whether an accident occurred according to the following formula:
wherein y is * To predict whether an accident occurs, when y * When=y=1, it is noted as true positive TP; when y is * When=1, y=0, it is noted as false positive FP; when y is * When=0, y=1, it is noted as false negative FN; when y is * When=y=0, the value is denoted as true negative TN, and the threshold value c is classified for each j Calculating the recall TPR below the threshold j False positive rate FPR j
Taking FPR as an abscissa and TPR as an ordinate, sequentially tracing points in a two-dimensional coordinate system, and calculating a curve area AUC of a working state of the subject, which is enclosed by the coordinates and an origin, (1, 1) and (1, 0);
s402: selecting TPR j -FPR j Maximum corresponding classification threshold c j As final classification threshold c y
9. The method according to claim 1, wherein in step S5, corresponding traffic flow information data is collected for the road to be predicted, the collected data including the traffic volume, average vehicle speed and lane occupancy of each section;
importing the collected traffic flow information data into a risk scene identification model to determine the category to which the data belongs, and identifying to obtain a risk scene; and then, the base learner is utilized to perform instant learning, the current road risk index is evaluated in real time, and early warning is issued for the scenes exceeding the final classification threshold.
CN202311103644.9A 2023-08-29 2023-08-29 Traffic accident risk assessment method under continuous flow road scene Pending CN117238126A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311103644.9A CN117238126A (en) 2023-08-29 2023-08-29 Traffic accident risk assessment method under continuous flow road scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311103644.9A CN117238126A (en) 2023-08-29 2023-08-29 Traffic accident risk assessment method under continuous flow road scene

Publications (1)

Publication Number Publication Date
CN117238126A true CN117238126A (en) 2023-12-15

Family

ID=89097630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311103644.9A Pending CN117238126A (en) 2023-08-29 2023-08-29 Traffic accident risk assessment method under continuous flow road scene

Country Status (1)

Country Link
CN (1) CN117238126A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015839A (en) * 2024-04-08 2024-05-10 清华大学 Expressway road domain risk prediction method and device
CN118097968A (en) * 2024-04-22 2024-05-28 哈尔滨学院 Road traffic safety assessment method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015839A (en) * 2024-04-08 2024-05-10 清华大学 Expressway road domain risk prediction method and device
CN118097968A (en) * 2024-04-22 2024-05-28 哈尔滨学院 Road traffic safety assessment method

Similar Documents

Publication Publication Date Title
CN109871876B (en) Expressway road condition identification and prediction method based on floating car data
CN105702029B (en) A kind of Expressway Traffic trend prediction method for considering space-time relationship at times
CN114049765B (en) Urban road network traffic flow OD estimation method based on automatic vehicle number plate identification data
CN105303197B (en) A kind of vehicle follow the bus safety automation appraisal procedure based on machine learning
CN114783183B (en) Traffic situation algorithm-based monitoring method and system
Jiang et al. Analysis of motorcycle accidents using association rule mining-based framework with parameter optimization and GIS technology
CN104834977B (en) Traffic alert grade prediction technique based on learning distance metric
CN117238126A (en) Traffic accident risk assessment method under continuous flow road scene
CN113436433B (en) Efficient urban traffic outlier detection method
CN104809877A (en) Expressway site traffic state estimation method based on feature parameter weighted GEFCM algorithm
CN112613225B (en) Intersection traffic state prediction method based on neural network cell transmission model
CN110675626B (en) Traffic accident black point prediction method, device and medium based on multidimensional data
CN111292534A (en) Traffic state estimation method based on clustering and deep sequence learning
CN112884014A (en) Traffic speed short-time prediction method based on road section topological structure classification
CN116631186B (en) Expressway traffic accident risk assessment method and system based on dangerous driving event data
CN112149922A (en) Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel
CN106384507A (en) Travel time real-time estimation method based on sparse detector
CN114882069A (en) Taxi track abnormity detection method based on LSTM network and attention mechanism
CN107564276A (en) A kind of traffic incidents detection method based on traffic behavior mutation
CN116597642A (en) Traffic jam condition prediction method and system
CN117009772A (en) Road traffic safety prediction method and prediction system based on risk intelligent perception model
Yang et al. Rapid analysis and detection algorithm and prevention countermeasures of urban traffic accidents under artificial intelligence
Cheng et al. Causal Analysis of Road Safety Accidents in Britain Based on a Univariate Decision Tree Method
CN112349098A (en) Method for estimating accident severity by environmental elements in exit ramp area of expressway
CN111369062A (en) Vehicle dynamics index extraction method and accident risk value prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination