CN117238126A

CN117238126A - Traffic accident risk assessment method under continuous flow road scene

Info

Publication number: CN117238126A
Application number: CN202311103644.9A
Authority: CN
Inventors: 陆建; 马潇驰; 车忠兴; 叶凡; 夏萧菡; 霍宗鑫
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2023-12-15

Abstract

The invention discloses a traffic accident risk assessment method under a continuous flow road scene, which comprises the following steps: constructing a case control data set, dividing the case control data set into historical sample data and test data, performing self-organizing map clustering analysis by using the historical sample data, and constructing a risk scene recognition model; inputting a data set to be tested, judging a risk scene to which the data to be tested belongs, taking historical sample data of the risk scene as a training set to perform model training of a base learner, and outputting a prediction result of the data to be tested; drawing an ROC curve for the prediction result of the test data, evaluating the goodness by adopting an AUC index, and selecting an optimal risk threshold according to the Jordan index; and acquiring traffic flow parameters on the road to be evaluated in real time, determining a risk scene, and calculating the risk level on the road to be evaluated in real time. The invention accurately and real-time early-warns the road traffic accident risk with lower cost, and improves the running safety and reliability of the traffic system.

Description

Traffic accident risk assessment method under continuous flow road scene

Technical Field

The invention belongs to the technical field of intelligent traffic, and particularly relates to a traffic accident risk assessment method in a continuous flow road scene.

Background

The continuous traffic flow field includes expressway, city expressway and other high grade roads, and features that the road is limited in and out, and the road is special for motor vehicle, full-grade crossing and full-closed. Such roads can have adverse effects on personal and property and transportation efficiency if accidents occur. In order to reduce traffic accidents in continuous traffic flow scenes and improve traffic operation safety, research and development staff aim at constructing a movable model at lower cost, so that short-time monitoring and early warning are implemented on large-scale high-grade road risks, and the method is beneficial to road managers to timely issue management and control measures and is also beneficial to travelers to plan routes in advance. There are a great deal of researches on accident cause analysis and safety guarantee measures at home and abroad, and researches by a plurality of students prove that the accident occurrence has close relation with the road traffic flow characteristics, and parameters such as traffic volume, average speed, environmental information and the like on the road traffic flow state before the accident occurrence can represent certain high risk characteristics, so that the accident risk can be estimated in advance by utilizing the road traffic flow information. However, the current situation of frequent traffic accidents is not improved. Therefore, the center of gravity of traffic safety control is advanced from post accident analysis to pre risk assessment, and the real-time accident risk assessment system can prompt drivers and supervisors to take necessary measures to avoid risks in time, so that the traffic safety control system has great practical significance for reducing traffic accidents and guaranteeing personal and property safety.

At present, a learner establishes an accident risk assessment model based on traffic flow data by using a metering economy model, the model is simple and convenient to operate and easy to understand, but the problem of low assessment precision is commonly existed, and the precision of risk assessment is further improved by using a complex machine learning model represented by deep learning. However, the deep learning framework requires complex model training and parameter debugging knowledge, and the obtained model is difficult to have mobility, and often requires an expert to re-perform parameter debugging and training to be put into use. Therefore, a new risk assessment model needs to be built from the aspects of precision and mobility, and an ideal risk assessment model should have the characteristics of high precision and mobility at the same time, and can be built and put into use without complex expert knowledge.

The invention with the patent publication number of CN106991510A provides a method for predicting urban traffic accidents based on space-time distribution characteristics, and the invention with the patent publication number of CN110532298B provides a multi-attribute railway accident cause weight analysis method. However, the former focuses on the space-time distribution characteristics, and influencing factors relate to personal information of traffic accident personnel, and the data acquisition is difficult and the mobility is insufficient; the latter is proposed for railway accidents and cannot be applied to continuous flow road scenes.

Disclosure of Invention

The technical problems to be solved are as follows: aiming at the problems that the existing real-time risk assessment model is low in precision and needs a large amount of expert knowledge to carry out model debugging generally, the invention aims to design a traffic accident risk assessment method under a continuous flow road scene, an accident risk assessment module which can be put into use without complex parameter debugging and model training can be quickly constructed, and the assessment precision reaches the level of the existing mainstream complex machine learning model, and the road risk can be predicted in real time.

The technical scheme is as follows:

a traffic accident risk assessment method in a continuous flow road scene, the traffic accident risk assessment method comprising the steps of:

s1: acquiring historical road accident data and historical traffic flow information of a road to be evaluated, acquiring accident data occurring in a certain time range and traffic flow parameters in the time range, taking whether the accident occurs or not as a dependent variable, taking the traffic flow information as the independent variable, and constructing a case comparison data set; the accident data comprise the accident occurrence event, the accident position and the accident uplink and downlink directions; the traffic flow parameters comprise traffic volume of section of road section, average speed and lane occupancy;

s2: dividing data in the case control data set into two types of historical sample data and test data, performing self-organizing mapping clustering analysis by using the historical sample data, and constructing a risk scene recognition model;

s3: inputting a data set to be tested, judging a risk scene to which the data to be tested belongs, taking historical sample data of the risk scene as a training set to perform model training of a base learner, and outputting a prediction result of the data to be tested;

s4: drawing an ROC curve for the prediction result of the test data, evaluating the goodness by adopting an AUC index, and selecting an optimal risk threshold according to the Jordan index;

s5: and (3) acquiring traffic flow parameters on the road to be evaluated in real time, determining a risk scene by using the risk scene identification model determined in the step (S2), and calculating the risk level on the road to be evaluated in real time by using the risk evaluation model determined in the step (S3) and the step (S4).

Further, in step S1, the process of collecting accident data occurring within a certain time range and traffic flow parameters within the time range includes the following sub-steps:

s101: processing the acquired historical road accident data, and distinguishing the upstream and downstream of the position according to the position of the traffic accident;

s102: the method comprises the steps of acquiring historical traffic flow information by sensors arranged on a lane, marking a traffic flow information sensor in the downstream direction of a traffic accident place as K, marking an upstream direction sensor as K-1, acquiring traffic flow information of the sensors K and K-1 within 5-10 minutes and 10-15 minutes before the accident according to the time of the traffic accident, and marking traffic flow parameter variables flowdata as follows according to the section position and time period:

flowdata＝[f _1，up ，v _1，up ，o _1，up ，f _2，up ，v _2，up ，o _2，up ，f _1，down ，v _1，down ，o _1，down ，f _2，down ，v _2，down ，o _2，down ]

wherein, the variable names f, v and o respectively represent traffic volume, average vehicle speed and lane occupation rate, the subscript 1 represents 5-10 minutes before the accident, the subscript 2 represents 10-15 minutes before the accident, the subscript up represents the upstream of the accident point, and the subscript down represents the downstream of the accident point;

s103: marking the data label crash of the accident in the step S101 as 1, selecting normal running data crash of the same place, same time and different date of the accident data as 0, and constructing a case contrast data set, wherein the basic structure of data in the case contrast data set is as follows:

data＝[crash，flowdata]。

further, in step S1, the historical road accident data includes time and location of the traffic accident, the time is accurate to a minute level, and the location is accurate to a hundred meters level.

Further, in step S1, the ratio of the number of normal driving data to the number of accident data in the case control data set is 3:1.

Further, in step S2, the process of constructing the risk scene recognition model by performing self-organizing map clustering analysis using the historical sample data includes the following sub-steps:

s201: extracting historical sample data and test data from the case control data set according to the prediction dividing proportion, and keeping the proportion of accident data and normal running data in the historical sample data set and the test data set consistent with the proportion of accident data and normal running data in the case control data set;

s202: training a self-organizing map network model by using a historical sample data set, and selecting the SOM network side length M according to the number N of the historical sample data sets by the following formula:

wherein each side of the SOM network has MNeurons, M columns and M in total ² Each neuron is connected with other neurons according to a hexagon to form a honeycomb network structure;

s203: initializing SOM network, and randomly assigning K-dimensional weight w to each neuron _j ＝[a _k ]The value of k is consistent with the number of flowdata argument elements in the historical sample data; j is the number of the neuron and, j=1, 2, M ² ；

S204: for historical sample dataset x= [ X ₁ ，x ₂ ，...x _i ...，x _N ]Sequentially inputting samples for training, and comparing the ith historical sample x _i And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the history sample, and determining the history sample data x according to the following formula _i Is defined in the following classes:

where i=1, 2, N;

s205: updating SOM network weights:

w←w+ηh(d)(x _i -w)

wherein w is the weight to be updated, η is the learning rate, h (·) is the decay function, d is the distance between the active neuron and the other neurons;

s206: dividing risk scenes according to the number of samples in each SOM neuron and the number of accident samples, and calculating the accident rate r of each neuron _c ：

Wherein N is _sample Is all samples belonging to neuron c, N _crash Is the accident sample size in neuron c;

s207, comparing the accident proportion in the clustered neurons with the occupation proportion of the accident data in the historical sample data set, and if the accident proportion in the clustered neurons exceeds the occupation proportion of the accident data in the historical sample data set, determining a high-risk scene, otherwise, determining a low-risk scene.

Further, in step S3, the base learner model adopts any one of a support vector machine model, a decision tree model, and an artificial neural network model.

Further, in step S3, the process of performing the model training of the base learner by using the historical sample data of the risk scene as the training set and outputting the prediction result of the data to be tested includes the following steps:

s301: the method comprises the steps of selecting an artificial neural network as a base learner, determining the number of neurons of each layer of the artificial neural network, keeping the element number consistent with that of a traffic flow parameter variable flowdata by an input layer, selecting 1 layer for a hidden layer, configuring 8 hidden elements, and comparing the output layer with an actual measurement value after each learning output prediction result, wherein the difference value is used as error back propagation learning to update each synaptic weight;

s302: inputting the data to be tested in the test set to make the data to be tested complete classification in the SOM network, and comparing the data x to be tested _p And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the sample, and determining the data x to be detected _p Category t of (2);

s303: selecting all data x belonging to the data to be tested in the historical sample data set _p Constructing a training set by the data of the category t, applying the training set to an artificial neural network for training, completing model output after meeting the preset precision requirement, predicting the data to be detected, and outputting a predicted value; after the prediction is completed, the trained artificial neural network is abandoned, and the instant learning process is completed.

Further, in step S4, the process of drawing an ROC curve for the prediction result of the test data, evaluating the goodness by using the AUC index, and selecting the optimal risk threshold according to the about log index includes the following steps:

s401: calculating ROC curve of test set data, calculating accident risk r of case control data set, and determining minimum value r of accident risk predictive value _min And maximum value r _max By r _min At a minimum value r _max Constructing a classification threshold sequence c= { C for maximum value and 0.001 for step size _j Sequentially calculating corresponding confusion matrixes according to elements in the classification threshold sequence C, wherein the specific steps are as follows:

converting the risk index into a prediction of whether an accident occurred according to the following formula:

wherein y is the predicted occurrence of an accident, and when y=y=1, it is noted as true positive TP; when y=1, y=0, it is noted as a false positive FP; when y=0, y=1, it is noted as a false negative FN; when y=y=0, it is noted as true negative TN, and the threshold value c is classified for each _j Calculating the recall TPR below the threshold _j False positive rate FPR _j ：

Taking FPR as an abscissa and TPR as an ordinate, sequentially tracing points in a two-dimensional coordinate system, and calculating a curve area AUC of a working state of the subject, which is enclosed by the coordinates and an origin, (1, 1) and (1, 0);

s402: selecting TPR _j -FPR _j Maximum corresponding classification threshold c _j As final classification threshold c _y 。

Further, in step S5, for the road to be predicted, corresponding traffic flow information data is collected, where the collected data includes traffic volume, average vehicle speed and lane occupancy of each section;

importing the collected traffic flow information data into a risk scene identification model to determine the category to which the data belongs, and identifying to obtain a risk scene; and then, the base learner is utilized to perform instant learning, the current road risk index is evaluated in real time, and early warning is issued for the scenes exceeding the final classification threshold.

The beneficial effects are that:

the traffic accident risk assessment method in the continuous flow road scene is beneficial to accurately and real-time early warning the road traffic accident risk with lower cost, provides a theoretical basis for formulating active risk prevention and control measures, and improves the running safety and reliability of a traffic system.

Drawings

FIG. 1 is a flow chart of a traffic accident risk assessment method in a continuous flow road scenario of the present invention;

FIG. 2 is a schematic diagram of data acquisition;

FIG. 3 is a schematic diagram of SOM topology;

FIG. 4 is a flow chart comparing an instant learning strategy with a conventional predictive model;

FIG. 5 is a risk assessment module based on self-organizing map and instant learning strategy.

Detailed Description

The following examples will provide those skilled in the art with a more complete understanding of the invention, but are not intended to limit the invention in any way.

Referring to fig. 1, the invention discloses a traffic accident risk assessment method in a continuous flow road scene, which provides basis for predicting road risk, issuing management and control measures and planning travel routes in real time. The traffic accident risk assessment method comprises the following steps:

Specifically, the step S1 includes the following substeps:

s1.1: the historical road accident data comprise time and position of traffic accidents, the time is accurate to be minute, the position is accurate to hundred meters, and the upstream and downstream of the position are distinguished according to the position of the traffic accidents;

s1.2: as shown in fig. 2, the historical traffic flow information can be obtained by sensors arranged on the lanes, the traffic flow information sensor in the downstream direction of the traffic accident place is denoted as K, the upstream direction sensor is denoted as K-1, the traffic flow information of the sensors K and K-1 within 5 to 10 minutes and 10 to 15 minutes before the accident occurs according to the time of the traffic accident, including the traffic volume of the section of the road section, the average vehicle speed and the lane occupation rate, and the variables thereof are respectively denoted as:

flowdata＝[f _1，up ，v _1，uup ，o _1，up ，f _2，up ，v _2，up ，o _2，up ，f _1，down ，v _1，down ，o _1，down ，f _2，down ，v _2，down ，o _2，down ]；

s1.3: the case contrast data set is a data set constructed by marking the data tag crash of the accident in S1.1 as 1, selecting normal running data crash of the accident data at the same place, at the same time and on different dates as 0, wherein the ratio of the number of the selected normal running data to the number of the accident data is 3:1, and the accident data and the non-accident data are added with the historical traffic flow information variable according to S1.2, and the basic structure of the obtained case contrast data set is as follows:

data＝[crash，flowdata]。

specifically, the step S2 includes the following substeps:

s2.1: the historical sample data and the test data are extracted from the case control data set in the step S1 according to the proportion of 7:3, and the proportion of accident data and normal driving data in the historical sample data set and the test data set is kept to be approximately 1:3;

s2.2: training an SOM network model by using a historical sample data set, selecting the side length M of the SOM network according to the number N of the historical sample data set, and selecting according to the following formula:

that is, each side of the SOM network has M neurons, M columns and M total ² Each neuron is connected with other neurons according to a hexagon to form a honeycomb network structure, as shown in fig. 3;

s2.3: training SOM clustering network, firstly initializing SOM network, randomly assigning K-dimension weight w to each neuron _j ＝[a _k ]The value of k is consistent with the number of flowdata argument elements in the historical sample data, and can be taken as 12;

s2.4: for historical sample dataset x= [ X ₁ ，x ₂ ，...，x _N ]Sequentially inputting samples for training and comparing historical samples x _i And each neuron weightThe Euclidean distance between the two, taking the neuron with the shortest Euclidean distance as the clustering label of the sample, namely historical sample data x _i Is determined by the following formula:

s2.5: after each sample data is classified, the SOM network weights will be updated with the following formula:

w←w+ηh(d)(x _i -w)

wherein w is the weight to be updated, eta is the learning rate, and can be set to 0.5, h (·) is the decay function, and a standard Gaussian function is often used in practical applicationsd is the distance between the activated neuron and other neurons, and the Euclidean distance is calculated according to the coordinates of the neurons in the network;

s2.6: after all samples are classified, the risk scenario is divided according to the number of samples and the number of accident samples in each SOM neuron, and the accident rate of each neuron is calculated:

wherein N is _sample Is all samples belonging to neuron c, N _crash The number of incidents in neuron c is theoretically kept at a ratio of about 1:3 in the historical sample data set, and therefore, if the proportion of incidents in clustered neurons exceeds 25%, a high risk scenario is considered, and the other is a low risk scenario.

Specifically, the step S3 includes the following substeps:

s3.1: as shown in fig. 4, a basic flow of instant learning is shown, a basic learner serving as a classifier is first determined, multiple machine learning methods such as a support vector machine, a CART decision tree, an Artificial Neural Network (ANN) with fewer super parameters are selected as the basic learner, the number of neurons of each layer of the ANN is determined, the number of elements of an input layer and flowdata is kept to be consistent to 12, a hidden layer is 1 layer, 8 hidden elements are configured, an output layer is a unit, after each learning output prediction result, the comparison is performed with an actual measurement value, and the difference value is used as error back propagation learning to update each synaptic weight;

s3.2: inputting the data to be tested in the test set, firstly classifying the data to be tested in the SOM network, and comparing the data x to be tested _p And Euclidean distance between each neuron weight, taking the neuron with the shortest Euclidean distance as the clustering label of the sample, namely the data x to be tested _p Is determined by the following formula:

s3.3: in this embodiment, the training set is all the data x belonging to the data x to be tested in the history sample data set _p The data of the category t is trained by applying the training set to ANN, model output is completed after the preset precision requirement is met, and the data x to be tested is obtained _p And (5) predicting, outputting a predicted value, and discarding the trained artificial neural network after the prediction is completed.

Specifically, the step S4 includes the following substeps:

s4.1 calculating an ROC curve of the test set data obtained in the step S3, calculating the accident risk r of the case control data set according to the SOM-JITL-ANN model, and determining the minimum value and the maximum value r of the accident risk predictive value _min 、r _max By r _min At a minimum value r _max Constructing a classification threshold sequence c= { C for maximum value and 0.001 for step size _j Sequentially calculating corresponding confusion matrixes according to elements in the classification threshold sequence C, wherein the specific steps are as follows:

wherein y is the predicted occurrence of an accident, and when y=y=1, true Positive (TP) is noted; when y=1, y=0, it is noted as False Positive (FP); when y=0, y=1, it is noted as False Negative (FN); when y=y=0, it is denoted as True Negative (TN), and the threshold value c is classified for each _j Calculating the recall rate (TPR) below the threshold _j ) And False Positive Rate (FPR) _j )：

Taking FPR as an abscissa and TPR as an ordinate, sequentially tracing points in a two-dimensional coordinate system, and calculating the area AUC of a subject working state curve (ROC) enclosed by the coordinates and an origin, (1, 1) and (1, 0);

s4.2: selecting TPR _j -FPR _j Maximum corresponding classification threshold c _j As final classification threshold c _y 。

Specifically, the step S5 includes the following substeps:

s5.1: as shown in fig. 5, for the road to be predicted, traffic flow information data in step S1 including the traffic volume, average vehicle speed, and lane occupancy of each section is collected;

s5.2: as shown in fig. 5, after inputting the data to be tested, firstly determining the category to which the data belongs by using the SOM network obtained in step S2, performing risk scene recognition, then performing instant learning by using the base learner in step S3, performing real-time evaluation on the risk index of the current road, and exceeding the final classification threshold c obtained in step S4 _y Is used for issuing early warning.

Examples of the invention

In order to show the practicability of the continuous flow road scene accident risk assessment method based on self-organizing mapping and instant learning, the following specific embodiments are utilized for further explanation.

Taking a certain expressway as an example, collecting traffic flow and accident data between 3 months according to the step S1, wherein the whole length of the expressway is 13 km, 7 microwave sensors are distributed along the expressway to collect the traffic flow data, the distance between each sensor is about 1.6 km, and the sensors record the section traffic volume, the average vehicle speed and the lane occupancy within the period once every 5 min. Accident data is collected and the time and position of the accident are confirmed, and the accident is collected 123. For each accident, according to the specific implementation step S1.2, traffic volume, occupancy and vehicle speed data in the first 5-10 minutes and 10-15 minutes collected by the upstream and downstream sensors of the road section where the accident point is located are taken as characteristic variables, and total 12 independent variables are taken. According to the specific embodiment, as shown in step S1.3, a case control data set is constructed, 3 pieces of non-accident data of the same observation section, the same time and different dates are randomly selected for each piece of accident data, and corresponding characteristic variable calculation is performed. Finally, the case control dataset totaled 492 pieces of data.

In performance testing using this dataset, all methods applying the JITL strategy use 70% of the data as the historical sample database and 30% of the data as the test data, as described in step S2.1 of the detailed description. For the traditional modeling method, the training set and the testing set are still divided, the data proportion of the two sets is 70% and 30%, and the proportion of accident data and non-accident data in the two sets is still controlled to be about 1:3.

According to the embodiment step S2.2, a SOM network is constructed, calculated M e (2.65,4.04), where m=3 is selected to build a SOM network with 9 neurons, and training is performed as described in the embodiment steps S2.3 to S2.5. According to the specific embodiment, the risk scenes are identified in step S2.6, wherein the accident sample rate of 2 neurons in 9 neurons exceeds 25%, and the accident sample rate reaches 29.2% and 37.1% respectively, namely the risk scenes are identified as high risk scenes, and the risk scenes are identified as low risk scenes.

According to the specific implementation mode, as described in step S3.1, an ANN is used as a base learner, the number of neurons in each layer of the ANN is determined, the number of elements of the input layer kept consistent with the number of elements of flowdata is 12, 1 layer is selected as a hidden layer, 8 hidden elements are configured, an output layer is a unit, after each learning output prediction result, the comparison is performed with an actual measurement value, and the difference value is used as error back propagation learning to update each synaptic weight. According to the embodiment, as described in step S3.2 and step S3.3, a similar sample set based on the SOM clustering result is constructed for the data to be tested, and a prediction result of the data to be tested is output, and according to the embodiment, as described in step S4, an AUC index is calculated. And comparing the SOM-JITL-ANN result with ANN and XGBoost, directly using an ANN model to perform risk assessment, wherein the effect is poor, the AUC is only 0.665, the performance is greatly improved after the SOM-JITL strategy is applied, the AUC index reaches 0.830, and the AUC index is improved by 24.8 percent compared with that of the traditional ANN and exceeds the AUC performance of 0.759 of XGBoost of complex machine learning.

In the modeling process, the SOM-JITL-ANN uses default parameters of the base learner to make parameter adjustments only once on the network structure configuration of the SOM. When the XGBoost algorithm is adopted to test the data set, the AUC index on the training set is finally obtained by using the default parameters and is close to 1, the AUC of the test set is only 0.6, a huge training test index difference is formed, and obviously, the XGBoost is fitted on the training set, and the super parameters are required to be adjusted. The parameter adjustment adopts a greedy algorithm, a plurality of recommended values of 7 parameters, namely the number of decision trees, the learning rate, the maximum depth, the column sampling proportion, the L1 regularization weight, the L2 regularization weight and the minimum leaf node branch loss are listed, the parameters are tried one by one, when one parameter is adjusted to be optimal, the next parameter is optimized, an optimized model is finally obtained, the AUC of a test set is 0.759, and if a data set is replaced, the process is needed to be carried out again. The SOM-JITL module greatly simplifies the parameter debugging process of a complex machine learning model by combining model precision comparison results, enables the risk assessment modeling to be completed by using less parameter tuning knowledge, and is a practical and effective method when the model portability is considered and high precision requirements exist.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims

1. The traffic accident risk assessment method under the continuous flow road scene is characterized by comprising the following steps of:

2. The traffic accident risk assessment method according to claim 1, wherein in step S1, the process of collecting accident data occurring within a certain time frame and traffic flow parameters within the time frame comprises the following sub-steps:

data＝[crash，flowdata]。

3. the method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 2, wherein in step S1, the historical road accident data includes the time and the location of the occurrence of the traffic accident, the time is accurate to the order of minutes, and the location is accurate to the order of hundred meters.

4. The method according to claim 2, wherein in step S1, the ratio of the number of normal driving data to the number of accident data in the case-control data set is 3:1.

5. The method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 1, wherein in step S2, the process of constructing a risk scene recognition model by performing self-organizing map cluster analysis using historical sample data comprises the following sub-steps:

wherein, each side of the SOM network has M neurons, which are in M columns and in M total ² Each neuron is connected with other neurons according to a hexagon to form a honeycomb network structure;

where i=1, 2, N;

s205: updating SOM network weights:

w←w+-ηh(d)(x _i -w)

Wherein N is _sample Is all samples belonging to neuron c, N _erash Is the accident sample size in neuron c;

6. The method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 1, wherein in step S3, the base learner model adopts any one of a support vector machine model, a decision tree model and an artificial neural network model.

7. The method for evaluating the risk of a traffic accident in a continuous flow road scene according to claim 1, wherein in step S3, the process of performing the training of the base learner model by using the historical sample data of the risk scene as the training set and outputting the prediction result of the data to be tested comprises the following steps:

8. The method according to claim 1, wherein in step S4, the process of drawing an ROC curve for the prediction result of the test data, evaluating the goodness by using an AUC index, and selecting the optimal risk threshold according to the about log index comprises the steps of:

wherein y is ^* To predict whether an accident occurs, when y ^* When=y=1, it is noted as true positive TP; when y is ^* When=1, y=0, it is noted as false positive FP; when y is ^* When=0, y=1, it is noted as false negative FN; when y is ^* When=y=0, the value is denoted as true negative TN, and the threshold value c is classified for each _j Calculating the recall TPR below the threshold _j False positive rate FPR _j ：

9. The method according to claim 1, wherein in step S5, corresponding traffic flow information data is collected for the road to be predicted, the collected data including the traffic volume, average vehicle speed and lane occupancy of each section;