CN109034264B

CN109034264B - CSP-CNN model for predicting severity of traffic accident and modeling method thereof

Info

Publication number: CN109034264B
Application number: CN201810930337.0A
Authority: CN
Inventors: 李彤; 郑明�; 朱锐
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2018-08-15
Filing date: 2018-08-15
Publication date: 2021-11-19
Anticipated expiration: 2038-08-15
Also published as: CN109034264A

Abstract

The invention discloses a CSP-CNN model for predicting the severity of a traffic accident and a modeling method thereof. The CSP-CNN model comprises a model input layer, wherein the model input layer inputs a traffic accident data gray level image set converted from traffic accident data, performs convolution calculation on an input convolution layer to obtain a feature vector extracted by the last convolution layer, and inputs the feature vector to a full connection layer; the full connection layer carries out flatten operation on the input characteristic vector, converts the characteristic vector into a one-dimensional vector and then carries out linear processing, and the full connection layer comprises 3 hiding units and outputs 3 linear processing results to the model output layer; the model output layer sets 3 traffic accident severity levels and predicts the severity of the traffic accident using the Softmax activation function. The invention fully considers the space-time relationship, the combination relationship and the deeper internal relationship among the traffic accident characteristics and predicts the severity of the traffic accident.

Description

CSP-CNN model for predicting severity of traffic accident and modeling method thereof

Technical Field

The invention belongs to the technical field of data mining, and particularly relates to a traffic accident severity prediction model based on deep learning and a modeling method thereof.

Background

Every year, more than 125 million people worldwide end up due to road traffic accidents, and 2000 to 5000 million people suffer non-fatal injuries, many of which are disabled as a result. Road traffic injuries bring huge economic losses to individuals, families and the whole country, and the loss of road traffic collisions accounts for 3% of the total domestic production value of most countries.

Accident severity prediction is one of the important steps in accident management and provides emergency personnel with important information for assessing the severity of an accident, assessing the potential impact of an accident, and implementing an effective accident management program. The problem of predicting the severity of a traffic accident can be said to be a major challenge in the field of current intelligent transportation systems, as the task of correctly predicting the severity of a traffic accident will provide an extremely important aid in saving lives in those accidents.

At present, the traffic accident severity prediction method can be divided into two categories, namely a statistical learning method and a deep learning method. In recent years, deep learning, which explains texts, images, and sounds, has been widely used in the fields of text, image, and speech recognition, and the like, has been highly drawing attention from researchers and business people as a new machine learning method. As an efficient deep learning technique, the neural network technique is widely applied to traffic prediction problems due to its advantages of capability of processing multidimensional data, flexibility of implementation, universality, strong prediction capability, and the like. In terms of Traffic Accident Severity Prediction, Mehmmetmeta Kunt et al, published under the name "Prediction for Traffic Accident Severity" matching the engineering Neural Network, Genetic Algorithm, Combined Genetic Algorithm and Pattern Search Methods ", Transport,2011,26, (4),353-366 predicts the Severity of highway Traffic accidents by using 12 Accident-related parameters in a multilayer perceptron (MLP) structure modeling method in a Genetic Approach (GA), Pattern Search and Artificial Neural Network (ANN). The models were built based on a total of 1000 traffic accident datasets occurring in 2007 on the delaunay-ghogong highway, with best-fit models selected based on R-value, Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Sum of Squared Errors (SSE). The experimental results show that the highest R value of MLP is about 0.87, indicating that MLP provides the best prediction results. Zeng, q.and Huang, h., published under the name "a Stable and Optimized Neural Network Model for blast interior Severity Prediction", account Analysis & Prediction, 2014,73,351-358 proposed a Convex Combination (CC) method to quickly and stably train a Neural Network (NN) Model for traffic Accident Severity Prediction and an improved NN pruning for function approximation (N2PFA) method to optimize the Network structure and compare them with NN trained by the conventional Back Propagation (BP) method and the Ordered Logic (OL) Model, and the results show that the CC method is superior to the BP method in convergence ability and training speed. Compared to fully connected NNs, optimized NNs contain far fewer network nodes and have almost the same classification accuracy. They all have better fitting and prediction performance than the OL model, which again demonstrates that neural networks are superior to statistical models in predicting the severity of traffic accidents. Sameen et al, published under the name "visibility Prediction of Traffic identifications with Current Neural Networks", Applied Sciences,2017,7, (6) analyze 1130-initiated Traffic Accidents on the southern-northern expressway in Malaysia from 2009 to 2015 via the Recurrent Neural network (LSTM-RNN) and apply them to the Prediction of the Severity of Traffic Accidents. Their experimental results show that compared with MLP and Bayesian Logistic Regression (BLR) models, the LSTM-RNN model is superior to the MLP and BLR models, the validation accuracy of the LSTM-RNN model is 71.77%, and the MLP and BLR models respectively reach 65.48% and 58.30%.

CNN has become one of the research hotspots in many scientific fields, is a fast and effective feedforward neural network, and is widely applied to the fields of computer vision, image recognition and speech recognition with remarkable results. The CNN has the following characteristics in the aspect of feature extraction: first, the convolutional layers in CNN are locally connected or not fully connected, which means that the output neurons are connected only to locally adjacent input neurons; second, another layer structure in CNN, the pooling layer, selectively selects only salient features from the recipient areas, which greatly reduces the parametric size of the model; third, the fully connected layer is only used at the last stage of CNN. The factors influencing the severity of the traffic accident mainly comprise the following five characteristics: road surface characteristics, accident characteristics, vehicle characteristics, driver characteristics, and environmental factors. However, the above-identified work has not addressed and explored in detail the spatial, combinatorial, and deeper-lying intrinsic relationships between these features that affect the severity of casualties of traffic accidents.

Disclosure of Invention

The invention aims to provide a traffic accident severity prediction model and a modeling method thereof, which are used for converting a traffic accident data set into a gray image form according to the importance of traffic accident characteristics, constructing a traffic accident severity prediction CSP-CNN model based on deep learning, extracting the space, combination and deeper internal relation among the traffic accident casualty severity characteristics and predicting the traffic accident casualty severity.

The invention adopts the technical scheme that a CSP-CNN model for predicting the severity of a traffic accident consists of the following four parts: a model input layer, a convolution layer, a full connection layer and a model output layer;

the model input layer is used for inputting a traffic accident data gray level image set and providing input for the convolutional layer;

the convolution layer is used for extracting abstract characteristics of the traffic accident data set from the input gray images of the traffic accident data set;

the full-connection layer is used for converting the characteristic vector of the traffic accident data set extracted and learned by the last convolutional layer into a one-dimensional vector, performing linear processing based on the one-dimensional vector and outputting a linear processing result;

the model output layer is used for predicting the severity of the traffic accident by utilizing a Softmax activation function on the output of the full connection layer;

the number of the convolution layers is 4, each convolution base layer is provided with 256 filters, the convolution kernel size is 3, the walking stride is 1, and the complementary 0 parameter pad is 1;

the full connection layer comprises 1 flatten layer and 128 hidden units;

the model output layer, namely the softmax full-link layer, comprises 3 hidden units.

The modeling method of the CSP-CNN model for predicting the severity of the traffic accident comprises the following specific steps:

the method comprises the following steps: converting the traffic accident data into a traffic accident data gray image set based on the importance of the traffic accident characteristics, and inputting the traffic accident data gray image set into a model input layer, wherein the input mathematical form of the traffic accident severity prediction model CSP-CNN is expressed as follows:

wherein d represents the index of the traffic accident data set x, N represents the total number of the traffic accident data set x, PC represents the number of the parent features of the traffic accident data set x, CC represents the maximum number of the child features under the parent features of all the traffic accident data sets x, max (PC, CC) represents the maximum value of PC and CC, P_MMGray scale image pixel matrix x representing a traffic accident data set_dRow M and column M;

convolution calculation: the convolution calculation is performed on the input provided by the model input layer by using an activation function ReLU, wherein the activation function ReLU is as follows:

g(h)＝max(0,h)； (2)

wherein h is the input of the convolution neuron;

the convolution calculation formula is:

wherein: a is_k，lThe kth line and the l column elements of the convolutional layer Feature Map are represented, wherein the value ranges of e and F are [1, F](ii) a C is the number of channels, which is the same as the number of filters of the convolutional layer; f is the size of the filter, and the width and the height of the filter are the same; w is a_c，e，fThe weight of the ith row and the fth column of the c channel filter is represented; p is a radical of_c,k,lA pixel element representing a kth row and a l column of a c-th channel gray image of the input image; p is a radical of_c,k+e,l+fA pixel element representing a (k + e) th row and a (l + f) th column of a (c) th channel gray scale image of the input image; w is a_bRepresenting the bias item of the filter, randomly initializing w each time the model runs_b；

An input that is a convolutional neuron;

step three: calculating a full connection layer: converting the feature vector extracted and learned by the last convolutional layer into a one-dimensional vector through a scatter operation by using the following formula as input of a full-link layer:

a^flatten＝flatten([a₁,a₂,...,a_c]),c∈[1,C]； (4)

wherein, a^flattenRepresenting a converted one-dimensional vector, namely the Feature Map of the full connection layer after flatten; [ a ] A₁，a₂，…，a_c]Feature vectors [ Feature Map1, Feature Map2, …, Feature Map apc ] extracted and learned for the last convolutional layer]；

The calculation formula of the full connection layer is as follows:

wherein:

representing the linear output of the fully connected layer, w_flWeight representing fully connected layer, b_flA bias term representing a fully connected layer;

step four: predicting the severity of the traffic accident: setting the severity grade of the traffic accident as three types of light traffic accident, serious traffic accident or fatal traffic accident, and outputting the model by the model output layer according to the output of the full connection layer

Predicting the severity of the traffic accident by using a Softmax activation function, outputting a probability value of the set traffic accident grade,the traffic accident grade with the maximum probability value is the predicted severity of the traffic accident;

step five: and training the CSP-CNN model for predicting the severity of the traffic accident, and confirming the hyperparametric combination of the CSP-CNN model.

In the first step, the traffic accident data is converted into the traffic accident data gray level image set based on the importance of the traffic accident characteristics, and the implementation process is as follows:

step 1: acquiring a feature matrix FM of the preprocessed traffic accident data set;

step 2: distributing k threads according to the total number of the original traffic accident data set, and converting corresponding feature vectors FV in a feature matrix FM of the traffic accident data set into gray images aiming at each thread;

and step 3: and storing the gray level image converted by the feature vector FV obtained by each thread in a gray level image linked list, and returning the gray level image.

The steps of preprocessing the traffic accident data set in the step 1 are as follows:

(1) deleting incomplete, wrong and repeated traffic accident data, and deleting sub-characteristics influencing the severity of casualties of traffic accidents;

(2) normalizing the traffic accident data set, removing unit limitation of the data, and converting the data into dimensionless pure values: normalizing the traffic accident data set x by using a Normalization method Z-score Normalization in statistics to obtain a data symbol standard normal distribution, wherein a conversion function of the Z-score Normalization is as follows:

wherein x is^*Representing a certain data under a single characteristic, u is the mean value of all data under the single characteristic, and sigma is the labeling difference of all data under the single characteristic; each feature in the traffic accident data set x is calculated separately in turn.

The step 1 of obtaining the feature matrix FM of the traffic accident data set comprises the following specific steps:

step 1.1. determining all parent features fp of a certain data in the original traffic accident data set according to whether the parent features fp are related to traffic accident severity prediction:

fp＝{fp₁，…，fp_m}； (7)

wherein m represents the number of parent features of a piece of data in the original traffic accident data set;

step 1.2, acquiring all sub-characteristics fc of certain data in the original traffic accident data set confirmed by data preprocessing:

wherein i ∈ [1, m ]]，j∈[1，n]，fc_i,jRepresents the jth sub-feature of a piece of data in the original traffic accident data set, and the parent feature of the sub-feature is fp_iAnd satisfies the following conditions:

and is

Wherein i ≠ j, i.e. each child feature belongs to and only belongs to 1 parent feature; the number of the child features of the ith parent feature is recorded as Np_i＝|fp_i|；

Step 1.3, determining importance weight vectors wc of all sub-features of a certain data in an original traffic accident data set:

wc＝(w_1,1，…，w_i,j)； (9)

wherein, w_i，jRepresenting the importance weight of the jth sub-feature of a piece of data in the original traffic accident data set, wherein the sub-feature belongs to the parent feature fp_i；

Step 1.4, determining a feature vector FV of a certain data in the traffic accident data set, wherein the feature vector FV is an expression form of a certain data feature in the traffic accident data set, and is a triple:

FV＝<fp,fc,wc>； (10)

step 1.5, determining a feature matrix FM of the traffic accident data set, wherein the feature matrix FM is an expression form of all data features of the traffic accident data set and is a set of feature vectors:

FM＝{FV₁,...,FV_kand FM ∈ R^k×n； (11)

Where k represents the total number of pieces of the original traffic accident data set, n represents the number of sub-features of a piece of data in the original traffic accident data set, R^k×nRepresenting a k x n matrix.

In the step 2, the feature vector FV of a certain data in the traffic accident data set is converted into a gray image, and the method specifically comprises the following steps:

step 2.1, classifying the characteristics of the original traffic accident data set: according to a feature vector FV of certain data in a traffic accident data set, n sub-features of the data are classified into corresponding m father features respectively, and importance weight vectors wc of all the sub-features fc are initialized simultaneously;

step 2.2, all father characteristics fp are searched, father characteristics with the most number of the child characteristics are found, and the number of the child characteristics of the father characteristics is returned;

step 2.3, the returned number of the sub-characteristics is compared with m, the maximum value of the number of the returned sub-characteristics is defined as max _ dim, and then an all-zero matrix Mat is initialized^{max_dim×max_dim}As a final traffic accident data set storage unit;

step 2.4, according to the original traffic accident data set, the all-zero matrix Mat is paired^{max_dim×max_dim}Filling;

step 2.5, calling Reshap function of graphic processing to give all-zero matrix Mat^{max_dim×max_dim}A channel is added and converted into a grayscale image gray image.

In the step 2.4, the all-zero matrix Mat is aligned according to the original traffic accident data set^{max_dim×max_dim}The filling process includes the following steps:

step 2.4.1, parent feature descending order: according to the weight wp of each father feature, all the father features fp are arranged in a descending order, and the weight wp of a certain father feature_iIs equal to the importance weight w of all sub-features thereunder_i,jSum, i.e. wp_i＝Σw_i，j；

Step 2.4.2.Mat^{max_dim×max_dim}And (3) line filling: according to the weight wp of each father feature, extending and progressively reducing and filling all the father features fp which are arranged in a descending order from the middle to the upper and lower sides according to the principle that the upper part is larger than the lower part;

step 2.4.3. sub-feature descending order: according to the importance weight w of the child features under each parent feature_i,jSorting the sub-features below the sub-features in descending order;

step 2.4.4.Mat^{max_dim×max_dim}Column filling: in all-zero matrix Mat^{max_dim×max_dim}In the corresponding row, namely in each parent feature, column filling is carried out on all the next descending sub-features according to the principle that the left is larger than the right;

step 2.4.5, maintain all-zero matrix Mat^{max_dim×max_dim}And the other unit values are '0' and are unchanged, and a final result matrix is obtained.

The number m of the parent features of a certain piece of data in the original traffic accident data set is 5; the number n of sub-features of a certain piece of data in the original traffic accident data set is 12;

fp ═ Accident feature₁Road surface characteristics₂Environmental factors₃Vehicle features₄Characteristic of the driver₅}；

fc ═ east position_1,1North orientation position_1,2Road class 1_1,3Time of occurrence of an accident_1,4Number of vehicles involved in an accident_1,5Condition of road surface_2,6Illumination situation_3,7Weather conditions_3,8Type of vehicle_4,9Categories of casualty_5,10Sex of casualty_5,11Age of injury and death_5,12}。

In the step 1.3, the importance weight vectors wc of all the sub-features of a certain piece of data in the original traffic accident data set are determined, and are obtained by performing 1000 iterations on 12 sub-features of the traffic accident data set by using an XGboost method.

The CSP-CNN model hyper-parameter composition in the step five is as follows: the batch size is 128, the loss function is a conditional cross, the optimizer is an Adagrad (gradient descent optimizer), the learning rate is 0.001, the error term is 1e-07, and the initialization convolution kernel adopts a gloot normal distribution initialization method (gloot normal).

The invention has the beneficial effects that:

(1) the method for converting the traffic accident data into the gray level image set based on the importance of the traffic accident features is firstly provided;

(2) firstly converting the traffic accident data into a gray image set and using the gray image set as the input of a deep learning model to predict the severity of the traffic accident;

(3) according to the method for converting the traffic accident data into the gray image set based on the importance of the traffic accident features and the proposed CSP-CNN model, the space-time relationship, the combination relationship and the deeper internal relationship among the traffic accident features are fully considered, and the severity of the traffic accident is predicted.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a CSP-CNN model for predicting the severity of a traffic accident;

FIG. 2 is a graph of traffic accident data set casualty severity in 2009-2016, Ritz, UK;

FIG. 3 is a traffic accident data set sub-feature importance distribution;

FIG. 4 is a schematic diagram illustrating a process of converting a feature vector of a piece of data in a traffic accident data set into a gray image;

FIG. 5 is the accuracy of the CSP-CNN model at different depths;

FIG. 6 shows the accuracy of different model experiments;

FIG. 7 is a partially aware traffic accident data set;

FIG. 8(a) is a plot of accuracy, recall and F1 Score for different model predictions under a light traffic accident test set;

FIG. 8(b) is a plot of accuracy, recall and F1 Score for different model predictions under a severe traffic accident test set;

FIG. 8(c) is a plot of accuracy, recall and F1 Score for different model predictions under the fatal traffic accident test set.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Predicting the severity of casualties of a traffic accident, a traffic accident data set with characteristic information must be comprehensively considered, and factors known to influence the severity of the traffic accident mainly comprise the following five parent characteristics: the method for converting the traffic accident data set into the gray image form and the CSP-CNN model are elaborated on the basis of the five father characteristics influencing the severity of the traffic accident casualties.

The method for converting the traffic accident data into the gray image set based on the importance of the traffic accident features by using the feature vector FV of a certain data in the traffic accident data set comprises the following steps:

the method comprises the following steps: acquiring a feature matrix FM of a traffic accident data set;

step two: distributing k threads according to the total number of the original traffic accident data set, and performing gray level image conversion on corresponding feature vectors FV in a feature matrix FM of the traffic accident data set aiming at each thread;

step three: and storing the gray level image converted by the feature vector FV obtained by each thread in a gray level image linked list, and returning the gray level image.

The method comprises the following steps of obtaining a feature matrix FM of a traffic accident data set:

fp＝{fp₁，…，fp_m}； (7)

step 1.2. determining all sub-features fc of a certain data in the original traffic accident data set according to whether the sub-features are related to the traffic accident severity prediction:

wherein i ∈ [1, m ]]，j∈[1，n]，fc_i，jRepresents the jth sub-feature of a piece of data in the original traffic accident data set, and the parent feature of the sub-feature is fp_iAnd satisfies the following conditions:

and is

wc＝(w_1,1，…，w_i,j)； (9)

FV＝<fp,fc,wc>； (10)

FM＝{FV₁,...,FV_kand FM ∈ R^k×n (11)

Converting a feature vector FV of certain data in a traffic accident data set into a gray level image, which specifically comprises the following steps:

step 1, classifying the characteristics of an original traffic accident data set: according to a feature vector FV of certain data in a traffic accident data set, n sub-features of the data are classified into corresponding m father features respectively, and importance weight vectors wc of all the sub-features fc are initialized simultaneously;

step 2, searching all father features fp, finding the father feature with the largest number of the child features, and returning the number of the child features of the father feature;

step 3, comparing the number of the returned sub-features with m, defining the maximum value of the number of the returned sub-features as max _ dim, and then initializing an all-zero matrix Mat^{max_dim×max_dim}As a final traffic accident data set storage unit;

step 4, according to the original traffic accident data set, the all-zero matrix Mat is paired^{max_dim×max_dim}Filling;

step 5, calling Reshap function of graphic processing to give all-zero matrix Mat^{max_dim×max_dim}A channel is added and converted into a grayscale image gray image.

Pairing the all-zero matrix Mat according to the original traffic accident data set^{max_dim×max_dim}The filling steps were as follows:

step 4.1, father feature dropSequencing: according to the weight wp of each father feature, all the father features fp are arranged in a descending order, and the weight wp of a certain father feature_iIs equal to the importance weight w of all sub-features thereunder_i，jSum, i.e. wp_i＝Σw_i，j；

Step 4.2.Mat^{max_dim×max_dim}And (3) line filling: according to the weight wp of each father feature, extending and progressively reducing and filling all the father features fp which are arranged in a descending order from the middle to the upper and lower sides according to the principle that the upper part is larger than the lower part; such as all-zero matrix Mat^max ^{_dim×max_dim}If the number of rows is odd, the parent feature with the largest weight is placed in the middle row, the parent feature with the second largest weight is placed in the upper row of the middle row, the parent feature with the third largest weight is placed in the second row above the middle row, and after the filling in the upper row of the middle row is completed, the filling is continued from the next row of the middle row, and the filling is decreased progressively; such as all-zero matrix Mat^{max_dim×max_dim}If the number of rows is even, the parent feature with the largest weight is placed in the upper row in the two rows at the middle, and the parent feature with the second largest weight is placed in the lower row in the two rows at the middle;

step 4.3, the sub-features are arranged in descending order: since the sub-features contained in each parent feature are unordered, after the above steps are completed, the importance weight w of the sub-features under each parent feature is also required_i，jSorting the sub-features below the sub-features in descending order;

step 4.4.Mat^{max_dim×max_dim}Column filling: in all-zero matrix Mat^{max_dim×max_dim}In the corresponding row, namely in each parent feature, column filling is carried out on all the next descending sub-features according to the principle that the left is larger than the right; if there may be 3 sub-features under the 2 nd large parent feature, the 2 nd large parent feature is placed in the all-zero matrix Mat^{max_dim×max_dim}The second row of (2), then, at this point, these three sub-features will be placed in the (2, 3), (2, 2), (2, 4) cells of the matrix, respectively;

step 4.5, maintain all-zero matrix Mat^{max_dim×max_dim}And the other unit values are '0' and are unchanged, and a final result matrix is obtained.

In a feature vector FV of a certain piece of data in a traffic accident data set, m is 5, n is 12:

fc ═ east position_1,1North orientation position_1,2Road class 1_1,3Time of occurrence of an accident_1,4Number of vehicles involved in an accident_1,5Condition of road surface_2,6Illumination situation_3,7Weather conditions_3,8Type of vehicle_4,9Categories of casualty_5,10Sex of casualty_5,11Age of injury and death_5,12}；

wc＝(0.165774538_1,1，0.171530785_1,2，0.082228259_1,3，0.047771472_1,4，0.060763375_1,5，0.048847406_2,6，0.041826936_3,7，0.04354843_3,8，0.126314657_4,9，0.067057589_5,10，0.049116389_5,11，0.095220163_5,12)。

Establishing a CSP-CNN model for predicting the severity of traffic accidents

Because the CNN is an extraction method with unique image key features, the CNN shows strong learning ability in the aspects of image identification and understanding. Compared to other deep learning models, CNN has two unique features: local join and weight sharing. Local connection means that each neuron is connected only to a block of input neurons called the receptive field, and weight sharing means that the filter employed by the neurons for extracting image features is shared. These two unique features together determine fewer parameters for CNN than other deep learning models.

To accommodate traffic environments, a CSP-CNN model is presented herein. Compared with the traditional CNN model, the method has specificity in the following aspects: (1) the input of the model is different, namely the traffic accident image input by the CSP-CNN model has only one channel, namely a gray scale map, which is essentially a pixel matrix, and the pixel value ranges from 0 to the value after the traffic accident characteristic is normalized. In contrast, in the CNN model in the image recognition and classification problem, the input image typically has three channels, RGB, and the pixel values range from 0 to 255; (2) from the 12 parent features, according to the 5 parent features selected herein: accident characteristics, road surface characteristics, environmental factors, vehicle characteristics and driver characteristics, when the accident characteristics, the road surface characteristics, the environmental factors, the vehicle characteristics and the driver characteristics are converted into a matrix, the dimension of the matrix is 5x5, and the characteristic information of the severity of casualties of the traffic accident which is not much in nature is damaged by down-sampling operation on the matrix, so that the CSP-CNN does not have down-sampling operation (Pooling layer Pooling) in the traditional CNN model; (3) the model outputs are different, in traffic environments the output of the CSP-CNN is a prediction of the severity of casualties of traffic accidents, while in image recognition and classification problems the output of the CNN is an image class label.

As shown in fig. 1, the convolution kernel size is set to 3, stride is set to 1, and the complement 0 parameter pad is set to 1, and the structure of the CSP-CNN model for predicting the severity of a traffic accident includes four main parts: input to the model, convolutional layer, full link layer, and model output layer.

First, the CSP-CNN inputs a traffic accident data set gray image obtained by converting traffic accident data based on the importance of traffic accident features, which includes 5 parent features and 12 child features of a traffic accident. Accordingly, the input mathematical form of the model is expressed as follows:

wherein d represents the index of the traffic accident data set x, N represents the total number of the traffic accident data set, PC represents the number of the father features of the traffic accident data set, CC represents the maximum number of the child features under all the father features, P represents the number of the child features under all the father features_MMGray scale image pixel matrix x representing a traffic accident data set_dThe convolution layer is used as the core layer of CSP-CNN, the purpose is to extract abstract features in traffic accident data set, for clearly describing the calculation process of convolution layer, firstly, each pixel of the gray level image of traffic accident data set is processedLine number, P_c,k,lA pixel element representing a kth row and a l column of a c-th channel gray image of the input image; then each weight of the filter is numbered with w_c，e，fThe weight of the ith row and the fth column of the c channel filter is represented; finally, the convolution is calculated using the activated function Rectified Linear Unit (ReLU):

the activation function ReLU is: g (h) max (0, h) (2)

Wherein h represents the input of the neuron, and the convolution formula is:

wherein: a is_k，lA k line and l column element representing Feature Map; c is the number of channels, which is the same as the number of filters of the convolutional layer; f is the size of the filter (width or height, both the same); p is a radical of_c,k+e,l+fA pixel element representing a (k + e) th row and a (l + f) th column of a (c) th channel gray scale image of the input image; e and F have a value range of [1, F]；w_bRepresenting the bias item of the filter, randomly initializing w each time the model runs_b；

Is the input to the convolution neuron.

Each convolution layer can have a plurality of filters, and after each filter is convolved with the original traffic accident image, a Feature Map can be obtained. Therefore, the number of channels of the Feature Map after convolution is the same as the number of filters of the convolutional layer.

And (3) setting a fully-connected layer, converting the feature vector extracted and learned by the last convolutional layer into a one-dimensional vector through a scatter operation by using the following formula as the input of the first fully-connected layer:

a^flatten＝flatten([a₁,a₂,...,a_c]),c∈[1,C]； (4)

wherein, a^flattenOne-dimensional vector representing the transformation, namely Feature Map after flatten, [ a ]₁，a₂，…，a_c]Feature vectors extracted and learned for the last convolutional layer [ Feature Map1, Feature Map2, …, Feature Map apc]；

The calculation formula of the full connection layer is as follows:

wherein:

finally, the output of the previous full connection layer is used as the input of the next full connection layer, and finally output

And the output layer classifies the casualty severity of the traffic accident by utilizing a Softmax activation function, the Softmax function outputs a probability value to each set category, the category with the maximum probability value is the predicted category, and the output of the model is the corresponding casualty severity level of the traffic accident, including light traffic accidents, serious traffic accidents and fatal traffic accidents.

In addition, Batch Normalization is used between convolutional layers and convolutional layers, between convolutional layers and fully-connected layers, and between fully-connected layers and fully-connected layers to accelerate model training and prevent overfitting.

Results and analysis of the experiments

The CSP-CNN model proposed herein is implemented in Python using the open source deep learning framework tensrflow developed by Google, because tensrflow has advantages of usability, flexibility, high efficiency, etc., and can conveniently define and execute a variety of deep learning networks. Specifically configured as an Intel Xeon E5-2682V 4(Broadwell) processor, 2.5GHz dominant frequency, Nvidia P100GPU, 12GiB video memory, a GPU server with 9.3TFLOPS single precision floating point and 4.7TFLOPS double precision floating point computing power trained on 39403 samples (80% of the dataset) based on the TensorFlow framework using CSP-CNN model experiment 100epochs and validated with 9851 samples (20% of the dataset).

(1) Data collection

Traffic accident data from 8 years (2009-2016) of the litz city, uk was used in this experiment. The total number of accident records obtained during this period was 21436. In each accident record of the traffic accident information of the Ritz city, 15 different sub-characteristics including the place, the number of people involved, the vehicle, the road surface, the weather condition and the like are collected when the traffic accident occurs. In order to examine the influence of various factors on the severity of casualties of traffic accidents, the severity of casualties is classified into three grades, namely mild, severe and fatal.

(2) Data pre-processing

Before applying the traffic accident data set as input to the CSP-CNN, the data set needs to be preprocessed, the steps comprising: the method comprises the following steps of data preprocessing, data type unbalance processing and data conversion into an image, wherein the specific steps are as follows:

1) the pre-data processing includes the deletion of incomplete, erroneous, and repetitive traffic accident data, the subtraction of sub-features that affect the severity of casualties of traffic accidents, and the normalization of traffic accident data sets. There are 18727 pieces of data in the entire dataset that can be trained after deleting incomplete, erroneous, and duplicate data. The ratio of the data sets of different severity of traffic accidents to the total data set is shown in fig. 2, wherein 88% of traffic accidents belong to mild accidents, 11% of traffic accidents belong to severe accidents, and 1% of traffic accidents belong to fatal accidents.

Second, the 15 different sub-features of the traffic accident data set are reduced to 12 depending on whether they correlate with the traffic accident severity prediction, covering road surface features, accident features, vehicle features, driver features, and environmental factors, as shown in Table 1.

TABLE 1 12 sub-characteristics of a traffic accident data set and corresponding description

Because the dimensions of each feature of the 12 sub-features of the traffic accident are different, the data under each feature needs to be normalized, unit limits of the data are removed, and the data are converted into dimensionless pure numerical values, so that the features of different units or orders of magnitude can be compared conveniently. In addition, normalization of the traffic accident data set can also bring about the effect of improving the convergence speed and accuracy of the model. The traffic accident data set x is normalized by a Normalization method Z-score Normalization (Zero-Mean Normalization) in statistics to obtain a data symbol standard normal distribution, namely, the Mean value is 0, the standard deviation is 1, and the conversion function is:

wherein x is^*Representing a certain data under a single characteristic, u is the mean value of all data under the single characteristic, and sigma is the labeling difference of all data under the single characteristic; respectively performing calculation on each feature;

2) as can be seen from fig. 2, fatal and serious traffic accidents only account for a small part of the total number of traffic accidents, and if the condition of unbalanced traffic accident data sets is not dealt with, training of the model emphasizes the data category with a large proportion of the total data, ignores the data category with a small proportion of the total data, and finally causes the trained model to be over-fitted to the sample category with a large proportion, and under-fitted to the sample category with a small proportion. Generally, there are two processing modes for unbalanced data through a sampling method, namely, undersampling and oversampling, a part of a data set is lost due to undersampling, so that the data set cannot be fully utilized, and in order to fully utilize a traffic accident data set, the oversampling method is adopted to solve the problem of data unbalance. The simple Oversampling method is random Oversampling, and the Minority sample is added by a simple sample copying strategy, but this method easily causes the learned information of the model to be too special to generalize, i.e. the training of the model is overfit, for this reason, we use the Borderline-SMOTE2 method improved based on the Synthetic Minority Oversampling Technology (SMOTE) to solve the problem, and by using this method, we finally obtain 49254 traffic accident data sets, where the ratio of the light accident data set, the severe accident data set and the fatal accident data set is 1: 1: 1, i.e. 16418 strips each.

3) In order to better extract the space, combination and deeper internal relation of the traffic accident data set features, the five father features and corresponding sub-features of the traffic accident are converted into a gray image form as input variables of a CSP-CNN model, the space, combination and deeper internal relation between the traffic accident data set features are better learned from the bottom layer to the high layer from abstraction to concrete by utilizing the characteristics of the CNN, and finally a model for predicting the casualty severity of the traffic accident is obtained. Converting a traffic accident data set into a grayscale image mainly comprises the following steps: (1) performing 1000 iterations on 12 sub-features of the traffic accident based on XGboost to obtain an importance distribution result, wherein the distribution result is shown in the following figure 3 and table 2; (2) the significance of the sub-features of the traffic accident data set and the traffic accident data set are used as inputs to the method FM2GI, outputting a grayscale image form of the traffic accident data set.

TABLE 2 traffic accident data set importance values

Fig. 4 shows how the feature vector of a piece of data in a traffic accident data set is converted into a grayscale image.

(3) Hyper-parameters of CSP-CNN

Through an interface provided by scinit-spare, a parameter combination of CSP-CNN is searched for 100epochs by combining GridSearchCV and RandomizeSearchCV methods, and an optimal CSP-CNN hyper-parameter combination is determined. Only using GridSearchCV requires a high computational cost, while only using randomize esearchcv finds locally optimal hyper-parametric combinations, and in order to better utilize them, randomize esearchcv is used when globally searching for optimal hyper-parametric combinations, and GridSearchCV is used when locally searching for optimal hyper-parametric combinations, so that the required computational cost is reduced somewhat and it is not easy to get into locally optimal hyper-parametric combinations, and adjusting the combination of hyper-parameters by this cross-linking method can obtain better results. By establishing models with various hyper-parameter combinations and evaluating each model by using 5-fold cross-validation, the hyper-parameter combination with the highest accuracy is finally obtained. Table 3 shows the hyper-parametric combinations used in CSP-CNN after searching using this hybrid approach.

TABLE 3 hyper-parametric combinations of CSP-CNN models

(4) CSP-CNN deep analysis

Typically, in deep learning models, multiple modules and layers can be stacked together, so it is important to analyze the network depth to understand the network behavior, and generally, the depth of the CNN should not be too large or too small, so the CNN can learn more complex relationships while maintaining the convergence of the model. And distributing different depth values from small to large to the CSP-CNN model for testing. Table 4 lists the network structures of the CSP-CNN at different depths, and experiments are performed according to the CSP-CNN network structure in table 4 to obtain the accuracy of the training set and the validation set of the CSP-CNN at different depth structures shown in fig. 5. When the depth of CSP-CNN is 5, the accuracy of the training set and the validation set is 96.24% and 92% respectively; when the depth is 7, the accuracy of the verification set reaches the highest 93.42 percent, and correspondingly, the accuracy of the training set is 97.45 percent; when the depth of the CSP-CNN model is more than 7, although the accuracy of the training set of the model is gradually increased, the accuracy of the verification set is gradually reduced, which indicates that the CSP-CNN model starts to enter the overfitting situation, and the accuracies of the training set and the verification set of the CSP-CNN model with the depths of 9, 11 and 13 are 97.91%, 98.03%, 98.27% and 93.36%, 93.34% and 93.23% respectively. The best accuracy was achieved by using 4 convolutional layers with 256 filters, 1 fallten layer, 1 fully-connected layer containing 128 hidden units and 1 softmax fully-connected layer containing 3 hidden units, and the training and validation set accuracy of the model reached 93.42% and 97.45% respectively. Experiments were therefore performed herein using the CSP-CNN model with depth 7.

TABLE 4 CSP-CNN model at different depths

(5) Comparison of the results of the experiment with other models

To illustrate the effectiveness of the CSP-CNN model presented herein, the experiment compared the model to 6 statistical models and 3 deep learning models. Wherein, 6 statistical models are respectively: the K-nearest neighbor method (KNN) is a non-parametric method for classification and regression; decision Trees (DTs) are combinations that break down a complex decision into several simple decisions, in the hope that the final solution obtained in this way is similar to the intended solution; naive Bayes Classifier (NBC) is a simple family of "probabilistic classifiers" based on the application of Bayes' theorem with strong (naive) independence assumption between features; logistic Regression (LR) measures the relationship between a domain dependent variable and one or more independent variables by estimating the probability using a logistic function (i.e., cumulative logistic distribution); gradient enhancement (GB) is a statistical learning technique for regression and classification problems that produces prediction models in the form of a collection of weak prediction models, the idea of gradient enhancement stems from the observation of rio brahman; support vector machines (SVMs, also called support vector networks) are supervised learning models with associated learning methods for analyzing data for classification and regression analysis. Accordingly, the 3 deep learning methods are: neural Networks (NNs) or connectionless systems are a computing system that is ambiguously inspired by biological neural networks that make up an animal brain, represent traditional neural networks, and attempt to learn features through hidden layers; the long-short term memory recurrent neural network (LSTM-RNN) is an extension of RNN and has become popular because the architecture is able to handle long-term memory and avoid the vanishing gradient problem experienced by traditional RNNs; one-dimensional convolution (Conv1D) is a form of convolution of Convolutional Neural Networks (CNNs), commonly used in sequence modeling and natural language processing.

More, the above 6 statistical models are implemented by the interface provided by scimit-spare, with the parameters set as default parameters. The neural network model is set to 4 hidden layers and each hidden layer corresponds to 245 neurons, 1 softmax fully connected layer, the activation function is relu and the optimizer is random gradient descent (SGD), except that the initial parameter for each layer is uniform. The long-short term memory recurrent neural network comprises an LSTM layer and has 128 hidden units and three hidden layers with 128, 256 and 512 neural units, correspondingly, the last layer is a softmax full-link layer, the optimizer is SGD, and parameters are learning rate 0.01, latency 0.9 and momentum 0.8. The parameters of Conv1D were set to contain 4 hidden layers and 256 hidden neural units each, the last layer was the softmax fully connected layer, the activation function was relu and the optimizer was Adam.

Table 5 and fig. 6 show the experimental results of the accuracy of the training set and the validation set obtained by applying 6 statistical models, 3 deep learning models and CSP-CNN to the traffic accident data set. The result shows that the CSP-CNN model provided by the inventor is superior to other statistical models and deep learning models in the accuracy of the test set, which shows that the CSP-CNN can be well generalized on a new traffic accident data set. Although the CSP-CNN is not the highest in the accuracy of the training set, the DT model with the highest accuracy of the training set apparently shows the over-fitting phenomenon, while the CSP-CNN with the second highest of the training set does not. One possible reason is that when the traffic accident data set vector is treated by the statistical model, the traffic accident data set features are considered to have no local correlation, and spatial, combinatorial and deeper internal relations among the traffic accident data set features are ignored. Similarly, for deep learning models, the deep learning models cannot analyze the spatial relationship between the traffic accident data set features from the perspective of the model structure, and the traffic accident data set features have strong correlation and intrinsic relationship. The CSP-CNN model is locally perceived, and can fully extract the spatial relationship, the combination relationship and the deeper internal relationship among the traffic accident data set characteristics, and the simple description is shown in FIG. 7. FIG. 7 is a pixel matrix form of traffic accident data set image, as can be seen from FIG. 7, the CSP-CNN model passes through a filter (convolution kernel) of specific size, on one hand, it can extract corresponding traffic accident features according to different importance of sub-features (e.g. 12 traffic accident sub-features in FIG. 7), on the other hand, the CSP-CNN model fully utilizes the specificity capability of local perception, it can not be considered as having no relation between features, it can extract features after sub-feature combination with spatial relation and intrinsic relation, for example, in FIG. 7, the filter is learning to extract sub-feature lighting condition under sliding window, weather condition, casualty category, casualty age, number of vehicles involved, traffic accident features after east and north combination, and clearly shows how the CSP-CNN model mentioned herein can extract traffic accident features rich in spatial relation, Traffic accident features that combine relationships and deeper intrinsic relationships.

TABLE 5 accuracy under different model experiments

The essential purpose of predicting the severity of the casualties of the traffic accidents is to provide corresponding medical assistance for the personnel involved in the traffic accidents in time, reduce casualties of the accidents, inform the corresponding emergency decision-making departments in time and avoid causing greater property loss. To this end, we further analyzed the predicted severity of casualties of traffic accidents into three degrees: namely, light traffic accidents, serious traffic accidents and fatal traffic accidents. Since correctness is not the only index for evaluating the prediction capability of the model, and in order to combine the practical application scenarios of the model, we introduce accuracy, recall and F1 Score to analyze the traffic accident test set, wherein the calculation formula of the accuracy is as follows:

wherein tp (true positive) represents a true positive case, that is, the true category is a positive case, and the prediction category is a negative case; fp (false positive) indicates a false positive case, i.e., the true class is a negative case and the predicted class is a positive case.

The recall ratio is calculated as follows:

fn (false negative) indicates a false negative case, i.e. the true category is a positive case and the predicted category is a negative case.

The formula for F1 Score is as follows:

table 6 and fig. 8 are experimental results of the accuracy, recall and F1 Score of different models in the traffic accident test set for light, severe and fatal traffic accidents.

TABLE 6 accuracy, recall and F1 Score for different model predictions for different severity of casualties due to traffic accidents

As can be seen from table 6 and fig. 8, the results on the light traffic accident test set show that the accuracy of the CSP-CNN model is highest compared with the other models, and the recall rate is highest for the statistical model GB; the results on the serious traffic accident test set show that the CSP-CNN has the highest precision rate and recall rate compared with other nine models; the results on the fatal traffic accident test set show that the accuracy and recall rate of the CSP-CNN, NN and Conv1D are first in parallel compared with other models. In combination with the actual situation analysis, for the prediction of a light traffic accident, we can allow a certain error in the accuracy rate of the prediction, because the light traffic accident does not cause the serious casualties and the serious property loss of the injured people with a high probability, while for the serious and fatal traffic accidents, the requirement on the accuracy of the prediction must be higher, because if the prediction is slightly inaccurate, the corresponding emergency medical support and the corresponding decision of the emergency department may not be provided, and finally, the serious casualties and the property loss are brought, so that the performance of the CSP-CNN model is better than that of other models from the aspect of combining the specific situation. Generally, the accuracy and the recall rate are mutually influenced, the accuracy is high, and the recall rate is low; the recall rate is low, the accuracy rate is high, and ideally, both are definitely high, and for the sake of fairness and objectivity, it is generally considered that the performance of the model is evaluated by using a comprehensive index F1 Score which is closely related to the accuracy rate and the recall rate. As can be seen from the results, in both the light and severe traffic accident test sets, the F1 Score of the CSP-CNN model mentioned herein is higher than the other models, while in the fatal traffic accident test set, the F1 Score of the CSP-CNN model is juxtaposed with NN, Conv1D at number 1.

In conclusion, the CSP-CNN model has better performance than other models, no matter from the analysis of the accuracy of model prediction or the analysis of traffic accidents under different severity degrees in consideration of specific application situations.

A deep learning CSP-CNN model is presented to predict traffic accident severity. Unlike previous simple structures that only focus on traffic accident data, the proposed method can successfully deliver a representation of the severity of the traffic accident, such as a non-linear spatiotemporal relationship, a combinatorial relationship, and a deeper-level intrinsic relationship between traffic accident features. Based on the traffic accident data set between 2009 and 2016 of the Ritz conference, experiments are carried out, and the model CSP-CNN provided by the text is compared with NBC, KNN, LR, DT, GB, SVM, Conv1D, NN and LSTM-RNN models, and the experimental results show that the performance of the model provided by the text is superior to that of other models.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. The modeling method of the CSP-CNN model for predicting the severity of the traffic accident is characterized by comprising the following specific steps of:

g(h)＝max(0，h)； (2)

wherein h is the input of the convolution neuron;

the convolution calculation formula is:

wherein: a is_k，lThe kth line and the l column of the convolutional layer Feature Map are represented, wherein the value ranges of k and l are [1, F](ii) a C is the number of channels, which is the same as the number of filters of the convolutional layer; f is the size of the filter, and the width and the height of the filter are the same; w is a_c，e，fThe weight of the ith row and the fth column of the c channel filter is represented; p is a radical of_c，e，fPixel elements of an e-th row and an f-th column of the c-th channel gray scale image; w is a_bRepresenting the bias item of the filter, randomly initializing w each time the model runs_b；

An input that is a convolutional neuron;

a^flatten＝flatten([a₁，a₂，...，a_c])，c∈[1，C]； (4)

The calculation formula of the full connection layer is as follows:

wherein:

Predicting the severity of the traffic accident by using a Softmax activation function, outputting a probability value of a set traffic accident grade, wherein the traffic accident grade with the maximum probability value is the predicted severity of the traffic accident;

step five: training the CSP-CNN model for predicting the severity of the traffic accident, and confirming the hyperparametric combination of the CSP-CNN model;

2. The modeling method of the traffic accident severity prediction CSP-CNN model according to claim 1, wherein the step of preprocessing the traffic accident data set in step 1 is as follows:

3. The modeling method of the CSP-CNN model for predicting the severity of the traffic accident according to claim 1, wherein the feature matrix FM of the traffic accident data set is obtained in step 1, and the concrete steps are as follows:

fp＝{fp₁，…，fp_m}； (7)

and is

wc＝(w_1，1，…，w_i，j)； (9)

FV＝<fp，fc，wc>； (10)

FM＝{FV₁，...，FV_kand FM ∈ R^k×n； (11)

4. The modeling method of the traffic accident severity prediction CSP-CNN model according to claim 1, wherein the step 2 of converting the feature vector FV of a certain data in the traffic accident data set into a gray image specifically comprises the following steps:

5. The modeling method of the CSP-CNN model for traffic accident severity prediction according to claim 4, characterized in that in step 2.4 the all-zero matrix Mat is applied according to the original traffic accident data set^{max_dim×max_dim}The filling process includes the following steps:

step 2.4.1, parent feature descending order: according to the weight wp of each father feature, all the father features fp are arranged in a descending order, and the weight wp of a certain father feature_iIs equal to the importance weight w of all sub-features thereunder_i，jSum, i.e. wp_i＝∑w_i，j；

step 2.4.3. sub-feature descending order: according to the importance weight w of the child features under each parent feature_i，jSorting the sub-features below the sub-features in descending order;

step 2.4.4.Mat^{max_dim×max_dim}Column filling: in all-zero matrix Mat^{max_dim×max_dim}In the corresponding row, i.e. in each parent feature, all children in descending orderColumn filling is carried out according to the principle that the left side is larger than the right side;

6. The modeling method of the CSP-CNN model for predicting the severity of the traffic accident according to any of the claims 3-5, characterized in that the number m of the parent features of a certain piece of data in the original traffic accident data set is 5; the number n of sub-features of a certain piece of data in the original traffic accident data set is 12;

fc ═ east position_1.1North orientation position_1.2Road class 1_1.3Time of occurrence of an accident_1.4Number of vehicles involved in an accident_1.5Condition of road surface_2.6Illumination situation_3.7Weather conditions_3.8Type of vehicle_4.9Categories of casualty_5.10Sex of casualty_5.11Age of injury and death_5.12}。

7. The modeling method of the traffic accident severity prediction CSP-CNN model according to claim 3, wherein the determining of the importance weight vector wc of all the sub-features of a certain data in the original traffic accident data set in step 1.3 is performed by performing 1000 iterations on 12 sub-features of the traffic accident data set by using the XGboost method.

8. The modeling method of the CSP-CNN model for predicting the severity of the traffic accident according to claim 1, wherein the CSP-CNN model in the fifth step is characterized in that the hyper-parameters of the CSP-CNN model are combined as follows: the batch size is 128, the loss function is the crystalline cross sensitivity, the optimizer is a gradient descent optimizer, the learning rate is 0.001, the error term is 1e-07, and the initialization convolution kernel adopts a Glorot normal distribution initialization method.