CN112270355B

CN112270355B - Active safety prediction method based on big data technology and SAE-GRU

Info

Publication number: CN112270355B
Application number: CN202011172029.XA
Authority: CN
Inventors: 郝威; 吴其育; 戎栋磊; 张兆磊; 易可夫; 伍文广; 吴伟; 李永福; 王正武; 谷健
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2023-12-05
Anticipated expiration: 2040-10-28
Also published as: CN112270355A

Abstract

The invention discloses an active safety prediction method based on big data technology and SAE-GRU, firstly, an original data set is obtained and preprocessed to form a training data set; carrying out dynamic traffic running state identification based on cluster analysis by utilizing the training data set to obtain a sample data set with traffic running state labels; taking a sample data set with a traffic running state label as priori knowledge of classification analysis to generate a traffic running state classifier; constructing a data set for risk running state judgment by adopting a training data set, judging the risk running state according to different traffic running states to obtain a training data set with a risk running state label, training an SAE-GRU model by using the training data set, and obtaining an optimal SAE-GRU active safety prediction model by parameter adjustment; and performing active safety prediction by using an SAE-GRU active safety prediction model. The application range is wide, and the requirements of high-precision and high-efficiency prediction are met.

Description

Active safety prediction method based on big data technology and SAE-GRU

Technical Field

The invention belongs to the technical field of traffic state identification, and relates to a main road active safety prediction method based on a traffic big data technology and SAE-GRU.

Background

With the rapid increase of the vehicle holding quantity and the promotion of the urbanization process, traffic jam, accident frequency and management lag become main problems for obstructing traffic development. The traffic decision based on big data and deep learning is more intelligent, and support is provided for relieving traffic jam, optimizing road resources and improving safety indexes. Therefore, the characteristic parameters associated with traffic safety are mined by utilizing a big data technology, a safety model is established by utilizing deep learning, and a perfect active safety management method is established to be a research hotspot of an intelligent transportation system (ITS, intelligent Transport System).

In the aspect of traffic running state identification, calculation is complex when the sample size is large, and the phenomenon of failure in judgment is often caused; in the aspect of risk running state identification, the main component analysis method and the multiple regression analysis are mainly focused, and the reliability, the effectiveness and the universality of the identification are not evaluated. Therefore, it is necessary to construct a state recognition model with migratable learning.

In recent years, traffic data bursts have resulted in increased analysis difficulties and data value. For better analysis, mining and modeling, various state-tag based prediction methods are proposed. Although the existing research method promotes the accuracy of traffic state identification and safety prediction to a certain extent, the following defects still exist:

1. The relevance between the traffic running state and the active safety prediction method is poor, so that the information is disjointed, and the risk running state cannot be accurately predicted;

2. a model with universality in a certain field cannot be researched, so that a state identification method is narrow;

3. the traffic data volume is rapidly improved, the existing method can not meet the requirements of high-precision and high-efficiency prediction, and the applicability is reduced.

Disclosure of Invention

The embodiment of the invention aims to provide an active safety prediction method based on a big data technology and SAE-GRU (integrated automatic level control-ground control unit) so as to solve the problems that the existing state label-based prediction method is poor in relevance between traffic running states and active safety prediction methods, information is disjointed and risk running states cannot be accurately predicted, and the existing state label-based prediction method is narrow in application range and cannot meet high-precision and high-efficiency prediction requirements.

The technical scheme adopted by the embodiment of the invention is that the active safety prediction method based on the big data technology and SAE-GRU is carried out according to the following steps:

step S1, acquiring an original data set, wherein the original data set comprises a plurality of single data sets, each single data set comprises 5 types of characteristic data such as average speed, average acceleration, average occupancy, average queuing time and average travel time and 1 types of conventional data such as average standard time in a delta t time interval, and preprocessing the original data set to form a training data set;

S2, carrying out dynamic traffic running state identification based on cluster analysis by utilizing the training data set obtained in the step S1, and obtaining a sample data set with traffic running state labels;

s3, taking a sample data set with a traffic running state label as priori knowledge of classification analysis to generate a traffic running state classifier;

s4, constructing a data set for risk running state judgment by adopting the training data set formed in the step S1, and judging the risk running state of the constructed data set for risk running state judgment based on a fuzzy comprehensive evaluation method according to different traffic running states to obtain a training data set with a risk running state label;

s5, building an SAE-GRU model, performing a training process on the built SAE-GRU model by adopting the training data set with the risk running state label obtained in the step S4, and obtaining an optimal SAE-GRU active safety prediction model through parameter adjustment;

and S6, performing active safety prediction on the main road by using the obtained optimal SAE-GRU active safety prediction model, and predicting to obtain a risk running state in the next phase, namely in the next delta t time interval.

The embodiment of the invention has the beneficial effects that:

1) After different traffic running state classification results are obtained based on the dynamic traffic running state classifier, fuzzy comprehensive judgment is carried out on traffic flow attributes and vehicle running attributes under each traffic running state to obtain a risk running state, and finally active safety prediction is carried out by utilizing SAE-GRU based on a risk running state label, so that the traffic running state and the active safety prediction are tightly connected, and the problems that information is disjointed and the risk running state cannot be accurately predicted due to poor relevance between the traffic running state and the active safety prediction in the existing state label-based prediction method are solved.

2) The SAE-GRU active safety prediction model provided by the embodiment of the invention can be applied to different scenes (arterial roads, intersections, highways and the like), so that the model is beneficial to improving the universality and the mobility of prediction, effectively expanding the application range and solving the problem of narrow application range of the existing prediction method based on the state label.

3) According to the embodiment of the invention, the data characteristics are firstly obtained based on the SAE model, and then the high-efficiency and high-precision safety prediction is realized based on the GRU model. Therefore, the combination of SAE and GRU effectively excavates data features, optimizes the prediction process and improves training efficiency on the premise of guaranteeing prediction precision, and solves the problems that the existing prediction method based on the state label cannot meet the requirements of high-precision and high-efficiency prediction and the applicability is reduced.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic diagram of a dynamic state recognition model according to an embodiment of the present invention.

FIG. 2 is a graph showing the variation of the velocity coefficient of variation according to an embodiment of the present invention.

Fig. 3 is a graph showing the variation of the occupancy coefficient of variation according to the embodiment of the present invention.

FIG. 4 is a graph showing the variation of the coefficient of variation of the travel time according to the embodiment of the present invention.

Fig. 5 (a) is a velocity variation coefficient distribution diagram of an embodiment of the present invention.

Fig. 5 (b) is a distribution diagram of occupancy coefficient of variation in the embodiment of the present invention.

Fig. 5 (c) is a graph showing the coefficient of variation of the travel time according to the embodiment of the present invention.

FIG. 6 is a control reset/update gating diagram of an embodiment of the present invention.

FIG. 7 is a schematic diagram of h' update according to an embodiment of the present invention.

FIG. 8 is a schematic diagram of the operation of a GRU neural network in accordance with an embodiment of the invention.

FIG. 9 (a) is a graph comparing prediction accuracy with RMSE for groups 1-5 according to an embodiment of the invention.

FIG. 9 (b) is a graph comparing prediction accuracy with RMSE for groups 5-10 according to an embodiment of the invention.

FIG. 9 (c) is a graph comparing prediction accuracy with RMSE for groups 10-15 according to an embodiment of the invention.

Fig. 10 is a main road traffic flow data clustering result and state division diagram.

FIG. 11 is a graph of predicted time versus SAE-GRU, GRU, LSTM, CNN-LSTM.

FIG. 12 is a graph comparing MAE to RMSE for SAE-GRU, GRU, LSTM, CNN-LSTM.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention provides an active safety prediction method based on a big data technology and SAE-GRU, which comprises the following steps:

step S1, acquiring an original data set, wherein the original data set comprises a plurality of single data sets, each single data set comprises 5 types of characteristic data such as average speed, average acceleration, average occupancy, average queuing time and average travel time and 1 types of conventional data such as average standard time in a delta t time interval, and preprocessing the original data set to form a training data set:

Specific analyses were performed using sunset, california as an example, as follows:

firstly, an original data set is obtained, working days from 1 month in 2020 to 28 months in 2020 are taken as a study time period, traffic flow data are collected according to 30s from 5:00 to 24:00 of each working day, the data are fitted at intervals of 5min to form 9805 data sets, and each data set contains 5 types of characteristic data of speed, acceleration, occupancy, queuing time and travel time, and 1 type of conventional data in standard time.

Then, preprocessing the original data set collected in step S11 to form a training data set:

step 1, cleaning and filling data: calculating the deletion rate of each single data set in the sample data set, wherein the deletion rate of the single data set is equal to or greater than 80 percent, and deleting the single data set; when the missing rate of a single data set is smaller than 80%, filling non-characteristic data (such as time) in the single data set by adopting statistics, filling characteristic data (such as speed, acceleration and the like) in the single data set by using Lagrange interpolation methods of formulas (1) - (2), establishing a polynomial function related to a plurality of data points, and fully utilizing the time sequence of the data:

Wherein L (x) represents a missing value to be solved,to interpolate the basis function, x _j Represents the j-th position point, x _i Represents dividing by x _j Other position points outside, x represents the position point of the missing value to be solved, y _j Represents x _j The value of the position points, k represents the number of given value points;

step 2, feature transformation: the embodiment of the invention adopts a maximum-minimum normalization method to perform feature transformation on each numerical value of each type of feature data in the sample data set after data cleaning and filling, so that the numerical value of each type of data is in the range of [0,1], and the formula of the maximum-minimum normalization method is shown as formula (3):

wherein y is _new Representing the value after feature transformation, y representing the value before feature transformation, y _min Representing each of the sample data setsMinimum value of class characteristic data, y _max The maximum value of each type of characteristic data in a sample data set, which refers to the original data set, is represented.

Step S2, carrying out dynamic traffic running state identification based on cluster analysis by utilizing the training data set obtained in the step S1 to obtain a sample data set with traffic running state labels, wherein the dynamic traffic running state identification is carried out based on a fuzzy C-means clustering algorithm in the embodiment of the invention, as shown in fig. 1, and the specific steps are as follows:

Step S21, constructing a traffic flow data set X= [ X ] ₁ ,x ₂ ,…,x _i ,…x _n ] ^T And let each sample item X in the traffic flow data set X _i ＝(x _i1 ,x _i2 ,x _i3 ,x _i4 ,x _i5 ) Each sample item x _i Are each composed of traffic flow attribute parameters and vehicle operating parameters, wherein x is _i1 、x _i2 、x _i3 、x _i4 、x _i5 The average speed, the average occupancy, the average acceleration, the average delay time and the average travel time in the delta t time interval are respectively and correspondingly represented one by one, the average speed and the average occupancy in the delta t time interval are traffic flow attribute parameters, and the average acceleration, the average delay time and the average travel time in the delta t time interval are vehicle operation parameters; dividing all sample items in the traffic flow dataset X into k categories c= { C ₁ ,c ₂ ,…,c _k Each category represents a traffic state; and constructing a cluster loss function according to equation (4):

wherein J (U, X, C) represents a cluster loss function, X _i Representing an ith sample item in the traffic flow data set X, n being the total number of samples of the traffic flow data set X; k is the number of clusters, C _j Represents the j-th class c _j U represents a membership matrix, U= { U _ij }，u _ij Representing sample x _i For category c _j Membership degree of m is a modulusPaste weighted index, ζ represents the cluster penalty function constraint space, defined by equation (5):

step S22, initializing a membership matrix U, and calculating and updating the j-th category c according to formulas (6) - (7) _j Is a fuzzy clustering center C of (2) _j And a membership matrix U:

wherein d _ij Representing sample item x _i And the j th class c _j Is a fuzzy clustering center C of (2) _j Euclidean distance d of (2) _ih Representing sample item x _i And the h category c _h Is a fuzzy clustering center C of (2) _h Is a Euclidean distance of (2);

step S23, comparing the membership matrix U of the lambda+1st iteration ^λ+1 And the membership matrix U of the lambda-th iteration ^λ If U ^λ+1 -U ^λ I is less than or equal to ε or λ=λ _max Stopping iteration and outputting the current fuzzy clustering center C _j Each sample item X in the traffic flow dataset X _i For the current fuzzy clustering center C _j The membership degree matrix U of (1) is more than or equal to j and less than or equal to 3; otherwise, returning to the step S22, continuing iteration, wherein lambda is the iteration number and lambda is _max Epsilon is the iteration termination threshold for the maximum number of iterations. The embodiment of the invention sets the clustering number k to 3, which respectively represents three traffic states of smoothness, congestion and blockage, fuzzy weighting index m=2, and maximum iteration number lambda _max =1000, iteration termination threshold ε=10 ^-5 The method comprises the steps of carrying out a first treatment on the surface of the The cluster loss function constructed by the back-substitution of the formulas (6) and (7) is minimized when meeting U ^λ+1 -U ^λ I is less than or equal to ε or λ=λ _max And when the target function is optimal, the loss is minimum.

Step S24, fuzzy clustering center C output according to step S23 _j Each sample item X in the traffic flow dataset X _i For the current fuzzy clustering center C _j The membership degree matrix U of (1) is less than or equal to j is less than or equal to 3, and the traffic running state type label set Y corresponding to the traffic flow data set X is determined by a maximum membership degree method, wherein Y= { y|y _i =τ; i=1, 2, …, n, τ=1, 2,3}, where y _i For each sample item X in the traffic flow dataset X _i The traffic running state category labels of (1), 2 and 3 are the fuzzy clustering center C with the current ₁ ～C ₃ Cluster labels corresponding to each other one by one;

step S25, visualizing the traffic flow data set X on the three-dimensional space of average speed, occupancy and travel time according to the traffic running state type label set Y obtained in step S24, wherein the Y in the clustering result is represented by "+" _i Sample term corresponding to when=1, denoted y by "" _j The corresponding sample item when=2, y is denoted by "+ _k Sample term corresponding to =3, where i, j, k e [1, n ]]Obtaining a cluster analysis chart; determining traffic running states corresponding to the cluster labels 1,2 and 3 in the class label set Y according to the cluster analysis graph to obtain each sample item X in the traffic flow data set X _i Traffic running state category label y of (2) _i Then get sample data set Γ= { (X, Y) | (X) with traffic running state label ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _i ,y _i ),…,(x _n ,y _n ) -a }; meanwhile, in order to eliminate interference of different dimension of characteristic parameters to the classifier, a sample data set Γ with traffic running state labels is normalized by adopting a formula (3).

And S3, taking sample data with traffic running state labels as priori knowledge of classification analysis, and generating a traffic running state classifier.

According to the embodiment of the invention, a support vector machine (Support Vector Machines, SVM) is selected as a classifier for traffic state identification, a nonlinear support vector machine is specifically adopted as a classifier for traffic state identification, a sample data set gamma with a traffic running state label is used as a data set of the nonlinear support vector machine, then the traffic state represented by traffic flow parameters is learned through offline classification evaluation model training, and finally the traffic running state classifier is obtained, and real-time or future traffic flow states are subjected to online classification prediction, wherein the method comprises the following specific implementation procedures:

step S31, constructing and solving a convex quadratic programming problem, wherein the formula (8) is as follows:

wherein Q (α) is an optimal Lagrangian multiplier objective function, K (x) _i ,x _z ) Representing a kernel function, n is the total number of samples of a sample data set gamma with traffic running state labels, alpha _i ,α _z All are Lagrangian multipliers, and C is a penalty coefficient. In the embodiment of the invention, a penalty coefficient c=1.1 is set, and the kernel function is a Radial Basis (RBF) function, as shown in formula (9):

where γ denotes the coefficient of the kernel function, set to be 0.2 which is the inverse of the number of features of the input traffic flow data set X, and g denotes the kernel function width.

Solving to obtain an optimal Lagrangian multiplier solution

Step S32, calculating an optimal bias value b ^* ：

Selecting the optimal Lagrangian multiplier solution α in step S31 ^* Is a component element of (a)Satisfies the conditionAccording to->The subscript l of (1) selects x from the sample data set Γ with traffic running status label _l And y _l Then calculate b according to equation (10) ^* ：

Step S33, solving the classification decision function f (x _i )：

Wherein f (x) _i ) Namely, the generated traffic running state classifier is used for representing the traffic running state classification result of the ith sample item in the sample data set Γ with the traffic running state label, and sgn () represents a sign function. Returning the sign of the parameter, namely completing one-time classification problem. The traffic running state recognition system constructed by the embodiment of the invention is a three-classification problem, a support vector machine model needs to be expanded, a plurality of support vector machine classifiers are established, the construction of a multi-class classifier is realized by combining a plurality of two classifiers, and the embodiment of the invention utilizes a LIBSVM tool box to construct the multi-class classifier by adopting a one-to-one method.

Step S4, constructing a data set for risk running state judgment by adopting the training data set formed in the step S1, and judging the risk running state of the constructed data set for risk running state judgment based on a fuzzy comprehensive evaluation method according to different traffic running states to obtain the training data set with risk running state labels, wherein the specific implementation process is as follows:

step S41, establishing a factor set P, P= [ S ] of the judgment object aggregated by deltat as time interval by using the training data set of step S1 _V ,S _O ,S _T ]Wherein S is _V Represents the variation coefficient of the vehicle speed S _V ＝σ _V /V，V represents the average vehicle speed within the Δt time interval, σ _V Representing the standard deviation of the vehicle speed in the delta t time interval; s is S _O Representing the coefficient of variation of occupancy, S _O ＝σ _O O, O represents the average occupancy, σ, over the Δt time interval _O Representing the standard deviation of occupancy within the Δt time interval; s is S _T Representing the coefficient of variation of the travel time S _T ＝σ _T T, T represents the average travel time in the Δt time interval, σ _T Representing the standard deviation of travel time within the Δt time interval; in the embodiment of the invention, Δt is 5 minutes.

Step S42, establishing a judgment comment set F, wherein F= [ F ] ₁ ,f ₂ ,f ₃ ]，f ₁ Indicating that the risk operating state is in a low risk operating state, f ₂ Indicating that the risk operating state is in the risk operating state, f ₃ Indicating that the risk operating state is in a high risk operating state;

step S43, establishing a fuzzy relation matrix R, wherein R is a fuzzy mapping R E (P-F) from the evaluation object factor set P to the evaluation comment set F, R= (R (S) _V ),R(S _O ),R(S _T )) ^T Thereby, each judgment factor S in the judgment object factor set P can be induced _V 、S _O 、S _T Fuzzy relation matrix R (S) for membership degree of evaluation comment set F _V )、R(S _O )、R(S _T ) Wherein R (S) _V )＝(r ₁₁ ,r ₁₂ ,r ₁₃ )，r ₁₁ Representing the degree of membership of the velocity coefficient of variation to low risk, r ₁₂ Representing the degree of membership of the velocity coefficient of variation to the medium risk, r ₁₃ Representing the degree of membership of the velocity coefficient of variation to high risk; r (S) _O )＝(r ₂₁ ,r ₂₂ ,r ₂₃ )，r ₂₁ Representing the membership of the occupancy coefficient of variation to low risk, r ₂₂ Represents the membership degree of the occupancy coefficient of variation to the middle risk, r ₂₃ Representing the membership degree of the occupancy coefficient of variation relative to high risk; r (S) _T )＝(r ₃₁ ,r ₃₂ ,r ₃₃ )，r ₃₁ Representing the degree of membership of the coefficient of variation of the journey time with respect to low risk, r ₃₂ Represents the degree of membership of the coefficient of variation of the journey time to the middle risk, r ₃₃ Representing the degree of membership of the coefficient of variation of the journey time with respect to the high risk. As shown in the formula (12), trapezoidal membership functions are adopted in the embodiment of the invention:

wherein,r _μθ the specific solving process of (2) is shown in the formulas (19) to (27).

Step S44, classifying the traffic running state of the sample item corresponding to the judgment object factor set P in the traffic flow data set X by using the traffic running state classifier generated in the step S3, and then obtaining the judgment object factor set P with the traffic running state label based on the traffic running state classification result; then according to different traffic running states, establishing a fuzzy weight matrix S= (S) by using a judgment object factor set P with traffic running state labels ₁ ,s ₂ ,s ₃ )，s ₁ The influence degree s of speed variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states is shown ₂ Representing the influence degree s of occupancy coefficient of variation in the fuzzy relation matrix R on the risk running state under different traffic running states ₃ The influence degree of the travel time variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states is represented, and the specific implementation process is as follows:

step S441, classifying the traffic running state of the traffic flow data set X in the step S2 by using the traffic running state classifier generated in the step S3, and performing one-to-one correspondence between the generated traffic running state labels and the judgment object factor set P according to standard time characteristics to obtain the judgment object factor set P with the traffic running state labels; then, the entropy weight method is applied to the judgment object factor set P with the traffic running state label to determine the weight of each factor in the judgment object factor set P under different traffic running states, and The weights of all the factors in the judging object factor set P are normalized to obtain a weight matrix A= (A) _f ,A _c ,A _j ) ^T Wherein A is _f A weight vector representing the velocity variation coefficient, occupancy variation coefficient and travel time variation coefficient in the state of smooth flow, A _c A weight vector representing the velocity variation coefficient, occupancy variation coefficient and travel time variation coefficient in the crowded flow state, A _j A weight vector representing a velocity variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a choked flow state; solving to obtain A in the embodiment of the invention _f = (0.30,0.29,0.41), and the weight vector a of the velocity variation coefficient, occupancy variation coefficient and travel time variation coefficient in the crowded flow state is obtained by the same method _c = (0.25,0.35,0.40), weight vector a of velocity variance coefficient, occupancy variance coefficient, and travel time variance coefficient in the choked flow state _j ＝(0.2,0.28,0.52)。

Step S442, the clustering result shows that the partial data points have the phenomenon of unclear clustering, so the embodiment of the invention innovatively provides a fuzzy influence vector w _i ＝(w _i1 ,w _i2 ,w _i3 ) Fuzzy processing is carried out on the weight matrix A to eliminate the problem of unclassified clusters, wherein w is as follows _i1 Sample item pair clear flow clustering center in sample data set gamma with traffic running state label Degree of membership, w _i2 Representing the i-th sample item in the sample data set Γ with traffic running state label for crowded flow cluster center +.>Degree of membership, w _i3 Sample item of ith sample item in sample data set gamma with traffic running state label is indicated to block flow clustering center +.>The membership degree of (c) is solved by equation (13):

wherein, I _ij Representing ith sample item in data set gamma and different traffic state clustering centersIs a Euclidean distance of (2);

step S443, obtaining a fuzzy weight matrix s=w _i ×A。

Step S45, calculating a fuzzy synthetic value matrix B,wherein->Representing generalized fuzzy operators, common fuzzy operators include M (a), V-shaped body), M (V-shaped body), and (V-shaped body)>Embodiments of the invention employ->The operator balances the effect of each evaluation factor on the evaluation set, and is suitable for evaluating the overall index. Finally, B= (B) ₁ ,b ₂ ,b ₃ ) The fuzzy synthetic value matrix B represents fuzzy synthetic evaluation results of the evaluation object, wherein B is as follows ₁ Representing low risk running state F of fuzzy comprehensive evaluation result relative judgment comment set F ₁ Membership degree of b ₂ Representing the risk running state F of the fuzzy comprehensive evaluation result relative to the judgment comment set F ₂ Membership degree of b ₃ High risk running state F representing fuzzy comprehensive evaluation result relative judgment comment set F ₃ Membership degree of b _θ θ=1, 2,3 is calculated as equation (14):

step S46, determining a fuzzy comprehensive judgment result z, namely a risk running state category, according to the fuzzy synthetic value matrix B:

the embodiment of the invention adopts a maximum membership method to determine the fuzzy comprehensive judgment result, namely, selects the element (F) of the judgment comment set F corresponding to the maximum value in the fuzzy synthetic value matrix B ₁ 、f ₂ Or f ₃ ) The indicated risk operating state is taken as the final judging result, namely z=f _θ ＝max b _θ ；θ＝1,2,3；

Step S47, based on the method for establishing the factor set P of the judgment object in step S41, constructing a data set for risk running state judgment by adopting the training data set formed in step S1

Wherein P is _i The ith sample item is the ith evaluation object factor set, namely the data set omega for risk running state evaluation, and n is the total sample amount of the data set omega for risk running state evaluation; calculating each sample item in the data set omega for risk running state judgment according to the steps S43-S46 to obtain a risk running state label data set Z corresponding to the data set omega for risk running state judgment, wherein the risk running state label data set Z is a fuzzy comprehensive judgment result set of all sample items of the data set omega for risk running state judgment, and carrying out normalization processing on the risk running state label data set Z by adopting a formula (3) to finally obtain a training data set with risk running state labels

Wherein z is _i And (3) a fuzzy comprehensive judgment result of the ith sample item of the data set omega for risk running state judgment.

To take it as a postTag data of continuous active traffic safety prediction is used as a data set of active traffic safety prediction, and the embodiment of the invention respectively enables the low-risk running state f to be ₁ =1, risk of running state in stroke f ₂ =2, high risk operating state f ₃ =3, i.e. the risk running state label dataset z= { z|z _i ＝f _θ ＝maxb _θ ；i＝1,2,…,n,θ＝1,2,3}。

Step S5, an SAE-GRU model is built, a training process is carried out on the built SAE-GRU model by adopting the training data set with the risk running state label (training data set psi with the risk running state label) obtained in the step S4, and an optimal SAE-GRU active safety prediction model is obtained through parameter adjustment, wherein the specific implementation process is as follows:

and S51, building an SAE model, inputting the training data set with the risk running state label obtained in the step S4 into the SAE model to train the SAE model, processing the training data set with the risk running state label through the SAE model, extracting abstract features, namely data features in a high risk running state layer by layer, obtaining a feature data set output by the SAE model and an output loss value after training is completed, and judging to obtain the optimal SAE model according to the output loss value.

The self-encoder (AE) encodes an input x to obtain a new feature y, and expects the original input x to be reconstructed from the new feature y, as follows:

y＝f(Wx+b)； (15)

the encoding is that after linear combination, a nonlinear activation function f is added, and by using a new feature y, the input x can be reconstructed, namely, the decoding process:

x'＝f(W'x+b')； (16)

the resulting reconstructed x' and x are as consistent as possible, and this model can be trained with a loss function that minimizes negative log likelihood:

L＝-logP(x|x')； (17)

in equations (15) to (17), W and b represent weights and offsets at the time of encoding, and W 'and b' represent weights and offsets at the time of decoding.

The stacked self-encoder is cascaded by a plurality of self-encoders to complete the task of layer-by-layer feature extraction, and the finally obtained features are representative. The training process is that n AE training is sequentially performed, after the 1 st AE training is completed, the output of an encoder is used as the input of the 2 nd AE, and the finally obtained characteristic is used as the input of a classifier, so that the final classification training is completed.

The embodiment of the invention processes the traffic data set based on SAE, extracts the data characteristics under high risk layer by utilizing the stack type self-encoder, reduces the data dimension, and provides a low-dimensional and high-value data set for the safety prediction of the GRU model of the next stage. The SAE consists of a multi-layer sparse self-encoder, and a Softmax classifier and a multi-layer AE, and the training steps and flow are as follows:

Step S511, environment definition: the custom encoder is defined, the output layer using the Softmax classifier is defined, and the loss and optimizer is defined. Then, parameter initialization is carried out, a coordinator is created, a Softmax classifier is used in the fine tuning step, and two self-encoders to be applied are traversed;

step S512, taking the data set after traffic state identification as the input of SAE, executing the training of the first self-encoder, and taking the training result as the characteristic output of the first self-encoder;

step S513, the characteristic output of the upper network is used as the input of the lower network, and training is repeated according to step S411; cycling all batches, performing adaptation training by using batch processing data, and calculating average loss;

step S514, taking the feature output in step S413 as an input of the Softmax classifier, and training the Softmax classifier in combination with the initial data set;

step S515, repeating the steps S412-S414, calculating the cost value of each Epoch, and storing;

and step S516, obtaining the characteristic output of the data set after training all the data sets, and ending SAE training.

The SAE designed by the embodiment of the invention consists of a main function (SAE test) and three auxiliary functions (load data and init, autoencoder). SAE test is used for training process and extracting characteristic data, load data is used In loading the data set based on traffic state identification, init is used for loading initial parameters, autoencoder is used for loading a self-encoder, and 3 auxiliary functions are called by the main function. In order to debug out the SAE that is suitable for active safety prediction, the existing SAE needs to be fine-tuned with parameters. The embodiment of the invention selects the Batch size and the Learning rate as parameter adjustment objects, the Batch size influences the SAE training degree by adjusting the number of the data sets grabbed by each training, and the value is generally 2 ⁿ N=5, 6,7,8; the Learning rate affects the Learning rate for each iteration and provides valid information for the next iteration. Therefore, the embodiment of the invention sets the parameter value table shown in the table 1 for test, and takes the test precision before each fine tuning as the judgment index of the training quality.

TABLE 1 value of Batch size and Learning rate parameters and training result table

In the first stage test, the embodiment of the invention controls the Batch size to be 128, sets the Learning rate to be in gradient distribution, and takes the value of 0.001/0.003/0.005/0.007/0.009, so that the first stage test result of the table 1 is obtained, and the precision is the highest when the Batch size is 128,Learning rate and reaches 0.9149 in five groups of experiments. Based on the first stage test results, the embodiment of the invention controls the Learning rate to be 0.007. Setting the Batch size to be gradient distribution and taking a value of 32/64/128/256, thereby obtaining the second stage test result of table 1, wherein the precision is the highest when the Learning rate is 0.007 and the Batch size is 64 in four groups of experiments, and the precision reaches 0.9487. Thus, SAE with Learning rate=0.007 and batch size=64 was selected as the experimental model based on the above two-stage test results.

And S52, building a GRU model based on TensorFlow, designing a loss function and an optimizer of the GRU neural network, dividing a characteristic data set output by the optimal SAE model into a training set and a prediction set, executing a training process on the GRU model by using the training set, testing the GRU model by using the prediction set after a certain number of times of training is polled, comparing a prediction result with an actual result to calculate loss, then adjusting super parameters according to the loss to obtain the optimal GRU model, and outputting a risk running state in the next phase, namely in the next delta t time interval after training is completed to obtain the optimal SAE-GRU active safety prediction model.

First, the high risk characteristic state h transmitted from the previous time ^t-1 And the input of the current node, namely the characteristic data x with the risk running state label ^t To obtain two gating states. As shown in fig. 6, where r is reset gate, z is update gate, and σ is Sigmoid function to act as a gate signal. After receiving the gating signal, reset gating r is used to obtain reset state h ^t-1' ＝h ^t-1 r, then h ^t-1' And input x ^t And splicing, namely scaling data to an interval (-1, 1) through activating a function tanh (), wherein the principle is as shown in fig. 7, judging the traffic running state to obtain a data set with a risk running state label, then extracting a characteristic data set with the risk running state label under high risk by an SAE model, predicting the risk running state by using the characteristic data set with the risk running state label by using a GRU model, and realizing visualization through images. Finally, the GRU performs an update memorization phase, in which both forget and memorization steps are performed, based on the previous update gating z (update gate), the update formula is as follows:

h ^t ＝(1-z)h ^t-1 +zh ^t ； (18)

As can be seen from the formula (18), the range of the value of the update gating z is 0-1, the closer the value of the update gating z is to 1, the more data are memorized, and the closer the value of the update gating z is to 0, the more forgetting is represented; h is a ^t Representing the high risk characteristic state at the current time. Thus, based on the above-described GRU mechanism interpretation, a final GRU operation diagram is obtained, as shown in fig. 8. Then, the optimal prediction effect of active safety is realized through parameter adjustment: train hours influences the effect of model training by controlling the proportion of the training set to the test set; the Epochs and Batch size represent the cadence of the entire training process. The settings of the three parameters are shown in table 2.

The analysis process is divided into two stages:

step S521: under the same Train hours, the influence of different combinations of the Epochs and the Batch size on the prediction result is controlled;

step S522: after the optimal combination of the Epochs and the Batch size is determined, the prediction results under different Train hours are studied, and finally, an optimal GRU model is determined.

Step S521 is to compare 1-5, 6-10 and 11-15 groups, and obtain FIGS. 9 (a) to 9 (c) respectively by using prediction accuracy and RMSE as evaluation basis. Firstly, the prediction precision is relatively little under the combination of the same Train hours and different Epochs and Batch size, and the prediction precision has larger deviation under different Train hours. Wherein the prediction accuracy of group 1/6/11 is all the best; secondly, the RMSE gradually reflects the advantages and disadvantages of different groups along with the deepening of the training degree. Wherein groups 1/6/11 are all optimal combinations and are significantly better than the control group. Therefore, based on the analysis of fig. 9, it can be known that the SAE-GRU model proposed by the embodiment of the present invention has higher prediction accuracy, wherein the group 1/6/11 prediction effect is optimal, i.e., the model prediction effect is proved to be optimal in the case of epochs=1960, batch size=5.

Step S522 is to determine that epoch=1960, batch size=5 based on step S521, and then compare prediction accuracy, prediction time, RMSE of group 1/6/11. Table 3 was thus obtained. As can be seen from table 3, group 11 is the best value in both prediction accuracy, prediction time, and RMSE, i.e., the efficiency and accuracy of group 11 are demonstrated. Thus, based on the above analysis, it can be determined that group 11 is the best choice among all groups, i.e., when Train hours=6384, epochs=1960, batch size=5, the prediction accuracy and stability are the best.

TABLE 2 parameter settings for Train hours/Epochs/Batch size

TABLE 3 prediction accuracy, prediction time and RMSE for experimental group 1/6/11

Group of	Prediction accuracy (%)	Time(s)	RMSE
				1	76.204	38054	0.031284
6	89.788	36466	0.014847
				11	95.157	32265	0.005689

Status recognition results and analysis:

traffic flow data used for the experiments was derived from the american california department of transportation (PeMS) dataset. The study route is a city arterial road in los Angeles county, and the study period is working days of 1 month, 1 day and 2 months, 28 days in 2020. According to the embodiment of the invention, experimental analysis is carried out after holiday data in a research period are removed, firstly, pretreatment is carried out by utilizing the steps S1 and S2, then sample data are fitted by taking 5min as time granularity, 0-5 point data in the early morning are removed at the same time, and finally 9804 pieces of experimental data in total of 43 working days in the research period can be obtained, as shown in table 4.

TABLE 4 urban arterial road partial traffic flow parameter data

(1) Traffic running state recognition

And (3) performing cluster analysis on historical traffic data of the target road section by utilizing the steps 11-14, determining the category to which each sample data belongs according to the membership degree of each cluster center, automatically dividing the sample data into the category with the largest membership degree, and dividing 9804 samples into 3 categories by using a clustering result, respectively representing the samples by different symbols, and performing visual display by taking data of a certain day as an example, wherein the visual display is shown in fig. 10. As can be determined from fig. 10, the "+" symbol portion data has a lower travel time and a higher vehicle speed as a whole, and can be determined as a clear state "1", the "+" symbol portion is determined as a crowded state "2", the "++" symbol portion has a highest overall travel time, and the "3" symbol portion is determined as a blocking state, so that a traffic running state corresponding to the historical data for training the support vector machine classifier is finally obtained. And on the basis of the fuzzy C-means clustering state identification experiment, performing offline training on the historical traffic flow data by utilizing the steps S21 to S23 to obtain an online real-time traffic state classifier. The previous 30 days data are used as training sets, the later 13 days data are used as test sets, and meanwhile, common machine learning algorithms such as Decision Trees (DT), gradient lifting decision trees (GBDT), random Forests (RF), K Nearest Neighbors (KNN), logistic Regression (LR) and naive Bayes (GNB) are selected for prediction performance comparison, so that the result analysis of model real-time traffic state recognition is finally obtained, and is shown in the table 5.

Table 5 comparison of traffic state prediction performance for different methods

Algorithm	SVM	LR	RF	DT	GBDT	KNN	GNB
								Accuracy (%)	98.92	98.04	96.06	92.93	95.87	94.63	91.33

As can be seen from Table 5, the SVM classifier has a prediction accuracy as high as 98.92%, which is superior to other 6 machine learning classifiers. Therefore, the method can be used for real-time traffic state classification prediction, the experimental result meets the actual traffic condition, and the on-line traffic state identification of the urban arterial road can be realized.

(2) Risk running state discriminant analysis

The embodiment of the invention takes traffic parameter data detected by two detectors at the same section position of the los Angeles sunset channel 1 month and 1 day as an example for statistical analysis, and respectively calculates sigma _V ,σ _T ,σ _O ,S _V ,S _T ,S _O The change relation between them is shown in fig. 2 to 4.The coefficient of variation is a coefficient for measuring the variation amplitude difference of the index, is determined by standard deviation and average value, and can objectively reflect the discrete degree of the index. The smaller the coefficient of variation, the smaller the degree of dispersion and the lower the risk; conversely, the greater the risk. It can be seen from fig. 2 that the corresponding variation coefficient is at a higher value under the condition that the average speed suddenly fluctuates, and fig. 3 and 4 also show that the occupancy and the travel time have the same phenomenon. In order to establish the relationship between each coefficient of variation and the road risk running state, further, cumulative frequency distribution diagrams of the speed coefficient of variation, the occupancy coefficient of variation, and the travel time coefficient of variation in different intervals are respectively made, as shown in fig. 5 (a) to 5 (c). According to the change curve rule diagrams of the indexes of fig. 2 to 4 and the coefficient distribution diagrams of fig. 5 (a) to 5 (c), finally, a risk degree classification table of various kinds of variation coefficients can be constructed, so that the membership degrees of various kinds of variation coefficients corresponding to three traffic congestion states, namely smooth, crowded and blocked, can be determined, and table 6 is referred to.

Table 6 the index of each coefficient of variation corresponds to a different risk level classification table

Evaluation grade	Low risk	Low-medium critical	Risk in	Medium-high critical	High risk
						Coefficient of speed variation	[0,0.36]	(0.36,0.4)	[0.4,0.48]	(0.48,0.52)	≥0.52
Coefficient of occupancy variation	[0,0.55]	(0.55,0.6)	[0.6,0.7]	(0.7,0.75)	≥0.75
						Coefficient of variation of travel time	[0,0.366]	(0.366,0.4)	[0.4,0.5]	(0.5,0.566)	≥0.566

On the basis of the traffic running state identification, membership function formulas of the various variation coefficients in low-risk, medium-risk and high-risk running states are respectively constructed based on a risk level grading table, and the membership function formulas are shown in formulas (19) - (27).

/>

Wherein, the formulas (19) - (21) respectively represent membership functions of the speed variation coefficient relative to low risk, medium risk and high risk; formulas (22) to (24) respectively represent membership functions of the occupancy coefficient of variation with respect to low risk, medium risk and high risk; equations (25) to (27) represent membership functions of the travel time variation coefficient with respect to low risk, medium risk, and high risk, respectively. The steps S41 to S47 can be used to determine a training data set for active traffic safety prediction.

(3) Active safety prediction method analysis

To evaluate the superiority of SAE-GRU for active safety prediction, prediction accuracy and prediction efficiency will be analyzed based on the same traffic data and prediction environment. In the prediction accuracy analysis, GRU, LSTM, CNN-LSTM, SVM, KNN, the accuracy of random forests, was reproduced experimentally and compared with SAE-GRU. The GRU is used as a comparison object with the same attribute, and the other methods are used as comparison objects with different attributes; in the prediction efficiency analysis, the prediction time of SAE-GRU, GRU, LSTM, CNN-LSTM, MAE and RMSE are used as comparison parameters for analysis.

And (5) prediction accuracy analysis. The existing prediction method gradually coincides with the time sequence characteristics of traffic data, so that the prediction result is more objective. The comparison prediction method selected by the embodiment of the invention has a certain representativeness and has a certain research progress in the traffic prediction field. Therefore, it would be convincing to compare SAE-GRU with the above method. The predicted results are detailed in Table 7.

TABLE 7 active safety prediction accuracy under different methods

	SAE-GRU	GRU	LSTM	CNN-LSTM	SVM	KNN	Random forest
								Accuracy (%)	95.157	89.584	86.756	90.757	87.066	89.772	91.340

As can be seen from Table 7, the SAE-GRU can achieve 95.157% accuracy with only 86.756% LSTM, when predicted by different methods under the same conditions. Meanwhile, SAE-GRU is compared with GRU with the same attribute, so that the precision is obviously improved. The prediction precision of SAE-GRU is better than that of LSTM with complex structure and the optimization model CNN-LSTM. Based on the comparative analysis, the active safety prediction method provided by the embodiment of the invention not only can meet the safety prediction requirement, but also is superior to other representative methods.

Prediction efficiency analysis: to accurately evaluate the application effects of various prediction methods, embodiments of the present invention will analyze the efficiency of SAE-GRU, GRU, LSTM, CNN-LSTM. The time consumption is used as a core efficiency evaluation index, and the MAE and the RMSE are used as a result evaluation index. Based on the evaluation requirements, the comprehensive efficiency evaluation of each prediction method can be realized.

As can be seen from FIG. 11, the SAE-GRU prediction time is significantly lower than the GRU prediction time under the same environment, and the reduction reaches 10.45%. SAE-GRU was compared with LSTM, CNN-LSTM, with prediction times of only 85.25% and 88.81%. Thus, FIG. 11 illustrates that SAE-GRU has a more superior predictive capability. In fig. 12, MAE represents the average of absolute errors, RMSE represents the square root of the average of the square difference between the predicted value and the actual observed value, and MAE and RMSE of SAE-GRU are the lowest in all methods and their MAE is significantly lower than in other methods. That is, the efficiency of SAE-GRU was confirmed.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. The active safety prediction method based on the big data technology and the SAE-GRU is characterized by comprising the following steps:

step S41, establishing a factor set P, P= [ S ] of the judgment object aggregated by deltat as time interval by using the training data set of step S1 _V ,S _O ,S _T ]Wherein S is _V Represents the variation coefficient of the vehicle speed S _V ＝σ _V V, V represents the average vehicle speed, sigma, over a period of Δt _V Representing the standard deviation of the vehicle speed in the delta t time interval; s is S _O Representing the coefficient of variation of occupancy, S _O ＝σ _O O, O represents the average occupancy, σ, over the Δt time interval _O Representing the standard deviation of occupancy within the Δt time interval; s is S _T Representing the coefficient of variation of the travel time S _T ＝σ _T T, T represents the average travel time in the Δt time interval, σ _T Representing the standard deviation of travel time within the Δt time interval;

step S42, establishing a judgment comment set F, wherein F= [ F ] ₁ ,f ₂ ,f ₃ ]，f ₁ Representation ofLow risk operating conditions, f ₂ Representing a medium risk operating condition, f ₃ Representing a high risk operating condition;

step S43, establishing a fuzzy relation matrix R, as shown in a formula (12):

wherein R (S) _V ) Representing the variation coefficient S of the vehicle speed _V Fuzzy relation matrix for membership degree of evaluation comment set F, R (S _O ) Representing the coefficient of variation S of the occupancy _O Fuzzy relation matrix for membership degree of evaluation comment set F, R (S _T ) Representing the coefficient of variation S of the travel time _T A fuzzy relation matrix of membership degree of the judgment comment set F; r is (r) ₁₁ Representing the coefficient of variation of speed with respect to the low risk operating condition f ₁ Degree of membership of r ₁₂ Representing the velocity variation coefficient relative to the medium risk operating state f ₂ Degree of membership of r ₁₃ Representing the coefficient of variation of speed with respect to the high risk operating regime f ₃ Is a membership degree of (2); r is (r) ₂₁ Representing the coefficient of occupancy variation with respect to the low risk operating regime f ₁ Degree of membership of r ₂₂ Representing the coefficient of variation of occupancy relative to the mid-risk operating state f ₂ Degree of membership of r ₂₃ Representing the coefficient of occupancy variation with respect to the high risk operating regime f ₃ Is a membership degree of (2); r is (r) ₃₁ Representing the coefficient of variation of the travel time with respect to the low risk operating condition f ₁ Degree of membership of r ₃₂ Representing the coefficient of variation of the travel time with respect to the mid-risk operating state f ₂ Degree of membership of r ₃₃ Representing the coefficient of variation of the travel time with respect to the high risk operating condition f ₃ Is a membership degree of (2);

step S44, classifying the traffic running state of the sample item corresponding to the evaluation object factor set P in the traffic flow data set X by using the traffic running state classifier generated in step S3, and thenThen, based on the traffic running state classification result, obtaining a judgment object factor set P with a traffic running state label; then according to different traffic running states, establishing a fuzzy weight matrix S= (S) by using a judgment object factor set P with traffic running state labels ₁ ,s ₂ ,s ₃ )，s ₁ The influence degree s of speed variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states is shown ₂ Representing the influence degree s of occupancy coefficient of variation in the fuzzy relation matrix R on the risk running state under different traffic running states ₃ The influence degree of the travel time variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states is represented, and the specific implementation process is as follows:

Step S441, classifying the traffic running state of the traffic flow data set X in the step S2 by using the traffic running state classifier generated in the step S3, and performing one-to-one correspondence between the generated traffic running state labels and the judgment object factor set P according to standard time characteristics to obtain the judgment object factor set P with the traffic running state labels; then, determining the weights of all the factors in the judgment object factor set P under different traffic running states by using an entropy weight method on the judgment object factor set P with the traffic running state label, and carrying out normalization processing on the weights of all the factors in the judgment object factor set P to obtain a weight matrix A= (A) _f ,A _c ,A _j ) ^T Wherein A is _f A weight vector representing the velocity variation coefficient, occupancy variation coefficient and travel time variation coefficient in the state of smooth flow, A _c A weight vector representing the velocity variation coefficient, occupancy variation coefficient and travel time variation coefficient in the crowded flow state, A _j A weight vector representing a velocity variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a choked flow state;

step S442 of obtaining a fuzzy influence vector w _i ，w _i ＝(w _i1 ,w _i2 ,w _i3 ) Wherein w is _i1 Sample item pair clear flow clustering center in sample data set gamma with traffic running state label Degree of membership, w _i2 Representing the i-th sample item in the sample data set Γ with traffic running state label for crowded flow cluster center +.>Degree of membership, w _i3 Sample item of ith sample item in sample data set gamma with traffic running state label is indicated to block flow clustering center +.>Solving the membership degree of (3) as follows:

step S443, obtaining a fuzzy weight matrix s=w _i ×A；

Step S45, calculating a fuzzy synthetic value matrix B, b=s omicron R, omicron representing a fuzzy operator, the fuzzy synthetic value matrix b= (B) ₁ ,b ₂ ,b ₃ ) Wherein b ₁ Representing low risk running state F of fuzzy comprehensive evaluation result relative judgment comment set F ₁ Membership degree of b ₂ Representing the risk running state F of the fuzzy comprehensive evaluation result relative to the judgment comment set F ₂ Membership degree of b ₃ High risk running state F representing fuzzy comprehensive evaluation result relative judgment comment set F ₃ Membership degree b of (a) _θ The calculation of θ=1, 2,3 is shown in formula (14):

step S46, determining a fuzzy comprehensive judgment result z, namely a risk running state category, by adopting a maximum membership method according to the fuzzy synthetic value matrix B;

Wherein P is _i The ith sample item is the ith evaluation object factor set, namely the data set omega for risk running state evaluation, and n is the total sample amount of the data set omega for risk running state evaluation; each sample item in the data set omega for risk running state judgment is calculated according to the steps S43-S46, so that a risk running state label data set Z corresponding to the data set omega for risk running state judgment is obtained, and the risk running state label data set Z= { z|z _i ＝f _θ ＝max b _θ The method comprises the steps of carrying out a first treatment on the surface of the i=1, 2, …, n, θ=1, 2,3}; the risk running state label data set Z is a fuzzy comprehensive judgment result set of all sample items of the data set omega for risk running state judgment, and is normalized to finally obtain a training data set with risk running state labels

Wherein z is _i A fuzzy comprehensive judgment result of an ith sample item of the data set omega for risk running state judgment is represented;

2. The active safety prediction method based on big data technology and SAE-GRU according to claim 1, wherein the specific implementation procedure of step S6 is as follows:

firstly, preprocessing a predicted real-time traffic flow data set, then classifying traffic running states of the preprocessed predicted real-time traffic flow data set by using a generated traffic running state classifier, judging risk running states by using a step S4 based on traffic running state classification results to obtain a real-time traffic flow data set with risk running state labels, and then, carrying out active safety prediction on a main road by taking the real-time traffic flow data set with the risk running state labels as input data of an SAE-GRU active safety prediction model to predict and obtain a risk running state in the next phase, namely the next delta t time interval.

3. The active safety prediction method based on big data technology and SAE-GRU according to claim 2, wherein the preprocessing of the original data set in step S1 and the preprocessing of the predicted real-time traffic flow data set in step S6 is implemented as follows:

Step 1, cleaning and filling data: calculating the deletion rate of each single data set in the sample data set, wherein the deletion rate of the single data set is equal to or greater than 80 percent, and deleting the single data set; when the single data set deletion rate is less than 80%, filling non-characteristic data in the single data set by adopting statistics, and filling the characteristic data in the single data set by using Lagrangian interpolation methods of formulas (1) - (2):

wherein L (x) represents a missing value to be solved, L _j (x) To interpolate the basis function, x _j Represents the j-th position point, x _i Represents dividing by x _j Other position points outside, x represents the position point of the missing value to be solved, y _j Represents x _j The value of the position points, k represents the number of given value points;

step 2, feature transformation: carrying out feature transformation on each numerical value of each type of feature data in the sample data set after data cleaning and filling by adopting a maximum-minimum normalization method shown in a formula (3):

wherein y is _new Representing the value after feature transformation, y representing the value before feature transformation, y _min Representing the minimum value, y, of each type of feature data in the sample data set _max The maximum value of each type of characteristic data in a sample data set, which refers to an original data set or a predictively real-time traffic flow data set, is represented.

4. The active safety prediction method based on big data technology and SAE-GRU according to claim 1, wherein the step S2 is based on fuzzy C-means clustering algorithm to perform dynamic traffic running state identification, and the specific implementation process is as follows:

step S21, constructing a traffic flow data set X= [ X ] ₁ ,x ₂ ,…,x _i ,x _n ] ^T And let each sample item X in the traffic flow data set X _i ＝(x _i1 ,x _i2 ,x _i3 ,x _i4 ,x _i5 ) Each sample item x _i Are all traffic flow attribute parametersVehicle operating parameter composition, wherein x _i1 、x _i2 、x _i3 、x _i4 、x _i5 The average speed, the average occupancy, the average acceleration, the average delay time and the average travel time in the delta t time interval are respectively and correspondingly represented one by one, the average speed and the average occupancy in the delta t time interval are traffic flow attribute parameters, and the average acceleration, the average delay time and the average travel time in the delta t time interval are vehicle operation parameters; dividing all sample items in the traffic flow dataset X into k categories c= { C ₁ ,c ₂ ,…,c _k Each category represents a traffic state, k=3, and three categories represent three traffic states of clear, crowded and blocked, respectively; and constructing a cluster loss function according to equation (4):

Wherein J (U, X, C) represents a cluster loss function, X _i Representing an ith sample item in the traffic flow data set X, n being the total number of samples of the traffic flow data set X; k is the number of clusters, C _j Represents the j-th class c _j Is a fuzzy clustering center; u represents a membership matrix, U= { U _ij }，u _ij Representing sample x _i For category c _j Membership degree of (3); m is a fuzzy weighting index; ζ represents a cluster loss function constraint space, which is defined by formula (5):

step S23, comparing the membership matrix U of the lambda+1st iteration ^λ+1 And the membership matrix U of the lambda-th iteration ^λ If U ^λ+1 -U ^λ I is less than or equal to ε or λ=λ _max Stopping iteration and outputting the current fuzzy clustering center C _j Each sample item X in the traffic flow dataset X _i For the current fuzzy clustering center C _j The membership degree matrix U of (1) is more than or equal to j and less than or equal to 3; otherwise, returning to the step S22, continuing iteration, wherein lambda is the iteration number and lambda is _max The maximum iteration number is epsilon, and the iteration termination threshold value is epsilon;

step S25, visualizing the traffic flow data set X on the three-dimensional space of average speed, occupancy and travel time according to the traffic running state type label set Y obtained in the step S24 to obtain a cluster analysis chart; determining traffic running states corresponding to the cluster labels 1,2 and 3 in the class label set Y according to the cluster analysis graph to obtain each sample item X in the traffic flow data set X _i Traffic running state category label y of (2) _i Then get sample data set Γ= { (X, Y) | (X) with traffic running state label ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _i ,y _i ),…,(x _n ,y _n ) -a }; and simultaneously, normalizing the sample data set Γ with the traffic running state label.

5. The active safety prediction method based on big data technology and SAE-GRU according to claim 4, wherein the specific implementation process of step S3 is as follows:

where Q (α) is the optimal Lagrangian multiplier objective function, n is the total number of samples of the sample data set Γ with traffic running state labels, α _i ,α _z All are Lagrangian multipliers, and C is a penalty coefficient; k (x) _i ,x _z ) Representing a kernel function;

solving to obtain an optimal Lagrangian multiplier solution

Step S32, calculating an optimal bias value b ^* ：

Selecting the optimal Lagrangian multiplier solution α in step S31 ^* Is a component element of (a) Satisfy condition->According to->The subscript l of (1) selects x from the sample data set Γ with traffic running status label _l And y _l Then calculate b according to equation (10) ^* ：

Step S33, solving the classification decision function f (x _i )：

Wherein f (x) _i ) Namely, the generated traffic running state classifier is used for representing the traffic running state classification result of the ith sample item in the sample data set Γ with the traffic running state label, and sgn () represents a sign function.

6. The method for active safety prediction based on big data technology and SAE-GRU according to claim 1, wherein R in the fuzzy relation matrix R _μθ Calculated according to formulas (19) to (27):

7. the active safety prediction method based on big data technology and SAE-GRU according to any one of claims 1 to 6, wherein the specific implementation procedure of step S5 is as follows:

S51, building an SAE model, inputting the training data set with the risk running state label obtained in the step S4 into the SAE model to train the SAE model, processing the training data set with the risk running state label through the SAE model, extracting abstract features, namely data features in a high risk running state layer by layer, obtaining a feature data set output by the SAE model and an output loss value after training is completed, and judging to obtain an optimal SAE model according to the output loss value;