CN112270355A

CN112270355A - Active safety prediction method based on big data technology and SAE-GRU

Info

Publication number: CN112270355A
Application number: CN202011172029.XA
Authority: CN
Inventors: 郝威; 吴其育; 戎栋磊; 张兆磊; 易可夫; 伍文广; 吴伟; 李永福; 王正武; 谷健
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2021-01-26
Anticipated expiration: 2040-10-28
Also published as: CN112270355B

Abstract

The invention discloses an active safety prediction method based on big data technology and SAE-GRU, firstly, obtaining an original data set and preprocessing the original data set to form a training data set; carrying out dynamic traffic running state identification based on cluster analysis by utilizing a training data set to obtain a sample data set with a traffic running state label; taking a sample data set with a traffic running state label as prior knowledge of classification analysis to generate a traffic running state classifier; adopting a training data set to construct a data set for risk operation state judgment, carrying out risk operation state judgment according to different traffic operation states to obtain a training data set with a risk operation state label, training an SAE-GRU model by using the training data set, and simultaneously obtaining an optimal SAE-GRU active safety prediction model by parameter adjustment; and carrying out active safety prediction by using an SAE-GRU active safety prediction model. The application range is wide, and the requirements of high-precision and high-efficiency prediction are met.

Description

Active safety prediction method based on big data technology and SAE-GRU

Technical Field

The invention belongs to the technical field of traffic state identification, and relates to a main road active safety prediction method based on a traffic big data technology and SAE-GRU.

Background

With the rapid rise of automobile holding capacity and the advance of urbanization, traffic jam, frequent accidents and management delay become main problems which hinder traffic development. The traffic decision based on big data and deep learning is more intelligent, and support is provided for relieving traffic jam, optimizing road resources and improving safety indexes. Therefore, the construction of a perfect active safety management method by utilizing a big data technology to mine characteristic parameters related to traffic safety and applying deep learning to establish a safety model has become a research hotspot of an Intelligent Transportation System (ITS).

In the aspect of traffic running state identification, when the sample size is large, the calculation is complex, and the phenomenon of failure in judgment is often caused; in the aspect of risk operation state identification, the method mainly focuses on principal component analysis and multiple regression analysis, and the reliability, effectiveness and universality of identification are not evaluated. Therefore, it is necessary to construct a state recognition model with migratable learning.

In recent years, traffic data explosion has led to an increase in the difficulty of analysis and data value. For better analysis, mining and modeling, various state label based prediction methods are proposed. Although the existing research method promotes the accuracy of traffic state identification and safety prediction to a certain extent, the following defects still exist:

1. the relevance between the traffic running state and the active safety prediction method is poor, so that the information is disjointed, and the risk running state cannot be accurately predicted;

2. a model with universal applicability in a certain field cannot be researched, so that a state identification method is narrow;

3. the traffic data volume is rapidly increased, the existing method cannot meet the prediction requirements of high precision and high efficiency, and the applicability is reduced.

Disclosure of Invention

The embodiment of the invention aims to provide an active safety prediction method based on a big data technology and SAE-GRU (system architecture analysis-general rule Unit), and aims to solve the problems that the relevance of a traffic running state and the active safety prediction method is poor, information is disconnected, and a risk running state cannot be accurately predicted, and the problems that the application range of the existing prediction method based on a state label is narrow, and the high-precision and high-efficiency prediction requirements cannot be met.

The technical scheme adopted by the embodiment of the invention is that the active safety prediction method based on the big data technology and SAE-GRU is carried out according to the following steps:

step S1, obtaining an original data set, wherein the original data set comprises a plurality of single data sets, each single data set comprises 5 types of feature data including average vehicle speed, average acceleration, average occupancy, average queuing time and average travel time in a delta t time interval and 1 type of conventional data including average standard time, and the original data set is preprocessed to form a training data set;

s2, performing dynamic traffic running state recognition based on cluster analysis by using the training data set obtained in the step S1 to obtain a sample data set with a traffic running state label;

s3, using the sample data set with the traffic running state label as the prior knowledge of classification analysis to generate a traffic running state classifier;

s4, constructing a data set for risk running state judgment by adopting the training data set formed in the step S1, and judging the risk running state of the constructed data set for risk running state judgment according to different traffic running states based on a fuzzy comprehensive evaluation method to obtain a training data set with a risk running state label;

step S5, an SAE-GRU model is established, a training process is carried out on the established SAE-GRU model by adopting the training data set with the risk running state label obtained in the step S4, and an optimal SAE-GRU active safety prediction model is obtained through parameter adjustment;

and step S6, performing active safety prediction on the main trunk by using the obtained optimal SAE-GRU active safety prediction model, and predicting to obtain the risk running state in the next stage, namely the next delta t time interval.

The embodiment of the invention has the beneficial effects that:

1) after different traffic running state classification results are obtained based on a dynamic traffic running state classifier, fuzzy comprehensive judgment is carried out on traffic flow attributes and vehicle running attributes in all traffic running states to obtain a risk running state, SAE-GRU is finally used for carrying out active safety prediction based on a risk running state label, the traffic running state is tightly connected with the active safety prediction, and the problems that information is disconnected and the risk running state cannot be accurately predicted due to the fact that the traffic running state and the active safety prediction method are poor in relevance in the existing prediction method based on the state label are solved.

2) The SAE-GRU active safety prediction model provided by the embodiment of the invention can be applied to different scenes (main roads, intersections, expressways and the like), so that the model is favorable for improving the universality and the mobility of prediction, effectively expands the application range and solves the problem of narrow application range of the conventional prediction method based on the state label.

3) The embodiment of the invention firstly obtains the data characteristics based on an SAE model, and then realizes high-efficiency and high-precision safety prediction based on a GRU model. Therefore, the combination of SAE and GRU effectively excavates data characteristics, optimizes the prediction process on the premise of ensuring prediction accuracy, improves training efficiency, and solves the problems that the existing prediction method based on the state label cannot meet the prediction requirements of high accuracy and high efficiency and the applicability is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a dynamic state recognition model according to an embodiment of the present invention.

FIG. 2 is a graph of the variation of the coefficient of variation of velocity according to an embodiment of the present invention.

FIG. 3 is a graph showing the change in occupancy coefficient of variation according to an embodiment of the present invention.

FIG. 4 is a graph of variation of the travel time coefficient of variation according to an embodiment of the present invention.

FIG. 5(a) is a velocity coefficient of variation distribution plot according to an embodiment of the present invention.

FIG. 5(b) is a distribution diagram of occupancy coefficient of variation in accordance with an embodiment of the present invention.

FIG. 5(c) is a graph of the travel time coefficient of variation distribution according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating controlling reset/update gating according to an embodiment of the present invention.

FIG. 7 is a diagram illustrating h' update according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of the GRU neural network operation according to an embodiment of the present invention.

FIG. 9(a) is a graph comparing the prediction accuracy and RMSE for groups 1-5 of an embodiment of the present invention.

FIG. 9(b) is a graph comparing the prediction accuracy and RMSE for groups 5-10 of an embodiment of the present invention.

FIG. 9(c) is a graph comparing the prediction accuracy and RMSE for sets 10-15 of an embodiment of the present invention.

Fig. 10 is a graph of clustering results and status division of trunk traffic flow data.

FIG. 11 is a graph of the predicted temporal comparisons of SAE-GRU, LSTM, CNN-LSTM.

FIG. 12 is a graph comparing MAE and RMSE for SAE-GRU, LSTM, CNN-LSTM.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides an active safety prediction method based on big data technology and SAE-GRU, which is carried out according to the following steps:

step S1, obtaining an original data set, where the original data set includes a plurality of single data sets, each single data set includes 5 types of feature data, i.e., average vehicle speed, average acceleration, average occupancy, average queuing time, and average travel time, and 1 type of conventional data, i.e., average standard time, in a Δ t time interval, and the original data set is preprocessed to form a training data set:

the specific analysis was performed using the sunset avenue, california as an example, with the following steps:

firstly, acquiring an original data set, collecting traffic flow data according to 30s by taking a working day from 1 month and 1 day in 2020 to 28 days in 2020 and 2 months and 28 days in 2020 as a research time period, wherein the data are fitted at intervals of 5min to form 9805 data sets, and each data set comprises 5 types of characteristic data of vehicle speed, acceleration, occupancy, queuing time and travel time and 1 type of standard time conventional data.

Then, the raw data set collected in step S11 is preprocessed to form a training data set:

step 1, data cleaning and filling: calculating the missing rate of each single data set in the sample data set, wherein the missing rate of the single data set is the missing number of the data of the single data set/the total number of the data of the single data set, and deleting the single data set when the missing rate of the single data set is more than or equal to 80%; when the single data set loss rate is less than 80%, filling non-characteristic data (such as time) in the single data set by adopting statistics, filling characteristic data (such as speed, acceleration and the like) in the single data set by a Lagrange interpolation method of formulas (1) to (2), establishing a polynomial function for associating a plurality of data points, and fully utilizing the time sequence of the data:

wherein L (x) represents a missing value to be found,

as interpolation basis functions, x_jDenotes the j-th position point, x_iDenotes dividing by x_jOther position points than the above, x representing the position point of the missing value to be found, y_jDenotes x_jThe value of the position point, k represents the number of the given value points;

step 2, feature transformation: in the embodiment of the invention, a maximum-minimum normalization method is adopted to perform characteristic transformation on each numerical value of each type of characteristic data in a sample data set after data cleaning and filling are completed, so that the numerical value of each type of data is in a range of [0,1], and the formula of the maximum-minimum normalization method is shown as a formula (3):

wherein, y_newRepresenting the value after feature transformation, y representing the value before feature transformation, y_minRepresenting each class of features in a sample data setMinimum of the characterization data, y_maxAnd representing the maximum value of each type of feature data in a sample data set, wherein the sample data set refers to an original data set.

Step S2, performing dynamic traffic state recognition based on clustering analysis by using the training data set obtained in step S1, and obtaining a sample data set with a traffic state label, where in the embodiment of the present invention, the dynamic traffic state recognition is performed based on a fuzzy C-means clustering algorithm, as shown in fig. 1, and the specific steps are as follows:

step S21, constructing a traffic flow data set X ═ X₁,x₂,…,x_i,…x_n]^TAnd making each sample item X in the traffic flow data set X_i＝(x_i1,x_i2,x_i3,x_i4,x_i5) Each sample item x_iEach composed of traffic flow attribute parameters and vehicle operating parameters, wherein x_i1、x_i2、x_i3、x_i4、x_i5Respectively and correspondingly representing the average speed, the average occupancy, the average acceleration, the average delay time and the average travel time in a delta t time interval in a one-to-one manner, wherein the average speed and the average occupancy in the delta t time interval are traffic flow attribute parameters, and the average acceleration, the average delay time and the average travel time in the delta t time interval are vehicle operation parameters; dividing all sample items in a traffic flow data set X into k classes C ═ C₁,c₂,…,c_k-each category represents a traffic state; and constructing a clustering loss function according to equation (4):

wherein J (U, X, C) represents a clustering loss function, X_iRepresenting the ith sample item in the traffic flow data set X, wherein n is the total number of samples in the traffic flow data set X; k is the number of clusters, C_jRepresents the jth class c_jU represents a membership matrix, U ═ U_ij}，u_ijRepresents a sample x_iFor class c_jDegree of membership of, m is fuzzyThe weighting index, ξ, represents the cluster loss function constraint space, which is defined in equation (5):

step S22, initializing a membership degree matrix U, and calculating and updating the jth category c according to formulas (6) to (7)_jFuzzy clustering center C of_jAnd membership matrix U:

in the formula (d)_ijRepresenting a sample item x_iAnd the jth class c_jFuzzy clustering center C of_jEuclidean distance of d_ihRepresenting a sample item x_iAnd h category c_hFuzzy clustering center C of_hThe Euclidean distance of (c);

step S23, comparing the membership degree matrix U of the lambda +1 iteration^λ+1And a membership matrix U of the lambda iteration^λIf U^λ+1-U^λ| ≦ epsilon or λ ═ λ_maxStopping iteration and outputting the current fuzzy clustering center C_jAnd each sample item X in the traffic flow data set X_iFor the current fuzzy clustering center C_jJ is more than or equal to 1 and less than or equal to 3; otherwise, returning to the step S22, and continuing the iteration, wherein lambda is the iteration number, lambda_maxAnd epsilon is the maximum iteration number and the iteration termination threshold. The embodiment of the invention sets the clustering number k to be 3, respectively represents three traffic states of smooth traffic, congestion traffic and congestion traffic, the fuzzy weighting index m is 2, and the maximum iteration number lambda is_max1000, the iteration end threshold epsilon is 10^-5(ii) a The clustering loss function constructed by the formulas (6) and (7) is replaced to the minimum, and when the condition of | | | U is met^λ+1-U^λ| ≦ epsilon or λ ═ λ_maxAnd (4) when the target function is optimal, the loss is minimum.

Step S24, fuzzy clustering center C output according to step S23_jAnd each sample item X in the traffic flow data set X_iFor the current fuzzy clustering center C_jJ is more than or equal to 1 and less than or equal to 3, and a traffic running state class label set Y corresponding to the traffic flow data set X is determined by the maximum membership method, wherein Y is { Y | Y ═ Y%_iτ; i is 1,2, …, n, τ is 1,2,3, y_iFor each sample item X in the traffic flow data set X_iThe traffic running state class labels 1,2 and 3 are the fuzzy clustering centers C₁～C₃Clustering labels in one-to-one correspondence;

step S25, according to the traffic running state class label set Y obtained in step S24, the traffic flow data set X is visualized on the three-dimensional space of the average speed, the occupancy and the travel time, for example, using "+" to represent Y in the clustering result_iWhen 1 is true, y is represented by a_jWhen the sample term is 2, y is represented by "●_kThe sample term corresponding to 3, where i, j, k ∈ [1, n ]]Obtaining a clustering analysis chart; determining the traffic running states corresponding to the clustering labels 1,2 and 3 in the category label set Y according to the clustering analysis chart to obtain each sample item X in the traffic flow data set X_iTraffic running state category label y_iThen, obtaining a sample data set gamma { (X, Y) | (X) with a traffic operation state label₁,y₁),(x₂,y₂),…,(x_i,y_i),…,(x_n,y_n) }; meanwhile, in order to eliminate the interference of different dimensions of the characteristic parameters on the classifier, a formula (3) is adopted to carry out normalization processing on the sample data set gamma with the traffic operation state label.

And step S3, using the sample data with the traffic running state label as the prior knowledge of classification analysis to generate a traffic running state classifier.

In the embodiment of the invention, a Support Vector Machine (SVM) is selected as a classifier for recognizing the traffic state, a nonlinear SVM is specifically adopted as the classifier for recognizing the traffic state, a sample data set gamma with a traffic operation state label is taken as a data set of the nonlinear SVM, then the traffic state represented by traffic flow parameters is learned by performing offline classification evaluation model training, and finally the classifier for the traffic operation state is obtained, and real-time or future traffic flow state is subjected to online classification prediction, and the specific implementation process is as follows:

step S31, constructing and solving a convex quadratic programming problem, which is shown in formula (8):

wherein Q (alpha) is an optimal Lagrange multiplier objective function, K (x)_i,x_z) Representing a kernel function, n is the total number of samples of a sample data set gamma with a traffic running state label, and alpha_i,α_zAll are lagrange multipliers, and C is a penalty coefficient. In the embodiment of the present invention, a penalty coefficient C is set to be 1.1, and a kernel function is a Radial Basis (RBF) function, see formula (9):

where γ denotes a coefficient of a kernel function, and is set to 0.2 which is the reciprocal of the number of features of the input traffic flow data set X, and g denotes a kernel function width.

Solving to obtain the optimal Lagrange multiplier solution

Step S32, calculating the optimal offset value b^*：

Selecting the optimal Lagrangian multiplier solution alpha in step S31^*A component element of

Satisfies the conditions

According to

The subscript l selects x from a sample data set gamma with a traffic running state label_lAnd y_lThen b is calculated according to the formula (10)^*：

Step S33, solving classification decision function f (x)_i)：

In the formula, f (x)_i) Namely, the generated traffic operation state classifier represents the traffic operation state classification result of the ith sample item in the sample data set gamma with the traffic operation state label, and sgn () represents a sign function. And returning the signs of the parameters, namely completing the one-time two-classification problem. The traffic running state recognition system constructed in the embodiment of the invention is a three-classification problem, a support vector machine model needs to be expanded, a plurality of support vector machine classifiers are established, and the construction of a multi-class classifier is realized by combining a plurality of two classifiers.

Step S4, constructing a data set for risk running state judgment by adopting the training data set formed in the step S1, and judging the risk running state of the constructed data set for risk running state judgment based on a fuzzy comprehensive evaluation method according to different traffic running states to obtain the training data set with a risk running state label, wherein the specific implementation process is as follows:

step S41, using the training data set of step S1 to establish an evaluation target factor set P aggregated at time intervals of Δ t, where P is [ S ]_V,S_O,S_T]In which S is_VRepresenting the coefficient of variation of vehicle speed, S_V＝σ_VV represents the average vehicle speed within the time interval of delta t, sigma_VRepresenting the standard deviation of the vehicle speed in the time interval of delta t; s_ODenotes the coefficient of occupancy variation, S_O＝σ_OO, O represents the average occupancy in the time interval Δ t, σ_ORepresents the standard deviation of occupancy within the Δ t time interval; s_TRepresenting the coefficient of variation of the time of flight, S_T＝σ_TT, T denotes the mean travel time within the time interval Δ T, σ_TRepresents the standard deviation of travel time within the Δ t time interval; in the examples of the present invention,. DELTA.t was 5 minutes.

Step S42, establishing a judgment comment set F, wherein F is [ F ]₁,f₂,f₃]，f₁Indicating that the critical operating state is a low-risk operating state, f₂Indicating that the risky operating state is in a medium risky operating state, f₃Indicating that the risk operation state is in a high risk operation state;

step S43, establishing a fuzzy relation matrix R, wherein R is a fuzzy mapping R epsilon (P → F) from the evaluation object factor set P to the evaluation criterion set F, and R is (R (S → F)_V),R(S_O),R(S_T))^TThereby inducing each judgment factor in the judgment object factor set P, namely S_V、S_O、S_TFuzzy relation matrix R (S) of membership degree of evaluation comment set F_V)、R(S_O)、R(S_T) Wherein R (S)_V)＝(r₁₁,r₁₂,r₁₃)，r₁₁Representing the degree of membership of the coefficient of variation of velocity to low risk, r₁₂Representing the degree of membership of the coefficient of variation of velocity to the intermediate risk, r₁₃Representing the degree of membership of the coefficient of variation of velocity to a high risk; r (S)_O)＝(r₂₁,r₂₂,r₂₃)，r₂₁Representing the degree of membership of the occupancy coefficient of variation to the low risk, r₂₂Representing the degree of membership of the occupancy coefficient of variation to the intermediate risk, r₂₃Representing the degree of membership of the occupancy coefficient of variation to a high risk; r (S)_T)＝(r₃₁,r₃₂,r₃₃)，r₃₁Indicating that the coefficient of travel time variation is relatively lowDegree of membership of the risk, r₃₂Representing the degree of membership of the coefficient of travel time variation to the intermediate risk, r₃₃Representing the degree of membership of the travel time coefficient of variation to a high risk. As shown in formula (12), trapezoidal membership functions are adopted in the embodiments of the present invention:

wherein the content of the first and second substances,

r_μθsee the equations (19) to (27).

S44, carrying out traffic running state classification on sample items corresponding to the judgment object factor set P in the traffic flow data set X by using the traffic running state classifier generated in the step S3, and then obtaining a judgment object factor set P with a traffic running state label based on the traffic running state classification result; and then establishing a fuzzy weight matrix S (S) by utilizing the evaluation object factor set P with the traffic running state label according to different traffic running states₁,s₂,s₃)，s₁Representing the influence degree of the speed variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states, s₂Representing the influence degree of occupancy rate variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states, s₃The method represents the influence degree of the travel time variation coefficient in the fuzzy relation matrix R in different traffic running states on the risk running state, and the specific implementation process is as follows:

step S441, the traffic operation state classifier generated in step S3 is used for classifying the traffic flow data set X in step S2 in the traffic operation state, and the generated traffic operation state labels are in one-to-one correspondence with the judgment object factor sets P according to the standard time characteristics to obtain the judgment object factor sets P with the traffic operation state labels; then, the entropy weight method is applied to the evaluation object factor set P with the traffic running state label to determine the evaluation object under different traffic running statesThe weights of all factors in the factor set P are normalized, and a weight matrix A is obtained by normalizing the weights of all factors in the evaluation object factor set P (A)_f,A_c,A_j)^TWherein A is_fA weight vector representing a speed variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a smooth flow state, A_cA weight vector representing a speed variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a congested flow state, A_jA weight vector representing a velocity variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a choked flow state; in the embodiment of the invention, A is obtained by solving_fThe weight vector a of the speed variation coefficient, occupancy variation coefficient, and travel time variation coefficient in the congested flow state is obtained in the same manner as (0.30,0.29,0.41)_c(0.25,0.35,0.40), weight vector a of velocity coefficient of variation, occupancy coefficient of variation, and travel time coefficient of variation in choked flow state_j＝(0.2,0.28,0.52)。

Step S442, the clustering result shows that clustering unclear phenomena exist in part of data points, so that the embodiment of the invention innovatively provides the fuzzy influence vector w_i＝(w_i1,w_i2,w_i3) Blurring the weight matrix A to eliminate the clustering ambiguity problem, wherein w_i1Representing the ith sample item in the sample data set gamma with the traffic operation state label to the smooth flow clustering center

Degree of membership, w_i2Representing the ith sample item in the sample data set gamma with the traffic running state label to the crowd stream clustering center

Degree of membership, w_i3Representing the ith sample item in the sample data set gamma with the traffic running state label to the clustering center of the choked flow

Is solved according to the formula (1)3)：

In the formula I_ijRepresenting the clustering centers of the ith sample item and different traffic states in the data set gamma

The Euclidean distance of (c);

step S443, obtaining the fuzzy weight matrix S ═ w_i×A。

Step S45, calculating a fuzzy composite value matrix B,

wherein

Representing generalized fuzzy operators, common fuzzy operators include M (V, V), M (V, etc.),

The embodiment of the invention adopts

The operator gives consideration to the action of the evaluation set in a balanced manner according to all evaluation factors, and is suitable for evaluating the integral indexes. Finally obtaining B ═ B₁,b₂,b₃) The fuzzy composite value matrix B represents the fuzzy comprehensive evaluation result of the evaluation object, wherein B₁Representing the low-risk running state F of the fuzzy comprehensive evaluation result relative to the evaluation comment set F₁Degree of membership of b₂Representing the running state F of the risk in the fuzzy comprehensive evaluation result relative judgment comment set F₂Degree of membership of b₃High-risk running state F representing relative judgment comment set F of fuzzy comprehensive evaluation result₃Degree of membership of b_θθ is 1,2,3, as calculated in equation (14):

step S46, determining a fuzzy comprehensive evaluation result z, namely the risk running state category, according to the fuzzy synthetic value matrix B:

according to the embodiment of the invention, a maximum membership method is adopted to determine a fuzzy comprehensive evaluation result, namely, an element (F) of an evaluation set F corresponding to the maximum value in a fuzzy synthetic value matrix B is selected₁、f₂Or f₃) The indicated critical operating state is used as the final result of the evaluation, i.e. z ═ f_θ＝max b_θ；θ＝1,2,3；

Step S47, based on the establishing method of the judgment object factor set P in the step S41, the training data set formed in the step S1 is adopted to construct a data set for risk operation state judgment

Wherein, P_iThe method comprises the steps that (1) the ith sample item of an ith evaluation object factor set, namely a data set omega for evaluating the risk running state is evaluated, and n is the total amount of the samples of the data set omega for evaluating the risk running state; and calculating each sample item in the data set omega for risk running state judgment according to the steps S43-S46 to obtain a risk running state label data set Z corresponding to the data set omega for risk running state judgment, wherein the risk running state label data set Z is a fuzzy comprehensive judgment result set of all sample items of the data set omega for risk running state judgment, and the risk running state label data set Z is normalized by adopting a formula (3) to finally obtain a training data set with risk running state labels

Wherein z is_iAnd (3) representing the fuzzy comprehensive judgment result of the ith sample item of the data set omega for the risk running state judgment.

In order to facilitate the use of the data as tag data for subsequent active traffic safety prediction, i.e. as data set for active traffic safety prediction, the embodiments of the present invention respectively enable low riskOperating state f ₁1, medium risk operating state f ₂2, high risk operating state f₃And 3, obtaining a risk operation state label data set Z ═ { Z | Z ═ Z_i＝f_θ＝maxb_θ；i＝1,2,…,n,θ＝1,2,3}。

Step S5, an SAE-GRU model is established, a training process is executed on the established SAE-GRU model by adopting the training data set with the risk operating state label (the training data set psi with the risk operating state label) obtained in step S4, and an optimal SAE-GRU active safety prediction model is obtained by adjusting parameters, and the specific implementation process is as follows:

and S51, building an SAE model, inputting the training data set with the risk running state label obtained in the step S4 into the SAE model to train the SAE model, processing the training data set with the risk running state label through the SAE model, extracting abstract features, namely data features in a high-risk running state layer by layer, obtaining a feature data set output by the SAE model and an output loss value after training is finished, and judging to obtain the optimal SAE model according to the output loss value.

The self-encoder (AE) encodes the input x to obtain a new feature y, and it is desired that the original input x can be reconstructed from the new feature y, as follows:

y＝f(Wx+b)； (15)

after the coding is linear combination, a nonlinear activation function f is added, and by using a new characteristic y, the input x can be reconstructed, namely, the decoding process:

x'＝f(W'x+b')； (16)

the finally reconstructed x' and x are as consistent as possible, and a loss function minimizing negative log-likelihood can be adopted to train the model:

L＝-logP(x|x')； (17)

in equations (15) to (17), W and b denote a weight and an offset at the time of encoding, and W 'and b' denote a weight and an offset at the time of decoding.

The stacked self-encoder is formed by cascading a plurality of self-encoders to complete the task of feature extraction layer by layer, and finally obtained features are representative. The training process is that n AEs are trained in sequence, after the 1 st AE training is finished, the output of the encoder is used as the input of the 2 nd AE, the finally obtained features are used as the input of a classifier, and the final classification training is finished.

The embodiment of the invention processes the traffic data set based on SAE, extracts the data characteristics under high risk layer by using the stacked self-encoder, reduces the data dimension and provides a low-dimensional and high-value data set for the safety prediction of a GRU model at the next stage. The SAE consists of a multilayer sparse self-encoder, a Softmax classifier and a multilayer AE, and the training steps and the process are as follows:

step S511, environment definition: define the auto-encoder, define the output layer using the Softmax classifier, define the loss and optimizer. Then, performing parameter initialization and creating a coordinator, and traversing two self-encoders to be applied by using a Softmax classifier in a fine tuning step;

step S512, taking the data set after the traffic state recognition as the input of SAE, executing the training of the first self-encoder, and taking the result after the training as the characteristic output of the first self-encoder;

step S513, taking the characteristic output of the upper network as the input of the lower network, and repeating the training according to the step S411; circulating all batches, performing adaptation training by using batch processing data, and calculating average loss;

step S514, using the characteristic output in the step S413 as an input of a Softmax classifier, and training the Softmax classifier by combining an initial data set;

step S515, repeating the steps S412 to S414, calculating the cost value of each Epoch, and storing the cost values;

and step S516, obtaining the characteristic output of the data set after all the data sets are trained, and finishing SAE training.

The SAE designed by the embodiment of the invention consists of a main function (SAE test) and three side functions (load data, init and Autoencoder). SAE test is used for training process and extracting characteristic data, load data is used for loading a data set based on traffic state identification, init is used for loading initial parameters, Autoencoder is used for loading self-encoder, and a main function is usedThe number calls 3 side functions. In order to debug the SAEs used for active security prediction, the parameters of the existing SAEs need to be finely adjusted. According to the embodiment of the invention, the Batch size and the Learning rate are selected as the parameter adjusting objects, the Batch size influences the SAE training degree by adjusting the number of data sets captured by each training, and the value of the Batch size is generally 2ⁿN is 5,6,7, 8; the Learning rate affects the Learning rate of each iteration and provides valid information for the next iteration. Therefore, in the embodiment of the present invention, the parameter value taking table shown in table 1 is set for testing, and the test precision before each fine adjustment is used as the judgment index of the training quality.

TABLE 1 Batch size and Learning rate parameter evaluation and training results Table

In the first stage test, the embodiment of the present invention controls the Batch size to be 128, and sets the spare rate to be in a gradient distribution with the value of 0.001/0.003/0.005/0.007/0.009, so as to obtain the first stage test result in table 1, and the accuracy is highest when the Batch size is 128 and the spare rate is 0.007 in the five experiments, and reaches 0.9149. Based on the first stage test results, the embodiment of the present invention controls the Learning rate to be 0.007. The Batch size is set to be distributed in a gradient mode, and the value is 32/64/128/256, so that the second stage test result of the table 1 is obtained, and the precision is the highest when the Learning rate is 0.007 and the Batch size is 64 in the four groups of experiments, and the precision reaches 0.9487. Therefore, based on the above two-stage test results, SAE with Learning rate of 0.007 and Batch size of 64 was selected as the experimental model.

Step S52, building a GRU model based on TensorFlow, designing a loss function and an optimizer of a GRU neural network, dividing a characteristic data set output by the optimal SAE model into a training set and a prediction set, executing a training process on the GRU model by using the training set, testing the GRU model by using the prediction set after each polling for a certain training number of times, comparing a prediction result with an actual result to calculate loss, adjusting a hyper-parameter according to the loss to obtain the optimal GRU model, and obtaining risk operation state output in the next stage, namely the next delta t time interval after the training is finished, so that the optimal SAE-GRU active safety prediction model is obtained.

Firstly, the high-risk characteristic state h transmitted at the last moment^t-1And the input of the current node, namely the characteristic data x with the risk operation state label^tTo obtain two gating states. As shown in fig. 6, where r is reset gate, z is update gate, and σ is Sigmoid function to serve as the gate signal. After receiving the gating signal, the reset gating r is used to obtain the reset state h^t-1'＝h^t-1r, followed by h^t-1'And input x^tSplicing, zooming the data to an interval (-1,1) through an activation function tanh (), judging and obtaining a data set with a risk operation state label according to a traffic operation state as shown in the principle of fig. 7, then extracting a high-risk characteristic data set of the data set with the risk operation state label through an SAE model, and finally predicting the risk operation state through the GRU model by utilizing the characteristic data set with the risk operation state label and realizing visualization through images. Finally, the GRU executes an update memory phase, in which two steps of forgetting and memorizing are performed simultaneously, based on the previous update gate z (update gate), the update formula is as follows:

h^t＝(1-z)h^t-1+zh^t； (18)

as can be seen from the formula (18), the range of the value of the update gate z is 0-1, the closer the value of the update gate z is to 1, the more data is stored in memory, and the closer the value of the update gate z is to 0, the more data is forgotten; h is^tRepresenting the high risk characteristic state at the present moment. Thus, based on the above interpretation of the GRU mechanism, a final GRU run graph is obtained, as shown in fig. 8. Then, the optimal prediction effect of active safety is realized through parameter adjustment: train hours influence the effect of model training by controlling the proportion of a training set and a test set; epochs and Batch size represent the cadence of the entire training process. The settings of the three parameters are shown in table 2.

The analysis process is divided into two stages:

step S521: controlling the influence of different combinations of Epochs and Batch size on the prediction result under the same Train hours;

step S522: after determining the combination of the optimal Epochs and Batch size, researching the prediction results under different Train hours, and finally determining the optimal GRU model.

In step S521, comparison is performed according to groups 1 to 5,6 to 10, and 11 to 15, and the prediction accuracy and the RMSE are used as evaluation bases, so as to obtain FIGS. 9(a) to 9(c), respectively. First, the prediction accuracy is relatively small in the same Train hours and different combinations of Epochs and Batch sizes, and the prediction accuracy is greatly different in the different Train hours. Wherein the prediction accuracy of the set 1/6/11 is optimal; secondly, RMSE gradually shows the advantages and disadvantages of different groups along with the deepening of the training degree. Of these groups 1/6/11, all were the best combination and were clearly superior to the control group. Therefore, based on the analysis of fig. 9, it can be seen that the SAE-GRU model proposed in the embodiment of the present invention has higher prediction accuracy, where the group 1/6/11 predicts the best effect, i.e. the model predicts the best effect in the cases of Epochs 1960 and Batch size 5.

In step S522, the accuracy of prediction, the prediction time, and the RMSE of the group 1/6/11 are compared after the fact that the epoch is 1960 and the Batch size is 5 are determined in step S521. Thus, Table 3 was obtained. As can be seen from table 3, the best value of the set 11 is found in the prediction accuracy, prediction time and RMSE, which proves the efficiency and accuracy of the set 11. Therefore, based on the above analysis, it can be determined that group 11 is the best choice among all groups, i.e., when Train hours is 6384, Epochs is 1960, and Batch size is 5, the prediction accuracy and stability are the best.

TABLE 2 parameter settings for Train hours/Epochs/Batch size

TABLE 3 prediction accuracy, prediction time and RMSE for the experimental group 1/6/11

Group of	Prediction accuracy (%)	Time(s)	RMSE
				1	76.204	38054	0.031284
6	89.788	36466	0.014847
				11	95.157	32265	0.005689

And (3) state recognition result and analysis:

traffic flow data used for the experiments were derived from the data set of the U.S. california department of transportation (PeMS). The study road segment is a city backbone located in the county of los angeles, and the study period is a weekday of 1 month and 1 day to 2 months and 28 days in 2020. According to the embodiment of the invention, after removing holiday data in a research time period, experimental analysis is carried out, firstly, preprocessing is carried out by utilizing the steps S1 and S2, then, sample data is fitted by taking 5min as time granularity, and meanwhile, data of 0-5 points in the morning every day is removed, and finally, 9804 pieces of experimental data in the total of 43 working days in the research time period can be obtained, as shown in Table 4.

TABLE 4 partial traffic flow parameter data for urban arterial road

(1) Traffic state identification

And (3) performing cluster analysis on the historical traffic data of the target road section by using the steps 11 to 14, determining the category of each sample data according to the membership degree of each cluster center, automatically dividing the sample data into the category with the maximum membership degree, dividing 9804 samples into 3 categories according to the clustering result, respectively representing the 3 categories by using different symbols, and performing visual display by taking data of a certain day as an example, as shown in fig. 10. As can be determined from fig. 10, the data of the + symbol part has a lower travel time and a higher vehicle speed as a whole, and can be determined as the clear state "1", the data of the a-solidup symbol part is determined as the crowded state "2", the travel time of the ● part is the highest as a whole, and is determined as the blocked state "3", and finally, the traffic running state corresponding to the historical data for training the support vector machine classifier is obtained. On the basis of an experiment of fuzzy C-means clustering state identification, the steps S21-S23 are utilized to carry out off-line training on historical traffic flow data to obtain an on-line real-time traffic state classifier. The data of 30 days before is used as a training set, the data of 13 days after is used as a test set, common machine learning algorithms such as a Decision Tree (DT), a Gradient Boosting Decision Tree (GBDT), a Random Forest (RF), K Nearest Neighbor (KNN), Logistic Regression (LR) and naive Bayes (GNB) are selected to perform prediction performance comparison, and finally, the result analysis of the model real-time traffic state recognition is obtained, as shown in Table 5.

TABLE 5 comparison of traffic state prediction performance by different methods

Algorithm	SVM	LR	RF	DT	GBDT	KNN	GNB
								Accuracy (%)	98.92	98.04	96.06	92.93	95.87	94.63	91.33

As can be seen from Table 5, the prediction accuracy of the SVM classifier is as high as 98.92%, which is superior to other 6 machine learning classifiers. Therefore, the method can be used for real-time traffic state classification prediction, the experimental result meets the actual traffic condition, and the online traffic state identification of the urban main road can be realized.

(2) Risk running state discriminant analysis

In the embodiment of the invention, the traffic parameter data detected by two detectors at the same section position of 1 month and 1 day on the major street of los Angeles sunset are taken as an example for statistical analysis, and the sigma is respectively calculated_V,σ_T,σ_O,S_V,S_T,S_OThe variation relationship between them is shown in FIGS. 2 to 4. The coefficient of variation is the coefficient for measuring the difference of the change range of the index and is formed by the standard deviationThe average value is determined, and the dispersion degree of the index can be objectively reflected. The smaller the variation coefficient is, the smaller the discrete degree is, and the smaller the risk is; conversely, the greater the risk. It can be seen from fig. 2 that, under the condition that the average speed fluctuates sharply, the coefficient of variation is at a higher value, and fig. 3 and 4 also show that the occupancy and the travel time have the same phenomenon. In order to establish the relationship between each coefficient of variation and the road risk operation state, the cumulative frequency distribution maps of the speed coefficient of variation, the occupancy coefficient of variation, and the travel time coefficient of variation in different intervals are further generated, as shown in fig. 5(a) to 5 (c). According to the change curve rule graphs of the indexes in the graphs 2-4 and the coefficient distribution graphs in the graphs 5(a) -5 (c), a risk degree grading table of various kinds of variation coefficients can be finally constructed, so that the membership degrees of various kinds of variation coefficients respectively corresponding to three traffic jam states of smooth, crowded and blocked are determined, and the table 6 is referred.

Table 6 grading table for different risk grades corresponding to each variation coefficient index

Rating of evaluation	Low risk	Low-medium critical	Middle risk	Medium-high critical	High risk
						Coefficient of variation of velocity	[0,0.36]	(0.36,0.4)	[0.4,0.48]	(0.48,0.52)	≥0.52
Coefficient of occupancy variation	[0,0.55]	(0.55,0.6)	[0.6,0.7]	(0.7,0.75)	≥0.75
						Coefficient of travel time variation	[0,0.366]	(0.366,0.4)	[0.4,0.5]	(0.5,0.566)	≥0.566

On the basis of the identification of the traffic running state, membership function formulas of the variation coefficients in low-risk, medium-risk and high-risk running states are respectively constructed on the basis of a risk grade classification table, and see formulas (19) to (27).

Wherein, the expressions (19) to (21) respectively represent membership functions of the velocity variation coefficient relative to low risk, medium risk and high risk; expressions (22) to (24) represent membership functions of the occupancy coefficient of variation with respect to low risk, medium risk, and high risk, respectively; equations (25) to (27) represent membership functions of the travel time variation coefficient with respect to low risk, medium risk, and high risk, respectively. The training data set for active traffic safety prediction can be obtained through the steps S41-S47.

(3) Active safety prediction method analysis

To evaluate the superiority of SAE-GRU for active safety prediction, the prediction accuracy and prediction efficiency will be analyzed based on the same traffic data and prediction environment. In the prediction accuracy analysis, the accuracy rates of GRU, LSTM, CNN-LSTM, SVM, KNN and random forest are obtained through experimental reproduction and compared with SAE-GRU. Wherein GRU is used as the same attribute comparison object, and the other methods are used as different attribute comparison objects; in the analysis of the prediction efficiency, the prediction time of SAE-GRU, LSTM, CNN-LSTM, MAE, RMSE were analyzed as comparison parameters.

And (6) analyzing the prediction precision. The existing prediction method gradually conforms to the time series characteristics of traffic data, so that the prediction result is more objective. The comparison prediction method selected by the embodiment of the invention has certain representativeness and makes certain research progress in the field of traffic prediction. Therefore, it would be instructive to compare SAE-GRU with the above-described method. The predicted results are detailed in table 7.

TABLE 7 active safety prediction accuracy under different methods

	SAE-GRU	GRU	LSTM	CNN-LSTM	SVM	KNN	Random forest
								Precision ratio (%)	95.157	89.584	86.756	90.757	87.066	89.772	91.340

As can be seen from Table 7, SAE-GRU achieved 95.157% accuracy and the lowest LSTM was only 86.756% under the same conditions and predicted by different methods. Meanwhile, compared with the GRU with the same attribute, the SAE-GRU precision is obviously improved. The prediction precision of SAE-GRU is better than that of LSTM with complex structure and optimization model CNN-LSTM thereof. Based on the comparative analysis, the active safety prediction method provided by the embodiment of the invention can meet the safety prediction requirement and is superior to other representative methods.

And (3) analyzing the prediction efficiency: in order to accurately evaluate the application effects of various prediction methods, the embodiment of the invention analyzes the efficiency of SAE-GRU, LSTM and CNN-LSTM. The time consumption is used as a core efficiency evaluation index, and the MAE and the RMSE are used as result evaluation indexes. Based on the evaluation requirements, the comprehensive efficiency evaluation of each prediction method can be realized.

As can be seen from FIG. 11, the SAE-GRU prediction time is significantly lower than the GRU prediction time under the same environment, and the reduction reaches 10.45%. Comparing SAE-GRU with LSTM and CNN-LSTM, the predicted time is only 85.25% and 88.81%. Thus, FIG. 11 illustrates the superior prediction capabilities of SAE-GRU. In FIG. 12, MAE represents the mean of absolute error, RMSE represents the square root of the mean of the squared differences between predicted and actual observations, and both MAE and RMSE for SAE-GRU are the lowest values among all methods, and its MAE is significantly lower than for the other methods. Namely, the efficiency of SAE-GRU is confirmed.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. The active safety prediction method based on the big data technology and SAE-GRU is characterized by comprising the following steps:

2. The active security prediction method based on big data technology and SAE-GRU as claimed in claim 1, wherein the step S6 is implemented as follows:

firstly, preprocessing a real-time traffic flow data set of a predicted place, then classifying the traffic running state of the preprocessed real-time traffic flow data set of the predicted place by using a generated traffic running state classifier, judging the risk running state by using a step S4 based on the classification result of the traffic running state to obtain the real-time traffic flow data set with a risk running state label, then taking the real-time traffic flow data set with the risk running state label as input data of an SAE-GRU active safety prediction model to perform active safety prediction of a trunk road, and predicting to obtain the risk running state in the next stage, namely the next delta t time interval.

3. The active safety prediction method based on big data technology and SAE-GRU as claimed in claim 2, wherein the step S1 is implemented by preprocessing the original data set, and the step S6 is implemented by preprocessing the real-time traffic flow data set at the predicted location as follows:

step 1, data cleaning and filling: calculating the missing rate of each single data set in the sample data set, wherein the missing rate of the single data set is the missing number of the data of the single data set/the total number of the data of the single data set, and deleting the single data set when the missing rate of the single data set is more than or equal to 80%; when the single data set loss rate is less than 80%, filling the non-characteristic data in the single data set by adopting statistics, and filling the characteristic data in the single data set by a Lagrange interpolation method of formulas (1) to (2):

wherein L (x) represents a missing value to be determined, l_j(x) As interpolation basis functions, x_jDenotes the j-th position point, x_iDenotes dividing by x_jOther position points than the above, x representing the position point of the missing value to be found, y_jDenotes x_jThe value of the position point, k represents the number of the given value points;

step 2, feature transformation: performing feature transformation on each numerical value of each type of feature data in the sample data set after data cleaning and filling are completed by adopting a maximum-minimum normalization method shown in formula (3):

wherein, y_newRepresenting the value after feature transformation, y representing the value before feature transformation, y_minRepresents the minimum value, y, of each type of feature data in the sample data set_maxAnd representing the maximum value of each type of feature data in a sample data set, wherein the sample data set refers to an original data set or a real-time traffic flow data set of a forecast place.

4. The active safety prediction method based on big data technology and SAE-GRU according to claim 1, wherein the step S2 is to identify the dynamic traffic operation state based on a fuzzy C-means clustering algorithm, and obtain a sample data set with a traffic operation state label, and the specific implementation process is as follows:

step S21, constructing a traffic flow data set X ═ X₁,x₂,…,x_i,…x_n]^TAnd making each sample item X in the traffic flow data set X_i＝(x_i1,x_i2,x_i3,x_i4,x_i5) Each sample item x_iEach composed of traffic flow attribute parameters and vehicle operating parameters, wherein x_i1、x_i2、x_i3、x_i4、x_i5Respectively and correspondingly representing the average speed, the average occupancy, the average acceleration, the average delay time and the average travel time in a delta t time interval in a one-to-one manner, wherein the average speed and the average occupancy in the delta t time interval are traffic flow attribute parameters, and the average acceleration, the average delay time and the average travel time in the delta t time interval are vehicle operation parameters; dividing all sample items in a traffic flow data set X into k classes C ═ C₁,c₂,…,c_kEach category represents a traffic state, k is 3, and the three categories are divided into three categoriesRespectively representing three traffic states of smooth, crowded and blocked; and constructing a clustering loss function according to equation (4):

wherein J (U, X, C) represents a clustering loss function, X_iRepresenting the ith sample item in the traffic flow data set X, wherein n is the total number of samples in the traffic flow data set X; k is the number of clusters, C_jRepresents the jth class c_jThe fuzzy clustering center of (1); u represents a membership matrix, and U is { U ═ U }_ij}，u_ijRepresents a sample x_iFor class c_jDegree of membership of; m is a fuzzy weighting index; ξ represents the cluster loss function constraint space, which is defined as equation (5):

step S23, comparing the membership degree matrix U of the lambda +1 iteration^λ+1And a membership matrix U of the lambda iteration^λIf U^λ+1-U^λ| ≦ epsilon or λ ═ λ_maxStopping iteration and outputting the current fuzzy clustering center C_jAnd each sample item X in the traffic flow data set X_iFor the current fuzzy clustering center C_jJ is more than or equal to 1 and less than or equal to 3; otherwise, returning to the step S22, and continuing the iteration, wherein lambda is the iteration number, lambda_maxIs the maximum iteration number, and epsilon is an iteration termination threshold;

s25, visualizing the traffic flow data set X on a three-dimensional space of average vehicle speed, occupancy and travel time according to the traffic running state category label set Y obtained in the step S24 to obtain a cluster analysis chart; determining the traffic running states corresponding to the clustering labels 1,2 and 3 in the category label set Y according to the clustering analysis chart to obtain each sample item X in the traffic flow data set X_iTraffic running state category label y_iThen, obtaining a sample data set gamma { (X, Y) | (X) with a traffic operation state label₁,y₁),(x₂,y₂),…,(x_i,y_i),…,(x_n,y_n) }; meanwhile, normalization processing is carried out on the sample data set gamma with the traffic running state label.

5. The active security prediction method based on big data technology and SAE-GRU as claimed in claim 4, wherein the step S3 is implemented as follows:

in the formula, Q (alpha) is an optimal Lagrange multiplier objective function, n is the total number of samples of a sample data set gamma with a traffic operation state label, and alpha_i,α_zAll are lagrange multipliers, and C is a penalty coefficient; k (x)_i,x_z) Representing a kernel function;

solving to obtain the optimal Lagrange multiplier solution

Step S32, calculating the optimal offset value b^*：

Satisfies the conditions

According to

Step S33, solving classification decision function f (x)_i)：

In the formula, f (x)_i) Namely, the generated traffic operation state classifier represents the traffic operation state classification result of the ith sample item in the sample data set gamma with the traffic operation state label, and sgn () represents a sign function.

6. The active security prediction method based on big data technology and SAE-GRU according to any of claims 1-5, wherein the step S4 is implemented as follows:

step S41, using the training data set of step S1 to establish an evaluation target factor set P aggregated at time intervals of Δ t, where P is [ S ]_V,S_O,S_T]In which S is_VRepresenting the coefficient of variation of vehicle speed, S_V＝σ_VV represents the average vehicle speed within the time interval of delta t, sigma_VRepresenting the standard deviation of the vehicle speed in the time interval of delta t; s_ODenotes the coefficient of occupancy variation, S_O＝σ_OO, O represents the average occupancy in the time interval Δ t, σ_ORepresents the standard deviation of occupancy within the Δ t time interval; s_TRepresenting the coefficient of variation of the time of flight, S_T＝σ_TT, T denotes the mean travel time within the time interval Δ T, σ_TRepresents the standard deviation of travel time within the Δ t time interval;

step S42, establishing a judgment comment set F, wherein F is [ F ]₁,f₂,f₃]，f₁Indicating a low risk operating condition, f₂Indicating an impending operational state of stroke, f₃Indicating a high risk operating condition;

step S43, establishing a fuzzy relation matrix R, as shown in formula (12):

wherein R (S)_V) Represents the coefficient of variation S of vehicle speed_VFuzzy relation matrix of membership degree of evaluation item set F, R (S)_O) Represents the occupancy coefficient of variation S_OFuzzy relation matrix of membership degree of evaluation item set F, R (S)_T) Representing the coefficient of variation S of the travel time_TA fuzzy relation matrix of the membership degree of the evaluation comment set F; r is₁₁Representing the coefficient of variation of speed with respect to a low risk operating condition f₁Degree of membership of r₁₂Representing the coefficient of variation of velocity versus the medium risk operating condition f₂Degree of membership of r₁₃Representing the coefficient of variation of velocity with respect to a high risk operating condition f₃Degree of membership of; r is₂₁Representing the occupancy coefficient of variation versus a low risk operating condition f₁Degree of membership of r₂₂Representing occupancy coefficient of variation versus an at-risk operating condition f₂Degree of membership of r₂₃Representing the occupancy coefficient of variation versus the high-risk operating condition f₃Degree of membership of; r is₃₁Representing the coefficient of travel time variation with respect to a low risk operating condition f₁Degree of membership of r₃₂Representing the travel time coefficient of variation versus the medium risk operating condition f₂Degree of membership of r₃₃Representing the coefficient of travel time variation with respect to a high risk operating condition f₃Degree of membership of;

s44, carrying out traffic running state classification on sample items corresponding to the judgment object factor set P in the traffic flow data set X by using the traffic running state classifier generated in the step S3, and then obtaining a judgment object factor set P with a traffic running state label based on the traffic running state classification result; and then establishing a fuzzy weight matrix S (S) by utilizing the evaluation object factor set P with the traffic running state label according to different traffic running states₁,s₂,s₃)，s₁Representing the influence degree of the speed variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states, s₂Representing the influence degree of occupancy rate variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states, s₃Representing the influence degree of the travel time variation coefficient in the fuzzy relation matrix R on the risk running state under different traffic running states;

step S45, calculating a fuzzy composite value matrix B,

step S46, determining a fuzzy comprehensive evaluation result z, namely a risk running state category, by using a maximum membership method according to a fuzzy synthetic value matrix B;

Wherein, P_iThe method comprises the steps that (1) the ith sample item of an ith evaluation object factor set, namely a data set omega for evaluating the risk running state is evaluated, and n is the total amount of the samples of the data set omega for evaluating the risk running state; and calculating each sample item in the data set omega for risk running state judgment according to the steps S43-S46 to obtain a risk running state label data set Z corresponding to the data set omega for risk running state judgment, wherein the risk running state label data set Z is a fuzzy comprehensive judgment result set of all sample items of the data set omega for risk running state judgment, and the risk running state label data set Z is subjected to normalization processing to finally obtain a training data set with risk running state labels

7. The active security prediction method based on big data technology and SAE-GRU as claimed in claim 6, wherein R in the fuzzy relation matrix R_μθCalculated according to equations (19) to (27):

8. the active security prediction method based on big data technology and SAE-GRU according to claim 6, wherein the step S44 is implemented as follows:

step S441, the pair of traffic operation state classifiers generated in step S3Classifying traffic running states of the traffic flow data set X in the step S2, and corresponding the generated traffic running state labels and the judgment object factor sets P one by one according to standard time characteristics to obtain judgment object factor sets P with the traffic running state labels; then, the weight of each factor in the evaluation object factor set P under different traffic running states is determined by applying an entropy weight method to the evaluation object factor set P with the traffic running state label, and the weight of each factor in the evaluation object factor set P is normalized to obtain a weight matrix A ═ A (A ═ A-_f,A_c,A_j)^TWherein A is_fA weight vector representing a speed variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a smooth flow state, A_cA weight vector representing a speed variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a congested flow state, A_jA weight vector representing a velocity variation coefficient, an occupancy variation coefficient, and a travel time variation coefficient in a choked flow state;

step S442 of obtaining a fuzzy influence vector w_i，w_i＝(w_i1,w_i2,w_i3) Wherein w is_i1Representing the ith sample item in the sample data set gamma with the traffic operation state label to the smooth flow clustering center

Is solved by the following formula (13):

The Euclidean distance of (c);

step S443, obtaining the fuzzy weight matrix S ═ w_i×A。

9. The active safety prediction method based on big data technology and SAE-GRU as claimed in claim 6, wherein the fuzzy synthetic value matrix B ═ B (B)₁,b₂,b₃) Wherein, b₁Representing the low-risk running state F of the fuzzy comprehensive evaluation result relative to the evaluation comment set F₁Degree of membership of b₂Representing the running state F of the risk in the fuzzy comprehensive evaluation result relative judgment comment set F₂Degree of membership of b₃High-risk running state F representing relative judgment comment set F of fuzzy comprehensive evaluation result₃Degree of membership of, degree of membership b_θThe calculation of θ is given in formula (14):

the critical operating state tag dataset Z ═ { Z | Z_i＝f_θ＝max b_θ；i＝1,2,…,n,θ＝1,2,3}。

10. The active security prediction method based on big data technology and SAE-GRU according to any one of claims 1 to 5 or 7 to 9, wherein the step S5 is implemented as follows:

step S51, building an SAE model, inputting the training data set with the risk running state label obtained in the step S4 into the SAE model to train the SAE model, processing the training data set with the risk running state label through the SAE model, extracting abstract features, namely data features in a high-risk running state layer by layer, obtaining a feature data set output by the SAE model and an output loss value after training is finished, and judging to obtain an optimal SAE model according to the output loss value;