CN111967712A

CN111967712A - Traffic risk prediction method based on complex network theory

Info

Publication number: CN111967712A
Application number: CN202010649490.3A
Authority: CN
Inventors: 李大庆; 郑参
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-11-20
Anticipated expiration: 2040-07-08
Also published as: CN111967712B

Abstract

The invention provides a traffic risk prediction method based on a complex network theory, which comprises the following steps: step A: dividing grids based on empirical data to construct a double-layer traffic network model; and B: extracting and screening features based on a complex network theory; and C: risk prediction is carried out based on an ensemble learning theory; step D: evaluating and verifying the model; through the steps, two dimensions of the function and the structure of the traffic system are comprehensively considered, scientific and reliable technical support and theoretical support are provided for the identification of traffic risks, and important support is provided for risk diagnosis of the traffic system, formulation of targeted management control measures and improvement of traffic operation reliability; the method has the advantages of strong systematicness, high portability and easy operation, and solves the problem that risks in a complex traffic system are difficult to identify and predict.

Description

Traffic risk prediction method based on complex network theory

Technical Field

The invention provides a traffic risk prediction method based on a complex network theory, and relates to the technical fields of risk analysis, network science and the like.

Background

Risk refers to a possible occurrence of an event that, if occurring, can impede the development of the system, even go to death, and is also defined as the uncertainty of whether an event occurred or not. The risk exists in the system objectively, and the loss caused by the risk can be prevented or reduced by adopting a precautionary measure, but the risk cannot be eliminated. In a complex system, because risks in the system often appear in the characteristics of sudden occurrence, large spread range and strong destructive power, great difficulty is brought to the identification, prediction and prevention of system risks, new challenges are also provided to the research of risk management, control and prevention of the complex system, and the loss caused by the occurrence of the system risks can bring great influence to the life of people and even the operation of the society, so that the accurate prediction of the risks in the complex system by adopting a scientific and reasonable method is necessary. The traffic system plays an important role in the aspects of travel, urban operation and the like, and in recent years, with the rapid development of mobile interconnection and vehicle-mounted technology, the traffic system has the characteristic of high complexity in structure and function. Under the complex and changeable environment and demand, the traffic system can face the occurrence of artificial and natural risk conditions such as traffic accidents, construction closure, rainstorm, snow disasters and the like, the traffic risk events often cause traffic jam, and meanwhile, the traffic system has the characteristic of space-time evolution, and the risk events can be spread in the traffic system after the occurrence of the traffic risk events, so that a large amount of extra cost is added for the travel of residents, and great resource waste is brought to the society.

In the current research of risk identification and prediction of a traffic system, the main methods include a Model-based analysis method, qualitative analysis and quantitative analysis, particularly, the structure and the function of the system are described based on a Process Flow Diagram (PFD) and grey correlation analysis, and the risk is identified and predicted by analyzing the system deviation generation condition and the correlation degree among all influencing factors and quantizing the system deviation generation condition and the correlation degree among all influencing factors; in addition, with the advent of the big data age and the development of technology thereof, Knowledge-based analysis methods have been developed, and the main methods thereof include a causal relationship model, a machine learning model, a deep learning model, and the like, which are based on empirical data generated by a traffic system, such as: and (3) traffic flow, vehicle-mounted speed and the like, and an unknown relation and a pattern in the data are discovered and revealed by constructing a historical data set application model, so that the risk state in a traffic system is identified and predicted. The method only uses the known model and data to predict the risks of the traffic system from the state of the traffic system, does not dynamically consider the incidence relation and the evolution mode among the risks in the traffic system from the network level, and is difficult to explain the internal mechanism of the risk formation of the traffic system. Therefore, aiming at the traffic system with high structural and functional complexity, the invention combines the complex network theory and the machine learning method to identify and predict the risk of the traffic system, provides a new perspective and a new method for researching the risk identification prediction and management control in the traffic system, enriches the cognition of people on the risk in the traffic system, and has important significance for ensuring the healthy and stable operation of the traffic system.

Disclosure of Invention

Objects of the invention

The invention is mainly used for solving the problem of risk identification and prediction under the background of a complex system and a network structure, the conventional method mainly analyzes the risk of a traffic system from the function of the system, and the invention provides a traffic risk prediction method based on a complex network theory by comprehensively considering two dimensions of the function and the structure of the traffic system from the perspective of the complex network aiming at the high complexity and the time-space evolution characteristic of the traffic system and the problem that the conventional method cannot well identify and predict the risk of the traffic system. The method provided by the invention can effectively identify and predict the risks of the traffic system, and provides important support for risk diagnosis of the traffic system, formulation of targeted management control measures and improvement of traffic operation reliability.

(II) technical scheme

In order to achieve the purpose, the method adopts the technical scheme that: a traffic risk prediction method based on a complex network theory is provided.

The invention relates to a traffic risk prediction method based on a complex network theory, which comprises the following steps:

step A: dividing grids based on empirical data to construct a double-layer traffic network model;

and B: extracting and screening features based on a complex network theory;

and C: risk prediction is carried out based on an ensemble learning theory;

step D: and (5) evaluating and verifying the model.

Through the steps, the purpose of risk prediction of the traffic system can be achieved, the method is strong in systematicness, high in transportability and easy to operate, and the problem that risks in a complex traffic system are difficult to identify and predict is solved.

The step A of establishing the double-layer traffic network model based on the empirical data division grids comprises the following steps of: firstly, acquiring basic information of roads in a research area, wherein the basic information mainly comprises two parts, namely traffic network road information and longitude and latitude information of a traffic road intersection, dividing the basic information into N-M grid areas according to the area and the size of a research area range and the longitude and latitude information of road sections and intersections, and labeling the grid areas; secondly, aiming at each grid area, constructing a grid traffic jam network model by using a complex network theory and a method according to actual traffic data, intersection as a node, road section as an edge and relative speed of the road section as an edge weight in a grid on a microscopic level; on a macroscopic level, each grid area is used as a node, whether congestion roads exist between grids is used as a judgment bar for judging whether edges are connected or not, the number of the congestion roads existing between the grids is used as an edge weight, and a grid node traffic network model is constructed by applying a complex network theory and a method; the specific method comprises the following steps:

step A1: dividing grid areas based on geographic information;

step A2: preprocessing speed data to obtain relative speed matrix

Step A3: construction of grid traffic congestion network model G₁(N₁,L₁)；

Step A4: construction of mesh node traffic network model G₂(N₂,L₂)；

In step a1, the "grid area is divided based on geographic information" specifically includes the following steps: firstly, extracting traffic network models and traffic road information required by dividing grid areas from a geographic information system (Mapinfo) file by using programming software Python, wherein the extracted information mainly comprises vehicle-mounted speed of each road at each moment, longitude and latitude information of intersections, network topological structure information of a researched traffic system and the like, and in the process of extracting the longitude and latitude of the intersections, the invention uses Python to call a Baidu map Application Programming Interface (API) and adopts a sequential traversal method to obtain the longitude and latitude information of the intersections by matching the topological structure of a road network with the names of the intersections, and processes the road with failed longitude and latitude acquisition due to the difference of the names of the road intersections on the Baidu map and the Mapinfo to obtain an accurate and standard longitude and latitude information data set of the traffic system road network; secondly, calculating the area S and the latitude and longitude dereferencing range of the researched area according to the obtained traffic road information of the researched area and the longitude and latitude information of the intersection, and scientifically and reasonably determining the number of the divided grids to be N × M according to the actual background condition of the researched area, so that the area of each grid is S/(N × M); finally, according to the divided grid areas, counting which intersections are in the grid according to the longitude and latitude information of each intersection in the traffic network aiming at each grid area, and recording;

therein, the speed data preprocessing described in the step A2 obtains the relative speed matrix

", it is as follows: in this step, first, according to actual traffic operation data of a vehicle Global Positioning System (GPS),at any one time t_iExpressing the speeds corresponding to all R roads into a vector form V according to the sequence relation of the roads_i＝(v₁,v₂,…,v_R) (ii) a Further, the above process is repeated for all T moments, and finally the velocity vectors V at all moments are integrated_iGenerating an initial velocity matrix

Secondly, in the process of collecting the speed information of the traffic system by using the floating car technology, the speed information of each area at each moment cannot be completely collected and reserved due to the influence of the network communication technology and human and natural factors, so that the original speed information of the traffic system needs to be subjected to speed compensation processing, namely an original speed matrix

There is a partial missing value (actually recorded as 0) and therefore, it is necessary to find the velocity matrix

The velocity missing value in (1), i.e. the element with the value of 0 in the matrix, is subjected to velocity compensation; for t_iTime-lapse road R_jIs compensated for by first finding the road R in the road network G (N, L)_jSet of neighboring roads

Searching whether the speed record exists on the road in the set at the moment, and if one element in the set has the speed record, taking the average value of the elements in the set, wherein the specific formula is as follows:

in the above formula, the first and second carbon atoms are,

road R indicating lack of speed_jAt t_iThe speed compensation value at the time of day,

road R indicating lack of speed_jSet of neighboring roads

Is not a sum of 0 element values, J represents a speed-missing road R_jSet of neighboring roads

The number of elements other than 0;

if the road R_jAll the neighboring road speeds are not recorded, the road R is determined_jIs compensated to 0, the original velocity matrix is used after each compensation

Updated to compensated

Repeating the above process at each moment until all 0 values in the velocity matrix are compensated to obtain the completed velocity matrix

In the original absolute velocity matrix

After the road speed compensation is completed, because the road grades at all levels are different, normalization processing is carried out on the compensated speed matrix to obtain the relative speed of the compensated speed matrix, and the judgment standard is unified; for any road R_jFrom velocity matrix

Extracting the speed vector of the road at all times

And extracts the maximum speed limit of the road section

The velocity vector of the moment

Is divided by the maximum speed limit

To obtain a normalized velocity

Obtaining a normalized velocity matrix

As follows:

wherein, in step A3, the step of "building the grid traffic congestion network model G₁(N₁,L₁) ", it is as follows: for each grid area divided in the step A, firstly, according to actual map data under each grid area, using software tools such as Python, Mapinfo and the like to extract structure information among roads and road intersection information contained in each grid area; secondly, selecting a suitable geographical coverage range of traffic according to the requirement of actual research, such as selecting a five-ring traffic network in Beijing; then, according to a complex network method, abstracting a road intersection in each grid area as a node in the network, abstracting a road in the traffic network of the grid area as a connecting edge between nodes in the network, and taking the relative speed of each road as the weight of the connecting edge so as to establish a grid traffic congestion network in each grid area; meanwhile, most roads of the traffic network run in two directions and have directionality, so the traffic network is constructed by the methodThe grid traffic jam network is a directed weighting network;

therein, the step A4 for "constructing the mesh node traffic network model G₂(N₂,L₂) ", it is as follows: firstly, constructing an intersection traffic network model between grids according to intersection information contained in each grid area and road topological structure information of a traffic network (whole network) of the whole research area, namely deleting the road topological structure information contained in the grid area on the basis of the whole network; secondly, counting the number of congested roads between the grid areas and recording the number; and finally, abstracting a grid area into nodes, abstracting whether congestion roads exist between grids as connecting edges or not by applying a complex network theory and a complex network method according to the information, and establishing a grid node traffic network model by taking the number of the congestion roads between the grids as connecting edge weights.

The method for extracting and screening the features based on the complex network theory in the step B comprises the following steps: for each time t_iThe grid traffic congestion network and the grid node traffic network (referred to as a double-layer traffic network for short) set a seepage threshold q (t) for seepage analysis, and determine the seepage threshold q (t) through the seepage analysis of the double-layer traffic network; secondly, aiming at each grid traffic jam network and nodes (grids) in the grid node traffic network under the seepage threshold q (t) at each moment, extracting the characteristics of each grid area by using the theory and method of a complex network, wherein the characteristics comprise the structural and functional characteristics such as maximum jam sub-cluster, node median, node degree mean, the average speed of the grid jam network, the number of first-order neighbor congested roads and the like, screening the extracted characteristics by using a machine learning method on the basis, selecting the characteristics which greatly contribute to the traffic risk identification and prediction effect, constructing a high-quality sample characteristic set, and improving the traffic risk identification and prediction effect and efficiency to the maximum extent; at the same time, with each grid zoneLabeling a grid area at the time t according to the proportion of the congested roads at the time t + delta t in the domain; the specific steps of the process are as follows:

step B1: analyzing seepage of a traffic network;

step B2: extracting risk features based on a complex network;

step B3: screening risk characteristics based on machine learning;

in step B1, the traffic network seepage analysis specifically includes the following steps: a seepage theory is applied to carry out seepage analysis on a double-layer traffic network, firstly, a control variable, namely a seepage threshold value is given for the traffic network at each moment, and the control variable is set as q (t), so that each road in the traffic network can present two states: unblocked state (i.e. v)_{i_ratio}(t) > q (t) and congestion status (i.e., v)_{i_ratio}Q (t) is less than or equal to q (t); deleting the unblocked connecting edges in the traffic network from the original network, and keeping the congested connecting edges in the original traffic network, wherein the rest network is the traffic network in a congested state at the moment t, and is referred to as a congested network for short; the next q (t) value at each moment corresponds to a congestion network, and as the q (t) value is reduced, the traffic network becomes more congested, namely, the more failed edges are, the traffic network becomes more sparse, so that the traffic congestion risk at the current moment is identified and predicted when the proper seepage threshold value q (t), namely the urban traffic network is in the stage with the most abundant congestion information, is selected;

the "risk feature extraction based on complex network" described in step B2 is specifically performed as follows: in the step, a grid traffic jam network and a grid node traffic network are constructed for each moment under a seepage threshold q (t), and from the viewpoint of statistical physics, a complex network theory and a method are applied to preliminarily extract micro and macro characteristics of a grid area of a double-layer traffic network at each moment from the two viewpoints of structure and function; firstly, on a microscopic level, each grid traffic congestion network is used as a research object, and the microscopic features of each grid area are calculated at the key seepage threshold value at each moment; the grid traffic congestion network has different characteristics at different moments, and the congestion network in the grid area can show dynamic characteristics in space along with the evolution of time, so that the grid traffic congestion network has a spatio-temporal characteristic; secondly, on a macro level, aiming at the constructed grid node traffic network model, taking the nodes (grid areas) thereof as research objects, and calculating the macro features of the grid areas (nodes) at each moment, as shown in fig. 2, such as the micro features: the maximum congestion subgroups of the grid traffic congestion network, the mean value of node betweenness, the mean value of node degree, the mean value of aggregation coefficient, the average speed and the growth rate of the congestion network, and the like, wherein the macro characteristics are as follows: the node average path length, the node strength, the node betweenness, the node degree, the growth rate and the like of the grid node traffic network;

in the invention, a method is provided for extracting features from the perspective of a complex network, the feature extraction of a grid is exemplified, and the features of an actual traffic system can be preliminarily extracted in a targeted manner from two aspects of the structure and the function of the actual traffic system according to the actual background and the actual situation of the actual traffic system, so that a sample feature set is constructed, and an initial feature matrix M is constructed_f；

The "risk feature screening based on machine learning" described in step B3 is specifically performed as follows: in step B2, extracting the functional and structural features of the grid region at each time based on the related knowledge of the complex network, and then constructing an initial feature matrix M_fIn order to improve the accuracy and precision of risk identification and prediction in the traffic system, a relevant theoretical method of machine learning is used for carrying out feature selection on a preliminarily constructed sample feature set in the step, so that a high-quality sample feature set is screened out, and the effect of risk identification and prediction in the traffic system is improved to the greatest extent; meanwhile, the structure and function characteristics of the traffic system are screened, important characteristics are screened out, irrelevant characteristics are removed, dimension disasters can be relieved, the difficulty of learning tasks is reduced, and the generalization capability of an over-fitting enhanced machine learning model is reduced; aiming at the characteristic that a traffic system has high complexity of space-time evolution and the optimization of a given learner, the invention uses a relatively classical LVW (Las Vegas wrapper) method in a wrapping modeThe feature selection is performed, as shown in fig. 3, and the specific steps are as follows:

(1) setting an initial optimal error E to be infinite, setting the current optimal feature subset to be an attribute complete set A, and setting the repetition time t to be 0;

(2) randomly generating a group of feature subsets A ', and calculating the error E' of the classifier when the feature subsets are used;

(3) if E ' is smaller than E, making A ' ═ A and E ' and repeating the steps (2) and (3), otherwise T + +, and jumping out of the cycle when T is larger than or equal to the stop control parameter T;

in the calculation process, the LVM method directly takes the performance of the finally used learner as the evaluation criterion of the feature subset, selects the feature subset which is most favorable for the performance of the given learner and is tailored, screens out a high-quality sample feature set, and constructs a feature matrix

Wherein, the step C of "risk identification and prediction based on ensemble learning theory" includes the following steps: in order to accurately identify and predict the congestion risk in the traffic system and effectively control the congestion risk, the method comprises the steps of firstly constructing an integrated learning model by using machine learning and relevant mathematical knowledge; secondly, in order to eliminate the influence of non-uniform dimension among the feature vectors on the model, a feature scaling method is used for data feature set

Carrying out standardization processing to obtain a standard sample feature matrix

Finally, in order to ensure that the model learns the characteristic knowledge of the risks in the traffic system as much as possible, the standard sample characteristic matrix is subjected to

Dividing the model into a training set and a test set according to a certain proportion (a: b), training the ensemble learning model by using the training set data, and thenIdentifying and predicting risks in a grid area of the traffic system at the current moment by using a trained ensemble learning model; the specific steps of the process are as follows:

step C1: constructing an ensemble learning model;

step C2: carrying out risk identification and prediction by using an ensemble learning model;

in step C1, the integrated learning model is constructed as follows: the invention aims to learn a more stable and better-performance model by using risk historical data information of a traffic system, the integrated learning model is more prominent in learning compared with a single classifier model, and in order to make up for the defect of learning of the single classifier model, the integrated learning theory is introduced in the invention, and the integrated learning model is constructed to carry out risk identification and prediction on the traffic system; the ensemble learning is to combine a plurality of weak supervision models to obtain a better and more comprehensive strong supervision model, and the potential core idea is that even if a certain weak classifier obtains wrong prediction, other weak classifiers can correct the errors, the current mainstream ensemble learning framework comprises Bagging, Boosting and Stacking, the invention uses the Bagging framework and the associated theoretical method of ensemble learning to construct a random forest model to identify and predict the risk of the traffic system, as shown in fig. 4, the implementation steps are as follows:

(1) assume that there is a dataset D ═ x_i1,x_i2,…,x_in,y_i}(i∈[1,m]) With a characteristic number N, with a sample generation sampling space (m x N) put back^m*n；

(2) Building a base learner (decision tree): for each sample d_j＝{x_i1,x_i2,…,x_ik,y_i}(i∈[1,m]) (where K < M) generating decision trees and recording the result h of each decision tree_j(x)；

(3) Train T times of

Where φ (x), is a mathematical model having: absolute majority voting, relative majorityVoting methods, weighted voting methods, and the like;

a special binary classifier, namely a random forest model, is constructed through the processes, risks in the traffic system are identified and predicted, in the process, the classification function is a symbolic function, output values are 0 and 1, and low risks and high risks in a grid area are respectively represented as follows:

in the above formula, f (x)_i) Representing the risk status of the ith grid area, 0 representing low risk and 1 representing high risk;

meanwhile, an ensemble learning model is constructed by applying an ensemble learning theory to identify and predict risks of the traffic system, and a proper ensemble learning framework and model can be selected according to the distribution characteristics of data samples to identify and predict the risks, so that the risk identification and prediction effects of the traffic system are further improved;

in step C2, "risk identification and prediction using ensemble learning model" specifically includes the following steps: in this step, based on the feature set of the high-quality sample extracted and screened in the step C, i.e. the feature matrix

Identifying and predicting risks in the traffic system by using the ensemble learning model constructed in the step C1; because the difference between characteristic dimensions in the historical sample data set can affect the performance of the ensemble learning model, when the model is used for risk identification and prediction, firstly, the sample characteristic set of a research object needs to be subjected to characteristic scaling, the influence of different dimensions among characteristic vectors on the model precision is eliminated, the convergence rate of the model is improved, and a standard sample characteristic matrix is obtained

The mainstream feature scaling method in machine learning mainly comprises the maximum-minimum normalization (min-max normalization), average normalizationValue normalization (Mean normalization), normalization (normalization), maximum absolute value normalization (Scaling to unit length), etc., for a sample feature set of a traffic system

In the mainstream method for scaling the characteristics, a proper characteristic scaling method can be selected according to the condition of an actual traffic system, the characteristics of a data characteristic set and the characteristics of an applied machine learning method in actual application, so that the maximum accuracy and precision of risk identification and prediction in the traffic system are ensured;

after scaling the characteristics of the sample data set in the traffic system, in this step, the standard sample characteristic matrix based on the traffic system

And C, identifying and predicting risks in the traffic system by using the integrated learning model constructed in the step C, and learning the characteristics of the integrated learning model needing to learn the risks in the process, so that the standard sample characteristic set is used for learning the characteristics of the risks in the invention

Randomly dividing the training set into a training set and a testing set according to a certain proportion (a: b), wherein the training set is used for training the random forest wheel model to enable the random forest wheel model to learn the characteristics of risks to the maximum extent, and the testing set is used for testing the training effect of the model.

Wherein, the model evaluation and verification in step D is performed as follows: in the process of identifying and predicting the risk in the traffic system by using the ensemble learning model constructed in the step C, in order to accurately and scientifically evaluate the performance of the model, in the step, firstly, evaluation indexes are reasonably selected based on the actual traffic system condition and the final target of the invention, for example: accuracy, precision, recall, F1 values, etc., the nature of which is calculated from a Confusion Matrix (fusion Matrix); secondly, in order to prevent the model from being over-fitted and accurately evaluate the generalization ability of the model, the ensemble learning model is evaluated by using a cross validation method in the step, so that the scientificity and reliability of model evaluation are further improved; the method specifically comprises the following substeps:

step D1: selecting a model evaluation index;

step D2: evaluating and analyzing the model;

wherein, the "selecting model evaluation index" in step D1 is specifically performed as follows: the invention is directed at the risk in the traffic system to discern and predict, its final goal is to employ the integrated learning model to discern the risk in the traffic system accurately and scientifically, its essence belongs to the abnormal detection problem in the machine learning, the main characteristic is to have the unbalanced problem of data classification, namely the sample size of the normal data is large and the sample size of the risk data is small, therefore, it can't reflect the model performance quality objectively to use the rate of accuracy alone; according to the invention, the risk identification detection problem is faced in a scene, under the scene, the model is evaluated by adopting two evaluation indexes of recall rate and accuracy, and the formula is as follows:

in the formula, Accuracy represents Accuracy, recall represents recall, and TP is the number of correct predicted cases; TN is the number of correctly predicted negative cases, FP is the number of positive cases predicted from negative cases, FN is the number of negative cases predicted from positive cases;

the prediction error condition of the real risk unit in the traffic system is better, because if the real congestion risk in the traffic system is not identified, the traffic system is damaged to a great extent once the real congestion risk occurs, and therefore, the recall rate needs to be concerned more; meanwhile, in order to ensure that the normal accurate prediction is normal, reduce the error rate of the normal sample prediction and enable a manager of the traffic system to accurately manage and control the real risk in the traffic system to the maximum extent under the limited resource cost, the accuracy and the recall rate are introduced as the evaluation indexes of the model;

the "evaluation analysis of the model" described in step D2 is specifically performed as follows: in the step, in order to prevent the model from being over-fitted and accurately evaluate the generalization ability of the model, the integrated learning model is evaluated by using a cross validation method in machine learning, so that the scientificity and reliability of model evaluation are further improved; the classical methods of cross-validation are mainly: the invention relates to a leave-one method, a K-fold cross validation method, a self-service method and the like, wherein the self-service method is used for cross validation, and the steps are as follows:

(1) randomly selecting one sample in a data set containing N samples each time, and taking the sample as a training sample;

(2) putting the randomly selected samples in the step (1) back into the original data set, and sampling the samples in a put-back mode for N times to generate a data set with the same size as the original data set, wherein the new data set is a training set;

(3) after N times of extraction, the original data set probably has

Will not appear in the new dataset, and therefore, samples that do not appear in the new dataset will be taken as validation sets;

(4) repeating the above steps M times, M models can be trained, the values of the evaluation indexes can be obtained, and then the performance evaluation value of the model can be obtained by taking the average value.

Through the steps, based on the complex network theory and the integrated learning theory method, from the perspective of the complex network, the two dimensions of the function and the structure of the traffic system are comprehensively considered, and scientific and reliable technical support and theoretical support are provided for the identification of traffic risks; the technical method provided by the invention can efficiently and accurately identify and predict the risk of the traffic system, and provides important support for risk diagnosis of the traffic system, establishment of targeted management control measures and improvement of traffic operation reliability.

(III) advantages and effects

The invention provides a traffic risk prediction method based on a complex network theory, which has the following advantages:

(1) global property: the traffic network model is constructed from the micro level and the macro level to extract the functional and structural characteristics of the traffic network model, so that the accuracy of the risk prediction of the traffic system is greatly improved, and the traffic network model has great significance for understanding the risk evolution mechanism of the traffic system and improving the reliability of the traffic system;

(2) and (3) timeliness: the invention can monitor the traffic state and predict the future risk in real time, and provides powerful support for the formulation and implementation of the risk control strategy of the traffic system, thereby ensuring the healthy and stable operation of the system;

(3) and (3) expandability: the risk prediction method provided by the invention can be expanded to the risk identification and prediction of other types of complex systems, such as biological systems, communication systems, financial systems and the like.

(4) The method of the invention is scientific, has good manufacturability and has wide popularization and application value.

Drawings

Fig. 1 is a flow chart of a traffic risk prediction method according to the present invention.

FIG. 2 is a traffic risk characterization hierarchy of the present invention.

FIG. 3 is a logic diagram of the process of wrapped feature selection of the present invention.

Fig. 4 is a random forest model architecture diagram of the present invention.

FIG. 5 is a trend chart of evaluation indexes of the random forest model of the present invention.

The numbers, symbols and codes in the figures are explained as follows:

s: the area of the region of interest;

V_i：t_ithe speed vectors of R roads at the moment;

an initial velocity matrix;

compensating the normalized speed matrix;

G₁(N₁,L₁): a grid traffic congestion network model;

G₂(N₂,L₂): a mesh node traffic network model;

q (t): a seepage threshold of the traffic network at time t;

V_{i_ratio}: a normalized velocity vector;

M_f: an initial feature matrix;

the screened high-quality characteristic matrix;

a high-quality feature matrix after feature scaling;

f(x_i): risk status of ith grid area

Accuracy: the model accuracy rate;

recall: model recall;

TP: the number of correct cases predicted;

TN: the number of negative cases correctly predicted;

FP: predicting negative examples as the number of positive examples;

FN: the positive examples are predicted as the number of negative examples.

Detailed Description

In order to make the technical problems and technical solutions to be solved by the present invention clearer, the following detailed description is made with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described herein are for purposes of illustration and explanation only and are not intended to limit the invention.

The invention is further described with reference to the following description and embodiments in conjunction with the accompanying drawings.

The actual traffic system data used in the embodiment of the invention is obtained by counting the real-time speed data of the floating cars on each road section within a certain time span of all roads in the five-ring area of Beijing, which is provided by QF technology company, at a time interval of 1 minute and a time granularity of higher, and at the same time, the time interval is 0:00-23:59 and 1440 moments are total, and the data of 2015, 10 months and 20 days are used for research and analysis in the embodiment.

The traffic risk prediction method based on the complex network theory of the embodiment of the invention is shown in figure 1, and the specific implementation steps are as follows:

and B: extracting and screening features based on a complex network theory;

and C: risk prediction is carried out based on an ensemble learning theory;

step D: and (5) evaluating and verifying the model.

The step A of establishing the double-layer traffic network model based on the empirical data division grids comprises the following steps of: firstly, acquiring basic information of roads in a research area, wherein the basic information mainly comprises two parts, namely traffic network road information and longitude and latitude information of a traffic road intersection, dividing the basic information into N-M grid areas according to the area and the size of a research area range and the longitude and latitude information of road sections and intersections, and labeling the grid areas; secondly, aiming at each grid area, constructing a grid traffic jam network model by using a complex network theory and a method according to actual traffic data, intersection as a node, road section as an edge and relative speed of the road section as an edge weight in a grid on a microscopic level; on a macroscopic level, each grid area is used as a node, whether congestion roads exist between grids is used as a judgment bar for judging whether edges are connected, the number of the congestion roads existing between the grids is used as an edge weight, and a grid node traffic network model is constructed by applying a complex network theory and a method.

Step A1: dividing grid areas based on geographic information;

step A2: preprocessing speed data to obtain relative speed matrix

Step A4: construction of mesh node traffic network model G₂(N₂,L₂)；

In step a1, the "grid area is divided based on geographic information" specifically includes the following steps: firstly, extracting traffic network models and traffic road information required by grid area division by utilizing a Python language Mapinfo file, wherein the extracted information mainly comprises vehicle-mounted speed of each road at each moment, longitude and latitude information of intersections, network topological structure information of a Beijing city five-ring traffic system and the like; secondly, calculating the area S in the Beijing five-ring area to be 667 square kilometers, the longitude range of 116.20-116.56 and the latitude range of 39.76-40.03 according to the obtained Beijing five-ring traffic road information and the longitude and latitude information of the crossroad, scientifically and reasonably determining the number of the divided grids to be 2500 according to the actual background condition in the Beijing five-ring area, and then determining the area of each grid to be 516 m; and finally, according to the divided grid areas, counting which intersections are in the grid according to the longitude and latitude information of each intersection in the traffic network aiming at each grid area, and recording.

The "speed data preprocessing" described in step A2 obtains a relative speed matrix

", it is as follows: in this step, first, actual traffic operation data of a vehicle-mounted Global Positioning System (GPS) is acquired at an arbitrary timing t_iExpressing the speeds corresponding to all R roads into a vector form V according to the sequence relation of the roads_i＝(v₁,v₂,…,v_R) (ii) a Further, the above process is repeated for all T moments, and finally the velocity vectors V at all moments are integrated_iGenerating an initial velocity matrix

Secondly, in the process of collecting the speed information of the five-ring traffic system in Beijing by using the floating car technology, the speed information of each area at each moment can not be completely collected and reserved due to the influence of the network communication technology and human and natural factors, so that the original speed information of the traffic system needs to be subjected to speed compensation processing, namely an original speed matrix

Finding out whether the speed record exists in the road in the set at the moment, if one element in the set exists in the speed record, judging whether the speed record exists in the road in the setTaking the average value of the elements in the set, wherein the specific formula is as follows:

in the above formula, the first and second carbon atoms are,

road R indicating lack of speed_jSet of neighboring roads

The number of elements other than 0 in (1).

Updated to compensated

In the original absolute velocity matrix

After the road speed compensation is completed, because eachBecause of different road grades, the compensated speed matrix needs to be normalized to obtain the relative speed, and the judgment standards are unified. For any road R_jFrom velocity matrix

Extracting the speed vector of the road at all times

And extracts the maximum speed limit of the road section

The velocity vector of the moment

Is divided by the maximum speed limit

To obtain a normalized velocity

Obtaining a normalized velocity matrix

As follows:

"construction of grid traffic Congestion network model G" described in step A3₁(N₁,L₁) ", it is as follows: aiming at each grid area divided in the step A, firstly, according to the five-ring actual map data in Beijing City under each grid area, the structure information between roads and the road intersection information contained in each grid area are extracted by software tools such as Python, Mapinfo and the like; secondly, selecting a five-ring traffic network in Beijing; then, according to the method of the complex network, the roads are divided into each grid areaThe intersection is abstracted into nodes in the network, roads in the grid area traffic network are abstracted into connecting edges among the nodes in the network, and the relative speed of each road is used as the weight of the connecting edges, so that a grid traffic congestion network is established in each grid area; meanwhile, most roads of the five-ring traffic network in Beijing are in bidirectional driving and have directionality, so the grid traffic jam network constructed by the method is a directed weighted network.

"construction of mesh node traffic network model G" described in step A4₂(N₂,L₂) ", it is as follows: firstly, constructing an intersection traffic network model between grids according to intersection information contained in each grid area and road topological structure information of a whole Beijing city five-ring traffic network (whole network), namely deleting the road topological structure information contained in the grid area on the basis of the whole network; secondly, counting the number of congested roads between the grid areas and recording the number; and finally, abstracting a grid area into nodes, abstracting whether congestion roads exist between grids as connecting edges or not by applying a complex network theory and a complex network method according to the information, and establishing a grid node traffic network model by taking the number of the congestion roads between the grids as connecting edge weights.

The method for extracting and screening the features based on the complex network theory in the step B comprises the following steps: for each time t_iThe grid traffic congestion network and the grid node traffic network (referred to as a double-layer traffic network for short) set a seepage threshold q (t) for seepage analysis, and determine the seepage threshold q (t) to be 0.5 through the seepage analysis of the double-layer traffic network; secondly, aiming at each grid traffic jam network and each node (grid) in the grid node traffic network with the seepage threshold value of 0.5 at each moment, extracting the characteristics of each grid area including maximum jam sub-cluster, node betweenness mean value and node degree by applying the theory and method of a complex networkThe average value, the average speed of the grid congestion network, the number of first-order neighbor congested roads and other structural and functional characteristics are screened by a machine learning method on the basis, the characteristics which greatly contribute to the traffic risk identification and prediction effect are selected, a high-quality sample characteristic set is constructed, and the traffic risk identification and prediction effect and efficiency are improved to the greatest extent; and meanwhile, labeling the grid area at the time t according to the proportion of the congested road at the time t + delta t in each grid area. The specific steps of the process are as follows:

step B1: analyzing seepage of a traffic network;

step B2: extracting risk features based on a complex network;

step B3: screening risk characteristics based on machine learning;

the "seepage analysis of the traffic network" described in step B1 is specifically performed as follows: a seepage theory is applied to carry out seepage analysis on a double-layer traffic network, firstly, a control variable, namely a seepage threshold value is given for the traffic network at each moment, and the control variable is set as q (t), so that each road in the traffic network can present two states: unblocked state (i.e. v)_{i_ratio}(t) > q (t) and congestion status (i.e., v)_{i_ratio}Q (t) is less than or equal to q (t); deleting the unblocked connecting edges in the traffic network from the original network, and keeping the congested connecting edges in the original traffic network, wherein the rest network is the traffic network in a congested state at the moment t, and is referred to as a congested network for short; the next q (t) value at each moment corresponds to a congestion network, and as the q (t) value is reduced, the traffic network becomes more congested, namely, the more failed edges, the traffic network becomes more sparse, so that the traffic congestion risk at the current moment is identified and predicted when the proper seepage threshold value q (t) is 0.5, namely, the urban traffic network is in the stage with the most abundant congestion information;

the "risk feature extraction based on complex network" described in step B2 is specifically performed as follows: in the step, the grid traffic congestion network and the grid node traffic network are constructed at each moment under the condition that the seepage threshold q (t) is 0.5, and from the point of view of statistical physics, a complex network theory and a complex network method are used for preliminarily extracting micro and macro characteristics of a grid area of the double-layer traffic network at each moment from the point of view of structure and function. Firstly, on a microscopic level, each grid traffic congestion network is used as a research object, and the microscopic features of each grid area are calculated at the key seepage threshold value at each moment; the grid traffic congestion network has different characteristics at different moments, and the congestion network in the grid area can show dynamic characteristics in space along with the evolution of time, so that the grid traffic congestion network has a spatio-temporal characteristic; secondly, on a macro level, aiming at the constructed grid node traffic network model, taking the nodes (grid areas) thereof as research objects, and calculating the macro features of the grid areas (nodes) at each moment, as shown in fig. 2, such as the micro features: the maximum congestion subgroups of the grid traffic congestion network, the mean value of node betweenness, the mean value of node degree, the mean value of aggregation coefficient, the average speed and the growth rate of the congestion network, and the like, wherein the macro characteristics are as follows: the node average path length, the node strength, the node betweenness, the node degree, the growth rate and the like of the grid node traffic network.

In the invention, a method is provided for extracting features from the perspective of a complex network, the feature extraction of a grid is exemplified, and the features of an actual five-ring traffic system in Beijing City can be preliminarily extracted in a targeted manner according to the actual background and situation of the system and from two aspects of the structure and the function of the system, so as to construct a sample feature set and an initial feature matrix M_fDimension (8752,40,30), i.e. 8752 samples, each sample having 40 features.

The "risk feature screening based on machine learning" described in step B3 is specifically performed as follows: in step B2, extracting the functional and structural features of the grid region at each time based on the related knowledge of the complex network, and then constructing an initial feature matrix M_fIn order to improve the accuracy and precision of risk identification and prediction in a five-ring traffic system in Beijing, a relevant theoretical method of machine learning is applied to carry out feature selection on a preliminarily constructed sample feature set in the steps, and a high-quality sample feature set is screened out, and mostThe effects of risk identification and prediction in the traffic system are improved to a great extent; meanwhile, the structure and functional characteristics of the five-ring traffic system in Beijing are screened, important characteristics are screened out, irrelevant characteristics are removed, dimension disasters can be relieved, the difficulty of learning tasks is reduced, and the generalization capability of an over-fitting enhanced machine learning model is reduced; aiming at the high complexity characteristic of space-time evolution of a five-ring traffic system in Beijing and the optimization of a given learner, the invention uses a relatively classic LVW (Las Vegas wrapper) method in a wrapping mode to perform characteristic selection, as shown in figure 3. The LVM method is applied to screen out high-quality samples with the characteristics as follows: the point betweenness variance, the edge betweenness variance, the grid congested road proportion and the node betweenness of the grid node traffic network are 10 characteristics in total, and a high-quality characteristic matrix is constructed

The dimensions were (8752,10,30), i.e. a total of 8752 samples, each sample sharing 10 high quality features.

Wherein, the step C of 'risk identification and prediction based on ensemble learning theory' comprises the following steps: in order to accurately identify and predict the congestion risk in the five-ring traffic system in Beijing, and effectively control the congestion risk, the method comprises the following steps of firstly constructing an integrated learning model by using machine learning and mathematical related knowledge; secondly, in order to eliminate the influence of non-uniform dimension among the feature vectors on the model, a feature scaling method is used for data feature set

Dimension (8752,10, 30); finally, in order to ensure that the model learns the characteristic knowledge of the risk in the five-ring road traffic system in Beijing City as much as possible, the standard sample characteristic matrix is subjected to

According to the following steps: 3, dividing the ratio into a training set and a testing set, namely, the number of samples in the training set is 6126, the number of samples in the testing set is 2626, training the ensemble learning model by using the data in the training set, and then, identifying and predicting the risk of the grid area of the traffic system at the current moment by using the trained ensemble learning model. The specific steps of the process are as follows:

step C1: constructing an ensemble learning model;

the "building ensemble learning model" described in step C1 is implemented as follows: the invention aims to learn a more stable and better-performance model by using risk historical data information of a five-ring traffic system in Beijing, and compared with a single classifier model, an integrated learning model is more prominent in learning. The ensemble learning is to combine a plurality of weak supervision models to obtain a better and more comprehensive strong supervision model, and the potential core idea is that even if a certain weak classifier obtains wrong prediction, other weak classifiers can correct the errors, the current mainstream ensemble learning framework comprises Bagging, Boosting and Stacking.

A special binary classifier, namely a random forest model, is constructed through the processes, risks in a five-ring traffic system in Beijing are identified and predicted, in the process, a classification function is a symbolic function, output values are 0 and 1, and low risks and high risks in a grid area are respectively represented as follows:

in the above formula, f (x)_i) Indicating the risk status of the ith grid area, 0 representing a low congestion risk and 1 representing a high congestion risk.

Meanwhile, an ensemble learning model is constructed by applying an ensemble learning theory to identify and predict risks of the five-ring traffic system in Beijing according to the distribution characteristics of data samples, and a proper ensemble learning framework and model can be selected to identify and predict the risks, so that the effects of identifying and predicting the risks of the traffic system are further improved.

In step C2, the method for risk identification and prediction using ensemble learning model includes: in this step, based on the feature set of the high-quality sample extracted and screened in the step C, i.e. the feature matrix

And (4) identifying and predicting risks in the traffic system by using the ensemble learning model constructed in the step C1. Because the difference between characteristic dimensions in the historical sample data set can affect the performance of the ensemble learning model, when the model is used for risk identification and prediction, firstly, the sample characteristic set of a research object needs to be subjected to characteristic scaling, the influence of different dimensions among characteristic vectors on the model precision is eliminated, the convergence rate of the model is improved, and a standard sample characteristic matrix is obtained

The mainstream feature Scaling method in machine learning mainly comprises maximum-minimum normalization (min-max normalization), average normalization (Mean normalization), normalization (normalization), maximum-absolute normalization (Scaling to unit length) and the like, and the method is used for a sample feature set of a traffic system

The mainstream method for scaling features selects standardized features according to the actual conditions of the five-ring traffic system in Beijing, the characteristics of the data feature set and the applied machine learning methodThe scaling method ensures the maximum accuracy and precision of risk identification and prediction in the traffic system.

After the feature scaling is carried out on the sample data set in the five-ring road traffic system in Beijing City, in the step, the standard sample feature matrix based on the traffic system

Identifying and predicting risks in the traffic system by using the random forest model constructed in the step C1, wherein in the process, the random forest model needs to learn the characteristics of the risks, so that the standard sample characteristic set is used in the embodiment

And randomly dividing the random forest into a training set and a testing set according to the proportion of 7:3, wherein the number of samples in the training set is 6126, the number of samples in the testing set is 2626, and the training set is used for training a random forest model to learn the characteristics of the congestion risk to the maximum extent.

The method for evaluating and verifying the model in the step D comprises the following steps: in the process of identifying and predicting the risk in the traffic system by using the ensemble learning model constructed in the step C, in order to accurately and scientifically evaluate the performance of the model, in the step, firstly, evaluation indexes are reasonably selected based on the actual traffic system condition and the final target of the invention, for example: accuracy, precision, recall, F1 values, etc., the nature of which is calculated from a Confusion Matrix (fusion Matrix); secondly, in order to prevent the model from being over-fitted and accurately evaluate the generalization ability of the model, the ensemble learning model is evaluated by using a cross validation method in the step, so that the scientificity and the reliability of the evaluation of the model are further improved. The method specifically comprises the following substeps:

step D1: selecting a model evaluation index;

step D2: evaluating and analyzing the model;

the "selection of model evaluation index" described in step D1 is specifically performed as follows: the invention aims at identifying and predicting risks in a traffic system, and the final aim is to accurately and scientifically identify the risks in the traffic system by using an integrated learning model, which essentially belongs to the problem of abnormal detection in machine learning. According to the invention, the risk identification detection problem is faced in a scene, under the scene, the model is evaluated by adopting two evaluation indexes of recall rate and accuracy, and the formula is as follows:

in the formula, Accuracy represents Accuracy, recall represents recall, and TP is the number of correct predicted cases; TN is the number of correctly predicted negative cases, FP is the number of positive cases predicted from negative cases, and FN is the number of negative cases predicted from positive cases.

The prediction error condition of the road traffic system in the five rings of Beijing city is better as less as possible in the truly risky units in the road traffic system in the five rings of Beijing city, because if the true congestion risk in the road traffic system in the five rings of Beijing city is not identified, once the true congestion risk occurs, the traffic system is damaged to a great extent, and therefore, the recall rate needs to be paid more attention; meanwhile, in order to ensure that the normal and accurate prediction is normal, reduce the error rate of the normal sample prediction and enable a manager of the traffic system to accurately manage and control the real risk in the traffic system to the maximum extent under the limited resource cost, the accuracy rate is introduced as the evaluation index of the model. The random forest model in the ensemble learning is used for identifying and predicting the congestion risk of the road traffic system in the five rings of Beijing city, the accuracy rate is 89.83%, the recall rate is 86.74%, the level is high, and the performance of the model is good.

The "evaluation analysis of the model" described in step D2 is specifically performed as follows: in the step, in order to prevent the model from being over-fitted and accurately evaluate the generalization ability of the model, the ensemble learning model is evaluated by using a cross validation method in machine learning, and the scientificity and reliability of model evaluation are further improved. The classical methods of cross-validation are mainly: the invention relates to a leave-one method, a K-fold cross validation method, a self-service method and the like, wherein the self-service method is used for cross validation, and the steps are as follows:

(1) randomly selecting one sample at a time in a data set containing 8752 samples, and using the sample as a training sample;

(2) putting the randomly selected sample in (1) back into the original data set, and then sampling 8752 times in a putting-back mode to generate a data set with the same size as the original data set, wherein the new data set is a training set;

(3) after 8752 times of extraction, 3221 samples in the original data set do not appear in the new data set, and therefore, the samples which do not appear in the new data set are taken as a verification set;

(4) repeating the above steps 10 times, 10 models can be trained, and the values of the evaluation indexes can be obtained, and then averaging is performed, so that the performance evaluation value of the model can be obtained.

As shown in fig. 5, the random forest model is used for identifying and predicting the congestion risk of the road traffic system in the five rings of beijing city, and the self-service method is used for performing cross validation on the model for 10 times, wherein the average value of the accuracy is about 92.84%, and the average value of the recall rate is about 92.45%, and is at a higher level, which indicates that the model has stronger generalization capability and better performance, can accurately and reliably identify and predict the congestion risk in the road traffic system in the five rings of beijing city, and provides powerful guarantee for ensuring safe, stable and healthy operation.

The invention has not been described in detail and is within the skill of the art.

The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A traffic risk prediction method based on a complex network theory is characterized in that: the method comprises the following steps:

and B: extracting and screening features based on a complex network theory;

and C: risk prediction is carried out based on an ensemble learning theory;

step D: and (5) evaluating and verifying the model.

2. The traffic risk prediction method based on the complex network theory as claimed in claim 1, wherein: the establishing of the double-layer traffic network model based on the empirical data division grids in the step A comprises the following steps: firstly, acquiring basic information of roads in a research area, wherein the basic information comprises two parts, namely traffic network road information and longitude and latitude information of a traffic road intersection, dividing the basic information into N-M grid areas according to the area and the size of a research area range and the longitude and latitude information of road sections and the longitude and latitude information of the intersection, and labeling the grid areas; secondly, aiming at each grid area, constructing a grid traffic jam network model by using a complex network theory and a method according to actual traffic data, intersection as a node, road section as an edge and relative speed of the road section as an edge weight in a grid on a microscopic level; on a macroscopic level, each grid area is used as a node, whether congested roads exist between grids is used as a judgment bar for judging whether edges are connected or not, the number of the congested roads existing between the grids is used as an edge weight, and a grid node traffic network model is constructed by applying a complex network theory and a method; the specific method comprises the following steps:

step A1: dividing grid areas based on geographic information;

step A2: preprocessing speed data to obtain relative speed matrix

Step A4: construction of mesh node traffic network model G₂(N₂,L₂)；

Wherein, in the step a1, the grid area is divided based on the geographic information, which is specifically done as follows: firstly, extracting traffic network models and traffic road information required for dividing grid areas from a geographic information system (Mapinfo file) by using programming software Python, wherein the extracted information comprises vehicle-mounted speed of each road at each moment, longitude and latitude information of intersections and network topological structure information of a researched traffic system, calling a Baidu map Application Programming Interface (API) by using Python and matching the topological structure of the road network and the names of the intersections by adopting a sequential traversal method to obtain longitude and latitude information of the intersections, and processing the road and intersection information which cause longitude and latitude acquisition failure due to the difference of the names of the road intersections on the Baidu map and the Mapinfo to obtain an accurate standard traffic system road network longitude and latitude information data set; secondly, calculating the area S and the latitude and longitude dereferencing range of the researched area according to the obtained traffic road information of the researched area and the longitude and latitude information of the intersection, and scientifically and reasonably determining the number of the divided grids to be N × M according to the actual background condition of the researched area, so that the area of each grid is S/(N × M); finally, according to the divided grid areas, counting which intersections are in the grid according to the longitude and latitude information of each intersection in the traffic network aiming at each grid area, and recording;

wherein, the speed data preprocessing described in the step A2 is used to obtain the relative speed matrix

The method comprises the following specific steps: in this step, first, actual traffic operation data of a GPS, which is a vehicle-mounted global positioning system, is acquired at an arbitrary time t_iThe corresponding speeds of all R roads are calculated according to the sequence of the roadsRelationship, expressed in vector form V_i＝(v₁,v₂,…,v_R) (ii) a Further, the above process is repeated for all T moments, and finally the velocity vectors V at all moments are integrated_iGenerating an initial velocity matrix

Secondly, in the process of collecting the speed information of the traffic system by using the floating car technology, due to the influence of the network communication technology and human and natural factors, the speed information of each area at each moment can not be completely collected and reserved, so that the original speed information of the traffic system needs to be subjected to speed compensation processing, namely an original speed matrix

There are partial missing values, and therefore, a speed matrix needs to be found

in the above formula, the first and second carbon atoms are,

road R indicating lack of speed_jSet of neighboring roads

The number of elements other than 0;

Updated to compensated

In the original absolute velocity matrix

Extracting the speed vector of the road at all times

And extracts the maximum speed limit of the road section

The velocity vector of the moment

Is divided by the maximum speed limit

To obtain a normalized velocity

Obtaining a normalized velocity matrix

As follows:

wherein, the step A3 is used for constructing the grid traffic jam network model G₁(N₁,L₁) The method comprises the following specific steps: aiming at each grid area divided in the step A, firstly, according to actual map data under each grid area, using Python and Mapinfo software tools to extract structure information among roads and road intersection information contained in each grid area; secondly, selecting a proper geographical coverage area of traffic according to the requirement of actual research, abstracting a road intersection in each grid area as a node in the network according to a complex network method, abstracting the road in the grid area traffic network as a connecting edge between nodes in the network, and taking the relative speed of each road as the weight of the connecting edge so as to establish a grid traffic congestion network in each grid area; at the same time, most of the traffic networksThe roads are driven in two directions and have directionality, so the constructed grid traffic jam network is a directed weighting network;

wherein, the step A4 is to construct the mesh node traffic network model G₂(N₂,L₂) The method comprises the following specific steps: firstly, constructing an intersection traffic network model between grids according to intersection information contained in a plurality of grid areas and the traffic network of the whole research area, namely the road topological structure information of the whole grid, namely deleting the road topological structure information contained in the grid areas on the basis of the whole grid; secondly, counting the number of congested roads between the grid areas and recording the number; and finally, abstracting a grid area into nodes, abstracting whether congestion roads exist between grids as connecting edges or not by applying a complex network theory and a complex network method according to the information, and establishing a grid node traffic network model by taking the number of the congestion roads between the grids as connecting edge weights.

3. The traffic risk prediction method based on the complex network theory as claimed in claim 1, wherein: the feature extraction and screening based on the complex network theory described in the step B is performed as follows: for each time t_iThe grid traffic congestion network and the grid node traffic network, which are referred to as a double-layer traffic network for short, set a seepage threshold q (t) for seepage analysis, and determine the seepage threshold q (t) through the seepage analysis of the double-layer traffic network; secondly, aiming at each grid traffic jam network and each grid in the grid node traffic network at each moment under the seepage threshold q (t), namely the grid, as research objects, extracting the characteristics of each grid area by applying the theory and method of a complex network, wherein the characteristics comprise the structural and functional characteristics of the maximum jam sub-cluster, the mean value of the node betweenness, the mean value of the node degree, the average speed of the grid jam network and the number of first-order neighbor jam roads, and applying a machine learning method to extract the extracted characteristics on the basisScreening characteristics, namely selecting characteristics which greatly contribute to the traffic risk identification and prediction effect, constructing a high-quality sample characteristic set, and improving the traffic risk identification and prediction effect and efficiency to the greatest extent; meanwhile, labeling the grid area at the time t according to the proportion of the congested road at the time t + delta t in each grid area; the specific steps of the process are as follows:

step B1: analyzing seepage of a traffic network;

step B2: extracting risk features based on a complex network;

step B3: screening risk characteristics based on machine learning;

in step B1, the seepage analysis of the traffic network specifically includes the following steps: a seepage theory is applied to carry out seepage analysis on a double-layer traffic network, firstly, a control variable, namely a seepage threshold value is given for the traffic network at each moment, and the control variable is set as q (t), so that each road in the traffic network can present two states: unblocked state i.e. v_{i_ratio}(t) > q (t) and congestion status v_{i_ratio}Q (t) is less than or equal to q (t); deleting the unblocked connecting edges in the traffic network from the original network, and keeping the congested connecting edges in the original traffic network, wherein the rest network is the traffic network in a congested state at the moment t, and is referred to as a congested network for short; the next q (t) value at each moment corresponds to a congestion network, and as the q (t) value is reduced, the traffic network becomes more congested, namely, the more failed edges are, the traffic network becomes more sparse, so that the traffic congestion risk at the current moment is identified and predicted when the proper seepage threshold value q (t), namely the urban traffic network is in the stage with the most abundant congestion information, is selected;

wherein, in the step B2, the extracting of the risk features based on the complex network specifically includes the following steps: constructing a grid traffic jam network and a grid node traffic network at each moment under a seepage threshold q (t), and preliminarily extracting micro and macro characteristics of a grid area of a double-layer traffic network at each moment from the viewpoint of statistics physics by using a complex network theory and a complex network method from the viewpoint of structure and function; firstly, on a microscopic level, each grid traffic congestion network is used as a research object, and the microscopic features of each grid area are calculated at the key seepage threshold value at each moment; the grid traffic congestion network has different characteristics at different moments, and the congestion network in the grid area can show dynamic characteristics in space along with the evolution of time, so that the grid traffic congestion network has a spatio-temporal characteristic; secondly, on a macroscopic level, aiming at the constructed grid node traffic network model, taking a node, namely a grid area, as a research object, calculating the macroscopic characteristics of the grid area, namely the node, the maximum congestion sub-cluster of the grid traffic congestion network, the mean value of node betweenness, the mean value of node degree, the mean value of aggregation coefficient, the average speed of the congestion network and the growth rate thereof at each moment, wherein the macroscopic characteristics are as follows: the average path length of nodes, the strength of the nodes, the node betweenness, the node degree and the growth rate of the nodes of the grid node traffic network;

a method is provided for extracting features from the perspective of a complex network, the feature extraction of a grid is exemplified, and the features of an actual traffic system can be preliminarily extracted in a targeted manner from two aspects of the structure and the function of the actual traffic system according to the actual background and the actual situation of the actual traffic system, so that a sample feature set is constructed, and an initial feature matrix M is constructed_f；

The risk feature screening based on machine learning in step B3 is specifically performed as follows: in step B2, extracting the functional and structural features of the grid region at each time based on the related knowledge of the complex network, and then constructing an initial feature matrix M_fIn order to improve the accuracy and precision of risk identification and prediction in the traffic system, a relevant theoretical method of machine learning is used for carrying out feature selection on a preliminarily constructed sample feature set in the step, so that a high-quality sample feature set is screened out, and the effect of risk identification and prediction in the traffic system is improved to the greatest extent; meanwhile, the structure and function characteristics of the traffic system are screened, important characteristics are screened out, irrelevant characteristics are removed, dimension disasters can be relieved, the difficulty of learning tasks is reduced, and the generalization capability of an over-fitting enhanced machine learning model is reduced; has the characteristics of high complexity of space-time evolution aiming at a traffic system and aims toOptimizing a given learner, and selecting characteristics by using a classic LVW (Las Vegas wrapper) method in a wrapping mode, wherein the method comprises the following specific steps:

in the calculation process, the LVM method directly takes the performance of the finally used learner as the evaluation criterion of the feature subsets, selects the feature subsets which are most beneficial to the performance and customized for the given learner, screens out high-quality sample feature sets, and constructs a feature matrix

4. The traffic risk prediction method based on the complex network theory as claimed in claim 1, wherein: in step C, risk identification and prediction are performed based on ensemble learning theory, which includes the following steps: in order to accurately identify and predict the congestion risk in the traffic system and effectively control the congestion risk, the method comprises the steps of firstly constructing an integrated learning model by using machine learning and relevant mathematical knowledge; secondly, in order to eliminate the influence of non-uniform dimension among the feature vectors on the model, a feature scaling method is used for data feature set

Dividing the traffic system into a training set and a test set according to a preset proportion (a: b), training an ensemble learning model by using training set data, and then identifying and predicting risks of a grid area of the traffic system at the current moment by using the trained ensemble learning model; the specific steps of the process are as follows:

step C1: constructing an ensemble learning model;

wherein, in the step C1, the ensemble learning model is constructed by the following specific steps: a random forest model is constructed by using a Bagging framework and an integrated learning related theoretical method to identify and predict risks of a traffic system, and the method comprises the following implementation steps:

(2) Constructing a base learner, namely a decision tree: for each sample d_j＝{x_i1,x_i2,…,x_ik,y_i}(i∈[1,m]) Where K < M, generating decision trees and recording the result h of each decision tree_j(x)；

(3) Train T times of

Where φ (x), is a mathematical model having: absolute majority voting, relative majority voting, and weighted voting;

meanwhile, an ensemble learning model is constructed by applying an ensemble learning theory to identify and predict risks of the traffic system, and a proper ensemble learning framework and model can be selected according to the distribution characteristics of data samples to identify and predict the risks, so that the effects of identifying and predicting the risks of the traffic system are further improved;

wherein, in the step C2, the risk identification and prediction is performed by using the ensemble learning model, which specifically includes the following steps: in this step, based on the feature set of the high-quality sample extracted and screened in the step C, i.e. the feature matrix

The mainstream feature Scaling method in machine learning comprises min-max normalization, Mean normalization, Standardization and Scaling to unit length, wherein the method is used for a sample feature set of a traffic system

Mainstream methods of feature scaling;

number of samples in traffic systemAfter feature scaling of the data set, in this step, based on the standard sample feature matrix of the traffic system

And C, identifying and predicting risks in the traffic system by using the integrated learning model constructed in the step C, and learning the characteristics of the integrated learning model needing to learn the risks in the process, so that the standard sample characteristic set

According to a predetermined ratio, namely: and b, randomly dividing the training set into a training set and a testing set, wherein the training set is used for training the random forest wheel model to furthest learn the characteristics of risks, and the testing set is used for testing the training effect of the model.

5. The traffic risk prediction method based on the complex network theory as claimed in claim 1, wherein: the model evaluation and validation described in step D is performed as follows: in the process of identifying and predicting risks in the traffic system by using the ensemble learning model constructed in the step C, in order to accurately and scientifically evaluate the performance of the model, in the step, evaluation indexes are reasonably selected based on the actual traffic system condition and the final target, for example: accuracy, precision, recall and F1 values, the nature of which is calculated from a Confusion Matrix, i.e. fusion Matrix; secondly, in order to prevent the model from being over-fitted and accurately evaluate the generalization ability of the model, the ensemble learning model is evaluated by using a cross validation method in the step, so that the scientificity and reliability of model evaluation are further improved; the method specifically comprises the following substeps:

step D1: selecting a model evaluation index;

step D2: evaluating and analyzing the model;

wherein, the model evaluation index selected in step D1 is specifically made as follows: the method aims at identifying and predicting risks in a traffic system, and the final aim is to accurately and scientifically identify the risks in the traffic system by using an integrated learning model, the essence of the method belongs to the problem of abnormal detection in machine learning, and the problem of unbalanced data categories exists, namely the sample size of normal data is large while the sample size of risk data is small, so that the quality of model performance cannot be objectively reflected by the single use of the accuracy rate; according to the faced scene, the risk identification and detection problem is solved, under the scene, the model is evaluated by adopting two evaluation indexes of recall rate and accuracy, and the formula is as follows:

the evaluation analysis of the model in step D2 is specifically performed as follows: in the step, in order to prevent the model from being over-fitted and accurately evaluate the generalization ability of the model, the integrated learning model is evaluated by using a cross validation method in machine learning, so that the scientificity and reliability of model evaluation are further improved; classical methods of cross-validation are: the leave-one method, the K-turn cross validation and the self-service method are used for cross validation, and the steps are as follows:

(3) after N times of extraction, the original data set probably has

(4) repeating the steps M times, training M models and obtaining the values of the evaluation indexes, and then taking the average value to obtain the performance evaluation value of the model.