CN111292534A

CN111292534A - Traffic state estimation method based on clustering and deep sequence learning

Info

Publication number: CN111292534A
Application number: CN202010090595.XA
Authority: CN
Inventors: 陈阳舟; 马鹏飞; 师泽宇
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2020-06-16

Abstract

The invention provides a traffic state estimation method based on kmeans clustering and deep sequence learning, belongs to the field of intelligent traffic systems, and mainly solves the problem of estimating the traffic state of the whole expressway under the condition that traffic flow data of partial road sections in an urban expressway cannot be acquired in real time. The method is characterized by comprising the following steps: (1) dividing a rapid road network; (2) modeling and data acquisition of the expressway; (3) preprocessing and normalizing data; (4) calculating Euclidean distances among traffic flow data through a kmeans clustering algorithm, and determining the traffic state grade of each data point; (5) and (3) designing a deep sequence learning Seq2Seq model, and performing traffic state recognition on the whole road network through model iterative learning. The invention fully considers the relation of traffic flow among road sections, exerts the advantages of a machine learning algorithm in the traffic field, obtains the traffic state of the whole road network in time and can provide reliable traffic information for a driving subject.

Description

Traffic state estimation method based on clustering and deep sequence learning

Technical Field

The invention relates to the field of intelligent traffic systems, in particular to a traffic state estimation method based on kmeans clustering and deep sequence learning.

Background

With the continuous development of social economy and the continuous growth of urban population in China, more and more families have one or more private automobiles, the traffic pressure of various cities in China is increased due to the rapidly-growing number of vehicles, the running efficiency of an urban traffic network is seriously influenced, the travel time of residents is increased, in addition, the energy waste is aggravated when the vehicles are in congestion in low-speed running, the emission of tail gas is increased due to frequent flameout and starting, and the living environment of the residents is polluted. Therefore, how to accurately estimate the traffic flow of the urban road network and relieve traffic pressure under the condition of meeting the travel requirement of people becomes the research focus of the important direction and academic circles of traffic management development.

Traffic state estimation refers to a process of inferring the overall road network traffic state using traffic flow data observed in the road network. At present, the method is mainly divided into two algorithms based on model driving and data driving: the model driving algorithm generally describes the transmission relation among road sections by using a traffic flow model, and deduces the traffic state change condition through a mathematical formula; data-driven algorithms typically use machine learning to analyze historical traffic flow data and mine relationships between the data to estimate or predict road segment traffic conditions.

However, due to the reasons of technology and capital, the current road detectors of urban road networks cannot achieve seamless coverage, and only can detect traffic flow data of partial road segments, so that most of the existing technologies are researched around a single road segment, the road segments without detected data cannot be effectively estimated, and the requirements of travel people on road network traffic information cannot be met. No effective solution has therefore been proposed to the above problems.

Disclosure of Invention

The invention aims to provide a traffic state estimation method based on kmeans clustering and deep sequence learning, aiming at solving the problem of estimating the traffic states of all road sections under the condition that traffic flow data of part of road sections in an urban expressway cannot be acquired in real time.

The technical scheme of the invention is implemented according to the following steps:

s1, fast path division: dividing an urban expressway into a plurality of balanced road sections according to a Cellular Transmission Model (CTM) theory, and ensuring that the traffic flow density inside each divided road section is uniformly distributed, and the section flow, the traffic flow speed and the like are approximately the same;

s2, data acquisition: modeling the selected express way by adopting simulation software, setting a virtual detector, and acquiring historical parameter data of traffic flow of each road section, wherein the characteristics of the data comprise the traffic flow of the section of the road section, the speed and the time occupancy of the road section;

s3, preprocessing data: removing the collected repeated data and abnormal data, and carrying out normalization processing on different traffic flow characteristic data to convert the data into values in an interval of [0,1 ];

s4, dividing traffic states: dividing the traffic state into two state grades of free flow and crowded flow according to the basic map characteristic of the road traffic flow, respectively carrying out cluster analysis on the historical traffic flow data of each road section by adopting a kmeans clustering algorithm, and judging the category of each data point according to the Euclidean distance of the data in a three-dimensional space, thereby achieving the purpose of calibrating the data set of each road section;

s5, traffic state estimation: and constructing a training data set by the calibrated data according to a certain proportion, designing a deep sequence learning model Seq2Seq model, inputting a traffic flow data sequence of a part of road sections in the expressway by the model, outputting a traffic state sequence of all road sections of the expressway, and realizing the traffic state estimation of the whole road section in an iterative learning mode to obtain an estimation result.

Further, in step S1, the segment division rule of the express way is:

s1.1, dividing a road network into a plurality of road sections according to the number and the positions of ramps in the expressway network, the change positions of the number of lanes and the change positions of the curvature radius of the road, wherein each road section is called a link road section; the ramp comprises an entrance ramp and an exit ramp, and the number of lanes is changed into increase or decrease;

s1.2, each link section is further divided into a plurality of smaller sections with the same length according to the length of each link section, each small section is called a cell, and each cell is guaranteed to be balanced.

Further, in step S2, the expressway model is divided according to the road segment division rule of S1, and the traffic demand of each road segment and the split ratio between the entrance ramp and the main road are dynamically set according to the real data, so that the simulated traffic flow can evolve according to the change condition of the real traffic flow. The historical traffic flow parameter data collected by each road section can be data of a plurality of continuous working days and is simulated by changing a software random seed mode.

Further, in the step S3, since the dimension difference between different characteristic variables of the traffic flow data is large, for example, the traffic flow can reach hundreds of thousands, and the traffic flow speed is only dozens, directly applying the data to the subsequent model training may cause inaccurate results. Therefore, the collected data needs to be normalized to make different feature variables fall in a specific area, and the normalization formula is as follows:

wherein: x' is the value after data normalization, x is the value before data normalization, x_minIs the minimum value, x, in the data set_maxIs the maximum value in the data set.

Further, in the step S4, the traffic state is classified according to the basic graph formed by the cellular internal traffic flow data, and the traffic state is divided into two types of free flow and congestion flow, wherein the traffic operation in the free flow state is relatively stable, the running vehicles are hardly influenced by the outside, and the state is represented by the numeral 0; traffic running in a crowded flow state is extremely unstable and interference between vehicles is severe, and this state is denoted by numeral 1.

Further, in step S4, a kmeans clustering algorithm is used to cluster the historical traffic flow parameter data, calibrate each group of historical traffic flow parameter data (0 or 1 state), calculate the mean value of different traffic state category data, and compare the differences, where the clustering algorithm specifically includes the following steps:

s4.1, determining the number k of clustering categories, and randomly selecting k data points from the data set as an initial clustering center;

s4.2, respectively calculating Euclidean distances between each data point and the clustering centers, and dividing the Euclidean distances into categories where the clustering centers which are close to each other are located, wherein the calculation formula is expressed as follows:

wherein x is_iFor the ith data point, μ in the dataset_jIs the central point of the jth clustering category, and k is the number of the clustering categories.

S4.3, calculating the arithmetic mean value of the data points contained in different categories according to the clustering result, replacing the previous clustering center point with the value, and updating the formula to be expressed as:

wherein, c_jThe number of data points included for the jth cluster category.

S4.4, comparing the difference between the current clustering center point and the center point before updating, if the current clustering center point and the center point before updating are the same, stopping iteration, and ending the algorithm; if not, the process returns to step S4.2 and continues the iteration.

Further, in step S4, the objective function of the algorithm is represented as:

where E represents the squared error of the algorithm, C_jIndicating the jth cluster.

Further, in step S5, the depth sequence Seq2Seq model is a model with a "many-to-many" structure, input and output data of the model need to be fitted in advance before training the model, traffic flow data and state data of the expressway network are arranged in a spatial order to obtain a road network traffic flow data (continuous) sequence and a traffic state (discrete) binary sequence corresponding to each other on road segments, and the Seq2Seq model is trained by using the traffic flow data sequence as input and the state sequence as output. In addition, since data of a part of road segments on an actual road network cannot be acquired in real time, the data of the part of road segments in the input sequence needs to be removed when the model is trained, so that the trained model can estimate the state sequence of the whole express way by using the data of the part of road segments in the future estimation process.

Further, in step S5, the Seq2Seq model is composed of two layers of LSTM neural networks, so as to solve the problems of gradient disappearance and gradient explosion faced by RNN neural networks. The former layer LSTM network is used as an encoder and is responsible for analyzing the input traffic flow data sequence, and the formula of the encoder is as follows:

h_t＝f(h_t-1,x_t) (5)

where f denotes the activation function of the encoder, x_tRepresenting a sequence of traffic flow data at time t, h_tRepresenting the hidden state of the encoder at time t.

The next layer of LSTM network is used as a decoder and is responsible for analyzing the output of the encoder, the traffic state probability distribution sequence of the whole express way is calculated according to a certain rule, and the output formula of the decoder and the model is expressed as follows:

s_t＝f(s_t-1,y_t-1,c) (6)

p(y_t|y_t-1,y_t-2,...,y₁,c)＝g(s_t,y_t-1,c) (7)

where f denotes the activation function of the decoder, c denotes the output of the encoder, s_tIndicating the hidden state of the decoder at time t, y_tIndicating the output result at the moment t of the decoder, g indicating the activation of the output layerA function.

Furthermore, in order to enable the model to remember traffic flow data information of more road sections in the encoding stage and improve the subsequent estimation precision, the invention introduces an attention mechanism to optimize on the basis of the existing model, and adds a dynamic weight to the hidden state of the encoder at all times, so that the output c of the encoder can be dynamically updated along with the hidden state, the model is ensured to acquire more important information from the input sequence, and the updating formula is represented as:

e_ij＝a(s_i-1,h_j) (10)

wherein, c_iRepresenting the dynamically updated encoder output, a_ijRepresenting the weight between the jth hidden state of the encoder and the ith hidden state of the decoder, e_ijAn alignment model is shown to measure the correlation between the jth hidden state of the encoder and the ith hidden state of the decoder.

Compared with the prior art, the invention has the following beneficial effects:

a traffic state estimation method based on kmeans clustering and deep sequence learning is provided under the condition that traffic flow data of a part of road sections in an urban expressway cannot be acquired in real time. Firstly, state calibration is carried out on historical traffic flow data of roads through a kmeans clustering algorithm, then a deep sequence learning Seq2Seq model is designed, a calibrated traffic flow data training model is adopted, iterative learning is carried out to obtain a traffic state sequence of the whole express way, and the problems that a road detector cannot achieve seamless coverage and only can detect traffic flow data of partial road sections in the existing traffic state estimation problem are solved. The method fully considers the relation of traffic flow among road sections, exerts the advantages of a machine learning algorithm in the traffic field, obtains the traffic state condition of the whole road network in time and provides reliable and accurate information for a driving subject.

Drawings

FIG. 1 is a flow chart of a traffic state estimation method based on kmeans clustering and deep sequence learning according to the present invention;

FIG. 2 is a schematic diagram of the expressway division;

FIG. 3 is a schematic diagram of simulation modeling using a Kyoto express way as an example;

FIG. 4 is a traffic flow data cluster diagram of a kmeans clustering algorithm;

FIG. 5 is a block diagram of a deep sequence learning Seq2Seq model;

Detailed Description

In order to clearly illustrate the present invention, the present invention will be further described with reference to the following examples and the accompanying drawings. It is to be understood that the following detailed description is intended to be illustrative, but not restrictive, and is not intended to limit the scope of the invention.

As shown in fig. 1, the invention discloses a traffic state estimation method based on kmeans clustering and deep sequence learning, which comprises the following steps:

s1, fast path division: in the example, a road section (from west to east) from a Beijing Jingtong express way to a distant bridge is selected as an example for analysis, the length of the express way of the section is about 7km, 7 exit ramps and 6 entrance ramps are provided, lane change and turning conditions exist in the road section, the road section is divided into a plurality of balanced road sections according to a CTM theory, the division result is shown in figure 2, so that the traffic flow density inside each divided road section is uniformly distributed, and the section flow, the traffic flow speed and the like are approximately the same; the specific division rule is as follows:

s1.1, dividing a road network into a plurality of road sections according to the number and the positions of ramps (including entrance ramps and exit ramps), the positions of the change of the number of lanes (the increase or decrease of the number of lanes) and the positions of the change of the curvature radius of the road in the expressway network, wherein each road section is called a link road section;

s1.2, each link section is further divided into a plurality of smaller sections with equal length according to the length of the link section, and each small section is called a cell.

The Jingtong express way is divided into 18 cells by the rule, so that each cell is balanced.

S2, data acquisition: as shown in fig. 3, simulation software is adopted to model a selected jingtong express way, traffic demand and ramp split ratio are dynamically set according to the change situation of the traffic flow of an actual road network, required traffic flow data is obtained by arranging a virtual detector in each cell, the detector counts every 30s, traffic flow data from 6 points to 10 points of an early peak on a working day (Monday to Friday) is collected by changing random seeds, 43200 groups of data are collected in total, the characteristics of the data comprise the traffic flow, the speed and the road section time occupancy rate of a road section, the data of the previous 4 days are selected as a training set, and the data of the last day are used as a test set.

S3, preprocessing data: after data collection is finished, firstly, repeated data and abnormal data in the data need to be cleared, secondly, because dimension difference between characteristic variables of the collected traffic flow data is large, if the traffic flow can reach hundreds and thousands, and the traffic flow speed is only dozens, the result is inaccurate when the data is directly used for subsequent model training, normalization processing needs to be carried out on the collected data, so that different characteristic variables can fall in a [0,1] interval, and a normalization formula is as follows:

S4, dividing traffic states: according to the relation between the road traffic flow data characteristics, data are divided into two state grades of free flow and crowded flow by adopting a kmeans clustering algorithm. The traffic running in the free flow state is relatively stable, the running vehicle is hardly influenced by the outside, and the state can be represented by a number 0; traffic running in a crowded flow state is extremely unstable and interference between vehicles is severe, and this state can be represented by the numeral 1. The clustering algorithm comprises the following specific steps: :

s4.1, determining that the number k of the clustering categories is 2, and randomly selecting 2 data points from a traffic flow data set as an initial clustering center;

wherein, c_jThe number of data points included for the jth cluster category.

Further, in step S4, the objective function of the algorithm is represented as:

Specifically, as shown in fig. 4, a traffic state kmeans clustering result graph of one of the road segments is shown, in the graph, triangular data points represent a crowded flow, and circular data points represent a free flow. As can be seen from the figure, in the free flow state, the occupancy rate of the traffic flow of the road section is relatively low, and is approximately in linear relation with the flow, the running vehicles are hardly interfered by external factors, and the high-speed running can be kept; under the crowded flow state, the traffic flow occupancy of the road section begins to rise rapidly, the flow can decline gradually after reaching the peak, at the moment, the traffic operation is extremely unstable, the data discrete degree is higher, the vehicles interfere with each other, and only can be driven at low speed in the road. Therefore, the road section traffic flow data can be well divided into states through k-means clustering, the boundaries among different classes are obvious, the clustering effect is good, and the method accords with the change conditions of a road section basic diagram and actual traffic flow.

S5, traffic state estimation: designing a deep sequence learning model Seq2Seq model which is a model with a structure of many-to-many, wherein input and output data of the model need to be fitted well in advance before training the model, traffic flow data and state data of an expressway network are arranged according to a spatial sequence to obtain road network traffic flow data (continuous) sequences and traffic state (discrete) binary sequences corresponding to road sections one by one, and the traffic flow data sequences are used as input and the state sequences are used as output to train the Seq2Seq model. In addition, since data of a part of road segments on the actual road network cannot be obtained in real time, the data of the part of road segments in the input sequence needs to be eliminated when the model is trained. Based on this, the input of the experimental design model is a traffic flow data sequence composed of

cells

1,2,3,4,5,7,9,11,13,15,17, and the output is a traffic state binary (0 or 1) sequence composed of all 18 cells, totaling 2400 group sequence pairs.

Specifically, the model structure designed in this experiment is shown in fig. 5, and is composed of two layers of LSTM neural networks, so as to solve the problems of gradient disappearance and gradient explosion faced by the RNN neural network. The former layer LSTM network is used as an encoder and is responsible for analyzing the input traffic flow data sequence, and the formula of the encoder is as follows:

h_t＝f(h_t-1,x_t) (5)

where f denotes the activation function of the encoder, x_tRepresenting traffic flow data sequences at time t，h_tRepresenting the hidden state of the encoder at time t.

s_t＝f(s_t-1,y_t-1,c) (6)

p(y_t|y_t-1,y_t-2,...,y₁,c)＝g(s_t,y_t-1,c) (7)

where f denotes the activation function of the decoder, c denotes the output of the encoder, s_tIndicating the hidden state of the decoder at time t, y_tRepresenting the output result at the moment t of the decoder and g the activation function of the output layer.

e_ij＝a(s_i-1,h_j) (10)

wherein, c_iRepresenting the dynamically updated encoder output, a_ijRepresenting the weight between the jth hidden state of the encoder and the ith hidden state of the decoder, e_ijRepresenting an alignment model for scaling the jth hidden state of the encoder with the th hidden state of the decoderCorrelation between i hidden states.

And training the designed Seq2Seq model by adopting the fitted traffic flow data sequence, and carrying out verification test on the model by a five-fold intersection method, wherein when the tested model reaches a preset performance index, the traffic state of the road network can be estimated by using real-time data acquired by the actual road network, and further the real-time traffic state sequence of the whole express way is obtained.

In summary, the invention provides a traffic state estimation method based on kmeans clustering and deep sequence learning. Firstly, state calibration is carried out on historical traffic flow data of roads through a kmeans clustering algorithm, then a deep sequence learning Seq2Seq model is designed, a calibrated traffic flow data training model is adopted, iterative learning is carried out to obtain a traffic state sequence of the whole express way, and the problems that a road detector cannot achieve seamless coverage and only can detect traffic flow data of partial road sections in the existing traffic state estimation problem are solved. The invention fully considers the relation of traffic flow among road sections, exerts the advantages of a machine learning algorithm in the traffic field, obtains the traffic state condition of the whole road network in time and can provide reliable and accurate information for a driving subject.

It should be finally noted that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it is obvious for those skilled in the art to make other variations or modifications based on the above description, and all the express way information cannot be exhaustively listed here, and all the obvious variations or modifications that belong to the technical scheme of the present invention still fall within the protection scope of the present invention.

Claims

1. A traffic state estimation method based on clustering and deep sequence learning is characterized by comprising the following steps:

2. The traffic state estimation method based on clustering and deep sequence learning of claim 1, wherein in step S1, the segment division rule of the expressway is:

3. The method according to claim 1, wherein in step S2, the expressway model is divided according to the road segment division rule of S1, and the traffic demand of each road segment and the split ratio between the on-off ramp and the main road are dynamically set according to the real data, and the historical traffic flow parameter data collected by each road segment can be data of a plurality of working days continuously, and is simulated by changing the random seed of the software.

4. The traffic state estimation method based on clustering and deep sequence learning of claim 1, wherein in step S3, the collected data is normalized to make different feature variables fall into a specific area, and the normalization formula is:

wherein x' is the value after data normalization, x is the value before data normalization, x_minIs the minimum value, x, in the data set_maxIs the maximum value in the data set.

5. The method according to claim 1, wherein in step S4, the traffic status is classified according to a basic map formed by the data of the traffic flow inside the cells, and the classification is divided into two types of free flow and congestion flow, wherein the traffic operation in the free flow state is stable, the running vehicles are hardly affected by the outside, and the state is represented by the number 0; traffic running in a congested flow state is extremely unstable and interference between vehicles is severe, and this state is denoted by numeral 1.

6. The traffic state estimation method based on clustering and deep sequence learning according to claim 1, wherein in step S4, a kmeans clustering algorithm is used to cluster the historical traffic flow parameter data, the historical traffic flow parameter data of each group is calibrated (0 or 1 state), and the mean value of different traffic state category data is calculated, and the difference is compared, wherein the clustering algorithm comprises the following specific steps:

wherein x is_iFor the ith data point, μ in the dataset_jIs the central point of the jth clustering category, and k is the number of the clustering categories;

s4.4, comparing the difference between the current clustering center point and the center point before updating, if the current clustering center point and the center point before updating are the same, stopping iteration, and ending the algorithm; if not, returning to the step S4.2 and continuing iteration;

further, in step S4, the objective function of the algorithm is represented as:

7. The traffic state estimation method based on clustering and deep sequence learning of claim 1, wherein in step S5, the deep sequence Seq2Seq model is a model with a "many-to-many" structure, input and output data of the model need to be fitted in advance before training the model, traffic flow data and state data of the expressway network are arranged in a spatial order to obtain a traffic flow data continuous sequence and a traffic state discrete binary sequence corresponding to each other on road segments, the traffic flow data sequence is used as input, and the state sequence is used as output to train the Seq2Seq model; in addition, since data of a part of road segments on an actual road network cannot be acquired in real time, the data of the part of road segments in the input sequence needs to be removed when the model is trained, so that the trained model can estimate the state sequence of the whole express way by using the data of the part of road segments in the future estimation process.

8. The traffic state estimation method based on clustering and deep sequence learning of claim 1, wherein in step S5, the Seq2Seq model is composed of two layers of LSTM neural networks, so as to solve the problem of gradient disappearance and gradient explosion faced by RNN neural networks; the former layer LSTM network is used as an encoder and is responsible for analyzing the input traffic flow data sequence, and the formula of the encoder is as follows:

h_t＝f(h_t-1,x_t) (5)

where f denotes the activation function of the encoder, x_tRepresenting a sequence of traffic flow data at time t, h_tIndicating the hidden state of the encoder at time t;

s_t＝f(s_t-1,y_t-1,c) (6)

p(y_t|y_t-1,y_t-2,...,y₁,c)＝g(s_t,y_t-1,c) (7)

9. The traffic state estimation method according to claim 1, wherein in step S5, an attention mechanism is introduced to optimize based on the existing model, and a dynamic weight is added to the hidden state of the encoder at all times, so that the output c of the encoder can be dynamically updated accordingly, and it is ensured that the model can obtain more important information from the input sequence, and the update formula is represented as:

e_ij＝a(s_i-1,h_j) (10)