CN112101117A

CN112101117A - Expressway congestion identification model construction method and device and identification method

Info

Publication number: CN112101117A
Application number: CN202010831009.2A
Authority: CN
Inventors: 刘妮; 唐心瑶; 崔华; 袁鸽鸽
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2020-12-18

Abstract

The invention discloses a method and a device for constructing a highway congestion identification model and an identification method, wherein the method comprises the following steps of 1, acquiring a highway video frame image to obtain an initial image set; labeling each image in the initial image set to obtain a label set, wherein the labels comprise traffic categories which comprise congestion, saturation or unblocked states; step 2, taking the initial image set as input and the label set as output, training a deep convolutional neural network, fusing a network structure into a classic structure Bottleneck Layer and Squeeze and Excitation (SE) Block in a ResNet network on the basis of VGG-16, and simultaneously introducing an attention mechanism to construct a classification network of SE-VGG 16; the method can accurately identify the traffic jam state of the highway, can be applied to the traffic jam identification under the conditions of various traffic scenes and various camera view angles, has the end-to-end characteristic, and is simpler to realize and higher in identification precision.

Description

Expressway congestion identification model construction method and device and identification method

Technical Field

The invention belongs to the technical field of intelligent traffic, and particularly relates to a method and a device for constructing a highway congestion identification model and an identification method.

Background

Traffic congestion detection is important for monitoring traffic conditions and optimizing road network performance. Early traffic monitoring systems used loop detectors to count vehicle travel, collecting occupancy of traffic flow, but had limited ability to provide rich and accurate traffic information. In order to improve accuracy, researchers detect traffic jams by combining a route map with GPS data collected by a GPS tracker or a smartphone. However, these methods are destructive to the road surface, and need to rely on specially constructed and deployed resources, and data is difficult to obtain.

With the continuous reduction of the installation cost of the monitoring cameras, a large number of cameras in a road network can generate massive monitoring data every day, and required traffic flow parameters are extracted by analyzing the video data, so that the method has important practical significance for detecting the traffic jam state and has no damage. The traditional method detects and tracks vehicles in a video in an image processing mode, and counts traffic flow parameters, and the method has low robustness on shielding, is easy to cause wrong calculation of the traffic flow parameters, and affects the recognition result of a congestion state. In recent years, convolutional neural networks have achieved excellent results in image classification and recognition, they are able to automatically learn features in images, and are robust to translation, scaling, and rotation. Therefore, there are some researchers that apply convolutional neural networks to traffic congestion identification. One method is to estimate the traffic flow density through a network, the method is susceptible to the change of camera view angles, the other method is to calculate the congestion degree directly through a classification network, but the data of the current method only comprises specific cities and camera view angles, and the universality is general.

Disclosure of Invention

Aiming at the defects and shortcomings in the prior art, the invention provides a method, a device and a method for constructing a highway congestion identification model, which realize end-to-end identification and overcome the defects of low efficiency and the like of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for constructing a highway congestion identification model comprises the following steps:

step 1, acquiring a video frame image of a highway to obtain an initial image set;

labeling each image in the initial image set to obtain a label set, wherein the labels comprise traffic categories which comprise congestion, saturation or unblocked states;

step 2, training a deep convolutional neural network by taking the initial image set as input and the label set as output;

the deep convolutional neural network comprises a plurality of feature extraction layers and classification layers which are sequentially arranged; the feature extraction layer comprises 5 feature extraction blocks, and the classification layer comprises 3 full-connection layers;

the feature extraction Block comprises a Bottleneck Layer, a SE Block and a pooling Layer which are connected in series;

the Bottleneck Layer comprises 3 convolution layers with the sizes of 1 × 1, 3 × 3 and 1 × 1 which are sequentially connected in series;

the SE Block comprises 1 convolutional Layer for further performing feature extraction on a feature graph output by a Bottleneck Layer and 1 global average pooling Layer;

the global average pooling layer corresponds to the Squeeze stage, the Excitation stage and the weight stage, wherein the Squeeze stage and the Excitation stage are used for generating weight parameters required by an attention mechanism, and the weight stage is used for acting the global average pooling layer on a feature map obtained by the convolution layer action of the SE Block to generate feature maps capable of reflecting different importance;

the 5 feature extraction blocks of the feature extraction layer are connected in series and then are connected with the 3 full connection layers of the classification layer for final traffic class classification;

and obtaining a highway congestion identification model.

The invention also comprises the following technical characteristics:

specifically, the loss function in the highway congestion identification model adopts a cross entropy loss function.

Specifically, a deep convolutional neural network is constructed by selecting a deep learning frame Caffe, a random gradient descent method is selected to optimize the highway congestion identification model, the learning rate is set to be 0.01, the batch-size is set to be 32, and the iteration epoch is set to be 80 times.

Specifically, the global average pooling layer corresponds to the Squeeze stage, the Excitation stage and the weight stage, and each feature channel is converted into a real number with a global receptive field by using the global average pooling in the Squeeze stage; generating a weight w for each characteristic channel in an Excitation stage; multiplying the characteristic channel corresponding to the Squeeze stage by the weight generated in the Excitation stage in the weight stage to obtain characteristic graphs capable of reflecting different importance degrees; the method specifically comprises the following steps:

first, a standard convolution operation, i.e. a conversion operation, is performed, F_trX → U, X representing the original input characteristic diagram, U representing the output characteristic diagram, F_trRepresenting a conversion from an original input feature map to an output feature map;

wherein the content of the first and second substances,

U∈R^W×H×Cr represents a real number space domain, W ', H ', C ' respectively represent the width, height and channel number of the characteristic diagram X, and W, H and C respectively represent the width, height and channel number of the characteristic diagram U, and the specific formula is as follows:

wherein, represents convolution operation, C and s are serial numbers of characteristic diagram channels, and the value ranges are respectively 1-C, 1-C', x^sFor the s-th feature layer in the feature map X, v_cFor the c-th convolution kernel, the number of channels is the same as that of the feature map X,

at the s-th layer of the c-th convolution kernel, u_cRepresenting the c-th feature layer in the feature map U;

performing global average pooling in the Squeeze stage, and taking the feature map U as a new oneInputting, converting into a new output characteristic diagram Z, wherein Z is equal to

R

^1*1*C1,1 and C respectively represent the width, height and channel number of the characteristic diagram Z, and the specific formula is as follows:

wherein F represents a global average pooling operation, u_c(i, j) represents the element in the position of i row and j column on the c th feature layer in the feature map U, z_cRepresenting the c-th characteristic layer in the characteristic diagram Z;

in an Excitation stage, performing weight calculation on the feature map Z output by the Squeeze stage global average pooling to obtain a weight vector w corresponding to each feature layer in the feature map Z, wherein the formula is as follows:

w＝F_ex(Z,W)＝σ(W₂(W₁Z))(3)

wherein, F_exRepresenting a weight calculation operation, W₁,W₂All represent fully connected layers, the dimensions are C/r, W represents W₁,W₂A fully connected layer group formed together; w₁Z represents a first full-connection operation, the dimension of an output vector is 1 x C/r, r is a scaling parameter, and the output vector is a ReLu activation function mainly for reducing the dimension and reducing the calculation amount; w₂(W₁Z) is a second fully-connected layer operation, the output vector dimension is 1 x C, sigma is a Sigmoid activation function, and finally the weight vector w corresponding to each characteristic layer in the characteristic diagram Z is obtained through sigma operation;

the concrete formula of the Reweight stage is as follows:

wherein, F_scaleRepresents a weight assignment operation, u_cDenotes the c-th feature layer, w, in the feature map U_cThe element representing the c-th position in the weight vector w, represents a multiplication operation, i.e. w is added_cAnd u_cEach element in the feature layer is multiplied.

A highway congestion identification model construction device comprises:

the data set acquisition and labeling module is used for acquiring a highway traffic monitoring video and storing video frame images as an initial image set; labeling each image in the initial image set to obtain a label set, wherein the labels comprise traffic categories which comprise a congestion state, a saturation state or a smooth state;

the network training module is used for training the deep convolutional neural network by taking the initial image set as input and the label set as output;

and obtaining a highway congestion identification model.

A highway congestion identification method comprises the steps of inputting a highway video frame image to be identified into a highway congestion identification model constructed by the highway congestion identification model construction method, and obtaining an identification result.

Compared with the prior art, the invention has the beneficial technical effects that:

the method is simple to implement, has the end-to-end characteristic, can be applied to traffic jam recognition under the conditions of various traffic scenes and various camera viewing angles, ensures the universality under the road monitoring environment, and meets the requirement of accurately judging the road state in an intelligent traffic monitoring system. The method using deep learning has wide application because of high stability and precision.

Drawings

FIG. 1 is an overall flow chart of the method of the present invention;

FIG. 2 is a partial scene graph of a data set collected in accordance with the present invention;

FIG. 3 is a block diagram of bottleeck layers and SEs in the network of the present invention;

FIG. 4 is an overall block diagram of the network SE-VGG16 according to the present invention;

FIG. 5 is a graph of loss variation and accuracy during network training according to an embodiment of the present invention;

FIG. 6 is a graph of congestion/saturation/clear portion test results for an embodiment of the present invention;

Detailed Description

The following describes in detail specific embodiments of the present invention. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

The invention aims at a large number of collected highway monitoring video frames to produce a highway traffic jam data set, and designs a classification network, wherein the network structure is fused into a classic structure Bottleneck and Squeeze and Excitation (SE) block in a ResNet network on the basis of VGG-16, so that the highway jam state can be effectively identified.

Example 1:

as shown in fig. 1 to 6, the invention discloses a method for constructing a highway congestion identification model, which comprises the following steps:

labeling each image in the expressway video frame images to obtain a label set, wherein the labels comprise traffic categories which comprise congestion, saturation or unblocked states;

specifically, in this embodiment, the image data of the expressway video frame is from the transportation department in Hangzhou and the division of the city surrounded by xi ' an, and mainly includes Hangzhou Jinqu high speed, xi ' ang high speed and xi ' an city surrounded by high speed, and a fixed camera is used to shoot a total of 900 videos and 674 daytime scenes. The time length of each video is 1-10 minutes, one picture is cut from every 200 frames of the original video and put into a data set, each picture in the data set is marked to be in one of three states of congestion/saturation/unblocked, and the saturation type is prone to being selected for the critical situation that traffic classification is difficult to determine. In the finally generated data set, a training set comprises 21209 samples (congestion 7269 containing 371 scenes, saturation 6559 containing 278 scenes, smoothness 6921 containing 407 scenes), a verification set comprises 3661 samples (congestion 1283 containing 87 scenes, saturation 1157 containing 56 scenes, smoothness 1221 containing 90 scenes), and a test set comprises 6101 samples (congestion 2137 containing 158 scenes, saturation 1929 containing 132 scenes, smoothness 2035 containing 107 scenes). As shown in fig. 2, a partial scene graph of a data set.

the feature extraction Block comprises a Bottleneck Layer, a SE Block and a pooling Layer which are connected in series; wherein, the Bottleneck Layer comprises 3 convolution layers with the sizes of 1 × 1, 3 × 3 and 1 × 1 which are connected in series in sequence; SE Block comprises: the method comprises the following steps that 1 convolution Layer is used for further extracting features of a feature graph output by a Bottleneck Layer and 1 global average pooling Layer, the global average pooling Layer corresponds to the stages of Squeeze, Excitation and weight, the stages of the Squeeze and the Excitation are used for generating weight parameters required by an attention mechanism, and the weight stage acts the global average pooling Layer on the feature graph obtained by the convolution Layer action of SE Block to generate feature graphs capable of reflecting different importance; connecting 3 full-connection layers behind the 5 serial feature extraction blocks for final congestion state classification;

and obtaining a highway congestion identification model.

Specifically, a deep convolutional neural network is built, network input is an RGB image with any size, the network structure comprises 13 convolutional layers, 8 SE block modules, 5 pooling layers, three full-connection layers and a final Softmax layer, and finally 1 x 3 vectors are output and comprise the probability of 3 states of congestion/saturation/smoothness.

The loss function adopts a cross entropy loss function which is most commonly used by classification tasks.

The bottle Layer is also called a Bottleneck Layer, the main purpose of using a 1 × 1 convolution kernel is to reduce the number of model parameters and the calculation amount, and through two times of convolution, the dimensionality of a feature Layer is reduced and then increased, so that feature extraction can be performed more effectively. As shown in fig. 3(a), the original convolutional Layer and the bottleeck Layer structure, from which it can be seen that each bottleeck Layer is a three-Layer structure, where 1 × 1 and 3 × 3 both represent the convolution kernel size and 16 and 64 both represent the number of channels.

The Squeeze and Excitation (SE) Block represents compression and Excitation, the channel dependence in the traditional convolution process is not fully utilized, and the convolution kernel of channel learning is carried out in a local receptive field, so that information outside a specific area of each feature map cannot be utilized. The SE Block adds an Attention mechanism when the network low-layer receptive field is small, the weight of the effective area of each characteristic layer is enlarged, and the weight of the ineffective or low-effect area is compressed, so that the model can utilize global information during training to obtain better training effect.

SE Block can be mainly divided into an Squeeze phase, an Excitation phase, and a weight (adjustment weight) 3 phases. FIG. 3(b) shows a structure of SE Block. Converting each two-dimensional feature channel into a real number with a global receptive field by using global average pooling in the Squeeze stage; generating a weight w for each characteristic channel in an Excitation stage; and multiplying the characteristic channel corresponding to the Squeeze stage by the weight generated in the Excitation stage in the weight stage to obtain characteristic graphs capable of reflecting different importance degrees.

As shown in FIG. 3(b), first, a standard convolution operation, i.e., a conversion operation, F is performed_trX → U, X representing the original input characteristic diagram, U representing the output characteristic diagram, F_trRepresenting a conversion from an original input feature map to an output feature map;

wherein the content of the first and second substances,

performing global average pooling in the Squeeze stage, taking the feature map U as a new input, and converting into a new output feature map Z, wherein Z belongs to

R

wherein F represents a global average pooling operation, u_c(i, j) represents the element in the position of i row and j column on the c th feature layer in the feature map U, z_cIn the representation characteristic diagram ZThe c characteristic layer;

w＝F_ex(Z,W)＝σ(W₂(W₁Z)) (3)

the concrete formula of the Reweight stage is as follows:

SE-VGG16 network architecture: and the two structures are effectively combined and added into the original VGG16 network to form a new convolutional neural network SE-VGG16 model. The detailed model structure is shown in fig. 4, wherein the leftmost column is the names of convolutional layers, pooling layers, fully-connected layers and classifiers, the module for adding SE-Block is labeled behind each convolutional layer in the leftmost column, the middle column is the size and number of convolutional cores, and the rightmost column is the output size and number of channels of the feature map. The final network structure contains 13 convolutional layers, 8 SE block modules, 5 pooling layers, three fully-connected layers and the final Softmax layer.

Setting hyper-parameters, training the network: a deep learning framework Caffe is selected to build a network, a random gradient descent method is selected to optimize a network model, the learning rate is set to be 0.01, the batch-size is set to be 32, and the iteration epoch is set to be 80 times. Fig. 5 shows a graph of loss variation and accuracy during network training.

Testing the network performance: and inputting the test sets in the production data set into the network in batches for prediction, comparing the network prediction result with the real label, and calculating the recognition performance of the network on the highway congestion. As shown in fig. 6, the test result is shown in the three states of congestion/saturation/clear.

Example 2:

the embodiment provides a device for constructing a highway congestion identification model, which comprises:

and obtaining a highway congestion identification model.

Example 3:

the embodiment provides a highway congestion identification method, which includes inputting a to-be-identified highway video frame image into a highway congestion identification model constructed by a highway congestion identification model construction method, and obtaining an identification result.

Example 4:

in order to verify the effectiveness of the method provided by the invention, the network model is trained and tested by using the self-labeling data set.

In the training process, the improved SE-VGG16 model is pre-trained using ImageNet data set first, and then trained using self-labeling data set. Fig. 5(a) is a loss variation curve of the validation set, and fig. 5(b) is an accuracy variation curve of the validation set, wherein the abscissa represents epoch and the ordinate represents a specific numerical value. When model training was not started, the accuracy of the validation set was 81.36% and loss was 0.4762. When the training epoch reaches 30 times, the verification accuracy of the model begins to slowly increase; when the epoch training times reach 60 times, the loss and the verification accuracy of the model both tend to be stable; when the number of epochs of training reaches 80, the accuracy of the verification set reaches 98.19%. It is demonstrated that the model has converged to a better degree.

As shown in fig. 6, the partial test result is a graph of congestion/saturation/clear. Table 1 shows the overall experimental test results, and table 2 shows the test results of each type of congestion condition.

TABLE 1 test results of the experiments

Table 2 test results of each type of congestion

Table 3 identifies the comparison results for the different algorithms, and the SE-VGG16 increased the accuracy by 2.35% over the fine tuning VGG 16. The Bottleneck Layer can reduce the number of parameters, can more effectively extract the characteristics, and the SEBlock module can also obviously improve the network performance by utilizing the interdependency among the channels.

TABLE 3 identification of comparison results by different algorithms

Experimental results show that the expressway congestion identification completed by the method has high precision. The experiment proves the effectiveness of the method provided by the invention to a certain extent.

Claims

1. A method for constructing a highway congestion identification model is characterized by comprising the following steps:

and obtaining a highway congestion identification model.

2. The method for constructing the highway congestion identification model according to claim 1, wherein the loss function in the highway congestion identification model adopts a cross entropy loss function.

3. The method for constructing the highway congestion identification model according to claim 1, wherein a deep learning framework Caffe is selected to construct a deep convolutional neural network, a random gradient descent method is selected to optimize the highway congestion identification model, the learning rate is set to be 0.01, the batch-size is set to be 32, and the iteration epoch is set to be 80 times.

4. The method according to claim 1, wherein the global average pooling layer corresponds to the Squeeze, Excitation and weight stages, and each feature channel is converted into a real number with a global receptive field by using global average pooling in the Squeeze stage; generating a weight w for each characteristic channel in an Excitation stage; multiplying the characteristic channel corresponding to the Squeeze stage by the weight generated in the Excitation stage in the weight stage to obtain characteristic graphs capable of reflecting different importance degrees; the method specifically comprises the following steps:

wherein X ∈ R^W'×H'×C',U∈R^W×H×CR represents a real number space domain, W ', H ', C ' respectively represent the width, height and channel number of the characteristic diagram X, and W, H and C respectively represent the width, height and channel number of the characteristic diagram U, and the specific formula is as follows:

performing global average pooling in the Squeeze stage, taking the feature map U as a new input, and converting into a new output feature map Z, wherein Z belongs to R^1*1*C1,1 and C respectively represent the width, height and channel number of the characteristic diagram Z, and the specific formula is as follows:

w＝F_ex(Z,W)＝σ(W₂(W₁Z)) (3)

the concrete formula of the Reweight stage is as follows:

5. A device for constructing a highway congestion identification model is characterized by comprising the following steps:

and obtaining a highway congestion identification model.

6. A highway congestion identification method is characterized in that a highway video frame image to be identified is input into a highway congestion identification model constructed by the highway congestion identification model construction method as claimed in any one of claims 1 to 4, and an identification result is obtained.