Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a flowchart illustrating the operation of the intelligent traffic light control method according to a preferred embodiment of the present invention.
And step S1, real-time acquisition of road vehicle videos is carried out by mounting a camera on a traffic signal lamp, video image information of the road vehicle is obtained, the acquisition target is the road and the vehicle within a certain distance in four directions of the intersection, and analysis data are provided for subsequent identification. Specifically, the method comprises the following steps:
firstly, a camera is installed on a traffic signal lamp to configure a video image information acquisition device. The crossroad of the embodiment is provided with four traffic signal lamps respectively, and the four traffic signal lamps are collected through monitoring cameras arranged on each traffic signal lamp. However, it is not excluded that two traffic signal lamps are arranged at a crossing with small traffic flow in order to save cost, the strategy of installing the monitoring camera is different from that of installing four traffic signal lamps under the condition, and the front side and the back side of each traffic signal lamp are respectively provided with a camera for respectively collecting video image information of vehicles on a road.
And then, adjusting the video image information acquisition angle of the camera. And adjusting the acquisition angle of the camera so that m rows of coming lanes are presented in the video acquisition range to the maximum extent.
And finally, setting the video image information acquisition range of the camera. In the video image information acquisition process, the acquisition range distance of each camera is limited, and the limited range is set to be n meters from the lane to identify vehicles on the road within the distance range.
In the image acquisition process, the video image information acquisition range of each camera is limited, and only the image information in the range of the m rows of lanes is acquired, so that the workload in the subsequent identification process can be simplified through the preprocessing, and the identification accuracy and speed are improved. As m is 2 in fig. 2.
And step S2, detecting the obtained video image information, thereby identifying the specific number of vehicles in each direction of the intersection and providing parameters for the control of the intelligent traffic signal lamp. That is, firstly, the vehicle in the video frame image is accurately detected in real time through the improved convolutional neural network, the vehicle identification frame is circled out, and then the identification frame in the video frame image is counted. The method specifically comprises the following steps:
step S21, preprocessing the acquired video image information, reserving only the information of the lane and the vehicle, and using the processed video image information for vehicle detection. The acquisition range of video image information is limited, and on the basis, the video image is preprocessed by methods such as cutting and background removal, so that the identification accuracy can be improved. The ground color after cutting is defaulted as the road ground color so as to improve the detection accuracy of the improved convolutional neural network.
And step S22, constructing a pre-training network model, constructing a training network on the basis of the pre-training network model, pre-training the constructed pre-training network, initializing the training network by using pre-trained network parameters, and training the training network by using the constructed data set until the number of vehicles in the corresponding lane in the picture is detected in real time. In the embodiment, the improved convolutional neural network is used for carrying out real-time vehicle detection on the preprocessed video image information, and the improved convolutional neural network is used for carrying out vehicle target detection and identification. The principle of the improved convolutional neural network is to detect video frame images within a deep network framework. The method comprises the following specific steps:
and step S221, selecting the pictures of the intersection traffic conditions collected by the monitoring camera, and constructing a data set.
In view of the lack of related data sets at present, although some data sets such as ImageNet contain some vehicle pictures, the difference between scenes and visual angles is large, and in order to improve the system effect, the embodiment constructs a small data set by itself.
And acquiring intersection traffic condition pictures through deployed monitoring cameras at various intersections. An appropriate amount of pictures (1000-10000 in this embodiment) are selected, the pictures are selected as uniformly as possible, various scenes and traffic density can be covered, and the generalization capability of the system is improved. And then manually marking the selected picture to mark the vehicle information on the corresponding picture lane.
Step S222, constructing a pre-training network, and constructing a training network on the basis of the pre-training network model.
(1) Pre-training network
In this embodiment, the pre-training network constructs a two-class network model in which N layers of convolution layers are connected with one layer of full connection layer, only judges whether the picture is a vehicle class, and respectively outputs the probabilities of being a vehicle and not being a vehicle in the picture. Taking the network in fig. 3 as an example, N convolutional layers are constructed, and then are all connected to an output layer with 2 nodes to construct a pre-training network model.
(2) Training network
Adding a convolution layer A and a full-connection layer B on the basis of the pre-trained convolution layers N. To preserve the data space structure and reduce the network parameters, Turker mode decomposition is performed on the full link layer calculation, and then the corresponding full link layer parameters become 3 factor matrices represented as U, V, W. A detection network model is formed, the tensor of S x (5a) is output, the detection network model corresponds to Sx S squares in the picture, and 4 coordinate parameters of a bounding boxes in each square and the confidence of whether a vehicle exists are formed in the training network shown in fig. 4.
And step S223, training the constructed pre-training network.
Due to the high dimensionality of the deep-learning model network parameters, it is often required that the amount of input data must be sufficient to avoid overfitting. However, the time consumption and the cost are high when the data set is built by itself and the pictures are collected and marked manually. From the perspective of economy, applicability and feasibility, the embodiment firstly uses the related pictures in the existing ImageNet data set and Pascal VOC data set for pre-training, and then carries out training fine adjustment on the constructed traffic intersection vehicle data set, so that a better effect under the specific scene can be obtained at a lower cost.
Step S224, initializing the training network with the trained pre-training network parameter part, and training the training network with the constructed data set.
(1) Initializing training network parameters, initializing the parameters of the first N layers of convolutional layers by adopting the parameters of the corresponding first N layers of convolutional layers obtained in the pre-training process, and then randomly initializing the network parameters of the layer A convolutional layers and the layer B fully-connected layers.
(2) And (3) using a self-built traffic intersection vehicle data set as a training set of the network, unifying the size specification of pictures in the data set, and inputting the pictures into the model for training.
(3) The traffic intersection vehicle pictures are continuously convolved and pooled, and the tensor A of SxSxm is output on the last layer of convolution layer, namely the original picture is divided into SxS grids, each grid unit corresponds to one part of the original traffic intersection picture, and the picture feature in each grid corresponds to one m-dimensional vector in the tensor.
(4) Using 3 factor matrices U, V, W along different dimensions to perform modulo-3 matrix multiplication with the convolutional layer output tensor a to obtain a core tensor B.
B=A x1U x2V x3W
(5) And inputting the kernel tensor B into a nonlinear activation function to find out corresponding potential vehicle features in hidden nodes, and outputting a feature tensor Z.
Z=h(B)
The activation function h () may be a sigmoid function, a hyperbolic tangent function, or a ReLU.
(6) And outputting a tensor of SxSx (5a) by the improved B-layer full link (namely repeating the previous two steps for B times, wherein parameters of each layer are different), namely coordinates (x, y, w, h) of a vehicle detection boundary frame corresponding to each grid unit and confidence of the vehicle detected in the vehicle identification frame. Wherein x and y are coordinates of the center point of the vehicle identification frame, w and h are width and height of the vehicle identification frame respectively, and the coordinates are normalized to be between 0 and 1.
(7) And (3) adjusting network parameters to a specified precision by using a back propagation algorithm according to a loss function L (the loss function adopts a sum of squares error loss function, which is specifically described in the following) formed by errors between the output predicted value and the real vehicle marking value in the original image, namely correctly classifying the image into a vehicle class or not, and then storing the network parameters. Wherein:
the loss function uses a sum of squares error loss function, which includes 3 parts, a coordinate prediction function, a confidence prediction function for recognition boxes containing vehicles, and a confidence prediction function for recognition boxes not containing vehicles.
Wherein x, y are coordinates of the center position of the vehicle recognition frame, w, h are the width and height of the vehicle recognition frame,
to determine whether the jth identification box in the ith mesh is responsible for detection,
to determine if there is a vehicle center falling within grid i, l
coordPredicting weights for the coordinates,/
noobjThe confidence weight for the recognition box that does not contain a vehicle.
And step S225, testing the trained network.
And implementing vehicle detection, performing target detection on the vehicle in the video image, inputting the acquired intersection vehicle image into a trained detection network model, and outputting the coordinates of the vehicle in the detected image and the probability of identifying the vehicle. And setting different threshold values to adjust the recognition precision according to actual requirements.
And counting the trained domain frame bodies so as to obtain the number of vehicles on the lane as a control parameter of the traffic signal lamp at the crossroad.
And step S3, according to the obtained control parameters, on the basis of the original passing time, substituting the quantity of vehicles in each direction of the intersection as parameters into a pre-established standard for passing the traffic signal lamp at the intersection, and adjusting the passing time of the traffic signal lamp at the intersection in real time, so that the real-time control of the traffic signal lamp is realized, and the effects of improving the passing efficiency, saving energy and reducing emission are achieved. Specifically, the method comprises the following steps:
firstly, a standard for adjusting the time of traffic signal lamps at the crossroads is established.
Then, the number of vehicles on lanes in each direction identified by the intelligent traffic signal lamp at the intersection is substituted into a standard system, so that the required control result, namely the traffic light time adjusted in real time according to the current number of the vehicles, is finally output.
Referring to fig. 5, a system architecture diagram of the intelligent traffic signal control system 10 of the present invention is shown. The system comprises an image acquisition module 101, a recognition module 102 and a control module 103.
The image acquisition module 101 is used for acquiring road vehicle videos in real time by installing a camera on a traffic signal lamp, acquiring video image information of the road vehicles, acquiring roads and vehicles with targets within a certain distance in four directions of the intersection, and providing analysis data for subsequent identification. Specifically, the method comprises the following steps:
firstly, a camera is installed on a traffic signal lamp to configure a video image information acquisition device. The crossroad of the embodiment is provided with four traffic signal lamps respectively, and the four traffic signal lamps are collected through monitoring cameras arranged on each traffic signal lamp. However, it is not excluded that two traffic signal lamps are arranged at a crossing with small traffic flow in order to save cost, the strategy of installing the monitoring camera is different from that of installing four traffic signal lamps under the condition, and the front side and the back side of each traffic signal lamp are respectively provided with a camera for respectively collecting video image information of vehicles on a road.
And then, adjusting the video image information acquisition angle of the camera. And adjusting the acquisition angle of the camera so that m rows of coming lanes are presented in the video acquisition range to the maximum extent.
And finally, setting the video image information acquisition range of the camera. In the video image information acquisition process, the acquisition range distance of each camera is limited, and the limited range is set to be n meters from the lane to identify vehicles on the road within the distance range.
In the image acquisition process, the video image information acquisition range of each camera is limited, and only the image information in the range of the m rows of lanes is acquired, so that the workload in the subsequent identification process can be simplified through the preprocessing, and the identification accuracy and speed are improved. As m is 2 in fig. 2.
The identification module 102 is configured to detect the acquired video image information, so as to identify the specific number of vehicles in each direction at the intersection, and provide parameters for controlling the intelligent traffic signal lamp. That is, firstly, the vehicle in the video frame image is accurately detected in real time through the improved convolutional neural network, the vehicle identification frame is circled out, and then the identification frame in the video frame image is counted. The method comprises the following specific steps:
the recognition module 102 preprocesses the acquired video image information, only reserves the information of the lane and the vehicle thereof, and uses the processed video image information for vehicle detection. The acquisition range of video image information is limited, and on the basis, the video image is preprocessed by methods such as cutting and background removal, so that the identification accuracy can be improved. The ground color after cutting is defaulted as the road ground color so as to improve the detection accuracy of the improved convolutional neural network.
The recognition module 102 builds a pre-training network model, builds a training network on the basis of the pre-training network model, pre-trains the built pre-training network, initializes the training network with pre-trained network parameters, and trains the training network with the built data set until the number of vehicles in the corresponding lane in the picture is detected in real time. In the embodiment, the improved convolutional neural network is used for carrying out real-time vehicle detection on the preprocessed video image information, and the improved convolutional neural network is used for carrying out vehicle target detection and identification. The principle of the improved convolutional neural network is to detect video frame images within a deep network framework. The method comprises the following specific steps:
and selecting pictures of the intersection traffic conditions collected by the monitoring camera to construct a data set.
In view of the lack of related data sets at present, although some data sets such as ImageNet contain some vehicle pictures, the difference between scenes and visual angles is large, and in order to improve the system effect, the embodiment constructs a small data set by itself.
And acquiring intersection traffic condition pictures through deployed monitoring cameras at various intersections. An appropriate amount of pictures (1000-10000 in this embodiment) are selected, the pictures are selected as uniformly as possible, various scenes and traffic density can be covered, and the generalization capability of the system is improved. And then manually marking the selected picture to mark the vehicle information on the corresponding picture lane.
And constructing a pre-training network, and constructing a training network on the basis of the pre-training network model.
(1) Pre-training network
In this embodiment, the pre-training network constructs a two-class network model in which N layers of convolution layers are connected with one layer of full connection layer, only judges whether the picture is a vehicle class, and respectively outputs the probabilities of being a vehicle and not being a vehicle in the picture. Taking the network in fig. 3 as an example, N convolutional layers are constructed, and then are all connected to an output layer with 2 nodes to construct a pre-training network model.
(2) Training network
Adding a convolution layer A and a full-connection layer B on the basis of the pre-trained convolution layers N. To preserve the data space structure and reduce the network parameters, Turker mode decomposition is performed on the full link layer calculation, and then the corresponding full link layer parameters become 3 factor matrices represented as U, V, W. A detection network model is formed, the tensor of S x (5a) is output, the detection network model corresponds to Sx S squares in the picture, and 4 coordinate parameters of a bounding boxes in each square and the confidence of whether a vehicle exists are formed in the training network shown in fig. 4.
And training the pre-training network constructed in the above way.
Due to the high dimensionality of the deep-learning model network parameters, it is often required that the amount of input data must be sufficient to avoid overfitting. However, the time consumption and the cost are high when the data set is built by itself and the pictures are collected and marked manually. From the perspective of economy, applicability and feasibility, the embodiment firstly uses the related pictures in the existing ImageNet data set and Pascal VOC data set for pre-training, and then carries out training fine adjustment on the constructed traffic intersection vehicle data set, so that a better effect under the specific scene can be obtained at a lower cost.
And initializing the training network by using the trained pre-training network parameter part, and training the training network by using the constructed data set.
(1) Initializing training network parameters, initializing the parameters of the first N layers of convolutional layers by adopting the parameters of the corresponding first N layers of convolutional layers obtained in the pre-training process, and then randomly initializing the network parameters of the layer A convolutional layers and the layer B fully-connected layers.
(2) And (3) using a self-built traffic intersection vehicle data set as a training set of the network, unifying the size specification of pictures in the data set, and inputting the pictures into the model for training.
(3) The traffic intersection vehicle pictures are continuously convolved and pooled, and the tensor A of SxSx m is output on the last layer of convolution layer, namely the original picture is divided into SxS grids, each grid unit corresponds to one part of the original traffic intersection picture, and the picture feature in each grid corresponds to one m-dimensional vector in the tensor.
(4) Using 3 factor matrices U, V, W along different dimensions to perform modulo-3 matrix multiplication with the convolutional layer output tensor a to obtain a core tensor B.
B=A x1U x2V x3W
(5) And inputting the kernel tensor B into a nonlinear activation function to find out corresponding potential vehicle features in hidden nodes, and outputting a feature tensor Z.
Z=h(B)
The activation function h () may be a sigmoid function, a hyperbolic tangent function, or a ReLU.
(6) And outputting a tensor of SxSx (5a) by the improved B-layer full link (namely repeating the previous two steps for B times, wherein parameters of each layer are different), namely coordinates (x, y, w, h) of a vehicle detection boundary frame corresponding to each grid unit and confidence of the vehicle detected in the vehicle identification frame. Wherein x and y are coordinates of the center point of the vehicle identification frame, w and h are width and height of the vehicle identification frame respectively, and the coordinates are normalized to be between 0 and 1.
(7) And (3) adjusting the network parameters to a specified precision by using a back propagation algorithm according to a loss function L (the loss function adopts a sum of squares error loss function, which is specifically described in the following) formed by the error between the output predicted value and the real marked value of the vehicle in the original image, and then storing the network parameters. Wherein:
the loss function uses a sum of squares error loss function, which includes 3 parts, a coordinate prediction function, a confidence prediction function for recognition boxes containing vehicles, and a confidence prediction function for recognition boxes not containing vehicles.
Wherein x, y are coordinates of the center position of the vehicle recognition frame, w, h are the width and height of the vehicle recognition frame,
to determine whether the jth identification box in the ith mesh is responsible for detection,
to determine if there is a vehicle center falling within grid i, l
coordPredicting weights for the coordinates,/
noobjThe confidence weight for the recognition box that does not contain a vehicle.
And testing the trained network.
And implementing vehicle detection, performing target detection on the vehicle in the video image, inputting the acquired intersection vehicle image into a trained detection network model, and outputting the coordinates of the vehicle in the detected image and the probability of identifying the vehicle. And setting different threshold values to adjust the recognition precision according to actual requirements.
And counting the trained domain frame bodies so as to obtain the number of vehicles on the lane as a control parameter of the traffic signal lamp at the crossroad.
The control module 103 is used for substituting the number of vehicles in each direction of the intersection as parameters into a pre-established intersection traffic signal lamp passing standard on the basis of the original passing time according to the obtained control parameters, adjusting the passing time of the intersection traffic signal lamp in real time, realizing the real-time control of the traffic signal lamp, and achieving the effects of improving the passing efficiency, saving energy and reducing emission. Specifically, the method comprises the following steps:
firstly, a standard for adjusting the time of traffic signal lamps at the crossroads is established.
Then, the number of vehicles on lanes in each direction identified by the intelligent traffic signal lamp at the intersection is substituted into a standard system, so that the required control result, namely the traffic light time adjusted in real time according to the current number of the vehicles, is finally output.
Compared with the traditional detection control method, the intelligent traffic signal lamp control method and the intelligent traffic signal lamp control system provided by the invention have the following advantages:
(1) the invention has the advantages of flexible installation and configuration, low cost, and better applicability and popularization.
(2) The improved convolutional neural network greatly improves the analysis speed and efficiency of vehicle identification, and can detect real-time pictures of traffic intersections. The improved convolutional neural network is an improvement of introducing Turker decomposition into a CNN network full connection layer and designing a network structure aiming at a target detection task, can efficiently detect vehicles in a picture in real time, well distinguishes a detected target and a background, and has the advantage of high recognition speed.
(3) The method has higher robustness in the practical application process. The judgment basis for adjusting the signal lamp passing time at the crossroad is determined by the quantity of vehicles in a certain distance on lanes in all directions of the crossroad. Therefore, the traffic signal lamp at the intersection is used as a system, the algorithm identification accuracy rate allows certain false detection rate to exist to some extent, and the system robustness is strong.
(4) Simplifying a complex traffic control model. Under the condition of identifying the quantity of vehicles in each direction of the intersection in real time, statistics on vehicle identification results is added on the basis of vehicle identification, the traffic control model of the intersection is simplified, if all detection data provided with the system are further aggregated, the urban traffic can be easily optimized and scheduled, and the complex problem of the smart urban traffic is simplified.
Although the present invention has been described with reference to the presently preferred embodiments, it will be understood by those skilled in the art that the foregoing description is illustrative only and is not intended to limit the scope of the invention, as claimed.