CN111444975B

CN111444975B - Traffic light identification method based on image processing and deep learning

Info

Publication number: CN111444975B
Application number: CN202010255239.9A
Authority: CN
Inventors: 车明亮; 王英利; 王晓文; 张驰; 曹鑫亮
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2024-02-23
Anticipated expiration: 2040-04-02
Also published as: CN111444975A

Abstract

The invention discloses a traffic light identification method based on image processing and deep learning, which comprises the following steps: (1) color space transforming and extracting candidate regions; (2) target boundary processing and extraction of candidate boxes; (3) size noise processing; (4) overlapping treatment; (5) texture noise processing; (6) classifying candidate boxes in parallel; (7) post-treatment. The invention can effectively improve the accuracy of traffic light identification, shortens the running time of the model and reduces the volume of the model.

Description

Traffic light identification method based on image processing and deep learning

Technical Field

The invention relates to the field of intelligent traffic, in particular to a traffic light identification method based on image processing and deep learning.

Background

Traffic light identification in urban environments is critical to vehicle control and assisting visually impaired pedestrians to cross intersections. Much research is directed to identifying traffic lights and is categorized into two broad categories, namely two-stage methods and one-stage methods. For the former, firstly, a suggested area needs to be extracted, and then a classifier is used for classifying the suggested area; there are many methods for extracting the suggested regions: sliding window is the simplest and most time-consuming method, and then rapid extraction methods such as selective search and edge detection are proposed; however, these methods are insensitive to objects of smaller size and do not provide good segmentation for the individual. Other studies have used spotlights to detect color and shape features, map information, etc. to extract the suggested regions, however these methods often require powerful assumptions and may create suggested region redundancy, thereby increasing runtime costs.

In a two-stage approach, typical classifiers include template matching, support vector machines, hidden Markov models, and deep learning. 1) Template matching is simple and easy to operate, but the classification precision of the template is closely related to the quality of the template; 2) The support vector machine has better classification performance in a small sample training set, but when training data become large, the support vector machine becomes very time-consuming, and the precision is very limited when the problem of multi-classification is solved; 3) The hidden markov model may help to determine the current detected traffic light state based on the obtained state, but its recognition accuracy is low; 4) Convolutional neural networks in deep learning are widely used for image classification and achieve good classification effects, but deep convolutional neural networks generally have huge model parameters, and model files of the deep convolutional neural networks generally occupy a large storage space.

In contrast to the two-stage approach, the one-stage approach can directly output the target class and corresponding location, such as YOLO and SSD models, without using any prior knowledge of the target location. 1) YOLO is a unified, real-time target detector, however YOLO generates more position errors and needs to be further improved in accuracy; 2) SSDs completely eliminate extraction advice areas and subsequent feature resampling, which makes SSDs faster in identifying objects and easy to train and optimize, but SSDs are not good at or even unable to identify small-sized objects, and moreover SSDs, like YOLO, are very bulky in their model.

In general, the two-stage method has higher recognition accuracy and longer run time, while the one-stage method has lower recognition accuracy and shorter run time. In both of these approaches, deep learning exhibits better performance, but its model is typically very bulky and requires high computer hardware. This is very disadvantageous for conventional devices that have low memory and a weak processor. In addition, deep learning is difficult to be compatible with accuracy and time efficiency.

As can be seen, existing traffic light identification methods still face challenges in terms of accuracy, run time, and model size. If these shortcomings are not further improved, the existing traffic light identification method is difficult to ensure the accuracy and timeliness of the existing traffic light identification method on the application level, so that the method has defects in practicality, which definitely limits the application and development of the traffic light identification technology in the related fields. Therefore, the above problems need to be solved.

Disclosure of Invention

The invention aims to solve the technical problem of providing a traffic light identification method based on image processing and deep learning, which can effectively improve the accuracy of traffic light identification, shorten the running time of a model and reduce the volume of the model.

In order to solve the technical problems, the invention adopts the following technical scheme: the invention relates to a traffic light identification method based on image processing and deep learning, which is characterized by comprising the following steps:

(1) Color space transformation and extraction of candidate regions: performing color conversion on the input image, extracting the area where the traffic light is located according to the color of the traffic light, and generating a candidate area;

(2) Target boundary processing and extraction candidate boxes: performing binarization processing on the generated candidate areas, estimating the outlines of the traffic lights in the candidate areas by using an outline detection algorithm, and calculating corresponding bounding boxes to obtain target candidate frames;

(3) Size noise processing: adopting a traffic light size experience estimation method to eliminate size noise of the candidate frames generated by the steps;

in the above steps, the calculation formula of the traffic light size experience estimation method is as follows:

bw′＝p·y+q+t

wherein bw' is the frame width to be estimated, y is the ordinate of the candidate frame, p and q are linear coefficients, and t is the tolerance; when bw is smaller or larger than bw', the candidate frame is noise, and elimination is performed;

(4) Overlapping: the plurality of candidate frames are covered around the same traffic light, and overlapping candidate frames are deleted by adopting an intersection comparison method;

in the above steps, the calculation formula of the cross-correlation method is:

wherein B is a candidate frame, and subscripts i and j are numbers of the candidate frame;

(5) Texture noise processing: adopting fractal dimension and R ratio to eliminate texture noise of the candidate frame generated by the steps;

in the above steps, the fractal dimension adopts a box counting method, and the calculation formula is as follows:

wherein F is a fractal dimension, N _r Is the minimum number of boxes at the scale r; before calculating the F value, the image in the candidate frame needs to be firstly subjected to Sobel filtering and binarization processing;

the calculation formula of the R ratio is:

R＝N _w /(N _w +N _b )

wherein R is a ratio, N _b And N _w Respectively representing the number of black and white pixels;

eliminating the background image which does not contain traffic lights through the F value threshold interval and the R value threshold interval;

(6) Parallel classification of candidate boxes: after the processing of the steps, classifying the rest candidate frames by using a classifier and a parallelization processing technology, and generating classification frames;

(7) Post-treatment: and (3) performing post-processing on the classification frame processed by the steps to obtain an accurate classification image and an accurate target position.

Preferably, in the step (1), the color is converted from Red-Green-Yellow mode to Hue-Saturation-Value mode; and under the Hue-Saturation-Value mode, extracting the area where the traffic light is positioned by using a color threshold method, and generating a candidate area.

Preferably, in the step (1), the threshold value of red is: h is more than or equal to 0 _red1 ＜15、115≤s _red1 ≤255、115≤v _red1 255 and 165 h _red2 ≤180、120≤s _red2 ≤255、90≤v _red2 255 or less; green colourThe threshold value of (2) is: h is 55 to or less _green ≤90、60≤s _green 255 and 90 v _green 255 or less; the threshold for yellow is: h is 15 to or less _yellow ≤25、195≤s _yellow 255 and 205 v _yellow ≤255。

Preferably, in the step (2), the contour detection algorithm uses a chain_appox_simple algorithm, and the size of the candidate frame depends on the width and height of the bounding box, and its calculation formula is:

wherein bw and bh are the width and height of the candidate frame, respectively; w and h are the width and height of the bounding box, respectively; the parameter c is the ratio of the width of the traffic light to the width of the lamp holder frame, and is set to c=2.0; the scaling factor k is used to expand the background pixel information around the traffic light, and is set to k=1.0 to 1.5.

Preferably, in the step (6), the classifier uses a lightweight deep convolutional network SqueezeNet model; the core module of the lightweight deep convolutional network SqueezeNet model is a fire module, and consists of a compressed convolutional layer and an expansion layer, wherein the size of an input image is set to 64 multiplied by 64 pixels, each layer of channels keeps the original value of the input image, the category number is set to 4, the learning rate is set to 0.0001, and all the input images need to be subjected to equalization processing before the lightweight deep convolutional network SqueezeNet model is trained.

Preferably, in the step (6), the parallelization processing technology is implemented by adopting a multi-process pool technology and combining a mapping method, an ending method and a joining method.

Preferably, in the step (7), a non-maximum suppression method is used for frame optimization, and a traffic light size experience estimation method is used for fine adjustment.

The invention has the beneficial effects that:

(1) The invention adopts the image processing technology to carry out the detection of the proposal area, has the characteristics of simplicity, actual effect and easy operation, and can extract the image of the area where the traffic light is positioned as much as possible, thereby ensuring the recognition precision of the traffic light in the detection of the proposal area;

(2) The invention removes the invalid suggestion region to the greatest extent through the size noise processing and the texture noise processing, and can effectively reduce the redundant suggestion region through the overlapping processing, thereby improving the overall running time efficiency;

(3) The invention adopts the lightweight deep convolutional neural network SquezeNet model to classify, and can furthest reduce the model volume on the premise of ensuring the classification precision of the proposal area;

(4) The invention adopts parallelization processing technology in the classifying process, which can effectively shorten the time for classifying the recommended section, thereby reducing the total running time cost of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a traffic light recognition method based on image processing and deep learning.

Fig. 2 is a schematic diagram of a candidate box detection process in an embodiment of the invention.

Fig. 3 is a diagram of the actual effect of an embodiment of the invention on traffic light identification on different data sets.

Detailed Description

The technical scheme of the present invention will be clearly and completely described in the following detailed description.

The invention relates to a traffic light identification method based on image processing and deep learning, which is shown in fig. 1 and comprises the following steps:

(1) Color space transformation and extraction of candidate regions: performing color conversion on an input image (shown in fig. 2 a), extracting an area where a traffic light is located according to the color of the traffic light, and generating a candidate area;

in the above steps, the color conversion is mainly the conversion from Red-Green-Yellow mode to Hue-Saturation-Value mode, but is not limited thereto;

in the above steps, the specific traffic light colors are red, green and yellow; under the Hue-Saturation-Value mode, extracting the area where the traffic light is located by using a color threshold method, and generating a candidate area; wherein, the threshold value of red is: h is more than or equal to 0 _red1 ＜15、115≤s _red1 ≤255、115≤v _red1 255 and 165 h _red2 ≤180、120≤s _red2 ≤255、90≤v _red2 255 or less; the threshold for green is: h is 55 to or less _green ≤90、60≤s _green 255 and 90 v _green 255 or less; the threshold for yellow is: h is 15 to or less _yellow ≤25、195≤s _yellow 255 and 205 v _yellow ≤255。

The color threshold is mainly obtained by statistics of traffic light training samples; in practice, the color threshold may be slightly adjusted, thereby extracting as many traffic light areas as possible, as shown in figure 2b,

(2) Target boundary processing and extraction candidate boxes: performing binarization processing on the generated candidate region, estimating the contour of the traffic light of the candidate region by using a contour detection algorithm, and calculating a corresponding bounding box to obtain a target candidate frame, as shown in fig. 2 c;

in the above steps, the contour detection algorithm mainly uses the CHAIN_APPROX_SIMPLE algorithm, but is not limited thereto;

in the above step, the size of the candidate frame depends on the width and height of the bounding box, and the calculation formula is:

wherein bw and bh are the width and height of the candidate frame, respectively; w and h are the width and height of the bounding box, respectively; the parameter c is the ratio of the width of the traffic light to the width of the lamp holder rim, typically set to c=2.0; the scaling factor k is used to expand the background pixel information around the traffic light, and is set to k=1.0 to 1.5, which helps to determine whether the candidate area contains traffic lights.

(3) Size noise processing: some of the candidate frames generated by the steps contain traffic lights, some are size noises, and the size noises of the candidate frames generated by the steps are eliminated by adopting a traffic light size experience estimation method;

in the above steps, size noise refers to those candidate boxes that can be clearly determined as non-traffic lights due to improper size;

in the above steps, the traffic light dimension experience estimation method accords with the rule of 'near big far small' in perspective principle, and the calculation formula is as follows:

bw′＝p·y+q+t

wherein bw' is the frame width to be estimated, y is the ordinate of the candidate frame, p and q are linear coefficients, and t is the tolerance; when bw is less than or greater than bw' as shown in fig. 2d, the candidate box is noise, and through this process, noise boxes which are significantly smaller or greater than normal size can be eliminated, and at the same time, some indistinguishable interference items, such as traffic light countdown, will be eliminated, as shown in fig. 2 d.

(4) Overlapping: in the above step, a plurality of candidate frames may exist around the same traffic light to overlap each other, and overlapping candidate frames are deleted by an overlap comparison method;

in the above steps, the calculation formula of the cross-correlation method is:

wherein B is a candidate frame, and subscripts i and j are numbers of the candidate frame; as shown in fig. 2e, a large number of redundant overlapping candidate boxes are eliminated through this process.

(5) Texture noise processing: some candidate frames processed by the steps are texture noise, and the fractal dimension and the R ratio are adopted to eliminate the texture noise of the candidate frames generated by the steps;

in the above steps, texture noise refers to those candidate boxes that can be obviously determined as non-traffic lights in terms of texture;

in the above step, the R ratio is the ratio of the white pixels to the total pixels in the binary image, and the calculation formula is:

R＝N _w /(N _w +N _b )

texture differences between traffic light images and non-traffic light images can be quantified through the F value and the R value, and background images which do not contain traffic lights can be effectively removed through the corresponding threshold intervals, as shown in fig. 2F.

(6) Parallel classification of candidate boxes: after the processing of the steps, classifying the rest candidate frames by using a classifier and a parallelization processing technology, and generating a classification frame of the marked classification label;

in the above steps, the candidate frame state mainly includes background, red light, green light and yellow light, but is not limited thereto;

in the step, a lightweight deep convolution network SqueezeNet model is adopted for the classifier; the core module of the lightweight deep convolutional network SqueezeNet model is a fire module, and consists of a compressed convolutional layer and an expansion layer, so that the convolutional neural network can keep equal accuracy on the finite parameter calculation; the size of an input image of the lightweight deep convolutional network SqueezeNet model is set to 64×64 pixels, each layer of channels keeps the original value, and the category number is set to 4, but the method is not limited to the method; the learning rate is set to 0.0001, but is not limited thereto; before training a lightweight deep convolutional network SqueezeNet model, all input images need to be subjected to equalization treatment;

in the above steps, the parallelization processing technology is completed by adopting a multi-process pool technology and combining with methods of mapping, ending, adding and the like, so that the classification time of the classification frame is shortened.

In the above step, the classification frames may still have a certain overlap, so a non-maximum suppression method is used for frame optimization; meanwhile, to further optimize the classification box boundary, it is fine-tuned using a traffic light size empirical estimation method.

The actual effect of traffic light identification on different data sets is shown in fig. 3, wherein a diagram a is a self-collected data set, and a diagram b is a LaRA data set;

data verification shows that the invention has advantages in the aspect of traffic light recognition performance; wherein, 1) in the aspect of recognition precision, the invention has higher precision and recall rate; 2) In terms of time efficiency, the invention adopts a classifier and a parallelization processing technology to classify, the frame frequency of the overall running time is more than 20 frames, and the frame frequency is 5-10 frames when a single-core CPU single process is adopted; 3) In terms of model volume, the invention adopts an image processing technology in candidate frame detection, and the model code volume is very small and is lower than 1MB; adopting a lightweight deep convolutional network SqueezeNet model in candidate frame classification, wherein the model code comprises a total space size occupied by a weight file which is lower than 8MB; thus, the total file occupation space size of the present invention is less than 10MB.

The invention has the beneficial effects that:

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the present invention is not limited to the above embodiments, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention without departing from the design concept of the present invention should fall within the protection scope of the present invention, and the claimed technical content of the present invention is fully described in the claims.

Claims

1. The traffic light identification method based on image processing and deep learning is characterized by comprising the following steps of:

bw′＝p·y+q+t

in the above steps, the calculation formula of the cross-correlation method is:

in the step (5), the fractal dimension adopts a box counting method, and the calculation formula is as follows:

the calculation formula of the R ratio is:

R＝N _w /(N _w +N _b )

2. The traffic light identification method based on image processing and deep learning according to claim 1, wherein: in the step (1), the color is converted from Red-Green-Yellow mode to Hue-Saturation-Value mode; and under the Hue-Saturation-Value mode, extracting the area where the traffic light is positioned by using a color threshold method, and generating a candidate area.

3. The traffic light identification method based on image processing and deep learning according to claim 2, wherein: in the step (1), the threshold value of red is: h is more than or equal to 0 _red1 ＜15、115≤s _red1 ≤255、115≤v _red1 255 and 165 h _red2 ≤180、120≤s _red2 ≤255、90≤v _red2 255 or less; the threshold for green is: h is 55 to or less _green ≤90、60≤s _green 255 and 90 v _green 255 or less; the threshold for yellow is: h is 15 to or less _yellow ≤25、195≤s _yellow 255 and 205 v _yellow ≤255。

4. The traffic light identification method based on image processing and deep learning according to claim 1, wherein: in the step (2), the contour detection algorithm adopts the CHAIN_APPROX_SIMPLE algorithm, and the size of the candidate frame depends on the width and the height of the bounding box, and the calculation formula is as follows:

5. The traffic light identification method based on image processing and deep learning according to claim 1, wherein: in the step (6), the classifier adopts a lightweight deep convolutional network SqueezeNet model; the core module of the lightweight deep convolutional network SqueezeNet model is a fire module, and consists of a compressed convolutional layer and an expansion layer, wherein the size of an input image is set to 64 multiplied by 64 pixels, each layer of channels keeps the original value of the input image, the category number is set to 4, the learning rate is set to 0.0001, and all the input images need to be subjected to equalization processing before the lightweight deep convolutional network SqueezeNet model is trained.

6. The traffic light identification method based on image processing and deep learning according to claim 1, wherein: in the step (6), the parallelization processing technology is completed by adopting a multi-process pool technology and combining with a mapping, ending and adding method.

7. The traffic light identification method based on image processing and deep learning according to claim 1, wherein: in the step (7), a non-maximum suppression method is used for frame optimization treatment, and a traffic light size experience estimation method is used for fine adjustment.