CN110826457A

CN110826457A - Vehicle detection method and device under complex scene

Info

Publication number: CN110826457A
Application number: CN201911050728.4A
Authority: CN
Inventors: 张焕芹; 罗国慧; 毛士杰
Original assignee: Science And Technology Ltd Of Upper Hiroad Army
Current assignee: Science And Technology Ltd Of Upper Hiroad Army
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2020-02-21
Anticipated expiration: 2039-10-31
Also published as: CN110826457B

Abstract

The invention discloses a vehicle detection method and a device under a complex scene, which comprises the following steps: processing the self-collected vehicle image according to the generated countermeasure network to generate a transformed vehicle image; generating an on-line shielded vehicle image by performing self-adaptive shielding on the vehicle image, and expanding the on-line shielded vehicle image on line to form a shielding module; processing the vehicle image after online expansion by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, fusing feature information with different scales, inputting the feature information into a target detection network, and performing target classification and accurate frame regression to obtain a first vehicle detection network; adding a shielding module consisting of an on-line shielded vehicle image expansion network, and training a first vehicle detection network to form a second vehicle detection network; and detecting the vehicle by utilizing a second vehicle detection network. The invention improves the detection precision and is more suitable for the detection of the shielded target.

Description

Vehicle detection method and device under complex scene

Technical Field

The invention relates to the technical field of vehicle detection, in particular to a vehicle detection method and device under a complex scene.

Background

In both the traditional vision field and the computer vision field, target detection and identification are always one of the most fierce and hot tasks. The target detection task is to classify and identify all targets in one image, and can accurately position different targets with any shape and size at any position in the image. With the continuous development of deep learning technology in recent years, the target detection and identification algorithm is also shifted to the detection technology based on the deep neural network from the traditional detection algorithm based on manual features. On the basis of the target detection technology based on deep learning, various new methods are emerging continuously, and the method can be divided into a one-stage series method based on a region suggestion box and a two-stage series method based on regression according to the processing mode.

The two-stage series method is mainly divided into two steps to complete the target detection task. Firstly, automatically selecting a region with a possibility of having a target in an image to generate a region candidate frame, wherein the process mainly comprises the steps of generating a series of frames with different sizes and scales, extracting features according to the image in the frames, carrying out background-foreground secondary classification, screening the frames classified as foreground, and generating a final candidate frame; then, the image in the candidate frame is judged according to the detailed category and the position of the candidate frame is adjusted. The detection precision is high, but the detection speed is reduced. The method for detecting the small dense targets in the image in the first stage does not need a network to extract candidate frames in advance, the classification and positioning are uniformly regarded as the regression problem of the image to solve, and compared with a method in a second stage series, the method has the advantages of obvious speed and higher detection precision, but the method has a poor effect on detecting the small dense targets.

Disclosure of Invention

The invention provides a vehicle detection method and a device under a complex scene aiming at the problems in the prior art, and compared with a public data set, the method and the device have diversity and complexity, the detection precision is greatly improved, the method and the device are more suitable for detecting the shielded target, and the detection effect on the small target is improved through a multi-scale fused target detection network.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention provides a vehicle detection method under a complex scene, which comprises the following steps:

s11: processing the self-collected vehicle image according to the generated countermeasure network to generate a transformed vehicle image;

s12: inputting the transformed vehicle image in the S11 into a target detection network, generating an online occluded vehicle image by performing adaptive occlusion on the input image, and performing online expansion on the transformed vehicle image in the S11 according to an online occluded vehicle image expansion technology to form an occlusion module;

s13: processing the vehicle image after online expansion in the S12 by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, obtain feature layers with different scales, fusing feature information in the feature layers with different scales, inputting the fused feature information into the target detection network in the S12, and performing target classification and accurate frame regression to obtain a first vehicle detection network;

s14: adding an occlusion module consisting of an online occluded vehicle image expansion network in the S12, training the first vehicle detection network formed in the S13, and returning data of the generated training sample to form a second vehicle detection network;

s15: and detecting the vehicle by using the second vehicle detection network formed by the S14.

Preferably, the S11 further includes: and carrying out de-duplication processing on the training samples in the vehicle detection network.

Preferably, the adaptively blocking the input image in S12 further includes: adding a full connection layer and a shielding mask layer on the last layer of feature map of the target detection network; and carrying out convolution on the occlusion mask layer and the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input so as to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network.

Preferably, the feature layers of different scales in S13 include: the VGG16 convolutional neural network comprises four feature layers with different scales, a large target for predicting the feature information of a high layer is realized, and a small target for predicting the feature information of a bottom layer is realized.

Preferably, the S14 further comprises: and a shielding module formed by an online shielded vehicle image expansion network is used for respectively shielding the characteristic layer of each scale partially, so that self-adaptive shielding of targets of different scales is realized, shielding of a large target on a high-resolution characteristic layer is realized, and shielding of a small target on a low-resolution characteristic layer is realized.

The invention also provides a vehicle detection device under the complex scene, which comprises: the system comprises a self-collected vehicle image processing module, a shielding module generating module, a first vehicle detection network forming module, a second vehicle detection network forming module and a vehicle detection module; wherein the content of the first and second substances,

the self-collected vehicle image processing module is used for processing the self-collected vehicle images according to the generated countermeasure network to generate transformed vehicle images;

the shielding module generation module is used for inputting the vehicle images which are converted in the self-collected vehicle image processing module into a target detection network, generating on-line shielded vehicle images by carrying out self-adaptive shielding on the input images, and carrying out on-line expansion on the vehicle images which are converted in the self-collected vehicle image processing module according to an on-line shielded vehicle image expansion technology to form a shielding module;

the first vehicle detection network forming module is used for processing the vehicle image after online expansion in the shielding module generating module by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, obtain feature layers with different scales, fuse feature information in the feature layers with different scales, input the fused feature information into a target detection network in the shielding module generating module, and perform target classification and accurate frame regression to obtain a first vehicle detection network;

the second vehicle detection network forming module is used for adding a shielding module which is formed by an online shielded vehicle image expansion network in the shielding module generating module, training the first vehicle detection network formed by the first vehicle detection network forming module, and returning data of the generated training sample to form a second vehicle detection network;

the vehicle detection module is used for detecting the vehicle by utilizing the second vehicle detection network formed by the second vehicle detection network forming module.

Preferably, the self-collected vehicle image processing module is further configured to perform deduplication processing on training samples in the vehicle detection network.

Preferably, the second vehicle detection network forming module is further configured to add a full connection layer and a shielding mask layer on the last layer of feature map of the target detection network; and carrying out convolution on the occlusion mask layer and the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input so as to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network.

Preferably, the different-scale feature layers in the first vehicle detection network formation module include: the VGG16 convolutional neural network comprises four feature layers with different scales, a large target for predicting the feature information of a high layer is realized, and a small target for predicting the feature information of a bottom layer is realized.

Preferably, the second vehicle detection network forming module is further configured to perform partial shielding on the feature layer of each scale by using a shielding module formed by an online shielded vehicle image expansion network, so as to implement adaptive shielding on targets of different scales, further implement shielding on a large target on a high-resolution feature layer, and shield a small target on a low-resolution feature layer.

Compared with the prior art, the invention has the following advantages:

(1) according to the vehicle detection method and device under the complex scene, the self-built data set contains a large number of images generated through the countermeasure network and the shielded target images, so that the method and device are more diverse and complex compared with the public data set, the detection precision is greatly improved, and the method and device are more suitable for detecting the shielded target;

(2) according to the vehicle detection method and device under the complex scene, disclosed by the invention, through multi-scale feature fusion, a large target can be predicted by using high-level feature information with a large receptive field, and a small target can be predicted by using bottom-level feature information with a small receptive field, so that fusion of high-level semantic information and bottom-level detail information is completed, and the detection precision is further improved;

(3) according to the vehicle detection method and device under the complex scene, the shielding module formed by the on-line shielded vehicle image expansion network is used for respectively shielding the characteristic layer of each scale partially, and due to the fact that different characteristic layers have different target sensitivities, the shielding of a large target on a high-resolution characteristic layer and the shielding of a small target on a low-resolution characteristic layer can be achieved, and the detection performance of shielded vehicles is improved.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings:

FIG. 1 is a flow chart of a vehicle detection method in a complex scenario according to an embodiment of the present invention;

FIG. 2 is a vehicle image before the generation of the countermeasure network augmentation based on deep convolution;

FIG. 3 is a corresponding image 1 of random noise generation;

FIG. 4 is a corresponding image 2 of random noise generation;

FIG. 5 is an online occluded original vehicle image based on a reinforcement learning mechanism;

FIG. 6 is an occlusion gray scale map automatically generated by the network;

FIG. 7 is the occlusion result;

FIG. 8 is an original image of the occluded large target;

FIG. 9 shows the detection results of a conventional SSD 300;

FIG. 10 shows the results of a conventional YOLOv3 assay;

FIG. 11 shows the results of the conventional fast R-CNN assay;

FIG. 12 shows the results of the conventional Cascade R-CNN assay;

FIG. 13 is a detection result of a vehicle detection method in a complex scenario according to an embodiment of the present invention;

FIG. 14 is an original image of occluded small targets;

FIG. 15 shows the detection results of a conventional SSD 300;

FIG. 16 shows the results of a conventional YOLOv3 test;

FIG. 17 shows the results of the conventional fast R-CNN assay;

FIG. 18 shows the results of the conventional Cascade R-CNN assay;

fig. 19 shows a detection result of the vehicle detection method in a complex scene according to an embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

Fig. 1 is a flowchart illustrating a vehicle detection method in a complex scenario according to an embodiment of the present invention.

Referring to fig. 1, the vehicle detection method in the complex scene of the present embodiment includes the following steps:

s12: inputting the transformed vehicle image in S11 into a target detection network, generating an online occluded vehicle image by performing adaptive occlusion on the input image, and performing online expansion on the transformed vehicle image in S11 according to an online occluded vehicle image expansion technology to form an occlusion module;

s13: processing the vehicle image after online expansion in the S12 by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, obtain feature layers with different scales, fusing feature information in the feature layers with different scales, inputting the fused feature information into a target detection network in the S12, and performing target classification and accurate frame regression to obtain a first vehicle detection network;

s15: the vehicle is detected using the second vehicle detection network formed at S14.

In an embodiment, S11 specifically includes: and the vehicle image expansion of the countermeasure network is generated based on the depth convolution, so that stable training can be completed on a deeper generation model with higher resolution. Another implementation is vehicle image augmentation based on a cyclic generation countermeasure network, which can achieve inter-transformation between images of two domains.

In an embodiment, S12 specifically includes: the method comprises the steps of generating an optimal shielding mask for a target in an image, realizing online expansion of difficult samples, wherein the samples can cause false detection of a target detection network; meanwhile, the difficult samples can be sent to a target detection network again for training, and the detection performance of the difficult samples is improved. The two networks are mutually confronted and learned, so that the performance of the target detection network is improved while a large number of difficult samples are generated.

In an embodiment, S13 specifically includes: the target detection is carried out by adopting four feature maps with different scales in a VGG16 network, a large object is predicted by using high-level feature information with a large receptive field, and a small target is predicted by using low-level feature information with a small receptive field. The invention introduces a multi-scale feature fusion module to complete the fusion of high-level semantic information and low-level detail information and improve the task of detection precision.

In an embodiment, S14 specifically includes: and an occlusion mask module is introduced to further train the network, so that the problem of detection of occluded targets is solved, and the target detection precision is further improved. The network is respectively applied to four different feature layers in parallel, partial shielding is carried out on the feature map on each scale, self-adaptive shielding of targets with different scales is achieved, meanwhile, data returning is carried out on the difficult samples generated on line, and the detection capability of the network on the shielded targets is improved. The network is distributed over four profiles in parallel. Because the four characteristic layers have different sensibility to the targets with different scales in the original image and the four shielding mask layers act independently, the module can shield the large target on the high-resolution characteristic layer and shield the small target on the low-resolution characteristic layer.

In a preferred embodiment, S11 specifically includes: the size of an original data collection and interception target is 96 multiplied by 48, all training data do not need to be preprocessed, the size of an image output generated by a generation network G is converted to [ -1,1], a mini-batch SGD is adopted in the network training process to carry out gradient descent training, the size of the image batch input into the network each time is 128, all input parameters adopt an initialization mode that the average value is 0 and the normalization is 0.02. And (3) carrying out network optimization by using a modified Adam optimizer, carrying out fine adjustment on parameters, wherein the learning rate is modified from 0.001 to 0.0002, and carrying out deduplication processing on the training sample in order to prevent overfitting of the model, namely that the model remembers some simple input features to generate similar pictures, namely images with higher similarity in the training sample are removed.

In a preferred embodiment, S12 specifically includes: the network structure is that a full connection layer and a shielding mask layer are added on the last layer of feature map of the target detection network, and for an input original vehicle image, the original feature map is directly shielded by using the binary shielding mask layer, so that a corresponding feature map is formed. Its loss function expression

Wherein i and j represent horizontal and vertical coordinates on the feature map respectively, n represents the number of training sample pairs, d represents the dimension of the feature map, and X^pRepresents the p-th image of the n original images, and A () represents the characteristic image X of the network pair^pThe occlusion feature map obtained by the occlusion operation is carried out,

and (i, j) position output representing the binary shielding mask is in a value range of 0 or 1.

In a preferred embodiment, S13 specifically includes: and performing 3 × 3 convolution operation with the stride of 1 twice on the features in the candidate frame, and further performing feature extraction while reducing the scale of the feature map. 4 x 4 deconvolution with step of 2 is carried out on the characteristic information of the high-level candidate frame^[55]And operation, generating a feature map with the same size as the previous layer, summing based on pixel values, adding an activation function and convolution operation to the summation result, and ensuring that the fused features have identifiability. The size of the deconvolution kernel is set according to the image sizes of the four characteristic layers of the VGG network. Assuming that the sizes of the feature maps of the last two layers are m × m and n × n respectively, the sizes of the feature maps generated after two times of 3 × 3 convolution with the step size of 1 are (m-2) × (m-2) and (n-2) × (n-2), and the size of the feature map obtained by deconvolution of the high-layer features with the step size of 4 × 4 of 2 is (2n-2) × (2 n-2). Because the sizes of the four feature maps in the selected VGG network are obtained through the pooling operation in sequence, if m is 2n, the feature maps with the same scale are generated through two operations after the deconvolution of the high-level features and the convolution of the low-level features, and the feature maps can be directly summed based on the pixel values.

In a preferred embodiment, S14 specifically includes: the shielding and shielding mask module is used for pre-training other networks, fixing parameters, accessing the shielding and shielding mask module, carrying out forward propagation on all data in the data set, wherein the training process is the same as that of the previous chapter, training is carried out by utilizing the image and shielding mask data pair, the parameters of the shielding and shielding mask module are trained independently in the process, and the two networks are jointly trained. After the input image passes through the shielding mask module, a shielding characteristic graph is generated, the input image is distinguished by using a loss function, a classification result is compared with a GT label of the input image, back propagation is carried out, and all parameters in the two networks are finely adjusted.

In order to verify that the method provided by the embodiment is more suitable for detecting an occluded target and to prove the effectiveness of the self-established data set of the embodiment, a network model trained by the self-established data set of the embodiment is compared and evaluated with a network model trained by public data sets (VOC07+12 and COCO), wherein the comparison algorithms are SSD300, YOLov3, Faster R-CNN (VGG-16 framework) and Cascade R-CNN.

TABLE 1 training results for different data sets for different methods

As can be seen from the data in table 1, the detection accuracy of different methods in the COCO data set is much lower than that in the VOC data set because the COCO data set contains images acquired in more categories of real scenes relative to the VOC data set, and at the same time, there are many images of each target and many targets in each image, so the network model obtained by training the COCO data set has much lower detection accuracy relative to the network model obtained by training the VOC data set; because the self-built data set of the invention is provided with a large number of images generated by GAN and occluded target images, the detection precision of the comparison method under the data set is generally reduced compared with the COCO data set, but the method of the invention is greatly improved, which shows that the self-built data set of the invention has more diversity and complexity compared with the public data set, and the method of the invention is more suitable for detecting the occluded target.

For a more intuitive display, reference is made to fig. 2-19, which are illustrations of an example process for vehicle detection in complex scenes using the above method, wherein: FIG. 2 is an example of the generation of an image of a vehicle prior to augmentation of a countermeasure network based on deep convolution; fig. 3 is a corresponding image 1 of random noise generation in this example, which increases the difficulty of detection compared to fig. 2. Fig. 4 is an image 2 of random noise generation in this example, which is more difficult to detect than the random noise generated in fig. 3.

FIG. 5 is an online occluded original vehicle image based on a reinforcement learning mechanism in this example; FIG. 6 is an occlusion gray scale map automatically generated by the network in this example; FIG. 7 is the occlusion result for this example, which generates the optimal occlusion image produced by the network for online difficult samples. Fig. 8 is an original image of the occlusion large object detection in this example.

For comparison with the prior art method, fig. 9 shows the detection result of the prior art SSD 300; FIG. 10 shows the results of a conventional YOLOv3 assay; FIG. 11 shows the results of the conventional fast R-CNN assay; FIG. 12 shows the results of the conventional Cascade R-CNN assay; FIG. 13 is a detection result of a vehicle detection method in a complex scenario according to an embodiment of the present invention; as can be seen directly from the comparison, the embodiment of the invention has better detection effect on the shielded vehicle and higher detection precision.

FIG. 14 is an original image of occluded small targets; FIG. 15 shows the detection results of a conventional SSD 300; FIG. 16 shows the results of a conventional YOLOv3 test; FIG. 17 shows the results of the conventional fast R-CNN assay; FIG. 18 shows the results of the conventional Cascade R-CNN assay; FIG. 19 is a detection result of a vehicle detection method in a complex scenario according to an embodiment of the present invention; as can be seen directly from the comparison, the embodiment of the invention has better detection effect on small targets.

In another embodiment, the present invention further provides a vehicle detection apparatus in a complex scene, which is used to implement the vehicle detection method of the foregoing embodiment, and includes: the system comprises a self-collected vehicle image processing module, a shielding module generating module, a first vehicle detection network forming module, a second vehicle detection network forming module and a vehicle detection module; wherein the content of the first and second substances,

the shielding module generation module is used for inputting the vehicle images which are subjected to conversion in the self-collected vehicle image processing module into a target detection network, generating on-line shielded vehicle images by carrying out self-adaptive shielding on the input images, and carrying out on-line expansion on the vehicle images which are subjected to conversion in the self-collected vehicle image processing module according to an on-line shielded vehicle image expansion technology to form a shielding module;

the vehicle detection module is used for detecting the vehicle by utilizing a second vehicle detection network formed by the second vehicle detection network forming module.

The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention.

Claims

1. A vehicle detection method under a complex scene is characterized by comprising the following steps:

2. The vehicle detection method under the complex scene according to claim 1, wherein the S11 further comprises: and carrying out de-duplication processing on the training samples in the vehicle detection network.

3. The method for detecting vehicles in complex scenes according to claim 1, wherein the adaptively blocking the input image in S12 further comprises: adding a full connection layer and a shielding mask layer on the last layer of feature map of the target detection network; and carrying out convolution on the occlusion mask layer and the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input so as to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network.

4. The method for detecting the vehicle under the complex scene according to claim 1, wherein the feature layers with different scales in the S13 comprise: the VGG16 convolutional neural network comprises four feature layers with different scales, a large target for predicting the feature information of a high layer is realized, and a small target for predicting the feature information of a bottom layer is realized.

5. The vehicle detection method under the complex scene according to claim 1, wherein the S14 is further: and a shielding module formed by an online shielded vehicle image expansion network is used for respectively shielding the characteristic layer of each scale partially, so that self-adaptive shielding of targets of different scales is realized, shielding of a large target on a high-resolution characteristic layer is realized, and shielding of a small target on a low-resolution characteristic layer is realized.

6. A vehicle detection device under a complex scene is characterized by comprising: the system comprises a self-collected vehicle image processing module, a shielding module generating module, a first vehicle detection network forming module, a second vehicle detection network forming module and a vehicle detection module; wherein the content of the first and second substances,

7. The vehicle detection apparatus under the complex scene of claim 6, wherein the self-collected vehicle image processing module is further configured to perform de-duplication processing on training samples in a vehicle detection network.

8. The vehicle detection device under the complex scene according to claim 6, wherein the second vehicle detection network forming module is further configured to add a full connection layer and an occlusion mask layer on the last layer of feature map of the target detection network; and carrying out convolution on the occlusion mask layer and the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input so as to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network.

9. The vehicle detection device under the complex scene of claim 6, wherein the feature layers of different scales in the first vehicle detection network formation module comprise: the VGG16 convolutional neural network comprises four feature layers with different scales, a large target for predicting the feature information of a high layer is realized, and a small target for predicting the feature information of a bottom layer is realized.

10. The vehicle detection device under the complex scene according to claim 6, wherein the second vehicle detection network forming module is further configured to respectively perform partial occlusion on the feature layer of each scale by using an occlusion module formed by an online occluded vehicle image expansion network, so as to achieve adaptive occlusion on targets of different scales, further achieve occlusion on a large target in a high resolution feature layer, and occlusion on a small target in a low resolution feature layer.