CN110826457B

CN110826457B - Vehicle detection method and device under complex scene

Info

Publication number: CN110826457B
Application number: CN201911050728.4A
Authority: CN
Inventors: 张焕芹; 罗国慧; 毛士杰
Original assignee: Shanghai Rongjun Technology Co ltd
Current assignee: Shanghai Rongjun Technology Co ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2022-08-19
Anticipated expiration: 2039-10-31
Also published as: CN110826457A

Abstract

The invention discloses a vehicle detection method and a device under a complex scene, which comprises the following steps: processing the self-collected vehicle images according to the generated countermeasure network to generate transformed vehicle images; generating an on-line shielded vehicle image by performing self-adaptive shielding on the vehicle image, and expanding the on-line shielded vehicle image on line to form a shielding module; processing the vehicle image after online expansion by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, fusing feature information with different scales, inputting the feature information into a target detection network, and performing target classification and accurate frame regression to obtain a first vehicle detection network; adding a shielding module consisting of an on-line shielded vehicle image expansion network, and training a first vehicle detection network to form a second vehicle detection network; and detecting the vehicle by utilizing a second vehicle detection network. The invention improves the detection precision and is more suitable for the detection of the shielded target.

Description

Vehicle detection method and device under complex scene

Technical Field

The invention relates to the technical field of vehicle detection, in particular to a vehicle detection method and device under a complex scene.

Background

In both the traditional vision field and the computer vision field, target detection and identification are always one of the most fierce and hot tasks. The target detection task is to classify and identify all targets in one image, and can accurately position different targets with any shape and size at any position in the image. With the continuous development of deep learning technology in recent years, the target detection and identification algorithm is also shifted to the detection technology based on the deep neural network from the traditional detection algorithm based on manual features. On the basis of the target detection technology based on deep learning, various new methods are emerging continuously, and the method can be divided into a one-stage series method based on a region suggestion box and a two-stage series method based on regression according to the processing mode.

The two-stage series method is mainly divided into two steps to complete the target detection task. Firstly, automatically selecting a region with a possibility of having a target in an image to generate a region candidate frame, wherein the process mainly comprises the steps of generating a series of frames with different sizes and scales, extracting features according to the image in the frames, carrying out background-foreground classification, screening the frames classified as foreground, and generating a final candidate frame; then, the image in the candidate frame is judged according to the detailed category and the position of the candidate frame is adjusted. The detection precision is high, but the detection speed is reduced. The method for detecting the small targets in the image in the one-stage series does not need a network to extract candidate frames in advance, and the classification and positioning are uniformly regarded as the regression problem of the image to solve.

Disclosure of Invention

The invention provides a vehicle detection method and a device under a complex scene aiming at the problems in the prior art, and compared with a public data set, the method and the device have diversity and complexity, the detection precision is greatly improved, the method and the device are more suitable for detecting the shielded target, and the detection effect on the small target is improved through a multi-scale fused target detection network.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention provides a vehicle detection method under a complex scene, which comprises the following steps:

s11: processing the self-collected vehicle image according to the generated countermeasure network to generate a transformed vehicle image;

s12: inputting the vehicle image after transformation in S11 into a target detection network, generating an online sheltered vehicle image by performing adaptive sheltering on the input image, and performing online expansion on the vehicle image after transformation in S11 according to an online sheltered vehicle image expansion technology to form a sheltering module;

s13: processing the vehicle image after online expansion in the S12 by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, obtain feature layers with different scales, fusing feature information in the feature layers with different scales, inputting the fused feature information into the target detection network in the S12, and performing target classification and accurate frame regression to obtain a first vehicle detection network;

s14: adding an occlusion module consisting of an online occluded vehicle image expansion network in the S12, training the first vehicle detection network formed in the S13, and returning data of the generated training sample to form a second vehicle detection network;

s15: and detecting the vehicle by using the second vehicle detection network formed by the S14.

Preferably, the S11 further includes: and carrying out de-duplication processing on the training samples in the vehicle detection network.

Preferably, the adaptively blocking the input image in S12 further includes: adding a full connection layer and a shielding mask layer on the last layer of feature map of the target detection network; and carrying out convolution on the occlusion mask layer and the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input so as to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network.

Preferably, the feature layers of different scales in S13 include: the VGG16 convolutional neural network comprises four feature layers with different scales, a large target for predicting the feature information of a high layer is realized, and a small target for predicting the feature information of a bottom layer is realized.

Preferably, the S14 further comprises: and a shielding module formed by an online shielded vehicle image expansion network is used for respectively shielding the characteristic layer of each scale partially, so that self-adaptive shielding of targets of different scales is realized, shielding of a large target on a high-resolution characteristic layer is realized, and shielding of a small target on a low-resolution characteristic layer is realized.

The invention also provides a vehicle detection device under the complex scene, which comprises: the device comprises a self-collected vehicle image processing module, a shielding module generating module, a first vehicle detection network forming module, a second vehicle detection network forming module and a vehicle detection module; wherein, the first and the second end of the pipe are connected with each other,

the self-collected vehicle image processing module is used for processing the self-collected vehicle images according to the generated countermeasure network to generate transformed vehicle images;

the shielding module generation module is used for inputting the vehicle images which are converted in the self-collected vehicle image processing module into a target detection network, generating on-line shielded vehicle images by carrying out self-adaptive shielding on the input images, and carrying out on-line expansion on the vehicle images which are converted in the self-collected vehicle image processing module according to an on-line shielded vehicle image expansion technology to form a shielding module;

the first vehicle detection network forming module is used for processing the vehicle image after on-line expansion in the shielding module generating module by utilizing a multi-scale feature extraction technology to obtain area candidate frames with different scales, obtain feature layers with different scales, fuse feature information in the feature layers with different scales, input the fused feature information into a target detection network in the shielding module generating module, and perform target classification and accurate frame regression to obtain a first vehicle detection network;

the second vehicle detection network forming module is used for adding a shielding module which is formed by an online shielded vehicle image expansion network in the shielding module generating module, training the first vehicle detection network formed by the first vehicle detection network forming module, and returning data of the generated training sample to form a second vehicle detection network;

the vehicle detection module is used for detecting the vehicle by utilizing the second vehicle detection network formed by the second vehicle detection network forming module.

Preferably, the self-collected vehicle image processing module is further configured to perform deduplication processing on training samples in the vehicle detection network.

Preferably, the second vehicle detection network forming module is further configured to add a full connection layer and a shielding mask layer on the last layer of feature map of the target detection network; and carrying out convolution on the occlusion mask layer and the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input so as to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network.

Preferably, the feature layers of different scales in the first vehicle detection network formation module include: the VGG16 convolutional neural network comprises four feature layers with different scales, so that a large target is predicted by high-level feature information, and a small target is predicted by low-level feature information.

Preferably, the second vehicle detection network forming module is further configured to perform partial shielding on the feature layer of each scale by using a shielding module formed by an online shielded vehicle image expansion network, so as to implement adaptive shielding on targets of different scales, further implement shielding on a large target on a high-resolution feature layer, and shield a small target on a low-resolution feature layer.

Compared with the prior art, the invention has the following advantages:

(1) according to the vehicle detection method and device under the complex scene, the self-built data set contains a large number of images generated through the countermeasure network and the shielded target images, so that the method and device are more diverse and complex compared with the public data set, the detection precision is greatly improved, and the method and device are more suitable for detecting the shielded target;

(2) according to the vehicle detection method and device under the complex scene, disclosed by the invention, through multi-scale feature fusion, a large target can be predicted by using high-level feature information with a large receptive field, and a small target can be predicted by using bottom-level feature information with a small receptive field, so that fusion of high-level semantic information and bottom-level detail information is completed, and the detection precision is further improved;

(3) according to the vehicle detection method and device under the complex scene, the shielding module formed by the on-line shielded vehicle image expansion network is used for respectively shielding the characteristic layer of each scale partially, and due to the fact that different characteristic layers have different target sensitivities, the shielding of a large target on a high-resolution characteristic layer and the shielding of a small target on a low-resolution characteristic layer can be achieved, and the detection performance of shielded vehicles is improved.

Of course, it is not necessary for any product to practice the invention to achieve all of the above-described advantages at the same time.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings:

FIG. 1 is a flowchart of a vehicle detection method in a complex scenario according to an embodiment of the present invention;

FIG. 2 is a vehicle image before the generation of the countermeasure network augmentation based on deep convolution;

FIG. 3 is a corresponding image 1 of random noise generation;

FIG. 4 is a corresponding image 2 of random noise generation;

FIG. 5 is an online occluded original vehicle image based on a reinforcement learning mechanism;

FIG. 6 is an occlusion gray scale map automatically generated by the network;

FIG. 7 is the occlusion result;

FIG. 8 is an original image of the occluded large target;

FIG. 9 shows the detection results of a conventional SSD 300;

FIG. 10 shows the results of a conventional YOLOv3 assay;

FIG. 11 shows the results of the conventional fast R-CNN assay;

FIG. 12 shows the results of the conventional Cascade R-CNN assay;

fig. 13 is a detection result of the vehicle detection method in a complex scene according to an embodiment of the present invention;

FIG. 14 is an original image of occluded small targets;

FIG. 15 shows the detection results of a conventional SSD 300;

FIG. 16 shows the results of a conventional YOLOv3 test;

FIG. 17 shows the result of the conventional Faster R-CNN test;

FIG. 18 shows the results of the conventional Cascade R-CNN assay;

fig. 19 shows a detection result of the vehicle detection method in a complex scene according to an embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

Fig. 1 is a flowchart illustrating a vehicle detection method in a complex scenario according to an embodiment of the present invention.

Referring to fig. 1, the vehicle detection method in the complex scene of the present embodiment includes the following steps:

s12: inputting the transformed vehicle image in S11 into a target detection network, generating an online occluded vehicle image by performing adaptive occlusion on the input image, and performing online expansion on the transformed vehicle image in S11 according to an online occluded vehicle image expansion technology to form an occlusion module;

s13: processing the vehicle image after online expansion in the S12 by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, obtain feature layers with different scales, fusing feature information in the feature layers with different scales, inputting the fused feature information into a target detection network in the S12, and performing target classification and accurate frame regression to obtain a first vehicle detection network;

s15: the vehicle is detected using the second vehicle detection network formed at S14.

In an embodiment, S11 specifically includes: and generating the vehicle image expansion of the countermeasure network based on the deep convolution, and completing stable training on a generation model with higher resolution and deeper depth. Another implementation is vehicle image augmentation based on a cyclic generation countermeasure network, which can achieve inter-transformation between images of two domains.

In an embodiment, S12 specifically includes: the method comprises the steps that an optimal shielding mask is generated for a target in an image, so that online expansion of difficult samples is realized, and the samples can cause false detection of a target detection network; meanwhile, the difficult samples can be sent to a target detection network again for training, and the detection performance of the difficult samples is improved. The two networks are mutually confronted and learned, so that the performance of the target detection network is improved while a large number of difficult samples are generated.

In an embodiment, S13 specifically includes: the target detection is carried out by adopting four feature maps with different scales in a VGG16 network, a large object is predicted by using high-level feature information with a large receptive field, and a small target is predicted by using low-level feature information with a small receptive field. The invention introduces a multi-scale feature fusion module to complete the fusion of high-level semantic information and low-level detail information and improve the task of detection precision.

In an embodiment, S14 specifically includes: and an occlusion mask module is introduced to further train the network, so that the problem of detection of occluded targets is solved, and the target detection precision is further improved. The network is respectively applied to four different feature layers in parallel, partial shielding is carried out on the feature map on each scale, self-adaptive shielding of targets with different scales is achieved, meanwhile, data returning is carried out on the difficult samples generated on line, and the detection capability of the network on the shielded targets is improved. The network is distributed over four profiles in parallel. Because the four characteristic layers have different sensibility to the targets with different scales in the original image and the four shielding mask layers act independently, the module can shield the large target on the high-resolution characteristic layer and shield the small target on the low-resolution characteristic layer.

In a preferred embodiment, S11 specifically includes: the original data is collected and the size of an intercepted target is 96 multiplied by 48, all training data does not need to be preprocessed, the size of an image output generated by a generating network G is converted to [ -1,1], in the network training process, mini-batch SGD is adopted to carry out gradient descent training, the image batch size input into the network each time is 128, all input parameters adopt an initialization mode that the average value is 0 and the normalization is 0.02. And (3) performing network optimization by using a modified Adam optimizer, fine-tuning parameters, and modifying the learning rate from 0.001 to 0.0002. in order to prevent overfitting of the model, namely the model remembers some simple input features to generate similar pictures, the training samples are subjected to de-duplication treatment, namely images with high similarity in the training samples are removed.

In a preferred embodiment, S12 specifically includes: the network structure is that a full connection layer and a shielding mask layer are added on the last layer of feature map of a target detection network, and for an input original vehicle image, the original feature map is directly shielded by using the binary shielding mask layer, so that a corresponding feature map is formed. Its loss function expression

Wherein i and j represent horizontal and vertical coordinates on the feature map respectively, n represents the number of training sample pairs, d represents the dimension of the feature map, and X ^p Represents the p-th image of the n original images, and A () represents the characteristic image X of the network pair ^p The occlusion feature map obtained by the occlusion operation is carried out,

and (i, j) position output representing the binary shielding mask is in a value range of 0 or 1.

In a preferred embodiment, S13 specifically includes: and performing 3 × 3 convolution operation with the stride of 1 twice on the features in the candidate frame, and further performing feature extraction while reducing the scale of the feature map. 4 x 4 deconvolution with step of 2 is carried out on the characteristic information of the high-level candidate frame ^[55] And operation, generating a feature map with the same size as the previous layer, summing based on pixel values, adding an activation function and convolution operation to the summation result, and ensuring that the fused features have identifiability. The size of the deconvolution kernel is set according to the image sizes of the four characteristic layers of the VGG network. Assuming that the sizes of the last two layers of feature maps are m × m and n × n respectively, the sizes of the feature maps generated after two 3 × 3 convolutions with the step size of 1 are (m-2) × (m-2) and (n-2) × (n-2), and the size of the feature map obtained by the deconvolution of the high-level features with the step size of 4 × 4 with the step size of 2 is (2n-2) × (2 n-2). Because the sizes of the four feature maps in the selected VGG network are obtained through the pooling operation in sequence, if m is 2n, the feature maps with the same scale are generated through two operations after the deconvolution of the high-level features and the convolution of the low-level features, and the feature maps can be directly summed based on the pixel values.

In a preferred embodiment, S14 specifically includes: the shielding and shielding mask module is used for pre-training other networks, fixing parameters, accessing the shielding and shielding mask module, carrying out forward propagation on all data in the data set, wherein the training process is the same as that of the previous chapter, training is carried out by utilizing the image and shielding mask data pair, the parameters of the shielding and shielding mask module are trained independently in the process, and the two networks are jointly trained. After the input image passes through the shielding mask module, a shielding characteristic graph is generated, the input image is distinguished by using a loss function, a classification result is compared with a GT label of the input image, back propagation is carried out, and all parameters in the two networks are finely adjusted.

In order to verify that the method provided by the embodiment is more suitable for detecting an occluded target and to prove the effectiveness of the self-established data set of the embodiment, a network model trained by the self-established data set of the embodiment is compared and evaluated with a network model trained by public data sets (VOC07+12 and COCO), wherein the comparison algorithms are SSD300, YOLov3, Faster R-CNN (VGG-16 framework) and Cascade R-CNN.

TABLE 1 training results for different data sets for different methods

As can be seen from the data in table 1, the detection accuracy of different methods in the COCO data set is much lower than that in the VOC data set because the COCO data set contains images acquired in more categories of real scenes relative to the VOC data set, and at the same time, there are many images of each target and many targets in each image, so the network model obtained by training the COCO data set has much lower detection accuracy relative to the network model obtained by training the VOC data set; because the self-built data set of the invention is provided with a large number of images generated by GAN and occluded target images, the detection precision of the comparison method under the data set is generally reduced compared with the COCO data set, but the method of the invention is greatly improved, which shows that the self-built data set of the invention has more diversity and complexity compared with the public data set, and the method of the invention is more suitable for detecting the occluded target.

For a more intuitive display, reference is made to fig. 2-19, which are illustrations of an example process for vehicle detection in complex scenes using the above method, wherein: FIG. 2 is an example of generating an image of a vehicle before augmentation of a countermeasure network based on deep convolution; fig. 3 is a corresponding image 1 of random noise generation in this example, which increases the difficulty of detection compared to fig. 2. Fig. 4 is an image 2 of random noise generation in this example, which is more difficult to detect than fig. 3, which has more random noise.

FIG. 5 is an online occluded original vehicle image based on a reinforcement learning mechanism in this example; FIG. 6 is an occlusion gray scale map automatically generated by the network in this example; FIG. 7 is the occlusion result for this example, which generates the optimal occlusion image produced by the network for online difficult samples. Fig. 8 is an original image of the occlusion large object detection in this example.

For comparison with the prior art method, fig. 9 shows the detection result of the prior art SSD 300; FIG. 10 shows the results of a conventional YOLOv3 assay; FIG. 11 shows the results of the conventional fast R-CNN assay; FIG. 12 shows the results of the conventional Cascade R-CNN assay; fig. 13 is a detection result of the vehicle detection method in a complex scene according to an embodiment of the present invention; as can be seen directly from the comparison, the embodiment of the invention has better detection effect on the sheltered vehicle and higher detection precision.

FIG. 14 is an original image of occluded small targets; fig. 15 shows the detection result of the conventional SSD 300; FIG. 16 shows the results of a conventional YOLOv3 test; FIG. 17 shows the results of the conventional fast R-CNN assay; FIG. 18 shows the results of the conventional Cascade R-CNN assay; FIG. 19 is a detection result of a vehicle detection method in a complex scenario according to an embodiment of the present invention; as can be seen directly from the comparison, the embodiment of the invention has better detection effect on small targets.

In another embodiment, the present invention further provides a vehicle detection apparatus in a complex scene, which is used to implement the vehicle detection method in the foregoing embodiment, and includes: the system comprises a self-collected vehicle image processing module, a shielding module generating module, a first vehicle detection network forming module, a second vehicle detection network forming module and a vehicle detection module; wherein, the first and the second end of the pipe are connected with each other,

the shielding module generation module is used for inputting the vehicle images which are subjected to conversion in the self-collected vehicle image processing module into a target detection network, generating on-line shielded vehicle images by carrying out self-adaptive shielding on the input images, and carrying out on-line expansion on the vehicle images which are subjected to conversion in the self-collected vehicle image processing module according to an on-line shielded vehicle image expansion technology to form a shielding module;

the first vehicle detection network forming module is used for processing the vehicle image after online expansion in the shielding module generating module by utilizing a multi-scale feature extraction technology to obtain region candidate frames with different scales, obtain feature layers with different scales, fuse feature information in the feature layers with different scales, input the fused feature information into a target detection network in the shielding module generating module, and perform target classification and accurate frame regression to obtain a first vehicle detection network;

the vehicle detection module is used for detecting the vehicle by utilizing a second vehicle detection network formed by the second vehicle detection network forming module.

The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention.

Claims

1. A vehicle detection method under a complex scene is characterized by comprising the following steps:

s11: processing the self-collected vehicle image according to the generated countermeasure network to generate a transformed vehicle image; the transformed vehicle image is the vehicle image after the vehicle image is expanded based on the generation countermeasure network;

s15: detecting the vehicle by using the second vehicle detection network formed by the S14;

the adaptively blocking the input image by the means of S12 further includes: adding a full connection layer and a shielding mask layer on the last layer of feature map of the target detection network; convolving the occlusion mask layer with the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network;

the feature layers of different scales in the S13 include: the method comprises the following steps that four feature layers with different scales in a VGG16 convolutional neural network realize a large target of high-level feature information prediction and a small target of bottom-level feature information prediction;

s13 specifically includes: performing 3 × 3 convolution operation with stride of 1 twice on the features in the candidate frame, and further performing feature extraction while reducing the scale of the feature map; 4 x 4 deconvolution operation with the stride of 2 is carried out on the feature information of the high-level candidate frame to generate a feature map with the same size as that of the previous layer, summation based on pixel values is carried out, an activation function and convolution operation are added to the summation result, and the feature after fusion is ensured to have identifiability; the sizes of the last two layers of feature maps are respectively m multiplied by m and n multiplied by n, the sizes of the feature maps generated after the 3 multiplied by 3 convolution with the step length of 1 twice are respectively (m-2) x (m-2) and (n-2) x (n-2), the size of the feature map obtained by the 4 multiplied by 4 deconvolution with the step length of 2 for the high-level feature is (2n-2) x (2n-2), the sizes of the four feature maps in the selected VGG network are all obtained through pooling operation, if m is 2n, the feature maps with the same scale are just generated through two operations after the deconvolution of the high-level feature and the convolution of the low-level feature, and the feature maps are directly summed based on pixel values;

the S14 is further: and respectively carrying out partial shielding on the characteristic layer of each scale by using a shielding module consisting of an online shielded vehicle image expansion network, realizing self-adaptive shielding on targets of different scales, further realizing shielding on a large target on a high-resolution characteristic layer and shielding on a small target on a low-resolution characteristic layer.

2. The vehicle detection method under the complex scene according to claim 1, wherein the S11 further comprises: and carrying out deduplication processing on the training samples in the vehicle detection network.

3. A vehicle detection device under a complex scene is characterized by comprising: the system comprises a self-collected vehicle image processing module, a shielding module generating module, a first vehicle detection network forming module, a second vehicle detection network forming module and a vehicle detection module; wherein the content of the first and second substances,

the self-collected vehicle image processing module is used for processing the self-collected vehicle images according to the generated countermeasure network to generate transformed vehicle images; the transformed vehicle image is the vehicle image after the vehicle image is expanded based on the generation countermeasure network;

the vehicle detection module is used for detecting the vehicle by utilizing a second vehicle detection network formed by the second vehicle detection network forming module;

the second vehicle detection network forming module is further used for adding a full connection layer and a shielding mask layer on the last layer of feature map of the target detection network; convolving the occlusion mask layer with the feature map to generate an occlusion feature map, training the target detection network by taking the occlusion feature map as input to continuously learn how to occlude the image, and mapping the optimal occlusion feature map generated by the trained online difficult sample generation network back to a sample generated by an original image, namely the difficult sample in the training process of the target detection network;

the different-scale feature layers in the first vehicle detection network formation module include: the VGG16 convolutional neural network comprises four feature layers with different scales, a large target is predicted by high-level feature information, and a small target is predicted by low-level feature information; performing 3 × 3 convolution operation with stride of 1 twice on the features in the candidate frame, and further performing feature extraction while reducing the scale of the feature map; 4 x 4 deconvolution operation with the stride of 2 is carried out on the feature information of the high-level candidate frame to generate a feature map with the same size as that of the previous layer, summation based on pixel values is carried out, an activation function and convolution operation are added to the summation result, and the feature after fusion is ensured to have identifiability; the sizes of the last two layers of feature maps are respectively m multiplied by m and n multiplied by n, the sizes of the feature maps generated after the 3 multiplied by 3 convolution with the step length of 1 twice are respectively (m-2) x (m-2) and (n-2) x (n-2), the size of the feature map obtained by the 4 multiplied by 4 deconvolution with the step length of 2 for the high-level feature is (2n-2) x (2n-2), the sizes of the four feature maps in the selected VGG network are all obtained through pooling operation, if m is 2n, the feature maps with the same scale are just generated through two operations after the deconvolution of the high-level feature and the convolution of the low-level feature, and the feature maps are directly summed based on pixel values;

the second vehicle detection network forming module is further used for respectively carrying out partial shielding on the feature layer of each scale by using a shielding module formed by the on-line shielded vehicle image expansion network, so that self-adaptive shielding on targets of different scales is realized, shielding on a large target on a high-resolution feature layer is further realized, and shielding on a small target on a low-resolution feature layer is realized.

4. The vehicle detection apparatus under the complex scene of claim 3, wherein the self-collected vehicle image processing module is further configured to perform de-duplication processing on training samples in the vehicle detection network.