CN111414910B

CN111414910B - Small target enhancement detection method and device based on double convolution neural network

Info

Publication number: CN111414910B
Application number: CN202010193865.XA
Authority: CN
Inventors: 李华伟; 王斯建
Original assignee: Shanghai Jiawo Photoelectric Technology Co ltd
Current assignee: Shanghai Jiawo Photoelectric Technology Co ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2023-05-02
Anticipated expiration: 2040-03-18
Also published as: CN111414910A

Abstract

The small target enhancement detection method based on the double convolution neural network can not only improve the detection accuracy, but also keep the detection speed similar to that of a general detection model. The detection method is to cascade a double convolution neural network, the first deconvolution neural network adopts a MobileNet-SSD network with the functions of target positioning and target recognition, candidate targets are screened out by a lower confidence threshold, and the small targets can be ensured not to be missed by the lower threshold; the second deconvolution neural network obtains candidate target areas according to the first round of detection, intercepts image sub-blocks one by one, adopts a MobileNet network only comprising a target recognition function, screens out targets by a higher confidence threshold, and can ensure that small targets are detected well by the higher threshold. By the two-wheel enhanced detection method provided by the invention, the detection accuracy of the small target can be improved, and the target false detection rate and the target missing detection rate can be reduced.

Description

Small target enhancement detection method and device based on double convolution neural network

Technical Field

The invention relates to the technical field of target detection, in particular to a small target enhancement detection method and device based on a double convolution neural network.

Background

Target detection is a basic technology of computer vision, and the technology achieves positioning of a region where a target is located and identifies category information of the target in a given image. In an actual image, the size of the targets has a large range, the angles, the postures and the positions of the targets in the image are different, and the targets may have an overlapping phenomenon, so that the difficulty of target detection becomes large. The current target detection algorithms that occur in academia and industry are mainly divided into two categories:

the traditional method for extracting the characteristics based on the manual rule adopts a region selection strategy based on a sliding window, adopts an HOG operator to extract the target characteristic vector, and the typical algorithm is as follows: cascade cascades+hog+haar/SVM, many improvements, optimizations of the above method. The main problems are as follows: the area selection strategy based on the sliding window has no pertinence, the time complexity is high, and the window is redundant; the manually designed features are not very robust to variations in diversity.

In recent years, a convolutional neural network-based automatic feature extraction method is developed in two stages. The first stage is a two-step method of candidate region+deep learning classification, by extracting candidate regions and performing a deep learning method-based classification scheme on the corresponding regions, for example: R-CNN, fast R-CNN and other serial methods. The R-CNN method effectively improves two problems of the traditional method; the convolutional neural network CNN method replaces manual characteristics, and improves the robustness of the algorithm. Although the speed of Faster R-CNN is improved, the speed is not satisfactory in real time, and multiple neural network models need to be trained. Thus, the convolutional neural network is developed to a brand new stage, the second stage is a one-step method based on deep learning regression, and the algorithm does not need to extract candidate areas separately, but puts the identification of the areas into the deep learning, so as to realize end-to-end target detection, such as: YOLO/SSD/MobileNet-SSD, and the like. The mobile network-SSD replaces a front end VGG feature extraction network of the SSD with MobileNet, mobileNet, and the convolution neural network model is suitable for mobile phone end target recognition and is proposed by Google.

Small target detection has been a challenge in deep learning convolutional neural network models. The target detection algorithm is subjected to a series of improvements from the traditional method to the deep learning method, so that the development of the accuracy and the real-time performance of target detection is achieved, and the problem of poor accuracy of small target identification still exists. There are two definition modes of small target, one is the relative size, for example, the length and width of the target size is 0.1 of the original image size, which can be regarded as the small target, and the other is the definition of absolute size, for example, the target with the size smaller than 32×32 pixels can be regarded as the small target, and the first definition is adopted by the invention. Early small target detection methods are mostly based on general conventional medium and large target-oriented methods, such as classical single-step method SSD, two-step method Faster R-CNN, and the like, which are mainly solutions designed for general target data sets, and are good at detecting medium and large targets, so that the detection effect is not ideal for small targets in images.

In recent years, a second method using a multi-layer feature map has been proposed, which has significantly improved the effect of small object detection, and the multi-layer feature map method is classified into the following 3 types: feature pyramid, RNN concept, layer-by-layer prediction. Feature pyramid: and referring to the characteristic information of the multi-scale characteristic map, and simultaneously considering stronger semantic characteristics and position characteristics. The method has the advantage that the multi-scale feature map is a transition module inherent in the convolutional neural network, and the stacked multi-scale feature map adds little complexity to the algorithm. Layer-by-layer prediction: the method predicts the characteristic diagram output of each layer of the convolutional neural network once, and finally obtains the result by comprehensive consideration. Also, this approach requires extremely high hardware performance. RNN concept: the threshold mechanism, long-short-term memory and the like in the RNN algorithm are referred, and multi-level characteristic information is recorded at the same time. However, RNNs have the inherent disadvantage of a slower training rate.

In the two main types of prior art, the first type adopts a general detection model facing to a conventional medium-large target, such as a MobileNet-SSD, and the like, so that the problem that the detection accuracy of a small target is low and the false detection or the omission is easy to cause exists. The second category adopts a method of multi-layer feature graphs, and because the feature graphs are added, the training and reasoning time is greatly increased, and the real-time performance of the algorithm is affected.

Disclosure of Invention

The invention provides a small target enhancement detection method and device based on a double convolution neural network, which solve the technical problems, and can improve the detection accuracy and maintain the detection speed similar to that of a general detection model.

The invention provides a small target enhancement detection method based on a double convolution neural network, which comprises the following steps:

the first deconvolution neural network adopts a reduced MobileNet-SSD detection network with the functions of target positioning and target recognition;

screening candidate target areas from the detection result of the first deconvolution neural network by using a lower confidence coefficient threshold (0.01-0.1), wherein the lower threshold can ensure that small targets are not missed;

obtaining candidate target areas according to the first deconvolution neural network, intercepting target image sub-blocks one by one, so that the target pixel occupation ratio is improved, changing a small target into a target with higher pixel occupation ratio, and sending the image sub-blocks into the second deconvolution neural network;

the second deconvolution neural network adopts a reduced MobileNet identification network only comprising a target identification function;

the second deconvolution neural network screens out targets with a higher confidence threshold (0.85-0.95), which can ensure that small targets are better checked.

The method has the advantages that the target detection is carried out by adopting the double convolution neural network, compared with a standard version MobileNet-SSD network with the default network shrinkage factor a of 1.0, the method does not reduce the detection speed, and improves the detection accuracy of a small target, and the AP is improved by 40% by taking a category bottle in a Pascal VOC data set as an example.

Drawings

FIG. 1 is a flow chart of a small target enhancement detection method based on a double convolution neural network.

Detailed Description

The invention is further described below with reference to the drawings and examples.

According to the small target enhancement detection method based on the double convolution neural network, the picture to be detected is sent into the double convolution neural network, and finally a target list comprising target positions and categories is obtained. The flow of the invention is shown in figure 1, and the specific implementation steps are as follows:

the double convolution neural network comprises a first deconvolution neural network and a second deconvolution neural network, wherein the first deconvolution neural network adopts a reduced MobileNet-SSD detection network with the functions of target positioning and target recognition, namely, a network reduction factor a is reduced from 1.0 of a standard MobileNet-SSD to 0.5, and the detection time of the reduced detection network is 1/4 of that of a reference detection network;

sending the picture to be detected into a first deconvolution neural network to execute an reasoning process, outputting a detection result by the first deconvolution neural network, sorting detection targets from high to low according to confidence probability values in a result queue, and screening candidate targets with a lower confidence threshold (0.1);

obtaining a candidate target area according to the first deconvolution neural network, intercepting target image sub-blocks one by one, thereby improving the target pixel ratio, changing a small target with smaller pixel ratio (less than 1%) into a large target with higher pixel ratio (more than 50%), and sending the image sub-blocks into the second deconvolution neural network;

the second deconvolution neural network adopts a MobileNet identification network only comprising a target identification function, the calculation time of the identification network is 1/8 of that of a reference detection network, and the second deconvolution neural network is assumed to identify 4 targets, so that the identification calculation time is 1/2.

Sending image sub-blocks with higher target pixel occupation into a second convolution neural network one by one to execute an reasoning process to obtain a target queue ordered from high to low according to confidence probability values, and screening out targets by using a higher confidence threshold (0.85-0.95), namely if the confidence of a certain target of the target queue is higher than the threshold, confirming that the image sub-blocks are targets; otherwise not the target.

The total calculation time of the double convolution neural network accounts for 3/4 (1/4+1/2) of the reference detection network, the above second double convolution network adopts the standard version MobileNet with the network reduction factor of 1.0, and if the second double convolution network also adopts the reduced version MobileNet, for example, the network reduction factor is set to 0.5, the number of identification targets can be further improved.

According to the embodiment of the invention, the target detection is performed by adopting the double convolution neural network, compared with a MobileNet-SSD network with the default of the network reduction factor a being 1.0, the detection speed is not reduced, the detection accuracy of a small target is improved, and the Average Precision (AP) value of all samples in a test set is improved by 40% by taking a category bottle in a Pascal VOC data set as an example.

The embodiment also provides a small target enhancement detection device based on a double convolution neural network, the detection device comprises: comprises a first deconvolution neural network unit and a second deconvolution neural network unit.

The first deconvolution neural network unit adopts a reduced MobileNet-SSD detection network with target positioning and target recognition functions, screens out candidate target areas, intercepts target image sub-blocks one by one for the screened candidate target areas, thereby improving the target pixel occupation ratio, forming high-pixel occupation ratio target image sub-blocks, and sending the high-pixel occupation ratio target image sub-blocks into the second deconvolution neural network unit;

and the second deconvolution neural network unit adopts a reduced MobileNet identification network only comprising a target identification function to carry out target identification on the target picture with high pixel duty ratio, and screens out the target.

Preferably, the first deconvolution neural network unit screens out candidate target areas by setting a low confidence threshold, and the range of the low confidence threshold is 0.01-0.1.

Preferably, the second deconvolution neural network unit screens out the target by setting a high confidence threshold, and the range of the high confidence threshold is 0.85-0.95.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of each method embodiment described above may be implemented. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

A memory storing a computer program that performs the steps of:

the first deconvolution neural network adopts a reduced MobileNet-SSD detection network with target positioning and target recognition functions, and screens out candidate target areas by setting a low confidence threshold;

intercepting target image sub-blocks one by one according to the candidate target area screened by the first deconvolution neural network, so as to improve the target pixel duty ratio, form high-pixel duty ratio target image sub-blocks, and send the high-pixel duty ratio target image sub-blocks into the second deconvolution neural network;

and the second deconvolution neural network adopts a reduced MobileNet identification network only comprising a target identification function to carry out target identification on the target picture with high pixel duty ratio, and the targets are screened out by setting a high confidence threshold.

Although the present invention has been described in terms of the preferred embodiments, it is not intended to be limited to the embodiments, and any person skilled in the art can make any possible variations and modifications to the technical solution of the present invention by using the methods and technical matters disclosed above without departing from the spirit and scope of the present invention, so any simple modifications, equivalent variations and modifications to the embodiments described above according to the technical matters of the present invention are within the scope of the technical matters of the present invention.

Claims

1. A small target enhancement detection method based on a double convolution neural network, characterized in that the method comprises the following steps:

2. The method of claim 1, wherein the first deconvolution neural network sets a low confidence threshold in the range of 0.01-0.1.

3. The small target enhancement detection method based on double convolution neural network according to claim 2, wherein the high confidence threshold set by the second double convolution neural network ranges from 0.85 to 0.95.

4. A small target enhancement detection method based on a double convolutional neural network as recited in claim 3, wherein the targets screened by the second double convolutional neural network include target locations and target categories.

5. The small object enhancement detection method based on double convolutional neural network as recited in claim 4, wherein the network reduction factor a of the reduced-form MobileNet-SSD detection network is set to 0.5.

6. The method of claim 5, wherein the first deconvolution neural network sets a low confidence threshold of 0.1.

7. The small target enhancement detection device based on the double convolution neural network is characterized by comprising a first deconvolution neural network unit and a second deconvolution neural network unit;

8. The small target enhancement detection device based on a double convolution neural network according to claim 7, wherein the first re-convolution neural network unit screens out candidate target areas by setting a low confidence threshold, and the low confidence threshold ranges from 0.01 to 0.1.

9. The small target enhancement detection device based on a double convolution neural network according to claim 8, wherein the second double convolution neural network unit screens out targets by setting a high confidence threshold, and the high confidence threshold ranges from 0.85 to 0.95.

10. A memory storing a computer program, characterized in that the computer program performs the steps of: