CN111985284A

CN111985284A - Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision

Info

Publication number: CN111985284A
Application number: CN201910443385.1A
Authority: CN
Inventors: 胡志强
Original assignee: Tianjin University of Science and Technology
Current assignee: Tianjin University of Science and Technology
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2020-11-24

Abstract

The invention relates to an anchor-box-free single-stage target detection device based on an attention mechanism and semantic weak supervision, and the accuracy of a target detection algorithm is improved by adopting a new method for estimating an object center point from a thermodynamic diagram. Compared with the traditional method, the method greatly reduces the influence of manually designed anchor box structures on the detection precision, and can use better characteristics to train the deep convolutional neural network, thereby obtaining higher precision. The method provides a theoretical basis for a single-stage target detection algorithm framework in the future.

Description

Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a target recognition algorithm based on deep learning.

Background

Object recognition can be applied in many fields, such as assisted driving of automobiles, automatic driving, and the like, for recognizing motor vehicles and pedestrians on a road surface. In recent years, target recognition algorithms gradually evolve from algorithms based on manually designed image features followed by machine learning feature classifiers to deep learning based methods. The target detection algorithm based on deep learning is further divided into a single-stage target detection algorithm and a two-stage target detection algorithm.

Girshick et al propose a 2-stage Fast R-cnn target detection algorithm, wherein in the first stage, an RPN (region pro positive network) is used for detecting a region where an object may exist in an image, and in the second stage, the depth characteristics of the region are used for carrying out classification and position regression on the object. Different from a two-stage target detection algorithm, the w.liu et al proposes a single-stage target detection algorithm. The whole feature map is traversed by a manually designed Anchor Box (Anchor Box), and finally the object is detected by feature classification and position regression.

In conclusion, the performance of the existing target price measurement algorithm greatly depends on the design mode of the anchor box and the selection of the hyper-parameters, and the design mode of the anchor box greatly limits the further improvement of the precision of the existing target detection algorithm.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and aims to improve the traditional target detection algorithm of manually designed anchor boxes. According to the invention, an attention mechanism module is added after the characteristic diagram is detected, so that the weight of the area of the characteristic diagram containing the target is increased, and the weight of the area without the target is decreased. The other innovation point is that in order to ensure the resolution of the thermodynamic diagram, the thermodynamic diagram is generated on the deconvoluted characteristic diagram, pyramid type fusion is carried out on the characteristic diagrams at different stages, and information with high resolution but weak semantics and information with low resolution but strong semantics are fused. The detection precision can be greatly improved. Another innovation of the method is that a new algorithm for guiding the target detection module by using box semantic segmentation as weak supervision information is provided on the basis of not introducing new calibration information. The weak semantic information can provide more macroscopic information for the detection network, and an angle is changed to enable the feature extraction network to be more concentrated in the feature map area where the target is located, so that a better detection effect is achieved. In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

Step 1: input image features are extracted using a generic feature extraction backbone network (e.g., VGG, ResNet, Googlenet, mobilene, shufflenet, etc.).

Step 2: and extracting feature maps of different scales.

And step 3: and sequentially amplifying feature maps of different scales to the same scale in a deconvolution mode.

And 4, step 4: and sequentially fusing the deconvoluted feature maps.

And 5: target detection is carried out on each fused feature map

Step 6: detection Head (Detection Head) output: 1. the thermodynamic diagram is used for estimating the center position of the object; 2. the classification output is used to estimate the object class; 3. outputting the width, height and position finishing results of each object; 4. the object detection process is aided by box-level semantic information.

And 7: and (6) suppressing the detection result output by each DP in the step 6 by using a non-maximum value to obtain a final detection result. The achievement of the invention provides theoretical basis for designing a rapid, reliable and easily-trained target detection algorithm.

Drawings

FIG. 1 is a block diagram of the overall object detection algorithm of the present invention.

FIG. 2 shows details of the detection head and output features.

Fig. 3 is a schematic diagram of a target detection result, and target center point information is estimated.

FIG. 4 is a specific example of weak semantic supervised information

FIG. 5 is a schematic illustration of an attention mechanism.

Claims

1. An anchor-box-free single-stage target detection device based on an attention mechanism and semantic weak supervision is characterized by comprising the following steps of:

Step 2: and extracting feature maps of different scales.

And 4, step 4: and sequentially fusing the deconvoluted feature maps.

And 5: adding an attention mechanism module after each fused feature map and then carrying out target detection

Step 6: detection Head (Detection Head) output: 1. the thermodynamic diagram is used for estimating the center position of the object; 2. the classification output is used to estimate the object class; 3. outputting the width, height and position finishing results of each object; 4. and (5) outputting weak supervision semantics.

And 7: and (6) suppressing the detection result output by each DP in the step 6 by using a non-maximum value to obtain a final detection result.

2. The deconvolution processing method of the feature map according to step 3 of claim 1.

3. The method for using the deconvolution feature map fusion processing mode and the attention module according to the method of claim 2 and step 4.

4. The detection head according to claim 3 or 6 outputs a thermodynamic diagram, and takes the local extreme points of the thermodynamic diagram as the center of the object.

5. The detector head output structure of claim 4 or 6.

6. The weakly supervised semantic segmentation branch of claim 5 step 6.

7. The inspection head of claim 6, step 6, outputting object width and height and position refinements.