CN111985284A - Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision - Google Patents
Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision Download PDFInfo
- Publication number
- CN111985284A CN111985284A CN201910443385.1A CN201910443385A CN111985284A CN 111985284 A CN111985284 A CN 111985284A CN 201910443385 A CN201910443385 A CN 201910443385A CN 111985284 A CN111985284 A CN 111985284A
- Authority
- CN
- China
- Prior art keywords
- target detection
- semantic
- attention mechanism
- weak supervision
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to an anchor-box-free single-stage target detection device based on an attention mechanism and semantic weak supervision, and the accuracy of a target detection algorithm is improved by adopting a new method for estimating an object center point from a thermodynamic diagram. Compared with the traditional method, the method greatly reduces the influence of manually designed anchor box structures on the detection precision, and can use better characteristics to train the deep convolutional neural network, thereby obtaining higher precision. The method provides a theoretical basis for a single-stage target detection algorithm framework in the future.
Description
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a target recognition algorithm based on deep learning.
Background
Object recognition can be applied in many fields, such as assisted driving of automobiles, automatic driving, and the like, for recognizing motor vehicles and pedestrians on a road surface. In recent years, target recognition algorithms gradually evolve from algorithms based on manually designed image features followed by machine learning feature classifiers to deep learning based methods. The target detection algorithm based on deep learning is further divided into a single-stage target detection algorithm and a two-stage target detection algorithm.
Girshick et al propose a 2-stage Fast R-cnn target detection algorithm, wherein in the first stage, an RPN (region pro positive network) is used for detecting a region where an object may exist in an image, and in the second stage, the depth characteristics of the region are used for carrying out classification and position regression on the object. Different from a two-stage target detection algorithm, the w.liu et al proposes a single-stage target detection algorithm. The whole feature map is traversed by a manually designed Anchor Box (Anchor Box), and finally the object is detected by feature classification and position regression.
In conclusion, the performance of the existing target price measurement algorithm greatly depends on the design mode of the anchor box and the selection of the hyper-parameters, and the design mode of the anchor box greatly limits the further improvement of the precision of the existing target detection algorithm.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and aims to improve the traditional target detection algorithm of manually designed anchor boxes. According to the invention, an attention mechanism module is added after the characteristic diagram is detected, so that the weight of the area of the characteristic diagram containing the target is increased, and the weight of the area without the target is decreased. The other innovation point is that in order to ensure the resolution of the thermodynamic diagram, the thermodynamic diagram is generated on the deconvoluted characteristic diagram, pyramid type fusion is carried out on the characteristic diagrams at different stages, and information with high resolution but weak semantics and information with low resolution but strong semantics are fused. The detection precision can be greatly improved. Another innovation of the method is that a new algorithm for guiding the target detection module by using box semantic segmentation as weak supervision information is provided on the basis of not introducing new calibration information. The weak semantic information can provide more macroscopic information for the detection network, and an angle is changed to enable the feature extraction network to be more concentrated in the feature map area where the target is located, so that a better detection effect is achieved. In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
Step 1: input image features are extracted using a generic feature extraction backbone network (e.g., VGG, ResNet, Googlenet, mobilene, shufflenet, etc.).
Step 2: and extracting feature maps of different scales.
And step 3: and sequentially amplifying feature maps of different scales to the same scale in a deconvolution mode.
And 4, step 4: and sequentially fusing the deconvoluted feature maps.
And 5: target detection is carried out on each fused feature map
Step 6: detection Head (Detection Head) output: 1. the thermodynamic diagram is used for estimating the center position of the object; 2. the classification output is used to estimate the object class; 3. outputting the width, height and position finishing results of each object; 4. the object detection process is aided by box-level semantic information.
And 7: and (6) suppressing the detection result output by each DP in the step 6 by using a non-maximum value to obtain a final detection result. The achievement of the invention provides theoretical basis for designing a rapid, reliable and easily-trained target detection algorithm.
Drawings
FIG. 1 is a block diagram of the overall object detection algorithm of the present invention.
FIG. 2 shows details of the detection head and output features.
Fig. 3 is a schematic diagram of a target detection result, and target center point information is estimated.
FIG. 4 is a specific example of weak semantic supervised information
FIG. 5 is a schematic illustration of an attention mechanism.
Claims (7)
1. An anchor-box-free single-stage target detection device based on an attention mechanism and semantic weak supervision is characterized by comprising the following steps of:
step 1: input image features are extracted using a generic feature extraction backbone network (e.g., VGG, ResNet, Googlenet, mobilene, shufflenet, etc.).
Step 2: and extracting feature maps of different scales.
And step 3: and sequentially amplifying feature maps of different scales to the same scale in a deconvolution mode.
And 4, step 4: and sequentially fusing the deconvoluted feature maps.
And 5: adding an attention mechanism module after each fused feature map and then carrying out target detection
Step 6: detection Head (Detection Head) output: 1. the thermodynamic diagram is used for estimating the center position of the object; 2. the classification output is used to estimate the object class; 3. outputting the width, height and position finishing results of each object; 4. and (5) outputting weak supervision semantics.
And 7: and (6) suppressing the detection result output by each DP in the step 6 by using a non-maximum value to obtain a final detection result.
2. The deconvolution processing method of the feature map according to step 3 of claim 1.
3. The method for using the deconvolution feature map fusion processing mode and the attention module according to the method of claim 2 and step 4.
4. The detection head according to claim 3 or 6 outputs a thermodynamic diagram, and takes the local extreme points of the thermodynamic diagram as the center of the object.
5. The detector head output structure of claim 4 or 6.
6. The weakly supervised semantic segmentation branch of claim 5 step 6.
7. The inspection head of claim 6, step 6, outputting object width and height and position refinements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910443385.1A CN111985284A (en) | 2019-05-21 | 2019-05-21 | Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910443385.1A CN111985284A (en) | 2019-05-21 | 2019-05-21 | Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111985284A true CN111985284A (en) | 2020-11-24 |
Family
ID=73436857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910443385.1A Pending CN111985284A (en) | 2019-05-21 | 2019-05-21 | Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111985284A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255759A (en) * | 2021-05-20 | 2021-08-13 | 广州广电运通金融电子股份有限公司 | Attention mechanism-based in-target feature detection system, method and storage medium |
-
2019
- 2019-05-21 CN CN201910443385.1A patent/CN111985284A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255759A (en) * | 2021-05-20 | 2021-08-13 | 广州广电运通金融电子股份有限公司 | Attention mechanism-based in-target feature detection system, method and storage medium |
CN113255759B (en) * | 2021-05-20 | 2023-08-22 | 广州广电运通金融电子股份有限公司 | In-target feature detection system, method and storage medium based on attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dharneeshkar et al. | Deep Learning based Detection of potholes in Indian roads using YOLO | |
CN109615016B (en) | Target detection method of convolutional neural network based on pyramid input gain | |
Börcs et al. | Instant object detection in lidar point clouds | |
CN103927526B (en) | Vehicle detecting method based on Gauss difference multi-scale edge fusion | |
CN102646199B (en) | Motorcycle type identifying method in complex scene | |
CN105046196A (en) | Front vehicle information structured output method base on concatenated convolutional neural networks | |
CN102073852B (en) | Multiple vehicle segmentation method based on optimum threshold values and random labeling method for multiple vehicles | |
JP6095817B1 (en) | Object detection device | |
CN114627437B (en) | Traffic target identification method and system | |
CN112270383B (en) | Tunnel large-scale rivet hole extraction method based on full convolution neural network | |
CN111985286A (en) | Target detection algorithm without anchor box based on Gaussian thermodynamic diagram attention mechanism and semantic weak supervision | |
Aneesh et al. | Real-time traffic light detection and recognition based on deep retinanet for self driving cars | |
CN109543498B (en) | Lane line detection method based on multitask network | |
KR20180062683A (en) | Apparatus and Method for Detecting Vehicle using Image Pyramid | |
Mijić et al. | Traffic sign detection using YOLOv3 | |
CN111985284A (en) | Single-stage target detection device without anchor box based on attention mechanism and semantic weak supervision | |
CN111985493A (en) | Anchor-box-free target detection algorithm based on Gaussian thermodynamic diagram and attention mechanism | |
CN113701642A (en) | Method and system for calculating appearance size of vehicle body | |
CN117197146A (en) | Automatic identification method for internal defects of castings | |
CN111986140A (en) | Single-stage face detection device without anchor box and using semantic information weak supervision | |
CN111985285A (en) | Target detection device without anchor box based on Gaussian attention mechanism and semantic weak supervision | |
CN111985515A (en) | Single-stage target detection algorithm using semantic information weak supervision deconvolution feature layer fusion | |
CN111985516A (en) | Single-stage target detection algorithm based on attention mechanism | |
CN111985288A (en) | Single-stage unmanned vehicle detection device without anchor box | |
CN111985287A (en) | Quick target detection device based on Gaussian thermodynamic diagram |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
DD01 | Delivery of document by public notice | ||
DD01 | Delivery of document by public notice |
Addressee: Hu Zhiqiang Document name: Notice of publication of application for patent for invention |
|
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201124 |