CN112381106B - Target detection method based on global area prior attention - Google Patents
Target detection method based on global area prior attention Download PDFInfo
- Publication number
- CN112381106B CN112381106B CN202011365545.4A CN202011365545A CN112381106B CN 112381106 B CN112381106 B CN 112381106B CN 202011365545 A CN202011365545 A CN 202011365545A CN 112381106 B CN112381106 B CN 112381106B
- Authority
- CN
- China
- Prior art keywords
- attention
- global area
- prior attention
- prior
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 11
- 230000002708 enhancing effect Effects 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 abstract description 3
- 230000003044 adaptive effect Effects 0.000 abstract 3
- 238000013527 convolutional neural network Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 210000001525 retina Anatomy 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000005252 bulbus oculi Anatomy 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection method based on a global area prior attention mechanism, which comprises the following steps: traversing all training samples, and counting the frequency of the target appearing in the image to obtain global prior attention; acquiring an image to be detected, extracting features of the image by using a feature extraction network obtained by training, extracting adaptive attention by using a convolutional neural network, correcting and enhancing global prior attention to obtain adaptive global prior attention, and enhancing a feature map by using the adaptive global prior attention; and finally, carrying out target detection. The method provides the global area prior attention mechanism network, improves the training convergence speed, improves the target detection precision while ensuring the detection speed, and is more obviously improved in the target detection application scene in which the target types are few and appear at a specific position.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a target detection method based on global area prior attention
Background
In recent years, computer vision has been widely used in various fields, freeing social productivity. Object detection is a very important task in computer vision. With the development of deep learning, target detection is widely researched, and the application of computer vision in various fields is promoted. Object detection can mark the position and category of an object in an image or video that we want to detect. This functionality can be extended to many practical applications, such as: automatic driving, pedestrian tracking, text recognition, etc.
The target detection technology based on the traditional method has the disadvantages of low speed, low precision and the like. With the development of deep learning technology and computer hardware technology, the target detection technology based on deep learning is developed greatly, so that the precision and speed are improved qualitatively, and the speed and precision of actual use are achieved. However, the target detection in the difficult scene by the current target detection method still has a large promotion space.
In the human eyeball, only the fovea of the retina has high-density cone cells, and the human can concentrate an area of interest to the fovea of the retina for better perception environment. Computer visual attention comes from this birth. In recent years, learners have used attention models to enhance feature extraction networks in object detection. The general target detection has the following difficulties: 1) during feature extraction, the background and the target are treated equally, and the features of the target to be detected are difficult to highlight; 2) the common attention model is difficult to ensure the correctness of attention; 3) it is difficult to detect small targets. The combination of the common attention model with target detection has its limitations.
Through the above description, how to accurately obtain the attention mask to perform the feature extraction better is a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a target detection method based on global area prior attention, which improves the accuracy of an attention mask and finally improves the precision of target detection.
The technical solution for realizing the purpose of the invention is as follows: a target detection method based on global area prior attention comprises the following steps:
step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area;
step (B), acquiring an image to be detected, and extracting the features of the original image by using a feature extraction network obtained by training;
step (C), branching from a backbone network to a self-adaptive attention network branch, and correcting and enhancing the prior attention of the global area by using the self-adaptive attention network to obtain a self-adaptive global area prior attention mask; and multiplying the attention mask points on the feature map, and then carrying out target detection on the feature map.
Further, step (a), traversing the training set, and counting the position information of the target in each training sample to obtain the prior attention of the global area, includes the following steps:
(A1) selecting an area where the target is located in a manual frame in the training sample picture, and acquiring position information of the target area where the target is located in the training sample picture to be used as a label for storage;
(A2) scaling the training sample picture and the global area prior attention to the set size;
(A3) initializing each value of the prior attention of the global area to be zero, traversing the training sample, and adding 1 to the pixel values of all pixels in the prior attention of the global area at the same position as the area defined in the label of the picture of the training sample to obtain the prior attention of the global area;
(A4) and normalizing the prior attention of the global area by using a sigmoid function to obtain the prior attention of the global area.
Further, the step (C) includes the steps of:
(C1) extracting features from features in the trunk feature extraction network as input of a global area prior attention correction enhancement network;
(C2) performing convolution, pooling and deconvolution on the input features to obtain more abstract semantic information;
(C3) and normalizing the calculation result by a sigmoid function, and then multiplying the result by the prior attention of the global area to enhance and correct the prior attention of the global area.
(C4) Adding 1 to the calculation result, and then performing point multiplication on the features of the trunk feature extraction network to enhance the features; the calculation formula is as follows:
y=(Sigmoid(F(x))*Sigmoid(pre)+1)*w
wherein x is the input feature, F (x) is series calculation of convolution, pooling and deconvolution, sigmoid (pre) is area prior attention, points are multiplied by a mask generated in an attention correction enhancement network to correct and enhance the area prior attention, 1 is added on all elements, and finally points are formed on a feature graph w to enhance the feature graph, and then the target detection is carried out on the feature graph.
Compared with the prior art, the invention has the following remarkable advantages: the attention model-based target detection method for enhancing the prior attention of the global area fully utilizes the position information of the target in the training set to generate the prior attention, then corrects and enhances the global prior attention through the neural network, and finally enhances the characteristics generated by the main network, thereby improving the target detection precision. The method has a few target types and has a remarkable target detection scene effect with regular positions. The method has good application scenes in industrial application and has the following advantages:
(1) under the condition of poor image quality, the prior attention can determine the approximate position of the target according to the global information, so that the positioning precision is improved;
(2) the global a priori attention enhancement network can enhance and adjust a priori attention to form final more accurate attention according to specific characteristics of each image.
Drawings
FIG. 1 is a flowchart of a global area prior attention based target detection method of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, a target detection method based on a global area prior attention mechanism includes the following steps:
step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area, wherein the method comprises the following steps:
(A1) selecting an area where the target is located in the training sample picture by a manual frame, and acquiring position information of the target area where the target is located in the training sample picture to be stored as a label;
(A2) scaling the training image to a set size, and setting the prior attention size of the global area to the size;
(A3) initializing each pixel value of the prior attention of the global area to be zero, traversing each training sample, and adding one to the pixel values in all rectangular frames to obtain a thermodynamic diagram;
(A4) and normalizing the prior attention of the global area by using a sigmoid function to obtain the prior attention of the global area.
And (B) acquiring an image to be detected, and extracting the features of the original image by using the feature extraction network obtained by training, wherein the method comprises the following steps:
(B1) using a bilinear interpolation method to scale the picture to be detected to the same size;
(B2) using convolution kernel of 3 × 3 and convolution kernel of 1 × 1 to extract features of input values, and finally generating a multi-dimensional matrix of 13 × 13, which divides the original image into 13 × 13 grids, wherein 169 n-dimensional vectors of 13 × 13 are responsible for target detection in each grid, including classification information and target position information;
(B3) the label information in the training samples is processed into a matrix form corresponding to the network output, and the loss value is calculated, and then the entire network is trained using back propagation.
Step (C), a global area prior attention correction and enhancement network is branched from the feature map, the global area prior attention is corrected and enhanced by the attention network, and finally an attention mask is obtained, wherein the method comprises the following steps:
(C1) extracting any layer from the features in the main feature extraction network as the input of a global area prior attention correction enhancement network;
(C2) performing convolution, pooling, deconvolution and other calculations on the input features to obtain more abstract semantic information;
(C3) normalizing the calculation result by a sigmoid function, and then multiplying the normalized result by the prior attention of the global area to enhance and correct the prior attention of the global area;
(C4) and adding 1 to the calculation result, and then performing point multiplication on the features of the trunk feature extraction network to enhance the features. The calculation formula is as follows:
y=(Sigmoid(F(x))*Sigmoid(pre)+1)*w
wherein x is input characteristics, F (x) is a series of calculations such as convolution, pooling, deconvolution and the like, sigmoid (pre) is area prior attention, points are multiplied by flooding generated in an attention correction and enhancement network to correct and enhance the area prior attention, 1 is added on all elements, w is a characteristic diagram, and finally points are multiplied on the characteristic diagram to enhance the characteristic diagram, and then target detection is carried out on the characteristic diagram.
(C4) The point multiplication feature is not the original feature, but the feature after several layers of convolution is enhanced by adopting a jumping mode.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (3)
1. A target detection method based on global area prior attention is characterized by comprising the following steps:
step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area;
step (B), acquiring an image to be detected, and extracting the features of the original image by using a feature extraction network obtained by training;
step (C), branching from a backbone network to a self-adaptive attention network branch, and correcting and enhancing the prior attention of the global area by using the self-adaptive attention network to obtain a self-adaptive global area prior attention mask; multiplying the attention mask points on the feature map, and then carrying out target detection on the feature map;
the step (C) comprises the following steps:
(C1) extracting features from features in the trunk feature extraction network as input of a global area prior attention correction enhancement network;
(C2) performing convolution, pooling and deconvolution on the input features to obtain more abstract semantic information;
(C3) normalizing the calculation result obtained in the step (C2) by a sigmoid function, and then multiplying the normalized result by the prior attention of the global area to enhance and correct the prior attention of the global area;
(C4) adding 1 to the calculation result of the step (C3), and then performing point multiplication on the features of the trunk feature extraction network to enhance the features; the calculation formula is as follows:
wherein x is a characteristic of the input and,for series calculation of convolution, pooling and deconvolution,multiplying the point with the mask generated in the attention correction enhancing network for the area prior attention, performing correction enhancement on the area prior attention, adding 1 to all elements, and finally forming a point into a feature mapAnd enhancing the characteristic diagram, and then carrying out target detection on the characteristic diagram.
2. The target detection method based on global area prior attention as claimed in claim 1, wherein in step (a), the training set is traversed, and the position information of the target appearing in each training sample is counted to obtain the global area prior attention, including the following steps:
(A1) selecting an area where the target is located in the training sample picture by a manual frame, and acquiring position information of the target area where the target is located in the training sample picture to be stored as a label;
(A2) scaling the training sample picture and the global area prior attention to the set size;
(A3) initializing each value of the prior attention of the global area to be zero, traversing the training sample, and adding 1 to the pixel values of all pixels in the prior attention of the global area at the same position as the area defined in the label of the picture of the training sample to obtain the prior attention of the global area;
(A4) and normalizing the prior attention of the global area by using a sigmoid function to obtain the prior attention of the global area.
3. The global area prior attention-based target detection method according to claim 1, wherein in the step (B), the image to be detected is obtained, and feature extraction is performed on the original image by using a feature extraction network obtained by training, and the method comprises the following steps:
(B1) using a bilinear interpolation method to scale the picture to be detected to the same size;
(B2) using convolution kernel of 3 × 3 and convolution kernel of 1 × 1 to extract features of input values, and finally generating a multi-dimensional matrix of 13 × 13, which divides the original image into 13 × 13 grids, wherein 169 n-dimensional vectors of 13 × 13 are responsible for target detection in each grid, including classification information and target position information;
(B3) the label information in the training samples is processed into a matrix form corresponding to the network output, and the loss value is calculated, and then the entire network is trained using back propagation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011365545.4A CN112381106B (en) | 2020-11-28 | 2020-11-28 | Target detection method based on global area prior attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011365545.4A CN112381106B (en) | 2020-11-28 | 2020-11-28 | Target detection method based on global area prior attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112381106A CN112381106A (en) | 2021-02-19 |
CN112381106B true CN112381106B (en) | 2022-09-09 |
Family
ID=74588595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011365545.4A Active CN112381106B (en) | 2020-11-28 | 2020-11-28 | Target detection method based on global area prior attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112381106B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111489334A (en) * | 2020-04-02 | 2020-08-04 | 暖屋信息科技(苏州)有限公司 | Defect workpiece image identification method based on convolution attention neural network |
CN111563415A (en) * | 2020-04-08 | 2020-08-21 | 华南理工大学 | Binocular vision-based three-dimensional target detection system and method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8266185B2 (en) * | 2005-10-26 | 2012-09-11 | Cortica Ltd. | System and methods thereof for generation of searchable structures respective of multimedia data content |
-
2020
- 2020-11-28 CN CN202011365545.4A patent/CN112381106B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111489334A (en) * | 2020-04-02 | 2020-08-04 | 暖屋信息科技(苏州)有限公司 | Defect workpiece image identification method based on convolution attention neural network |
CN111563415A (en) * | 2020-04-08 | 2020-08-21 | 华南理工大学 | Binocular vision-based three-dimensional target detection system and method |
Also Published As
Publication number | Publication date |
---|---|
CN112381106A (en) | 2021-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7236545B2 (en) | Video target tracking method and apparatus, computer apparatus, program | |
US11151403B2 (en) | Method and apparatus for segmenting sky area, and convolutional neural network | |
CN108229490B (en) | Key point detection method, neural network training method, device and electronic equipment | |
CN108734723B (en) | Relevant filtering target tracking method based on adaptive weight joint learning | |
WO2022111355A1 (en) | License plate recognition method and apparatus, storage medium and terminal | |
CN112800964B (en) | Remote sensing image target detection method and system based on multi-module fusion | |
CN111260688A (en) | Twin double-path target tracking method | |
CN112001403B (en) | Image contour detection method and system | |
CN110827312B (en) | Learning method based on cooperative visual attention neural network | |
CN110135446B (en) | Text detection method and computer storage medium | |
CN111738344A (en) | Rapid target detection method based on multi-scale fusion | |
CN110619638A (en) | Multi-mode fusion significance detection method based on convolution block attention module | |
CN111967464B (en) | Weak supervision target positioning method based on deep learning | |
CN112927209A (en) | CNN-based significance detection system and method | |
CN111310609A (en) | Video target detection method based on time sequence information and local feature similarity | |
CN107729885B (en) | Face enhancement method based on multiple residual error learning | |
CN113627481A (en) | Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens | |
CN112381106B (en) | Target detection method based on global area prior attention | |
CN116740399A (en) | Training method, matching method and medium for heterogeneous image matching model | |
CN115908995A (en) | Digital instrument reading identification method and device, electronic equipment and storage medium | |
CN113256528B (en) | Low-illumination video enhancement method based on multi-scale cascade depth residual error network | |
CN113269808B (en) | Video small target tracking method and device | |
CN111931793B (en) | Method and system for extracting saliency target | |
CN114842506A (en) | Human body posture estimation method and system | |
CN113112522A (en) | Twin network target tracking method based on deformable convolution and template updating |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |