CN112381106B - Target detection method based on global area prior attention - Google Patents

Target detection method based on global area prior attention Download PDF

Info

Publication number
CN112381106B
CN112381106B CN202011365545.4A CN202011365545A CN112381106B CN 112381106 B CN112381106 B CN 112381106B CN 202011365545 A CN202011365545 A CN 202011365545A CN 112381106 B CN112381106 B CN 112381106B
Authority
CN
China
Prior art keywords
attention
global area
prior attention
prior
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011365545.4A
Other languages
Chinese (zh)
Other versions
CN112381106A (en
Inventor
吴泽彬
龚航
徐伟
赵朝蓬
刘建新
陈圣堂
徐洋
陈刚
夏雷
顾涛
丁道华
晁京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Nanjing Power Supply Section of China Railway Shanghai Group Co Ltd
Original Assignee
Nanjing University of Science and Technology
Nanjing Power Supply Section of China Railway Shanghai Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology, Nanjing Power Supply Section of China Railway Shanghai Group Co Ltd filed Critical Nanjing University of Science and Technology
Priority to CN202011365545.4A priority Critical patent/CN112381106B/en
Publication of CN112381106A publication Critical patent/CN112381106A/en
Application granted granted Critical
Publication of CN112381106B publication Critical patent/CN112381106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on a global area prior attention mechanism, which comprises the following steps: traversing all training samples, and counting the frequency of the target appearing in the image to obtain global prior attention; acquiring an image to be detected, extracting features of the image by using a feature extraction network obtained by training, extracting adaptive attention by using a convolutional neural network, correcting and enhancing global prior attention to obtain adaptive global prior attention, and enhancing a feature map by using the adaptive global prior attention; and finally, carrying out target detection. The method provides the global area prior attention mechanism network, improves the training convergence speed, improves the target detection precision while ensuring the detection speed, and is more obviously improved in the target detection application scene in which the target types are few and appear at a specific position.

Description

Target detection method based on global area prior attention
Technical Field
The invention relates to the technical field of image processing, in particular to a target detection method based on global area prior attention
Background
In recent years, computer vision has been widely used in various fields, freeing social productivity. Object detection is a very important task in computer vision. With the development of deep learning, target detection is widely researched, and the application of computer vision in various fields is promoted. Object detection can mark the position and category of an object in an image or video that we want to detect. This functionality can be extended to many practical applications, such as: automatic driving, pedestrian tracking, text recognition, etc.
The target detection technology based on the traditional method has the disadvantages of low speed, low precision and the like. With the development of deep learning technology and computer hardware technology, the target detection technology based on deep learning is developed greatly, so that the precision and speed are improved qualitatively, and the speed and precision of actual use are achieved. However, the target detection in the difficult scene by the current target detection method still has a large promotion space.
In the human eyeball, only the fovea of the retina has high-density cone cells, and the human can concentrate an area of interest to the fovea of the retina for better perception environment. Computer visual attention comes from this birth. In recent years, learners have used attention models to enhance feature extraction networks in object detection. The general target detection has the following difficulties: 1) during feature extraction, the background and the target are treated equally, and the features of the target to be detected are difficult to highlight; 2) the common attention model is difficult to ensure the correctness of attention; 3) it is difficult to detect small targets. The combination of the common attention model with target detection has its limitations.
Through the above description, how to accurately obtain the attention mask to perform the feature extraction better is a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a target detection method based on global area prior attention, which improves the accuracy of an attention mask and finally improves the precision of target detection.
The technical solution for realizing the purpose of the invention is as follows: a target detection method based on global area prior attention comprises the following steps:
step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area;
step (B), acquiring an image to be detected, and extracting the features of the original image by using a feature extraction network obtained by training;
step (C), branching from a backbone network to a self-adaptive attention network branch, and correcting and enhancing the prior attention of the global area by using the self-adaptive attention network to obtain a self-adaptive global area prior attention mask; and multiplying the attention mask points on the feature map, and then carrying out target detection on the feature map.
Further, step (a), traversing the training set, and counting the position information of the target in each training sample to obtain the prior attention of the global area, includes the following steps:
(A1) selecting an area where the target is located in a manual frame in the training sample picture, and acquiring position information of the target area where the target is located in the training sample picture to be used as a label for storage;
(A2) scaling the training sample picture and the global area prior attention to the set size;
(A3) initializing each value of the prior attention of the global area to be zero, traversing the training sample, and adding 1 to the pixel values of all pixels in the prior attention of the global area at the same position as the area defined in the label of the picture of the training sample to obtain the prior attention of the global area;
(A4) and normalizing the prior attention of the global area by using a sigmoid function to obtain the prior attention of the global area.
Further, the step (C) includes the steps of:
(C1) extracting features from features in the trunk feature extraction network as input of a global area prior attention correction enhancement network;
(C2) performing convolution, pooling and deconvolution on the input features to obtain more abstract semantic information;
(C3) and normalizing the calculation result by a sigmoid function, and then multiplying the result by the prior attention of the global area to enhance and correct the prior attention of the global area.
(C4) Adding 1 to the calculation result, and then performing point multiplication on the features of the trunk feature extraction network to enhance the features; the calculation formula is as follows:
y=(Sigmoid(F(x))*Sigmoid(pre)+1)*w
wherein x is the input feature, F (x) is series calculation of convolution, pooling and deconvolution, sigmoid (pre) is area prior attention, points are multiplied by a mask generated in an attention correction enhancement network to correct and enhance the area prior attention, 1 is added on all elements, and finally points are formed on a feature graph w to enhance the feature graph, and then the target detection is carried out on the feature graph.
Compared with the prior art, the invention has the following remarkable advantages: the attention model-based target detection method for enhancing the prior attention of the global area fully utilizes the position information of the target in the training set to generate the prior attention, then corrects and enhances the global prior attention through the neural network, and finally enhances the characteristics generated by the main network, thereby improving the target detection precision. The method has a few target types and has a remarkable target detection scene effect with regular positions. The method has good application scenes in industrial application and has the following advantages:
(1) under the condition of poor image quality, the prior attention can determine the approximate position of the target according to the global information, so that the positioning precision is improved;
(2) the global a priori attention enhancement network can enhance and adjust a priori attention to form final more accurate attention according to specific characteristics of each image.
Drawings
FIG. 1 is a flowchart of a global area prior attention based target detection method of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, a target detection method based on a global area prior attention mechanism includes the following steps:
step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area, wherein the method comprises the following steps:
(A1) selecting an area where the target is located in the training sample picture by a manual frame, and acquiring position information of the target area where the target is located in the training sample picture to be stored as a label;
(A2) scaling the training image to a set size, and setting the prior attention size of the global area to the size;
(A3) initializing each pixel value of the prior attention of the global area to be zero, traversing each training sample, and adding one to the pixel values in all rectangular frames to obtain a thermodynamic diagram;
(A4) and normalizing the prior attention of the global area by using a sigmoid function to obtain the prior attention of the global area.
And (B) acquiring an image to be detected, and extracting the features of the original image by using the feature extraction network obtained by training, wherein the method comprises the following steps:
(B1) using a bilinear interpolation method to scale the picture to be detected to the same size;
(B2) using convolution kernel of 3 × 3 and convolution kernel of 1 × 1 to extract features of input values, and finally generating a multi-dimensional matrix of 13 × 13, which divides the original image into 13 × 13 grids, wherein 169 n-dimensional vectors of 13 × 13 are responsible for target detection in each grid, including classification information and target position information;
(B3) the label information in the training samples is processed into a matrix form corresponding to the network output, and the loss value is calculated, and then the entire network is trained using back propagation.
Step (C), a global area prior attention correction and enhancement network is branched from the feature map, the global area prior attention is corrected and enhanced by the attention network, and finally an attention mask is obtained, wherein the method comprises the following steps:
(C1) extracting any layer from the features in the main feature extraction network as the input of a global area prior attention correction enhancement network;
(C2) performing convolution, pooling, deconvolution and other calculations on the input features to obtain more abstract semantic information;
(C3) normalizing the calculation result by a sigmoid function, and then multiplying the normalized result by the prior attention of the global area to enhance and correct the prior attention of the global area;
(C4) and adding 1 to the calculation result, and then performing point multiplication on the features of the trunk feature extraction network to enhance the features. The calculation formula is as follows:
y=(Sigmoid(F(x))*Sigmoid(pre)+1)*w
wherein x is input characteristics, F (x) is a series of calculations such as convolution, pooling, deconvolution and the like, sigmoid (pre) is area prior attention, points are multiplied by flooding generated in an attention correction and enhancement network to correct and enhance the area prior attention, 1 is added on all elements, w is a characteristic diagram, and finally points are multiplied on the characteristic diagram to enhance the characteristic diagram, and then target detection is carried out on the characteristic diagram.
(C4) The point multiplication feature is not the original feature, but the feature after several layers of convolution is enhanced by adopting a jumping mode.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (3)

1. A target detection method based on global area prior attention is characterized by comprising the following steps:
step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area;
step (B), acquiring an image to be detected, and extracting the features of the original image by using a feature extraction network obtained by training;
step (C), branching from a backbone network to a self-adaptive attention network branch, and correcting and enhancing the prior attention of the global area by using the self-adaptive attention network to obtain a self-adaptive global area prior attention mask; multiplying the attention mask points on the feature map, and then carrying out target detection on the feature map;
the step (C) comprises the following steps:
(C1) extracting features from features in the trunk feature extraction network as input of a global area prior attention correction enhancement network;
(C2) performing convolution, pooling and deconvolution on the input features to obtain more abstract semantic information;
(C3) normalizing the calculation result obtained in the step (C2) by a sigmoid function, and then multiplying the normalized result by the prior attention of the global area to enhance and correct the prior attention of the global area;
(C4) adding 1 to the calculation result of the step (C3), and then performing point multiplication on the features of the trunk feature extraction network to enhance the features; the calculation formula is as follows:
Figure DEST_PATH_IMAGE002
wherein x is a characteristic of the input and,
Figure DEST_PATH_IMAGE004
for series calculation of convolution, pooling and deconvolution,
Figure DEST_PATH_IMAGE006
multiplying the point with the mask generated in the attention correction enhancing network for the area prior attention, performing correction enhancement on the area prior attention, adding 1 to all elements, and finally forming a point into a feature map
Figure DEST_PATH_IMAGE008
And enhancing the characteristic diagram, and then carrying out target detection on the characteristic diagram.
2. The target detection method based on global area prior attention as claimed in claim 1, wherein in step (a), the training set is traversed, and the position information of the target appearing in each training sample is counted to obtain the global area prior attention, including the following steps:
(A1) selecting an area where the target is located in the training sample picture by a manual frame, and acquiring position information of the target area where the target is located in the training sample picture to be stored as a label;
(A2) scaling the training sample picture and the global area prior attention to the set size;
(A3) initializing each value of the prior attention of the global area to be zero, traversing the training sample, and adding 1 to the pixel values of all pixels in the prior attention of the global area at the same position as the area defined in the label of the picture of the training sample to obtain the prior attention of the global area;
(A4) and normalizing the prior attention of the global area by using a sigmoid function to obtain the prior attention of the global area.
3. The global area prior attention-based target detection method according to claim 1, wherein in the step (B), the image to be detected is obtained, and feature extraction is performed on the original image by using a feature extraction network obtained by training, and the method comprises the following steps:
(B1) using a bilinear interpolation method to scale the picture to be detected to the same size;
(B2) using convolution kernel of 3 × 3 and convolution kernel of 1 × 1 to extract features of input values, and finally generating a multi-dimensional matrix of 13 × 13, which divides the original image into 13 × 13 grids, wherein 169 n-dimensional vectors of 13 × 13 are responsible for target detection in each grid, including classification information and target position information;
(B3) the label information in the training samples is processed into a matrix form corresponding to the network output, and the loss value is calculated, and then the entire network is trained using back propagation.
CN202011365545.4A 2020-11-28 2020-11-28 Target detection method based on global area prior attention Active CN112381106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011365545.4A CN112381106B (en) 2020-11-28 2020-11-28 Target detection method based on global area prior attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011365545.4A CN112381106B (en) 2020-11-28 2020-11-28 Target detection method based on global area prior attention

Publications (2)

Publication Number Publication Date
CN112381106A CN112381106A (en) 2021-02-19
CN112381106B true CN112381106B (en) 2022-09-09

Family

ID=74588595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011365545.4A Active CN112381106B (en) 2020-11-28 2020-11-28 Target detection method based on global area prior attention

Country Status (1)

Country Link
CN (1) CN112381106B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489334A (en) * 2020-04-02 2020-08-04 暖屋信息科技(苏州)有限公司 Defect workpiece image identification method based on convolution attention neural network
CN111563415A (en) * 2020-04-08 2020-08-21 华南理工大学 Binocular vision-based three-dimensional target detection system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8266185B2 (en) * 2005-10-26 2012-09-11 Cortica Ltd. System and methods thereof for generation of searchable structures respective of multimedia data content

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489334A (en) * 2020-04-02 2020-08-04 暖屋信息科技(苏州)有限公司 Defect workpiece image identification method based on convolution attention neural network
CN111563415A (en) * 2020-04-08 2020-08-21 华南理工大学 Binocular vision-based three-dimensional target detection system and method

Also Published As

Publication number Publication date
CN112381106A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
JP7236545B2 (en) Video target tracking method and apparatus, computer apparatus, program
US11151403B2 (en) Method and apparatus for segmenting sky area, and convolutional neural network
CN108229490B (en) Key point detection method, neural network training method, device and electronic equipment
CN108734723B (en) Relevant filtering target tracking method based on adaptive weight joint learning
WO2022111355A1 (en) License plate recognition method and apparatus, storage medium and terminal
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
CN111260688A (en) Twin double-path target tracking method
CN112001403B (en) Image contour detection method and system
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN110135446B (en) Text detection method and computer storage medium
CN111738344A (en) Rapid target detection method based on multi-scale fusion
CN110619638A (en) Multi-mode fusion significance detection method based on convolution block attention module
CN111967464B (en) Weak supervision target positioning method based on deep learning
CN112927209A (en) CNN-based significance detection system and method
CN111310609A (en) Video target detection method based on time sequence information and local feature similarity
CN107729885B (en) Face enhancement method based on multiple residual error learning
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN112381106B (en) Target detection method based on global area prior attention
CN116740399A (en) Training method, matching method and medium for heterogeneous image matching model
CN115908995A (en) Digital instrument reading identification method and device, electronic equipment and storage medium
CN113256528B (en) Low-illumination video enhancement method based on multi-scale cascade depth residual error network
CN113269808B (en) Video small target tracking method and device
CN111931793B (en) Method and system for extracting saliency target
CN114842506A (en) Human body posture estimation method and system
CN113112522A (en) Twin network target tracking method based on deformable convolution and template updating

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant