CN112381106B

CN112381106B - Target detection method based on global area prior attention

Info

Publication number: CN112381106B
Application number: CN202011365545.4A
Authority: CN
Inventors: 吴泽彬; 龚航; 徐伟; 赵朝蓬; 刘建新; 陈圣堂; 徐洋; 陈刚; 夏雷; 顾涛; 丁道华; 晁京
Original assignee: Nanjing University of Science and Technology; Nanjing Power Supply Section of China Railway Shanghai Group Co Ltd
Current assignee: Nanjing University of Science and Technology; Nanjing Power Supply Section of China Railway Shanghai Group Co Ltd
Priority date: 2020-11-28
Filing date: 2020-11-28
Publication date: 2022-09-09
Anticipated expiration: 2040-11-28
Also published as: CN112381106A

Abstract

The invention discloses a target detection method based on a global area prior attention mechanism, which comprises the following steps: traversing all training samples, and counting the frequency of the target appearing in the image to obtain global prior attention; acquiring an image to be detected, extracting features of the image by using a feature extraction network obtained by training, extracting adaptive attention by using a convolutional neural network, correcting and enhancing global prior attention to obtain adaptive global prior attention, and enhancing a feature map by using the adaptive global prior attention; and finally, carrying out target detection. The method provides the global area prior attention mechanism network, improves the training convergence speed, improves the target detection precision while ensuring the detection speed, and is more obviously improved in the target detection application scene in which the target types are few and appear at a specific position.

Description

Target detection method based on global area prior attention

Technical Field

The invention relates to the technical field of image processing, in particular to a target detection method based on global area prior attention

Background

In recent years, computer vision has been widely used in various fields, freeing social productivity. Object detection is a very important task in computer vision. With the development of deep learning, target detection is widely researched, and the application of computer vision in various fields is promoted. Object detection can mark the position and category of an object in an image or video that we want to detect. This functionality can be extended to many practical applications, such as: automatic driving, pedestrian tracking, text recognition, etc.

The target detection technology based on the traditional method has the disadvantages of low speed, low precision and the like. With the development of deep learning technology and computer hardware technology, the target detection technology based on deep learning is developed greatly, so that the precision and speed are improved qualitatively, and the speed and precision of actual use are achieved. However, the target detection in the difficult scene by the current target detection method still has a large promotion space.

In the human eyeball, only the fovea of the retina has high-density cone cells, and the human can concentrate an area of interest to the fovea of the retina for better perception environment. Computer visual attention comes from this birth. In recent years, learners have used attention models to enhance feature extraction networks in object detection. The general target detection has the following difficulties: 1) during feature extraction, the background and the target are treated equally, and the features of the target to be detected are difficult to highlight; 2) the common attention model is difficult to ensure the correctness of attention; 3) it is difficult to detect small targets. The combination of the common attention model with target detection has its limitations.

Through the above description, how to accurately obtain the attention mask to perform the feature extraction better is a problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a target detection method based on global area prior attention, which improves the accuracy of an attention mask and finally improves the precision of target detection.

The technical solution for realizing the purpose of the invention is as follows: a target detection method based on global area prior attention comprises the following steps:

step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area;

step (B), acquiring an image to be detected, and extracting the features of the original image by using a feature extraction network obtained by training;

step (C), branching from a backbone network to a self-adaptive attention network branch, and correcting and enhancing the prior attention of the global area by using the self-adaptive attention network to obtain a self-adaptive global area prior attention mask; and multiplying the attention mask points on the feature map, and then carrying out target detection on the feature map.

Further, step (a), traversing the training set, and counting the position information of the target in each training sample to obtain the prior attention of the global area, includes the following steps:

(A1) selecting an area where the target is located in a manual frame in the training sample picture, and acquiring position information of the target area where the target is located in the training sample picture to be used as a label for storage;

(A2) scaling the training sample picture and the global area prior attention to the set size;

(A3) initializing each value of the prior attention of the global area to be zero, traversing the training sample, and adding 1 to the pixel values of all pixels in the prior attention of the global area at the same position as the area defined in the label of the picture of the training sample to obtain the prior attention of the global area;

(A4) and normalizing the prior attention of the global area by using a sigmoid function to obtain the prior attention of the global area.

Further, the step (C) includes the steps of:

(C1) extracting features from features in the trunk feature extraction network as input of a global area prior attention correction enhancement network;

(C2) performing convolution, pooling and deconvolution on the input features to obtain more abstract semantic information;

(C3) and normalizing the calculation result by a sigmoid function, and then multiplying the result by the prior attention of the global area to enhance and correct the prior attention of the global area.

(C4) Adding 1 to the calculation result, and then performing point multiplication on the features of the trunk feature extraction network to enhance the features; the calculation formula is as follows:

y＝(Sigmoid(F(x))*Sigmoid(pre)+1)*w

wherein x is the input feature, F (x) is series calculation of convolution, pooling and deconvolution, sigmoid (pre) is area prior attention, points are multiplied by a mask generated in an attention correction enhancement network to correct and enhance the area prior attention, 1 is added on all elements, and finally points are formed on a feature graph w to enhance the feature graph, and then the target detection is carried out on the feature graph.

Compared with the prior art, the invention has the following remarkable advantages: the attention model-based target detection method for enhancing the prior attention of the global area fully utilizes the position information of the target in the training set to generate the prior attention, then corrects and enhances the global prior attention through the neural network, and finally enhances the characteristics generated by the main network, thereby improving the target detection precision. The method has a few target types and has a remarkable target detection scene effect with regular positions. The method has good application scenes in industrial application and has the following advantages:

(1) under the condition of poor image quality, the prior attention can determine the approximate position of the target according to the global information, so that the positioning precision is improved;

(2) the global a priori attention enhancement network can enhance and adjust a priori attention to form final more accurate attention according to specific characteristics of each image.

Drawings

FIG. 1 is a flowchart of a global area prior attention based target detection method of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, a target detection method based on a global area prior attention mechanism includes the following steps:

step (A), traversing a training set, and counting position information of a target in each training sample to obtain the prior attention of a global area, wherein the method comprises the following steps:

(A1) selecting an area where the target is located in the training sample picture by a manual frame, and acquiring position information of the target area where the target is located in the training sample picture to be stored as a label;

(A2) scaling the training image to a set size, and setting the prior attention size of the global area to the size;

(A3) initializing each pixel value of the prior attention of the global area to be zero, traversing each training sample, and adding one to the pixel values in all rectangular frames to obtain a thermodynamic diagram;

And (B) acquiring an image to be detected, and extracting the features of the original image by using the feature extraction network obtained by training, wherein the method comprises the following steps:

(B1) using a bilinear interpolation method to scale the picture to be detected to the same size;

(B2) using convolution kernel of 3 × 3 and convolution kernel of 1 × 1 to extract features of input values, and finally generating a multi-dimensional matrix of 13 × 13, which divides the original image into 13 × 13 grids, wherein 169 n-dimensional vectors of 13 × 13 are responsible for target detection in each grid, including classification information and target position information;

(B3) the label information in the training samples is processed into a matrix form corresponding to the network output, and the loss value is calculated, and then the entire network is trained using back propagation.

Step (C), a global area prior attention correction and enhancement network is branched from the feature map, the global area prior attention is corrected and enhanced by the attention network, and finally an attention mask is obtained, wherein the method comprises the following steps:

(C1) extracting any layer from the features in the main feature extraction network as the input of a global area prior attention correction enhancement network;

(C2) performing convolution, pooling, deconvolution and other calculations on the input features to obtain more abstract semantic information;

(C3) normalizing the calculation result by a sigmoid function, and then multiplying the normalized result by the prior attention of the global area to enhance and correct the prior attention of the global area;

(C4) and adding 1 to the calculation result, and then performing point multiplication on the features of the trunk feature extraction network to enhance the features. The calculation formula is as follows:

y＝(Sigmoid(F(x))*Sigmoid(pre)+1)*w

wherein x is input characteristics, F (x) is a series of calculations such as convolution, pooling, deconvolution and the like, sigmoid (pre) is area prior attention, points are multiplied by flooding generated in an attention correction and enhancement network to correct and enhance the area prior attention, 1 is added on all elements, w is a characteristic diagram, and finally points are multiplied on the characteristic diagram to enhance the characteristic diagram, and then target detection is carried out on the characteristic diagram.

(C4) The point multiplication feature is not the original feature, but the feature after several layers of convolution is enhanced by adopting a jumping mode.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A target detection method based on global area prior attention is characterized by comprising the following steps:

step (C), branching from a backbone network to a self-adaptive attention network branch, and correcting and enhancing the prior attention of the global area by using the self-adaptive attention network to obtain a self-adaptive global area prior attention mask; multiplying the attention mask points on the feature map, and then carrying out target detection on the feature map;

the step (C) comprises the following steps:

(C3) normalizing the calculation result obtained in the step (C2) by a sigmoid function, and then multiplying the normalized result by the prior attention of the global area to enhance and correct the prior attention of the global area;

(C4) adding 1 to the calculation result of the step (C3), and then performing point multiplication on the features of the trunk feature extraction network to enhance the features; the calculation formula is as follows:

wherein x is a characteristic of the input and,

for series calculation of convolution, pooling and deconvolution,

multiplying the point with the mask generated in the attention correction enhancing network for the area prior attention, performing correction enhancement on the area prior attention, adding 1 to all elements, and finally forming a point into a feature map

And enhancing the characteristic diagram, and then carrying out target detection on the characteristic diagram.

2. The target detection method based on global area prior attention as claimed in claim 1, wherein in step (a), the training set is traversed, and the position information of the target appearing in each training sample is counted to obtain the global area prior attention, including the following steps:

3. The global area prior attention-based target detection method according to claim 1, wherein in the step (B), the image to be detected is obtained, and feature extraction is performed on the original image by using a feature extraction network obtained by training, and the method comprises the following steps: