CN111079739B - Multi-scale attention feature detection method - Google Patents

Multi-scale attention feature detection method Download PDF

Info

Publication number
CN111079739B
CN111079739B CN201911189274.9A CN201911189274A CN111079739B CN 111079739 B CN111079739 B CN 111079739B CN 201911189274 A CN201911189274 A CN 201911189274A CN 111079739 B CN111079739 B CN 111079739B
Authority
CN
China
Prior art keywords
attention
feature
layer
scale
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911189274.9A
Other languages
Chinese (zh)
Other versions
CN111079739A (en
Inventor
周书仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN201911189274.9A priority Critical patent/CN111079739B/en
Publication of CN111079739A publication Critical patent/CN111079739A/en
Application granted granted Critical
Publication of CN111079739B publication Critical patent/CN111079739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a multi-scale attention feature detection method, which comprises the following steps: constructing a single target detector through hardware resources of a computer, wherein the single target detector comprises a basic network, a newly added convolution layer and a prediction layer; adding a plurality of attention branches on the newly added convolutional layer to enhance the characteristic of the detection characteristic, constructing a parallel multi-scale attention characteristic detection model, and training a single target detector; training a parallel multi-scale attention feature detection model according to parameters obtained by training a single target detector; inputting an image to be detected into a multi-scale attention feature detection model, and calculating to obtain a detection result; a plurality of attention branches are added on a newly added convolution layer of a single target detector, context information and attention characteristics can be combined, the detection effect is improved, and particularly, the effect of 78.6% on a VOC2007 data set can be achieved by using the method.

Description

Multi-scale attention feature detection method
Technical Field
The invention relates to the field of computer vision and intelligent image recognition, in particular to a multi-scale attention feature detection method.
Background
Object detection is a basic and central technique in machine vision today. The task of object detection is to determine the object or person of interest in a large number of images, as well as the category and position of the object in the image.
The target detection has extremely high application value and wide application prospect. The application fields include: the method comprises the steps of unmanned driving, target detection in intelligent traffic, an intelligent question-answering system, face detection, medical image detection and the like. In addition, in the field of intelligent security, the target detection can realize the dynamic detection of targets such as safety helmets, life jackets and the like, and can also realize the functions of target intrusion, departure detection and the like.
At present, the field of target detection is widely developed, and target detection methods such as RCNN, SPP, fast/fast RCNN and YOLO are provided; according to the stage in the detection process, the above target detection methods can be divided into two categories: one-stage (One stage) target detection method and Two-stage (Two stage) target detection method.
The main differences between these two types of methods are: the Two stage target detection algorithm firstly needs to generate a preselected region (a preselected frame possibly containing a target to be detected) on an image, and then classifies and positions the target; and the One stage does not need to generate a preselected region, and directly extracts features from the image to classify and position the target.
The Two stage target detection algorithm is most typically Fast/Fast RCNN algorithm, and such algorithms include feature extraction, region selection, target classification and positioning. The method achieves high detection precision, but is poor in detection speed. For example, the fast RCNN method can accurately achieve target detection in 2016, but requires approximately 0.2s in image processing, and is poor in practicability for achieving real-time detection. The reason for this is that the Faster RCNN can generate 2000 alternative regions in the feed forward process, resulting in a large number of calculations, ultimately affecting the speed of detection.
The One stage target detection algorithm omits the process of generating a preselected region through a simple anchor point; in this way, target detection becomes an end-to-end process. The One stage method has great advantages in speed compared with the Two stage method. For example, the YOLO algorithm can reach 155FPS, but the detection precision is low.
For example, SSD is One of the One stage target detection algorithms, and the detection effect of SSD is general.
Disclosure of Invention
The invention mainly aims to provide a multi-scale attention feature detection method, and aims to solve the problem that the detection effect of SSD is general in the prior art.
A multi-scale attention feature detection method, comprising:
a multi-scale attention feature detection method, comprising:
constructing a single target detector through hardware resources of a computer, wherein the single target detector comprises a basic network, a newly added convolution layer and a prediction layer;
adding a plurality of attention branches on the newly added convolutional layer to enhance the characteristic of detection features and constructing a parallel multi-scale attention feature detection model, wherein each attention branch provides an attention area mask for the features obtained by the dot product detection of the upper layer of elements, and each detected feature comprises upper layer information and lower layer information in the detection process;
training the single pass target detector;
training the parallel multi-scale attention feature detection models according to parameters resulting from training the single pass target detector;
and inputting the image to be detected into the multi-scale attention feature detection model, and calculating to obtain a detection result.
Preferably, the adding a plurality of attention branches to the newly added convolutional layer to enhance the characteristics of the detection features and constructing a parallel multi-scale attention feature detection model further includes:
and taking the next detection feature obtained by a down-sampling layer in a shared network as the input of the attention branch, wherein the shared network comprises the base network and the newly added convolutional layer.
Preferably, the depth of the hourglass network of the attention branch is set to 1.
Preferably, the attention branch includes a feature layer, wherein the probability value of the channel of the feature layer is calculated by the formula:
Figure BDA0002293153570000031
wherein λ is ij A value representing the previous feature set to 1; c represents the current channel and c represents the current channel,
Figure BDA0002293153570000032
representing the characteristic value of the (i, j) pixel point on the (c + 1) th characteristic diagram, wherein (i is more than or equal to 0, and j is more than or equal to k); c denotes the number of characteristic channels of the layer, and ` H `>
Figure BDA0002293153570000033
Representing the channel probability value of the pixel point;
the probability value calculation formula of the pixel points of the characteristic layer is as follows:
Figure BDA0002293153570000034
wherein λ is ij A value representing a previous feature set to 1; x is a radical of a fluorine atom ij Represents the characteristic value, sigma, of the pixel point (i +1, j + 1) on the characteristic diagram 0≤i,j≤k λ ij exp(x ij ) Representing the sum, x ', of weighted pixel values of different pixels in a channel' ij Representing the probability value of the (i, j) pixel point; k represents the size of the feature map.
Preferably, the loss function of the multi-scale attention feature detection model includes two parts, namely localization loss and classification loss, and the calculation formula is as follows:
Figure BDA0002293153570000035
wherein L is loss Is the loss function; l is loc For the loss of positioning, L cls Is the classification loss; n represents the number of matched prediction frames, and if N is 0, the loss is set to be 0; α represents the weight of the localization loss and the classification loss, and is set to 1.
Preferably, the calculation formula of the positioning loss is as follows:
Figure BDA0002293153570000036
wherein the content of the first and second substances,
Figure BDA0002293153570000037
representing the matching degree of the ith prediction box and the jth prediction box in the kth class; />
Figure BDA0002293153570000038
Representing the ith positive predicted distance, directly replaced by a bounding box; />
Figure BDA0002293153570000039
To representDistance of default box and correct box; l is a radical of an alcohol loc (b, p.t) represents a positioning loss, wherein b represents a bounding box, i.e. a bounding box, p represents a predictionbox, i.e. a predicted candidate box, and t represents a grountruth, i.e. a real bounding box; pos represents a positive sample; x, y represent the abscissa and ordinate of the center point, and w, h represent the width and height of the frame, respectively.
Preferably, the number of said attention branches is 5.
Preferably, the basic network is a VGG-16 model, which is a pre-training ILSVRC classification model with the last two fully-connected layers removed; the VGG-16 includes 5 sets of convolutional layers.
Preferably, the constructing the single-pass object detector comprises:
and taking the multi-scale convolutional layer of the newly-added convolutional layer as the input of the prediction layer, and respectively and independently calculating classification and positioning results by using two convolutional kernels with the same size.
Preferably, after the two convolution kernels with the same size are used for independently calculating the classification and localization results, the method further includes:
highly repetitive predictions are eliminated by non-maxima suppression.
Through the technical scheme, a plurality of attention branches are added on the newly added convolution layer of the single target detector, context information and attention characteristics can be combined, the detection effect is improved, and particularly, the effect of 78.6% on the VOC2007 data set can be achieved by using the method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a flow chart of a first embodiment of a multi-scale attention feature detection method of the present invention;
FIG. 2 is a schematic diagram of a basic architecture of an SSD according to a first embodiment of the multi-scale attention feature detection method of the present invention;
FIG. 3 is a diagram illustrating feature maps with different sizes according to a first embodiment of a multi-scale attention feature detection method of the present invention.
FIG. 4 is a schematic diagram of a prior block of an SSD in a first embodiment of a method for multi-scale attention feature detection in accordance with the present invention;
FIG. 5 is a schematic diagram of a network structure of an SSD in a first embodiment of a method for detecting multi-scale attention characteristics according to the present invention;
FIG. 6 is a schematic diagram of a MA-SSD in the first embodiment of the multi-scale attention feature detection method of the present invention;
FIG. 7 is a schematic view of an attention module in a first embodiment of a multi-scale attention feature detection method according to the present invention;
fig. 8 is a schematic diagram illustrating a computing manner of a feature layer in a multi-scale attention feature detection method according to the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The invention provides a multi-scale attention feature detection method.
As shown in fig. 1, in a first embodiment of the multi-scale attention feature detection method provided by the present invention, the method includes the following steps:
step S110: and constructing a single target detector through hardware resources of a computer, wherein the single target detector comprises a basic network, a newly added convolutional layer and a prediction layer.
Specifically, the method is implemented by a multi-scale attention feature detection system, the detection system includes a computer, and in step S110, a single-time target detector (i.e., SSD) is constructed by a processor of the computer, the SSD mainly has an idea that dense sampling is uniformly performed at different positions on a picture, different scales and aspect ratios can be adopted during sampling, and then CNN is used to extract features and perform classification and regression. Compared with a Two stage method, the whole process has one less step, so the speed is higher, but the defect is that the training is difficult due to uniform and dense sampling, and finally the model accuracy is not high.
SSD proposes a priori boxes of different scales and aspect ratios on the basis of anchor points in fast RCNN (Faster area convolutional neural network); meanwhile, the SSD generates predictions of different proportions from multi-scale features and explicitly separates the predictions according to the length-width ratio, a large-scale feature map can be used for detecting small objects, and a small-scale feature map can be used for detecting large objects. As shown in fig. 2, the SSD employs a multi-scale feature map.
With respect to multi-scale feature maps, CNN networks (convolutional neural networks) generally have a larger front feature map, and gradually use convolution or pool of stride =2 to reduce the feature map size, as shown in fig. 2, a larger feature map and a smaller feature map, which are both used for detection. This has the advantage that a larger feature map is used to detect a relatively smaller target, while a smaller feature map is responsible for detecting a larger target, as shown in fig. 3, the 8 × 8 feature map can be divided into more cells, but the a priori box scale of each cell is smaller.
In addition, unlike the YOLO (You Only Look one), which uses a full connection layer at last, the SSD directly uses convolution to extract the detection result from different feature maps. For a feature map having a shape of m × n × p, the detection value can be obtained by using only a convolution kernel having a size of 3 × 3 × p.
Furthermore, in YOLO, each unit predicts multiple bounding boxes, but all are relative to the unit itself (square block), but the shape of the real target is variable, and YOLO needs to adapt to the shape of the target during training. The SSD uses the concept of anchor points in FasterR-CNN as reference, each unit is provided with prior frames with different scales or length-width ratios, and the predicted boundary frame is based on the prior frames, so that the training difficulty is reduced to a certain extent. Typically, each cell is provided with a plurality of prior boxes, which have different dimensions and aspect ratios, as shown in fig. 4, it can be seen that each cell uses 4 different prior boxes, and in the picture, the cat and the dog respectively use the prior box most suitable for their shapes to train.
The SSD model consists of three parts: the model structures of the basic model, the newly added convolution layer and the prediction layer are shown in figure 5.
The detection effect of the SSD on small targets is general, since small targets do not have enough information at a high level.
Step S120: adding a plurality of attention branches on the newly added convolutional layer to enhance the characteristic of the detection feature and constructing a parallel multi-scale attention feature detection model, wherein each attention branch provides an attention area mask for the feature obtained by the dot product detection of the upper layer element, and each detected feature comprises upper layer information and lower layer information in the detection process.
Specifically, step S120 is completed by the processor, and the newly added convolutional layer is a multi-scale layer; please refer to fig. 6 for a structural schematic diagram of the parallel multi-scale attention feature detection model (MA-SSD); since each attention branch provides an attention area mask for the feature obtained by the dot-product detection of the element in the previous layer, the feature detection characteristics can be enhanced. Therefore, each detected feature comprises both upper layer information and lower layer information, which introduces context information in the detection process; thereby improving the accuracy of target detection.
Step S130: training the single pass target detector.
Specifically, the model is trained by the processor after being built.
Step S140: training the parallel multi-scale attention feature detection model according to parameters resulting from training the single pass target detector.
Specifically, parameters of the SSD backbone network are fixed, and then a multi-scale attention layer is trained through a processor, so that the parallel multi-scale attention feature detection model (MA-SSD) is trained.
Step S150: and inputting the image to be detected into the multi-scale attention feature detection model, and calculating to obtain a detection result.
Specifically, the image to be detected is input to a parallel multi-scale attention feature detection model (MA-SSD) through an input module of the computer, so that the image to be detected can be detected, and the detection result is better than that of the SSD model.
Through the technical scheme, a plurality of attention branches are added on the newly added convolution layer of the single target detector, context information and attention characteristics can be combined, the detection effect is improved, and particularly, the effect of 78.6% on the VOC2007 data set can be achieved by using the method.
In addition, the existing SSD model is complex, the training process is complex and long, and the training cost is high.
In order to solve the above technical problem, in a second embodiment of the multi-scale attention feature detection method provided by the present invention, based on the first embodiment, step S120 further includes:
step S210: and taking the next detection feature obtained by a down-sampling layer in a shared network as the input of the attention branch, wherein the shared network comprises the base network and the newly added convolutional layer.
Specifically, in order to make the MA-SSD model smaller and faster, a down-sampling layer of the model is shared; the MA-SSD uses the detected features as input to the encoding-decoding structure, and uses the next detected feature as input to the attention branch through a down-sampling layer in a shared network, wherein the shared network comprises a base network and a new addition layer, thereby reducing the computation of the model, reducing the training process and the training time, and reducing the training cost. In addition, by sharing the multi-scale layers, the target detection speed can be increased.
Furthermore, this parallel structure is less computationally expensive and has lower coupling than algorithms such as DSSD (deconvolution word target detector) and FPN (feature pyramid network). In the serial encoding-decoding structure, the upsampling features at the lower level depend on the top-level features, which constitute the multi-scale decoding structure at the higher level. Due to the difference in parallel structure, the lower level upsampling features depend only on the higher level features and the connectivity between the upsampling features is low.
In addition, in the third embodiment of the multi-scale attention feature detection method proposed by the present invention, based on the second embodiment, the depth of the hourglass network of the above-mentioned attention branches is set to 1.
Specifically, the design of the attention branch mainly refers to the residual attention structure and the soft attention structure in Natural Language Processing (NLP), and the specific structure of the attention branch is shown in fig. 7. The present invention improves the structure of residual attention to meet speed requirements. In the attention branch, the depth of the hourglass network is set to 1. Unlike NLP, it has no time dimension in a single image. However, images have many different feature maps and pixels. The attention branch proposes a method for calculating a region of interest based on a feature map to improve the importance of a target region in features.
In a fourth embodiment of the multi-scale attention feature detection method provided by the present invention, based on the second embodiment, the attention branch includes a feature layer, wherein a probability value calculation formula of a channel of the feature layer is as follows:
Figure BDA0002293153570000081
wherein λ is ij A value representing a previous feature set to 1; c represents the current channel and c represents the current channel,
Figure BDA0002293153570000082
representing the characteristic value of the (i, j) pixel point on the (c + 1) th characteristic diagram, wherein (i is more than or equal to 0, and j is more than or equal to k); c denotes the number of characteristic channels in the layer, and>
Figure BDA0002293153570000083
and representing the channel probability value of the pixel point.
Specifically, the calculation formula of the probability value of the channel is C-Softmax (channel-Softmax).
The probability value calculation formula of the pixel points of the characteristic layer is as follows:
Figure BDA0002293153570000084
wherein λ is ij A value representing the previous feature set to 1; x is the number of ij Represents the characteristic value, sigma, of the pixel point (i +1, j + 1) on the characteristic diagram 0≤i,j≤k λ ij exp(x ij ) Representing the sum, x ', of weighted pixel values of different pixels in a channel' ij Representing the probability value of the (i, j) pixel point; k denotes the size of the feature map.
Specifically, the formula for calculating the probability value of the pixel point is F-Softmax (characteristic-Softmax), and the schematic diagram of the formulas for calculating C-Softmax and F-Softmax refers to fig. 8, where the dark portion in the diagram is a summation area.
By the calculation method, the MA-SSD model is simplified, and the time cost of training and the calculation amount of the model are reduced.
In a fifth embodiment of the multi-scale attention feature detection method provided by the present invention, based on the first embodiment, the loss function of the multi-scale attention feature detection model includes two parts, namely a localization loss and a classification loss, and the calculation formula is as follows:
Figure BDA0002293153570000091
wherein L is loss Is the loss function; l is loc For the loss of positioning, L cls Is the classification loss; n represents the number of matched prediction frames, and if N is 0, the loss is set to be 0; α represents the weight of the localization loss and the classification loss, and is set to 1.
In a sixth embodiment of the multi-scale attention feature detection method provided by the present invention, based on the fifth embodiment, the calculation formula of the positioning loss is as follows:
Figure BDA0002293153570000092
wherein the content of the first and second substances,
Figure BDA0002293153570000093
representing the matching degree of the ith prediction box and the jth prediction box in the kth class; />
Figure BDA0002293153570000094
Representing the ith positive predicted distance, directly replaced by a bounding box; />
Figure BDA0002293153570000095
Indicating the distance between the default box and the correct box; l is loc (b, p.t) represents a positioning loss, wherein b represents a bounding box, i.e. a bounding box, p represents a predictionbox, i.e. a predicted candidate box, and t represents a grountruth, i.e. a real bounding box; pos represents a positive sample; x, y represent the abscissa and ordinate of the center point, and w, h represent the width and height of the frame, respectively.
By the calculation method, the MA-SSD model is simplified, and the training time cost and the calculation amount of the model are reduced.
In a seventh embodiment of the multi-scale attention feature detection method proposed by the present invention, based on the first embodiment, the number of the above-mentioned attention branches is set to 5. So as to improve the accuracy of target detection.
In an eighth embodiment of the multi-scale attention feature detection method provided by the present invention, based on the seventh embodiment, the basic network is a VGG-16 (visual geometry group network-16) model, which is a pre-trained ILSVRC (ImageNet large-scale visual recognition challenge race) classification model with the last two fully-connected layers removed; the VGG-16 includes 5 sets of convolutional layers.
In a ninth embodiment of the multi-scale attention feature detection method provided by the present invention, based on the eighth embodiment, step S110 includes:
step S910: and taking the multi-scale convolution layer of the newly-added convolution layer as the input of the prediction layer, and respectively calculating a classification result and a positioning result by using two convolution kernels with the same size.
Specifically, the convolution kernel of the same size is preferably a convolution kernel of 3*3.
In a tenth embodiment of the multi-scale attention feature detection method provided by the present invention, based on the ninth embodiment, after step S910, the method further includes:
step S1010: highly repetitive predictions are eliminated by non-maxima suppression.
Specifically, the optimal prediction effect is obtained by non-maximum suppression.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, wherein the software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the particular illustrative embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but is intended to cover various modifications, equivalent arrangements, and equivalents thereof, which may be made by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A multi-scale attention feature detection method is characterized by comprising the following steps:
constructing a single target detector through hardware resources of a computer, wherein the single target detector comprises a basic network, a newly added convolution layer and a prediction layer;
adding a plurality of attention branches to the newly added convolutional layer to enhance the characteristic of the detection feature, and constructing a parallel multi-scale attention feature detection model, wherein each attention branch is as follows: the method comprises the steps that an attention area mask is provided by features obtained by dot product detection of an upper layer element, and each feature obtained by detection comprises upper layer information and lower layer information in the detection process;
training the single pass target detector;
training the parallel multi-scale attention feature detection model according to parameters obtained by training the single target detector;
inputting an image to be detected into the multi-scale attention feature detection model, and calculating to obtain a detection result;
adding a plurality of attention branches to the newly added convolutional layer to enhance the characteristic of the detection feature and constructing a parallel multi-scale attention feature detection model, further comprising:
and taking the next detection feature obtained by a down-sampling layer in a shared network as the input of the attention branch, wherein the shared network comprises the base network and the newly added convolutional layer.
2. The multi-scale attention feature detection method of claim 1, wherein a depth of the hourglass network of attention branches is set to 1.
3. A multi-scale attention feature detection method as claimed in claim 1, wherein the attention branch comprises a feature layer, wherein the probability value of the channel of the feature layer is calculated by the formula:
Figure QLYQS_1
wherein the content of the first and second substances,
Figure QLYQS_2
a value representing the previous feature set to 1; c denotes the current channel, is present>
Figure QLYQS_3
Represents the feature value of the (i, j) pixel point on the (c + 1) th feature map, and +>
Figure QLYQS_4
(ii) a C denotes the number of characteristic channels of the layer, and ` H `>
Figure QLYQS_5
Representing the channel probability value of the pixel point;
the probability value calculation formula of the pixel points of the characteristic layer is as follows:
Figure QLYQS_6
wherein the content of the first and second substances,
Figure QLYQS_7
a value representing the previous feature set to 1; />
Figure QLYQS_8
Representing the characteristic value of the (i +1, j + 1) pixel point on the characteristic diagram,
Figure QLYQS_9
represents the sum of weighted pixel values for different pixels in a channel>
Figure QLYQS_10
Representing probability values of the (i, j) pixel points; k represents the size of the feature map.
4. A multi-scale attention feature detection method according to claim 1, wherein the loss function of the multi-scale attention feature detection model includes two parts of localization loss and classification loss, and the calculation formula is:
Figure QLYQS_11
wherein the content of the first and second substances,
Figure QLYQS_12
is the loss function; />
Figure QLYQS_13
For said positioning loss, is>
Figure QLYQS_14
Is the classification loss; n represents the number of matched prediction frames, and if N is 0, the loss is set to be 0; />
Figure QLYQS_15
The weights representing the localization and classification loss are set to 1.
5. A multi-scale attention feature detection method as claimed in claim 4 wherein said localization loss is calculated by the formula:
Figure QLYQS_16
wherein the content of the first and second substances,
Figure QLYQS_17
representing the matching degree of the ith prediction box and the jth prediction box in the kth class; />
Figure QLYQS_18
Representing the ith positive predicted distance, directly replaced by a bounding box; />
Figure QLYQS_19
Indicating the distance between the default box and the correct box; />
Figure QLYQS_20
Representing the positioning loss, wherein b represents a bounding box, namely a frame, p represents a prediction box, namely a predicted candidate box, and t represents a grountruth, namely a real frame; pos represents a positive sample; x and y representThe abscissa and ordinate of the center point, w, h, represent the width and height of the frame, respectively.
6. The multi-scale attention feature detection method of claim 1, wherein the number of said attention branches is 5.
7. The multi-scale attention feature detection method of claim 1, wherein the base network is a VGG-16 model, which is a pre-trained ILSVRC classification model with two fully connected layers removed; the VGG-16 includes 5 convolutional layers.
8. The multi-scale attention feature detection method of claim 7, wherein said constructing a single-shot object detector comprises:
and taking the multi-scale convolutional layer of the newly-added convolutional layer as the input of the prediction layer, and respectively and independently calculating classification and positioning results by using two convolutional kernels with the same size.
9. The multi-scale attention feature detection method of claim 8, wherein after said two convolution kernels of the same size are used to independently compute the classification and localization results, further comprising:
highly repetitive predictions are eliminated by non-maxima suppression.
CN201911189274.9A 2019-11-28 2019-11-28 Multi-scale attention feature detection method Active CN111079739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911189274.9A CN111079739B (en) 2019-11-28 2019-11-28 Multi-scale attention feature detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911189274.9A CN111079739B (en) 2019-11-28 2019-11-28 Multi-scale attention feature detection method

Publications (2)

Publication Number Publication Date
CN111079739A CN111079739A (en) 2020-04-28
CN111079739B true CN111079739B (en) 2023-04-18

Family

ID=70312155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911189274.9A Active CN111079739B (en) 2019-11-28 2019-11-28 Multi-scale attention feature detection method

Country Status (1)

Country Link
CN (1) CN111079739B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626176B (en) * 2020-05-22 2021-08-06 中国科学院空天信息创新研究院 Remote sensing target rapid detection method and system based on dynamic attention mechanism
CN111797712B (en) * 2020-06-16 2023-09-15 南京信息工程大学 Remote sensing image cloud and cloud shadow detection method based on multi-scale feature fusion network
CN111860398B (en) * 2020-07-28 2022-05-10 河北师范大学 Remote sensing image target detection method and system and terminal equipment
CN111914758A (en) * 2020-08-04 2020-11-10 成都奥快科技有限公司 Face in-vivo detection method and device based on convolutional neural network
CN111985552B (en) * 2020-08-17 2022-07-29 中国民航大学 Method for detecting diseases of thin strip-shaped structure of airport pavement under complex background
CN112200045B (en) * 2020-09-30 2024-03-19 华中科技大学 Remote sensing image target detection model establishment method based on context enhancement and application
CN112949635B (en) * 2021-03-12 2022-09-16 北京理工大学 Target detection method based on feature enhancement and IoU perception
CN113128476A (en) * 2021-05-17 2021-07-16 广西师范大学 Low-power consumption real-time helmet detection method based on computer vision target detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694471A (en) * 2018-06-11 2018-10-23 深圳市唯特视科技有限公司 A kind of user preference prediction technique based on personalized attention network
CN109886359A (en) * 2019-03-25 2019-06-14 西安电子科技大学 Small target detecting method and detection model based on convolutional neural networks
CN110245655A (en) * 2019-05-10 2019-09-17 天津大学 A kind of single phase object detecting method based on lightweight image pyramid network
CN110263819A (en) * 2019-05-28 2019-09-20 中国农业大学 A kind of object detection method and device for shellfish image

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262237B2 (en) * 2016-12-08 2019-04-16 Intel Corporation Technologies for improved object detection accuracy with multi-scale representation and training
US11373018B2 (en) * 2018-01-25 2022-06-28 Kioxia Corporation Method of displaying model and designing pattern, and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694471A (en) * 2018-06-11 2018-10-23 深圳市唯特视科技有限公司 A kind of user preference prediction technique based on personalized attention network
CN109886359A (en) * 2019-03-25 2019-06-14 西安电子科技大学 Small target detecting method and detection model based on convolutional neural networks
CN110245655A (en) * 2019-05-10 2019-09-17 天津大学 A kind of single phase object detecting method based on lightweight image pyramid network
CN110263819A (en) * 2019-05-28 2019-09-20 中国农业大学 A kind of object detection method and device for shellfish image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Wei Liu.SSD: Single Shot MultiBox Detector.arXiv.2016,1-17. *
余春艳.面向显著性目标检测的SSD改进模型.电子与信息学报.2018,第40卷(第40期),2554-2561. *

Also Published As

Publication number Publication date
CN111079739A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN111079739B (en) Multi-scale attention feature detection method
CN110348376B (en) Pedestrian real-time detection method based on neural network
CN112308019B (en) SAR ship target detection method based on network pruning and knowledge distillation
CN114202672A (en) Small target detection method based on attention mechanism
CN110458165B (en) Natural scene text detection method introducing attention mechanism
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN111753677B (en) Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
Chen et al. Research on recognition of fly species based on improved RetinaNet and CBAM
CN111612008A (en) Image segmentation method based on convolution network
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN113052834B (en) Pipeline defect detection method based on convolution neural network multi-scale features
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN111797841B (en) Visual saliency detection method based on depth residual error network
CN110991444A (en) Complex scene-oriented license plate recognition method and device
CN112288026B (en) Infrared weak and small target detection method based on class activation diagram
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114005094A (en) Aerial photography vehicle target detection method, system and storage medium
CN115861756A (en) Earth background small target identification method based on cascade combination network
CN114565824A (en) Single-stage rotating ship detection method based on full convolution network
Dai et al. GCD-YOLOv5: An armored target recognition algorithm in complex environments based on array lidar
CN117079095A (en) Deep learning-based high-altitude parabolic detection method, system, medium and equipment
CN115035429A (en) Aerial photography target detection method based on composite backbone network and multiple measuring heads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant