CN115294356A - Target detection method based on wide area receptive field space attention - Google Patents

Target detection method based on wide area receptive field space attention Download PDF

Info

Publication number
CN115294356A
CN115294356A CN202210882431.XA CN202210882431A CN115294356A CN 115294356 A CN115294356 A CN 115294356A CN 202210882431 A CN202210882431 A CN 202210882431A CN 115294356 A CN115294356 A CN 115294356A
Authority
CN
China
Prior art keywords
wide area
receptive field
target detection
attention
field space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210882431.XA
Other languages
Chinese (zh)
Inventor
王改华
曹清程
翟乾宇
甘鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202210882431.XA priority Critical patent/CN115294356A/en
Publication of CN115294356A publication Critical patent/CN115294356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on wide area receptive field space attention, which comprises the following steps: preparing an image data set for training and testing; constructing a target detection network based on wide area receptive field space attention, wherein the target detection network comprises four parts, namely a backhaul part, a hack part, a Head part and an MSA part; and performing feature extraction on the test set images by using the trained network. The invention captures the pixel-level characteristic information from the angle of the wide area receptive field, simultaneously considers the intercrossing of different characteristic information, and greatly improves the characteristic extraction effect under the condition of not obviously increasing the parameter quantity and the calculated quantity.

Description

Target detection method based on wide area receptive field space attention
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to a target detection method based on wide area receptive field space attention.
Background
In the development background of deep learning, the convolutional neural network has been accepted by more and more people, and the application is more and more common. The target detection algorithm based on deep learning utilizes a Convolutional Neural Network (CNN) to automatically select features, and then the features are input into a detector to classify and position targets.
In neural network learning, generally, the more parameters of a model, the stronger the expressive power of the model, and the larger the amount of information stored by the model, but this may cause a problem of information overload. By introducing the attention mechanism, the information which is more critical to the current task is focused in a plurality of input information, the attention degree to other information is reduced, and even irrelevant information is filtered, so that the problem of information overload can be solved, and the efficiency and the accuracy of task processing are improved.
In recent years, attention mechanisms have been widely used for different deep learning tasks, such as object detection, semantic segmentation, and pose estimation. Attention is divided into soft and hard attention. The soft attention mechanism is divided into three attention domains: spatial domain, channel domain, and hybrid domain. The spatial domain refers to the corresponding spatial transformation in the image. The channel domain directly concentrates information in the global channel. The hybrid domain contains both channel and spatial attention. In order to allow the network to focus more attention on the area around a significant target, the present invention proposes a wide area receptor field spatial attention module to process the extracted feature map.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a target detection method based on wide area receptive field space attention, which improves the feature expression capability of a network under the condition of not excessively increasing the number of model parameters. The method mainly comprises pooling operation, remolding operation, cavity rolling block, up-sampling operation and the like, and greatly enhances the expression capability of important characteristic information.
In order to achieve the purpose, the technical scheme provided by the invention is a target detection method based on wide area receptive field space attention, which comprises the following steps:
step 1, an image data set is prepared for testing and training.
And 2, constructing a target detection network based on the spatial attention of the wide area receptive field.
And 3, training the target detection network model based on the wide area receptive field space attention by using the training set image.
And 4, carrying out target detection on the test set image by using the network model trained in the step 3.
In step 1, the size of all the images is adjusted to 512 × 512 for multi-scale training, and a series of operations are performed on the image data set by data enhancement: random turning, padding filling, random cutting, normalization processing and image distortion processing.
In step 2, the target detection network based on wide area receptive field space attention is composed of a backhaul, a Neck, a Head and an MSA wide area receptive field space attention, wherein the backhaul adopts a ResNet50 Backbone network for extracting the characteristics of the picture, the Neck structure is used for connecting the backhaul and the Head and fusing the characteristics, the Head is used for detecting the object and realizing the classification and regression of the target, and the MSA is arranged between the backhaul and the Neck and between the Neck and the Head.
The ResNet50 Backbone network outputs 4 feature maps [ C1, C2, C3, C4] with different sizes, the step distance is [4,8,16,32], the channel size is [256,512,1024,2048], a Neck structure adopts three feature maps [ C2, C3, C4] of a backhaul, the channels are reduced to 256 after 1 x 1 convolution, feature fusion is carried out on [ P1, P2, P3] in an FPN structure, then P3 is downsampled for two times to obtain P4 and P5, finally ablation processing is carried out on the feature maps by 3 x 3 convolution, 5 feature maps with different sizes are output, the step distance is [8,16,32,64,128], and the channel sizes are all 256.
The structure of MSA is as follows: let F ∈ R C×H×W Is the input tensor, where C, H, W represent the channel, height and width, respectively; by 3X 3 convolution, the high and wide of F are halved to obtain F'. Epsilon.R C×H/2×W/2 Then respectively obtaining F through a common convolution branch 0 ∈R 1×H/2×W/2 And three depth separable convolution branches to F 1 ∈R C/2×H/2×W/2 、F 2 ∈R C/2×H/2×W/2 、F 3 ∈R C /2×H/2×W/2 Then F1, F2, F3 are reshaped into M1, M2, M3 by a change in dimension (three-dimensional to two-dimensional), i.e.:
Figure BDA0003764709520000021
m1, M2, M3 have the same matrix shape [ H/2W/2, C/2], H/2W/2 and C/2 represent the rows and columns of the matrix; multiplying M1, M2 and M3 respectively to obtain three relation matrixes N1, N2 and N3, wherein each value in the relation matrixes represents the relation between every two pixels in the characteristics; the calculation formulas of N1, N2 and N3 are as follows:
Figure BDA0003764709520000022
in the formula (I), the compound is shown in the specification,
Figure BDA0003764709520000023
representing matrix multiplication, M 1 T ,M 2 T ,M 3 T Transpose matrices of M1, M2, M3, respectively, N1, N2, N3 are [ H/2W/2, H/2W/2]H/2 w/2 and H/2 w/2 represent the rows and columns of the matrix, respectively;
reshaping N1, N2, N3 into T1, T2, T3, wherein T1, T2, T3 have a shape of [ H/2W/2, H/2, W/2]H/2W/2, H/2, W/2 denote channel, height and width, respectively; to obtain an output containing more useful global priors, F is 0 Splicing T1, T2 and T3 together to obtain a characteristic F M
F M =concat[F 0 ,T 1 ,T 2 ,T 3 ] (3)
In the formula, F M ∈R (H/2*W/2)*3×H/2×W/2 H/2, w/2, (H/2 w/2) × 3 denote height, width and channel;
f is to be M Reshape to Y 1 To generate an attention weight, which is then weighted by Y using an interpolation algorithm 1 Adjusted to Y 2 Obtaining the same space size as the Input characteristic Input, and then carrying out Y operation by reshaping 2 Is shaped into three-dimensional space with the size of [1, W]And finally, multiplying the Sigmoid function by the Input characteristic Input to obtain the final Output.
In step 3, the sizes of the training set images are unified to 512 × 512, the learning rate is set to 0.001, the size of batch \sizeis set to 4, the number of times of training is 12 epochs, and the learning rate is reduced to 1/10 of the original rate at the 8 th epoch and the 11 th epoch.
Compared with the prior art, the invention has the following advantages:
compared with the common spatial attention, the method provided by the invention captures the pixel-level characteristic information from the angle of the wide area receptive field, considers the mutual intersection among different characteristic information, and greatly improves the characteristic extraction effect under the condition of not obviously increasing the parameter quantity and the calculated quantity.
Drawings
Fig. 1 is a schematic diagram of a network structure according to the present invention.
Fig. 2 is a schematic view of a spatial attention structure of a wide area receptive field.
Fig. 3 is a schematic diagram of the network detection effect according to the present invention.
Detailed Description
The invention provides a target detection method based on wide area receptive field space attention, and the technical scheme of the invention is further explained by combining the attached drawings and an embodiment.
As shown in fig. 1, the process of the embodiment of the present invention includes the following steps:
step 1, an image data set is prepared for testing and training.
Selecting a COCO 2017 data set which is a large and rich object detection, segmentation and caption data set, comprises 80 types for detection, namely 80 common individuals in daily life such as people, bicycles, automobiles, motorcycles, airplanes, buses, trains, trucks, ships, traffic lights and the like, and comprises four files of innotations, test2017, train2017 and val2017, wherein the train comprises 118287 images, the val comprises 5000 images, the test comprises 28660 images, and the innotations are a set of mark types: object instances, object keypoints and image references, stored using JSON files.
All images were resized to 512 x 512 for multi-scale training, and a series of operations were performed on the image dataset with data enhancement: random turning, padding filling, random cutting, normalization processing and image distortion processing.
And 2, constructing a target detection network based on the spatial attention of the wide area receptive field.
As shown in fig. 1, the target detection network based on wide area receptive field Spatial Attention is composed of four parts, namely, back bone, tack, head and MSA (Multiple receptive field Spatial Attention). Backbone adopts a ResNet50 Backbone network for extracting the features of pictures, and the network outputs 4 feature maps [ C1, C2, C3, C4] with different sizes, the step pitch is [4,8,16,32], and the channel size is [256,512,1024,2048]. The Neck structure is used for connecting a Backbone and a Head and fusing features, the structure adopts three feature maps [ C2, C3 and C4] of the Backbone, channels are reduced to 256 after 1 x 1 convolution, feature fusion is carried out through [ P1, P2 and P3] in the FPN structure, then P3 is downsampled twice to obtain P4 and P5, finally ablation processing is carried out on the feature maps by 3 x 3 convolution, 5 feature maps with different sizes are output, the step distance is [8,16,32,64 and 128], and the channel size is 256. The Head is used for detecting the object and realizing the classification and regression of the target.
The MSA wide area receptive field spatial attention mechanism is put between backsbone and Neck, and between Neck and Head, i.e. "MSA" in FIG. 1, in 8 positions.
The structure of MSA is shown in FIG. 2, where F ∈ R C×H×W Is the input tensor, where C, H, W represent the channel, height and width, respectively. To reduce the parameters and the amount of computation, the high and wide halves of F are reduced by 3 × 3 convolution to obtain F' ∈ R C×H/2×W/2 Then is divided intoRespectively obtaining F through a common convolution branch 0 ∈R 1×H/2×W/2 And three depth separable convolution branches to F 1 ∈R C /2×H/2×W/2 、F 2 ∈R C/2×H/2×W/2 、F 3 ∈R C/2×H/2×W/2 Then F1, F2, F3 are reshaped into M1, M2, M3 by a change in dimension (three-dimensional to two-dimensional), i.e.:
Figure BDA0003764709520000041
m1, M2, M3 have the same matrix shape [ H/2 w/2, C/2], H/2 w/2 and C/2 representing the rows and columns of the matrix. And multiplying M1, M2 and M3 respectively to obtain three relation matrixes N1, N2 and N3, wherein each value in the relation matrixes represents the relation between every two pixels in the characteristics. The calculation formulas of N1, N2 and N3 are as follows:
Figure BDA0003764709520000042
in the formula (I), the compound is shown in the specification,
Figure BDA0003764709520000043
representing matrix multiplication, M 1 T ,M 2 T ,M 3 T Respectively, M1, M2, M3. The shape of N1, N2 and N3 is [ H/2W/2, H/2W/2]H/2 w/2 and H/2 w/2 represent the rows and columns of the matrix, respectively. Matrix multiplication is advantageous for fusing richer feature information while extracting features more carefully from a pixel perspective.
Reshaping N1, N2, N3 to T1, T2, T3 for the next feature fusion operation. The shapes of T1, T2, T3 are [ H/2W/2, H/2, W/2], H/2W/2, H/2, W/2 indicating channel, height and width, respectively.
To obtain an output containing more useful global priors, F 0 Spliced with T1, T2 and T3 to obtain a characteristic F M ∈R (H/2*W/2)*3×H/2×W/2 Wherein H/2, w/2, (H/2 w/2) × 3 denote height, width and channel. F M The formula of (1) is as follows:
F M =concat[F 0 ,T 1 ,T 2 ,T 3 ] (3)
F M is reshaped into Y 1 To generate an attention weight, which is then weighted by Y using an interpolation algorithm 1 Adjusted to Y 2 The same space size as the Input feature Input is obtained. Then Y is reset by reshaping operation 2 Is shaped into three-dimensional space with the size of [1, W]And finally, multiplying the Input characteristic Input by a Sigmoid function to obtain an Output.
And 3, training the target detection network model based on the wide area receptive field space attention by using the training set image.
The sizes of the images in the training set are uniformly 512 multiplied by 512, the learning rate is set to 0.001, the size of batch _sizeis set to 4, the training times are 12 epochs, and the learning rate is reduced to 1/10 of the original rate in 8 th and 11 th epochs.
And 4, performing target detection on the images in the test set by using the network model trained in the step 3.
The experimental environment is as follows: and (3) building a Python compiling environment with PyTorch1.6, torchvision =0.7.0, CUDA10.0 and CUDNN7.4 as deep learning frameworks, and realizing the Python compiling environment on a platform mmdetection 2.6.
Experimental equipment: CPU Intel Xeon E5-2683 V3@2.00GHz; RAM 16GB; graphics card Nvidia GTX 2060super; 500GB is the Hard disk.
In order to test the influence of the spatial attention structure of the MSA wide-area receptive field on the precision of the detected object, comparison experiments are carried out on a plurality of networks. The evaluation standard of the experiment adopts Average Precision (AP), and AP is selected 50 、AP 75 、AP S 、AP M 、AP L As a main evaluation criterion, wherein AP 50 ,AP 75 Means taking the detection result of the detector with the IoU threshold value larger than 0.50 and larger than 0.75, AP S 、AP M 、AP L The detection accuracy of small, medium and large targets is respectively corresponded, and the experimental results are shown in table 1.
Table 1 effect of MSA spatial attention on different networks
Figure BDA0003764709520000051
Table 1 shows the effect of detecting MSA wide area receptor field spatial attention structure on the COCO 2017 dataset. As can be seen from the table, the magnitude of the increase in each network is between 0.7% and 0.9%. Because the image of the COCO 2017 data set often contains a large number of complex objects, the type, the scale and the posture of the target to be detected are often uncertain, and some difficulties are brought to detection, for example, the detection effect of large targets after ATSS and VFNet are added with MSA space attention is slightly worse than that of an original network. Overall, the wide-area receptor field spatial attention mechanism extracts important features well.
Some test pictures were selected to test the final result. As can be seen from fig. 3, the proposed target detection network achieves good results. When only one bird is present in the third picture, the network can accurately detect the object. When a plurality of objects exist in other pictures, a good detection effect can be achieved. For the fourth picture and the fifth picture, when part of objects are blocked, the categories can still be accurately identified. In addition, for small objects and blurred images which well complete detection tasks, such as ships with the fourth picture and horses with the eighth picture, the detection network provided by the invention can also well complete detection tasks. In general, the network provided by the invention accurately completes the task of target detection and has excellent recognition effect at the edge.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (5)

1. A target detection method based on wide area receptive field space attention is characterized by comprising the following steps:
step 1, preparing an image data set for testing and training;
step 2, constructing a target detection network based on wide area receptive field space attention, wherein the network is composed of a Backbone, a Neck, a Head and an MSA wide area receptive field space attention, the Backbone adopts a ResNet50 Backbone network for extracting the characteristics of pictures, the Neck structure is used for connecting the Backbone and the Head and fusing the characteristics, the Head is used for detecting objects and realizing the classification and regression of the target, and the MSA is arranged between the Backbone and the Neck and between the Neck and the Head;
step 3, training a target detection network model based on wide area receptive field space attention by using a training set image;
and 4, performing target detection on the images in the test set by using the network model trained in the step 3.
2. The method as claimed in claim 1, wherein the wide area receptive field spatial attention-based target detection method comprises: in step 1, the sizes of all images are adjusted to 512 × 512 for multi-scale training, and a series of operations are performed on an image data set by data enhancement: random turning, padding, random cutting, normalization processing and image distortion processing.
3. The method as claimed in claim 1, wherein the wide area receptive field spatial attention-based target detection method comprises: in step 2, the ResNet50 Backbone network outputs 4 feature maps [ C1, C2, C3, C4] with different sizes, the step distance is [4,8,16,32], the channel size is [256,512,1024,2048], a hack structure adopts three feature maps [ C2, C3, C4] of Backbone, the channels are all reduced to 256 after 1 × 1 convolution, feature fusion is carried out on [ P1, P2, P3] in the FPN structure, then P3 is sampled twice to obtain P4 and P5, finally 3 × 3 convolution is adopted to carry out ablation processing on the feature maps, 5 feature maps with different sizes are output, the step distance is [8,16,32,64,128], and the channel sizes are all 256.
4. The method of claim 1A target detection method based on wide area receptive field space attention is characterized in that: the structure of MSA in step 2 is as follows: let F ∈ R C×H×W Is the input tensor, where C, H, W represent the channel, height and width, respectively; the high and wide halves of F are reduced by 3 x 3 convolution to obtain F' ∈ R C×H/2×W/2 Then respectively obtaining F through a common convolution branch 0 ∈R 1×H/2×W/2 And three depth separable convolution branches to F 1 ∈R C/2×H/2×W/2 、F 2 ∈R C/2×H/2×W/2 、F 3 ∈R C/2×H/2×W/2 Then, F1, F2, F3 are reshaped into M1, M2, M3 through the change of dimension, namely three-dimensional change into two-dimensional change, namely:
Figure FDA0003764709510000011
m1, M2, M3 have the same matrix shape [ H/2W/2, C/2], H/2W/2 and C/2 represent the rows and columns of the matrix; multiplying M1, M2 and M3 respectively to obtain three relation matrixes N1, N2 and N3, wherein each value in the relation matrixes represents the relation between every two pixels in the characteristics; the calculation formulas of N1, N2 and N3 are as follows:
Figure FDA0003764709510000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003764709510000022
representing matrix multiplication, M 1 T ,M 2 T ,M 3 T Transpose matrices of M1, M2, M3, respectively, N1, N2, N3 are [ H/2W/2, H/2W/2]H/2 w/2 and H/2 w/2 represent the rows and columns of the matrix, respectively;
reshaping N1, N2, N3 into T1, T2, T3, wherein the shape of T1, T2, T3 is [ H/2 × W/2, H/2, W/2]H/2W/2, H/2, W/2 denote channel, height and width, respectively; to obtain an output containing more useful global priors, F 0 And T1, T2. T3 are spliced together to obtain a characteristic F M
F M =concat[F 0 ,T 1 ,T 2 ,T 3 ] (3)
In the formula, F M ∈R (H/2*W/2)*3×H/2×W/2 H/2, w/2, (H/2 × w/2) × 3 denote height, width and channel;
f is to be M Reshape to Y 1 To generate an attention weight, which is then weighted by Y using an interpolation algorithm 1 Adjusted to Y 2 Get the same space size as the Input feature Input, then get Y through reshaping operation 2 Is shaped into three-dimensional space with the size of [1, W]And finally, multiplying the Sigmoid function by the Input characteristic Input to obtain the final Output.
5. The method as claimed in claim 4, wherein the object detection method based on wide-area receptive field space attention comprises the following steps: in step 3, the sizes of the images of the training set are unified to 512 multiplied by 512, the learning rate is set to 0.001, the size of batch _sizeis set to 4, the training times are 12 epochs, and the learning rate is reduced to 1/10 of the original rate when the 8 th epoch and the 11 th epoch are carried out.
CN202210882431.XA 2022-07-26 2022-07-26 Target detection method based on wide area receptive field space attention Pending CN115294356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210882431.XA CN115294356A (en) 2022-07-26 2022-07-26 Target detection method based on wide area receptive field space attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210882431.XA CN115294356A (en) 2022-07-26 2022-07-26 Target detection method based on wide area receptive field space attention

Publications (1)

Publication Number Publication Date
CN115294356A true CN115294356A (en) 2022-11-04

Family

ID=83824991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210882431.XA Pending CN115294356A (en) 2022-07-26 2022-07-26 Target detection method based on wide area receptive field space attention

Country Status (1)

Country Link
CN (1) CN115294356A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690522A (en) * 2022-12-29 2023-02-03 湖北工业大学 Target detection method based on multi-pooling fusion channel attention and application thereof
CN115810183A (en) * 2022-12-09 2023-03-17 燕山大学 Traffic sign detection method based on improved VFNet algorithm

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115810183A (en) * 2022-12-09 2023-03-17 燕山大学 Traffic sign detection method based on improved VFNet algorithm
CN115810183B (en) * 2022-12-09 2023-10-24 燕山大学 Traffic sign detection method based on improved VFNet algorithm
CN115690522A (en) * 2022-12-29 2023-02-03 湖北工业大学 Target detection method based on multi-pooling fusion channel attention and application thereof

Similar Documents

Publication Publication Date Title
CN110555434B (en) Method for detecting visual saliency of three-dimensional image through local contrast and global guidance
CN111160085A (en) Human body image key point posture estimation method
CN115294356A (en) Target detection method based on wide area receptive field space attention
CN111931624A (en) Attention mechanism-based lightweight multi-branch pedestrian heavy identification method and system
CN110619638A (en) Multi-mode fusion significance detection method based on convolution block attention module
KR20210028185A (en) Human posture analysis system and method
CN112686097A (en) Human body image key point posture estimation method
CN112132844A (en) Recursive non-local self-attention image segmentation method based on lightweight
CN115690522B (en) Target detection method based on multi-pooling fusion channel attention and application thereof
CN112070727B (en) Metal surface defect detection method based on machine learning
CN109753959B (en) Road traffic sign detection method based on self-adaptive multi-scale feature fusion
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN112927209B (en) CNN-based significance detection system and method
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
CN113657409A (en) Vehicle loss detection method, device, electronic device and storage medium
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN113408584A (en) RGB-D multi-modal feature fusion 3D target detection method
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN115019274A (en) Pavement disease identification method integrating tracking and retrieval algorithm
CN115131797A (en) Scene text detection method based on feature enhancement pyramid network
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN117037119A (en) Road target detection method and system based on improved YOLOv8
CN115294326A (en) Method for extracting features based on target detection grouping residual error structure
CN114677558A (en) Target detection method based on direction gradient histogram and improved capsule network
CN113902753A (en) Image semantic segmentation method and system based on dual-channel and self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination