CN113743521A - Target detection method based on multi-scale context sensing - Google Patents

Target detection method based on multi-scale context sensing Download PDF

Info

Publication number
CN113743521A
CN113743521A CN202111061082.7A CN202111061082A CN113743521A CN 113743521 A CN113743521 A CN 113743521A CN 202111061082 A CN202111061082 A CN 202111061082A CN 113743521 A CN113743521 A CN 113743521A
Authority
CN
China
Prior art keywords
features
feature
pyramid
scale
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111061082.7A
Other languages
Chinese (zh)
Other versions
CN113743521B (en
Inventor
王伯英
汲如意
张立波
武延军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202111061082.7A priority Critical patent/CN113743521B/en
Publication of CN113743521A publication Critical patent/CN113743521A/en
Application granted granted Critical
Publication of CN113743521B publication Critical patent/CN113743521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on multi-scale context sensing, which comprises the following steps: 1) extracting a plurality of scale features of the image; 2) enhancing the top layer characteristics in the multi-scale characteristics through the cavity residual block to obtain the top layer characteristics with high-level characteristics; 3) fusing the features of the adjacent layers to generate pyramid features; 4) aggregating the pyramid features to obtain feature Xm(ii) a 5) Further enhancement of feature X by relying on enhancement modulesmGenerating enhanced feature Xo(ii) a 6) Will be characterized by XoMatching and adding the pyramid characteristics in an up-sampling or down-sampling mode respectively; 7) inputting the features obtained in the step 6) into a candidate area generation network to generate a candidate frame, and extracting the features of the candidate frame; 8) inputting features of candidate boxes to head detectionAnd the module predicts and then filters the detection result of the candidate frame by a non-maximum suppression method to obtain the category and position information of the article.

Description

Target detection method based on multi-scale context sensing
Technical Field
The invention relates to the technical field of computer vision, in particular to target detection, and particularly relates to a target detection method based on multi-scale up-and-down perception.
Background
Object detection is a realistic and challenging computer vision task whose purpose is to identify and locate objects in an image. In recent years, with the deep study of deep learning, the method is rapidly developed and widely applied to the fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like. General target detection is generally divided into two categories: single-stage and two-stage target detection. The single-stage detection directly processes the input image to generate a detection result. The two-stage detection firstly extracts a candidate region through RPN, and then refines the detection result according to the candidate region. In earlier studies, object detection directly utilized the highest level of features to detect objects. However, the highest level features are not good for target detection due to the small spatial scale. To address this problem, some feature pyramid techniques that utilize multi-scale features have evolved. The mainstream work of the feature pyramid technology is divided into two categories: neural structure search and non-neural structure search. NAS-FPN is representative of neural structure search based methods. The NAS-FPN defines a search space and utilizes reinforcement learning strategies to explore the pyramid structure with the best performance. The neural structure search based approach has higher performance, but also has some obvious disadvantages. First, the resulting structure is extremely complex and not easily understood. Second, the structure is typically a multi-layer stack, thus placing a significant parameter and computational burden. Third, the search cost of neural structure search is prohibitive, involving thousands of TPU hours. In contrast, the non-NAS feature pyramid approach is designed manually. FPN is a widely applied non-neural structure search module, and the current FPN-based method has three problems: (1) the highest level context information is lost. Before merging, a 1 × 1 convolutional layer is used to reduce the number of feature channels. The top level features typically have thousands of channels, which contain rich contextual information. The top level features lose a large amount of information due to the reduction of channels. (2) The context fusion strategy is inadequate. In the fusion process, the high-level features are matched with the shallow features through upsampling operation, and then fusion is carried out through element addition. But this simple aggregation strategy is not optimal. Different levels should not be handled with the same considerations, since the context information contained is different. (3) Semantic gaps between different hierarchical features. Given that feature propagation is unidirectional, the underlying features cannot be propagated to higher levels. In addition, high-level semantic information can be diluted in the propagation process, so that semantic differences are generated between different levels after fusion.
Disclosure of Invention
In order to overcome the above problems, an object of the present invention is to provide a method for detecting a target based on multi-scale context sensing, an electronic device and a scale storage medium. First, with the hole residual block, an enhanced high-level feature with a richer receptive field is produced. Secondly, an interactive fusion method is adopted to better fuse the context information of adjacent layers. Third, an adaptive context aggregation block is proposed to solve the semantic gap problem. Under the guidance of channels and spaces, the network learns the weights of different layers in a self-adaptive manner to generate a discriminative context. Our method enables the network to obtain significant performance gains, and thus the present invention has been completed.
To achieve the object of the present aspect, the present invention employs the following steps:
1) inputting the sample image into a backbone network to extract features { C2, C3, C4 and C5} of a plurality of scales;
2) the top-level features C5 extracted by the backbone network are acted by the hole residual block, so that enhanced high-level features P5 with richer receptive fields are generated to make up for the loss of the high-level features.
3) Features { P2, P3, P4, P5} are generated by a cross-scale context aggregation module that better fuses context information of neighboring levels.
4) By applying the adaptive context aggregation module to the features { P2, P3, P4 and P5}, the network can also learn the weights of the multi-scale features on channels and spaces, and obtain the feature X by means of weighted summationm
5) Further enhancement of feature X by relying on enhancement modulesmGenerating enhanced feature Xo
6) Will be characterized by XoAnd finally, matching the features with the dimensions of the features { P2, P3, P4 and P5} in an upsampling or downsampling mode respectively, and adding the matched features in an element adding mode to obtain the features { O2, O3, O4 and O5 }.
7) Inputting the features { O2, O3, O4 and O5} into a candidate region generation network to generate candidate boxes, and simultaneously extracting the features of the candidate boxes by using the Roi-Pooling layer.
8) The candidate box features are input to a head detection module (such as in the fast rcnn, mask rcnn, etc. techniques) for prediction. The head detection module comprises a classification module and a regression module. The classification module is used for generating the category of the candidate frame, and the regression module is used for predicting the position coordinate offset. The offset is used to correct the position of the candidate frame generated in step 7). And finally, obtaining a final detection result, namely the type and the position of the article, by a non-maximum value inhibition method, and judging whether the type of the article is a target type.
A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method.
The invention has the advantages that:
1) the invention provides a novel characteristic pyramid network, namely a multi-scale context-aware network, which comprises three modules, namely: the device comprises a cavity residual block, a cross-scale context aggregation module and a self-adaptive context aggregation module;
2) the target detection method based on the multi-scale context sensing can obtain obvious performance improvement on the baseline of the target detection algorithm;
drawings
Fig. 1 is a flowchart of a target detection method based on multi-scale context awareness according to an embodiment of the present invention;
FIG. 2 is a diagram of a multi-scale context-aware-based target detection framework according to the present invention, with a structure of a void residual block on the right, where CCAB is a cross-scale context aggregation module, CAB is a channel-oriented aggregation module, and SAB is a spatial-oriented aggregation module;
FIG. 3 illustrates a network architecture diagram of a cross-scale context aggregation module;
fig. 4 shows a network structure diagram of an adaptive context aggregation module, wherein (a) is a channel guidance aggregation module and (b) is a spatial guidance aggregation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. The described embodiments are only some embodiments of the invention, not all embodiments.
Example 1
The invention discloses a target detection method based on multi-scale context sensing, which comprises the following steps:
step S1: constructing a backbone network, and performing pre-training on a large-scale classification data set to extract multi-scale features { C2, C3, C4, C5} of the input image; the backbone network can select an existing deep learning based neural network, such as a residual error network (ResNet) or a multi-branch residual error network (ResNeXt). The backbone network is pre-trained on large-scale taxonomic datasets (such as ImageNet or Microsoft COCO).
Step S2: and constructing a multi-scale context-aware network. Firstly, the hole residual block generates enhanced high-level features with richer receptive fields by overlapping a plurality of residual blocks with different hole rates, which can reduce the loss of context information of the highest-level features, wherein the residual block with the smallest hole rate is in front, and then the hole rates of the residual blocks are sequentially increased, namely all the residual blocks are overlapped together from small to large according to the hole rates. Secondly, the cross-scale context aggregation block adopts an interactive fusion method to better fuse the context information of adjacent layers and provide more effective supplement for the current layer. Third, an adaptive context aggregation block is proposed to solve the semantic gap problem. Under the guidance of channels and spaces, the network can adaptively learn weights of different layers to generate a distinguishing context.
A hole residual block. As shown in fig. 2, after obtaining the backbone network to extract the top-level features C5, we input them into the hole residual block to obtain rich context information P5. Firstly, each residual block uses a 1 × 1 convolutional layer to reduce the number of output channels, and then context semantic information is enhanced through a 3 × 3 convolutional layer, so that the sense field is enlarged due to the increase of a convolutional kernel, and therefore the extracted features have rich context semantic information. Finally, the number of channels is recovered using one 1 × 1 convolutional layer. It is noted that each 3 x 3 convolutional layer has a different void fraction, e.g., 2, 4, 6, 8.
Aggregating blocks across a scale context. The feature P4 is obtained by fusing features of adjacent levels through a cross-scale context aggregation module (for example, the context aggregation module acts on the features P5 and C4). As shown in FIG. 3, we assume the inputs of the cross-scale aggregation block are f (i +1) and f (i); first, we enhance the input features by 1 convolutional layer of 3 × 3.
f(i+1)=Conv(f(i+1))
f(i)=Conv(f(i))
The two branches are then cross-fused. f (i +1) is matched by upsampling to f (i), and f (i) is matched by downsampling to f (i + 1). The fusion mode is as follows:
h(i+1)=Conv(Down(f(i)))+Conv(f(i+1))
h(i)=Conv(Up(f(i+1)))+Conv(f(i))
o(i)=Conv(h(i))+Conv(Up(h(i+1)))
P(i)=Conv(o(i)+f(i))
finally, we obtain enhanced features { P2, P3, P4, P5} through cross-scale context aggregation block.
An adaptive context aggregation module. As shown in FIG. 2, the multi-scale features { P2, P3, P4, P5} are input into the channel-directed aggregation module and the spatial-directed aggregation module, respectively,to generate corresponding features XcAnd Xs. Then, the two characteristics are fused in an element addition mode to obtain an enhanced characteristic Xm. Note that we first need to unify the multi-scale features (experimentally chosen as the P4 scale size) and then input them into the adaptive context aggregation block.
The channel directs the aggregation module. As shown in fig. 4(a), given the output pyramid features of the cross-scale context aggregation block as { P2, P3, P4, P5}, we can obtain their global semantic representation through the addition operation of elements and input them to the Global Average Pool (GAP) layer. And then, processing the input global semantic expression by using a Global Average Pool (GAP) layer to output global channel information. We then use 1 x 1 convolutional layer to compress the global channel information. In addition, N convolutional layers are used for acting on compressed global channel information to channel weights of pyramid features, and finally the channel weights and the pyramid features are subjected to weighting summation to obtain the features Xc. And N is the pyramid feature layer number.
The space directs the aggregation module. As shown in fig. 4(b), a global semantic representation of the pyramid features { P2, P3, P4, P5} is first obtained by element addition. Two different spatial context information are then generated using the average pooling and maximum pooling operations. And we use the Concat operation to fuse these two pieces of context information. Then, we can use N7 × 7 convolutional layers to act on the fused context information to obtain the spatial weight of the pyramid feature, and finally obtain the feature X by weighting and summing the spatial weight and the pyramid features
The enhancement module is relied upon. We use dependency enhancement modules to act on feature XmGenerating more discriminative features Xo. Experiments performed on existing attention blocks (such as SEBlock, CBAM, Non-local, and GCBlock) have shown that both GCBlock and Non-local work well. Non-local, compared to GCBlock, imposes a significant parametric and computational burden. Therefore, GCBlock (global context block) is selected herein as a default setting. By effectively capturing the long-distance dependency, the accuracy is further improved.
Will be characterized by XoAnd matching with the dimensions of the features { P2, P3, P4 and P5} respectively by means of up-sampling or down-sampling, and finally obtaining the features { O2, O3, O4 and O5} by means of element addition. Wherein the pair X is determined according to the dimensions of the characteristics of each layer { P2, P3, P4, P5}oRespectively operating; for the ith layer feature Pi, if XoIs smaller than it, up-sampling if X isoIs larger than it, then downsampled.
Step S3: and constructing a candidate area generation network. The candidate area generation network may generate a detection box. For each point on the feature map { O2, O3, O4, O5} obtained in step S2, it may generate detection frames having different scales and aspect ratios. Then extracting the characteristics of the detection frames through an ROI Align layer, and finally inputting the extracted characteristics into two network layers, wherein one network layer is used for classification, namely whether the object contained in the frame belongs to the foreground or not; the other outputs the offset of the detection frame with respect to the real object frame. And performing preliminary correction on the detection frame through the predicted offset.
Step S4: and constructing a head detection module, and classifying and regressing the corrected detection frame again. The head detection module includes: the classification module is used for outputting the classification result of each detection frame; the position regression module is used for outputting the offset of each detection frame relative to the real target.
Step S5: the network is trained by a gradient descent algorithm. When the number of rounds specified in advance is reached, the entire network stops training.
Step S6: and (5) testing the network.
Example 2
An embodiment 2 of the present invention provides an electronic device, including a memory and a processor, where a target detection program based on multi-scale context awareness is stored, and when the target detection program is executed by the processor, the processor is enabled to execute a target detection method based on multi-scale context awareness, where the method includes the following steps:
1) performing multi-scale feature extraction on an input image by using a pre-trained backbone network;
2) fusing the extracted multi-scale features by adopting a multi-scale context-aware network;
3) inputting the fused features into a candidate area generation network to extract candidate frames, and extracting the features of the candidate frames through a Roi-Pooling layer;
4) and inputting the extracted candidate frame features into a head detector to obtain the category and the position offset of the detection frame. The offset is used to correct the position of the candidate frame generated in step 3). And finally, obtaining a final detection result, namely the category and the position of the article, by a non-maximum value inhibition method.
Example 3
An embodiment 3 of the present invention provides a computer-readable storage medium, where when executed by a processor, the program causes the processor to execute a method for detecting a target based on multi-scale context awareness, where the method includes:
1) performing multi-scale feature extraction on an input image by using a pre-trained backbone network;
2) fusing the extracted multi-scale features by adopting a multi-scale context-aware network;
3) inputting the fused features into a candidate area generation network to extract candidate frames, and extracting the features of the candidate frames through a Roi-Pooling layer;
4) the extracted candidate frame features are input to a head detector to obtain category and position information of the detection frame.
The above description is only a preferred example of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A target detection method based on multi-scale context sensing comprises the following steps:
1) extracting a plurality of scale features of the image by using a backbone network;
2) enhancing the top layer characteristics in the multi-scale characteristics through the cavity residual block to obtain the top layer characteristics with high-level characteristics;
3) fusing the features of adjacent layers through a cross-scale context aggregation module to generate pyramid features;
4) aggregating pyramid features through an adaptive context aggregation module to obtain a feature Xm
5) Further enhancement of feature X by relying on enhancement modulesmGenerating enhanced feature Xo
6) Will be characterized by XoRespectively matching the pyramid features in an up-sampling or down-sampling mode, and adding the matched features in an element addition mode;
7) inputting the features obtained in the step 6) into a candidate area generation network to generate a candidate frame, and extracting the features of the candidate frame;
8) inputting the characteristics of the candidate frame into a head detection module for prediction to obtain the category and position coordinates of the candidate frame; and then filtering the detection result of the candidate frame by a non-maximum value inhibition method to obtain the category and position information of the articles in the candidate frame.
2. The method of claim 1, wherein the hole residual block comprises a plurality of residual blocks with different hole rates; sequentially inputting top-level features in the multi-scale features into each residual block, wherein each residual block firstly adopts a 1 × 1 convolutional layer to reduce the number of channels of input data, then enhances the context semantic information of the input data through a 3 × 3 convolutional layer, and then recovers the number of channels of the input data by using the 1 × 1 convolutional layer; wherein the 3 x 3 convolutional layers in different residual blocks have different void rates.
3. The method of claim 1 or 2, wherein the cross-scale context aggregation module generates pyramid features by:
31) respectively enhancing the input two adjacent layer characteristics f (i +1) and f (i) through a 3 x 3 convolutional layer;
32) performing up-sampling on the increased features f (i +1) and performing matching fusion on the enhanced features f (i) to obtain features h (i); performing down-sampling on the enhanced features f (i) and matching and fusing the enhanced features f (i +1) to obtain features h (i + 1);
33) performing up-sampling on the feature h (i +1), and performing matching fusion on the feature h (i) and the feature h (i) to obtain a feature o (i);
34) matching and fusing the features o (i) and the features f (i) of the ith layer to generate pyramid features.
4. The method of claim 1 or 2, wherein the adaptive context aggregation module comprises a channel guidance aggregation module and a spatial guidance aggregation module; inputting the pyramid features into a channel guidance aggregation module and a space guidance aggregation module respectively to generate corresponding features XcAnd Xs(ii) a Then the feature X is comparedcAnd XsFusing by element addition to obtain enhanced feature Xm
5. The method of claim 4, wherein the channel guidance aggregation module first obtains and inputs a global semantic representation of a pyramid feature to a global average pool layer; then, processing the input global semantic expression by using a global average pool layer to output global channel information; then, a 1 × 1 convolutional layer is used for compressing global channel information, N convolutional layers are used for acting on the compressed global features to obtain channel weights of the pyramid features, and then the channel weights and the pyramid features are subjected to weighting summation to obtain features Xc(ii) a Wherein N is the pyramid feature level.
6. The method of claim 4, wherein the spatial guidance aggregation module first obtains a global semantic representation of a pyramid feature; then, average pooling and maximum pooling operations are respectively carried out on the global semantic representation to generate two different pieces of spatial context information; then, fusing the context information of the two spaces; then using N7 × 7 convolution layers to act on the fused spaceObtaining the spatial weight of the pyramid feature by the text information, and finally obtaining the feature X by the spatial weight and the pyramid feature in a weighted summation modes(ii) a Wherein N is the pyramid feature level.
7. The method of claim 1, wherein the dependency enhancement module is an attention module (GCBlock).
8. The method of claim 1, wherein the candidate area generation network generates detection boxes with different scales and aspect ratios for each point on the features resulting from step 6); then extracting the characteristics of the detection frames and inputting the characteristics into two network layers, wherein one network layer is used for classifying, namely identifying whether the object contained in the detection frame belongs to the foreground; another network layer predicts and outputs the offset of the detection frame relative to the real object frame; then correcting the detection frame through the predicted offset; and then classifying and regressing the corrected detection frames again.
9. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 8.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202111061082.7A 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness Active CN113743521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111061082.7A CN113743521B (en) 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111061082.7A CN113743521B (en) 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness

Publications (2)

Publication Number Publication Date
CN113743521A true CN113743521A (en) 2021-12-03
CN113743521B CN113743521B (en) 2023-06-27

Family

ID=78737903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111061082.7A Active CN113743521B (en) 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness

Country Status (1)

Country Link
CN (1) CN113743521B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920468A (en) * 2021-12-13 2022-01-11 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN116052026A (en) * 2023-03-28 2023-05-02 石家庄铁道大学 Unmanned aerial vehicle aerial image target detection method, system and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104584A1 (en) * 2018-09-28 2020-04-02 Aptiv Technologies Limited Object detection system of a vehicle
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area
CN111401201A (en) * 2020-03-10 2020-07-10 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN111461110A (en) * 2020-03-02 2020-07-28 华南理工大学 Small target detection method based on multi-scale image and weighted fusion loss
CN111738110A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
CN112347859A (en) * 2020-10-15 2021-02-09 北京交通大学 Optical remote sensing image saliency target detection method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200104584A1 (en) * 2018-09-28 2020-04-02 Aptiv Technologies Limited Object detection system of a vehicle
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area
CN111461110A (en) * 2020-03-02 2020-07-28 华南理工大学 Small target detection method based on multi-scale image and weighted fusion loss
CN111401201A (en) * 2020-03-10 2020-07-10 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN111738110A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN112347859A (en) * 2020-10-15 2021-02-09 北京交通大学 Optical remote sensing image saliency target detection method
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
T. WANG等: "SSFENet: Spatial and Semantic Feature Enhancement Network for Object Detection", 《IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
XIN, Y.等: "Reverse Densely Connected Feature Pyramid Network for Object Detection", 《ASIAN CONFERENCE ON COMPUTER VISION(ACCV 2018)》 *
田婷婷等: "基于多尺度特征融合网络的遥感影像目标检测", 《激光与光电子学进展》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920468A (en) * 2021-12-13 2022-01-11 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN113920468B (en) * 2021-12-13 2022-03-15 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN116052026A (en) * 2023-03-28 2023-05-02 石家庄铁道大学 Unmanned aerial vehicle aerial image target detection method, system and storage medium

Also Published As

Publication number Publication date
CN113743521B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN109522966B (en) Target detection method based on dense connection convolutional neural network
CN108764063B (en) Remote sensing image time-sensitive target identification system and method based on characteristic pyramid
CN110046550B (en) Pedestrian attribute identification system and method based on multilayer feature learning
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN110348437B (en) Target detection method based on weak supervised learning and occlusion perception
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN113139543B (en) Training method of target object detection model, target object detection method and equipment
CN111027576B (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN113743521B (en) Target detection method based on multi-scale context awareness
CN113628294A (en) Image reconstruction method and device for cross-modal communication system
CN109522958A (en) Based on the depth convolutional neural networks object detection method merged across scale feature
CN110826609B (en) Double-current feature fusion image identification method based on reinforcement learning
CN114283120B (en) Domain-adaptive-based end-to-end multisource heterogeneous remote sensing image change detection method
CN115222946B (en) Single-stage instance image segmentation method and device and computer equipment
CN112307982A (en) Human behavior recognition method based on staggered attention-enhancing network
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN111507359A (en) Self-adaptive weighting fusion method of image feature pyramid
CN110782430A (en) Small target detection method and device, electronic equipment and storage medium
CN116012722A (en) Remote sensing image scene classification method
CN115830449A (en) Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
CN114758255A (en) Unmanned aerial vehicle detection method based on YOLOV5 algorithm
CN113723553A (en) Contraband detection method based on selective intensive attention
CN114170526A (en) Remote sensing image multi-scale target detection and identification method based on lightweight network
CN111582057A (en) Face verification method based on local receptive field
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant