CN113743521B - Target detection method based on multi-scale context awareness - Google Patents

Target detection method based on multi-scale context awareness Download PDF

Info

Publication number
CN113743521B
CN113743521B CN202111061082.7A CN202111061082A CN113743521B CN 113743521 B CN113743521 B CN 113743521B CN 202111061082 A CN202111061082 A CN 202111061082A CN 113743521 B CN113743521 B CN 113743521B
Authority
CN
China
Prior art keywords
features
feature
pyramid
scale
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111061082.7A
Other languages
Chinese (zh)
Other versions
CN113743521A (en
Inventor
王伯英
汲如意
张立波
武延军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202111061082.7A priority Critical patent/CN113743521B/en
Publication of CN113743521A publication Critical patent/CN113743521A/en
Application granted granted Critical
Publication of CN113743521B publication Critical patent/CN113743521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on multi-scale context awareness, which comprises the following steps: 1) Extracting a plurality of scale features of the image; 2) Enhancing the top-level features in the multi-scale features through the cavity residual blocks to obtain top-level features with high-level features; 3) Fusing the features of adjacent layers to generate pyramid features; 4) The pyramid features are aggregated to obtain feature X m The method comprises the steps of carrying out a first treatment on the surface of the 5) Further enhancement of feature X by dependency enhancement module m Generating enhanced feature X o The method comprises the steps of carrying out a first treatment on the surface of the 6) Feature X o Respectively carrying out matching addition with pyramid features in an up-sampling or down-sampling mode; 7) Inputting the features obtained in the step 6) into a candidate region generation network to generate candidate frames, and extracting the features of the candidate frames; 8) And inputting the characteristics of the candidate frames into a head detection module for prediction, and filtering the detection results of the candidate frames by a non-maximum suppression method to obtain the category and position information of the articles.

Description

Target detection method based on multi-scale context awareness
Technical Field
The invention relates to the technical field of computer vision, in particular to a target detection method based on multi-scale up-down sensing.
Background
Object detection is a realistic and challenging computer vision task whose purpose is to identify objects in an image and locate them. In recent years, with the deep study of deep learning, the method is rapidly developed and widely applied to the fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like. General purpose target detection is generally divided into two categories: single-stage and two-stage target detection. The single-stage detection directly processes the input image to generate a detection result. The two-stage detection firstly extracts candidate areas through RPN, and then refines the detection result according to the candidate areas. In early studies, object detection directly utilized the highest level features to detect objects. However, due to the small spatial scale, the highest level features are not conducive to target detection. To address this problem, feature pyramid techniques have been developed that utilize multi-scale features. The main flow of work in feature pyramid technology falls into two categories: neural structure searches and non-neural structure searches. NAS-FPN is representative of methods based on neural structure search. NAS-FPN defines the search space and explores the best performing pyramid structure using reinforcement learning strategies. The neural structure search-based approach has higher performance, but also has some significant drawbacks. First, the resulting structure is extremely complex and not easily understood. Second, the structure is typically multi-layered, thus placing a significant amount of parameters and computational burden. Third, the search cost of neural structure searches is prohibitive, involving thousands of TPU hours. In contrast, non-NAS feature pyramid methods are designed manually. FPN is a widely applied non-neural structure search module, and three problems exist in the current FPN-based method: (1) highest level context information is lost. Before fusion, a 1×1 convolution layer is used to reduce the number of characteristic channels. The highest level features typically have thousands of channels containing rich contextual information. The highest level features lose a lot of information due to the reduction of channels. (2) the context fusion policy is inadequate. In the fusion process, the high-level features are matched with the shallow features through up-sampling operation, and then the elements are fused through addition. But this simple aggregation strategy is not optimal. Different levels should not be handled with the same considerations due to the different context information contained. (3) semantic gap between different levels of features. Considering that feature propagation is unidirectional, the underlying features cannot be propagated to high levels. In addition, in the propagation process, high-level semantic information can be diluted, so that semantic differences are generated between different layers after fusion.
Disclosure of Invention
In order to overcome the problems, the invention aims to provide a target detection method based on multi-scale context sensing, an electronic device and a scale storage medium. First, enhanced high-level features with richer receptive fields are generated by the hole residual block. And secondly, adopting an interactive fusion method to better fuse the context information of adjacent layers. Third, an adaptive context aggregation block is proposed to solve the semantic gap problem. Under the guidance of the channel and the space, the network adaptively learns weights of different layers to generate a context with discrimination. Our approach allows the network to achieve significant performance gains, leading to the completion of the present invention.
To achieve the object of the present invention, the present invention employs the steps of:
1) Inputting the sample image into a backbone network to extract characteristics { C2, C3, C4, C5} of a plurality of scales;
2) The cavity residual block acts on the top-level feature C5 extracted by the backbone network, so that the enhanced high-level feature P5 with a richer receptive field is generated, and the loss of the high-level feature is compensated.
3) The cross-scale context aggregation module is used for better fusing the context information of adjacent layers, so that the features { P2, P3, P4 and P5}.
4) The network can also learn the weights of the multi-scale features on channels and spaces by acting on the features { P2, P3, P4, P5}, and the feature X is obtained by a weighted summation mode m
5) Further enhancement of feature X by dependency enhancement module m Generating enhanced feature X o
6) Feature X o By upsampling or downsampling, respectively, with featuresThe { P2, P3, P4, P5} scale is matched, and finally, the matched features are added in an element addition mode to obtain features { O2, O3, O4, O5}.
7) The features { O2, O3, O4, O5} are input into the candidate region generation network to generate candidate boxes, while the Roi-Pooling layer is used to extract the features of the candidate boxes.
8) The candidate box features are input to a head detection module (such as a head detection module in the technology of a master rcnn, a mask rcnn and the like) for prediction. The head detection module comprises a classification module and a regression module. The classification module is used for generating the category of the candidate frame, and the regression module is used for predicting the position coordinate offset. The offset is used to correct the position of the generated candidate box of step 7). And finally, obtaining a final detection result, namely the category and the position of the article, through a non-maximum suppression method, and judging whether the category of the article is a target category.
A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the above method.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the above method.
The invention has the beneficial effects that:
1) The invention provides a novel characteristic pyramid network, namely a multi-scale context awareness network, which comprises three modules, namely: the system comprises a cavity residual block, a trans-scale context aggregation module and a self-adaptive context aggregation module;
2) The target detection method based on multi-scale context awareness can obtain remarkable performance improvement on a base line of a target detection algorithm;
drawings
FIG. 1 is a flow chart of a target detection method based on multi-scale context awareness according to an embodiment of the present invention;
FIG. 2 shows a multi-scale context awareness-based target detection framework, with a structure diagram of a hole residual block on the right side, wherein CCAB is a cross-scale context aggregation module, CAB is a channel guidance aggregation module, and SAB is a space guidance aggregation module;
FIG. 3 illustrates a network structure diagram of a cross-scale context aggregation module;
fig. 4 shows a network architecture diagram of an adaptive context aggregation module, where (a) is a channel-directed aggregation module and (b) is a spatial-directed aggregation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. The described embodiments are only some, but not all, embodiments of the invention.
Example 1
The target detection method based on multi-scale context awareness comprises the following steps:
step S1: constructing a backbone network, and pre-training on a large-scale classification dataset for extracting multi-scale features { C2, C3, C4, C5} of an input image; the backbone network may select existing deep learning based neural networks such as residual network (ResNet) or multi-branch residual network (ResNeXt) and the like. The backbone network is pre-trained on large-scale classification datasets (such as ImageNet or Microsoft COCO).
Step S2: a multi-scale context aware network is constructed. First, the hole residual block generates enhanced high-level features with richer receptive fields by superposing a plurality of residual blocks with different hole ratios together, which can alleviate the loss of context information of the highest-level features, wherein the hole ratio of the residual block with the smallest hole ratio is the front, and then the hole ratio of the residual block is sequentially increased, i.e. the residual blocks are sequentially superposed together from small hole ratio to large hole ratio. And secondly, the cross-scale context aggregation block adopts an interactive fusion method to better fuse the context information of the adjacent layers, thereby providing more effective supplement for the current layer. Third, an adaptive context aggregation block is proposed to solve the semantic gap problem. Under the guidance of the channel and the space, the network can adaptively learn weights of different layers to generate a differentiated context.
And (5) a hole residual block. After obtaining the backbone network extraction to the top-level feature C5, we input it into the hole residual block to obtain rich context information P5, as shown in fig. 2. First, each residual block uses a 1×1 convolution layer to reduce the number of output channels, then the context semantic information is enhanced by a 3×3 convolution layer, the convolution kernel increases so that the receptive field increases, and thus the extracted features have rich context semantic information. Finally, the number of channels is recovered using a 1×1 convolutional layer. Notably, each 3×3 convolutional layer has a different void fraction, such as 2,4,6,8.
The cross-scale context aggregates blocks. Features P4 are obtained by fusing features of adjacent layers by a cross-scale context aggregation module (e.g., the context aggregation module acts on features P5 and C4). As shown in fig. 3, we assume that the inputs to the cross-scale aggregation block are f (i+1) and f (i); first, we enhance the input features by 1 convolution layer of 3×3.
f(i+1)=Conv(f(i+1))
f(i)=Conv(f(i))
The two branches are then cross-fused. f (i+1) is matched to f (i) by upsampling, and f (i) is matched to f (i+1) by downsampling. The fusion mode is as follows:
h(i+1)=Conv(Down(f(i)))+Conv(f(i+1))
h(i)=Conv(Up(f(i+1)))+Conv(f(i))
o(i)=Conv(h(i))+Conv(Up(h(i+1)))
P(i)=Conv(o(i)+f(i))
finally, we obtain enhanced features { P2, P3, P4, P5} through the cross-scale context aggregation block.
An adaptive context aggregation module. As shown in FIG. 2, the multi-scale features { P2, P3, P4, P5} are input into the channel-directed aggregation module and the space-directed aggregation module, respectively, to generate corresponding features X c And X s . Then, the two features are fused in an element addition mode to obtain enhanced feature X m . Note that we first need to unify the multi-scale features (chosen in the experiment as P4 scale size) and then input them into the adaptive context aggregation block.
The channel directs the aggregation module. As shown in fig. 4 (a), given that the output pyramid of the cross-scale context aggregation block is characterized by { P2, P3, P4, P5}, we can obtain their global semantic representation by addition of elements and input to the Global Average Pool (GAP) layer. Then, the Global Average Pool (GAP) layer is utilized to process the input global semantic expression and output global channel information. Afterwards, we compress the global channel information using 1×1 convolutional layer. In addition, N convolution layers are used for acting on the channel weights of the compressed global channel information to pyramid features, and finally the channel weights and the pyramid features are subjected to weighted summation to obtain features X c . N is the number of pyramid feature layers.
The space directs the aggregation module. As shown in fig. 4 (b), a global semantic representation of the pyramid features { P2, P3, P4, P5} is first obtained by element addition. Two different spatial context information are then generated using the average pooling and maximum pooling operations. And we use the Concat operation to fuse the two context information. Then, we can use N7×7 convolution layers to act on the fused context information to obtain the spatial weight of the pyramid feature, and finally obtain feature X by weighting and summing the spatial weight and the pyramid feature s
Rely on an enhancement module. We use a dependency enhancement module to act on feature X m Generating more discriminative feature X o . Experiments performed on existing attention blocks (e.g., SEBlock, CBAM, non-local and GCBlock) show that GCBlock and Non-local have good effects. Non-local brings a large number of parameters and computational burden compared to GCBlock. Accordingly, GCBlock (global context block, i.e., global context block) is selected herein as the default setting. The accuracy is further improved by effectively capturing long-range dependencies.
Feature X o Matching with the scales of the features { P2, P3, P4, P5} by means of upsampling or downsampling respectively, and the mostThen, the features { O2, O3, O4, O5} are obtained by means of element addition. Wherein for X, based on the scale of each layer of features { P2, P3, P4, P5} o Respectively performing operations; for the ith layer feature Pi, if X o Is smaller than it, up-sampling if X o Is larger than it, downsampling.
Step S3: and constructing a candidate area generation network. The candidate region generation network may generate the detection box. For each point on the feature map { O2, O3, O4, O5} obtained in step S2, it may generate a detection box having a different scale and aspect ratio. Extracting the characteristics of the detection frames through the ROI alignment layer, and finally inputting the extracted characteristics into two network layers, wherein one network layer is used for classifying whether an object contained in the frame belongs to the foreground or not; the other outputs an offset of the detection frame relative to the real object frame. And carrying out preliminary correction on the detection frame through the predicted offset.
Step S4: and constructing a head detection module, and reclassifying the corrected detection frame. The head detection module includes: the classification module is used for outputting classification results of each detection frame; the position regression module is used for outputting the offset of each detection frame relative to the real target.
Step S5: the network is trained by a gradient descent algorithm. When the number of turns prescribed in advance is reached, the entire network stops training.
Step S6: and (5) network testing.
Example 2
An embodiment 2 of the present invention provides an electronic device including a memory and a processor, wherein when a target detection program based on multi-scale context awareness is executed by the processor, the processor is caused to execute a target detection method based on multi-scale context awareness, the method including the steps of:
1) Performing multi-scale feature extraction on the input image by using a pre-trained backbone network;
2) Fusing the extracted multi-scale features by adopting a multi-scale context sensing network;
3) Inputting the fused features into a candidate region generation network to extract candidate frames, and extracting the features of the candidate frames through a Roi-Pooling layer;
4) The extracted candidate frame features are input to a head detector to obtain the category and the positional offset of the detection frame. The offset is used to correct the position of the candidate box generated in step 3). Finally, the final detection result, namely the category and the position of the article, is obtained through a non-maximum suppression method.
Example 3
Embodiment 3 of the present invention provides a computer-readable storage medium, wherein the program, when executed by a processor, causes the processor to execute a multi-scale context-aware-based object detection method, the method including the steps of:
1) Performing multi-scale feature extraction on the input image by using a pre-trained backbone network;
2) Fusing the extracted multi-scale features by adopting a multi-scale context sensing network;
3) Inputting the fused features into a candidate region generation network to extract candidate frames, and extracting the features of the candidate frames through a Roi-Pooling layer;
4) The extracted candidate frame features are input to a head detector to obtain category and position information of the detection frame.
The foregoing is merely a preferred example of the present disclosure, and is not intended to limit the present disclosure, so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (9)

1. A target detection method based on multi-scale context awareness comprises the following steps:
1) Extracting a plurality of scale features of the image by using a backbone network;
2) Enhancing the top-level features in the plurality of scale features through the cavity residual blocks to obtain top-level features with high-level features;
3) Fusing the features of adjacent layers through a trans-scale context aggregation module to generate pyramid features; the method for generating pyramid features by the cross-scale context aggregation module comprises the following steps: 31 Enhancement of the input two adjacent layers of features f (i+1) and f (i) by a 3×3 convolution layer, respectively; 32 Up-sampling the added feature f (i+1) and matching and fusing the enhanced feature f (i) to obtain a feature h (i); downsampling the enhanced feature f (i) and carrying out matching fusion on the enhanced feature f (i+1) to obtain a feature h (i+1); 33 Up-sampling the characteristic h (i+1) and then carrying out matching fusion with the characteristic h (i) to obtain a characteristic o (i); 34 Matching and fusing the feature o (i) and the ith layer feature f (i) to generate pyramid features;
4) Pyramid features are aggregated through a self-adaptive context aggregation module to obtain features X m
5) Further enhancement of feature X by dependency enhancement module m Generating enhanced feature X o
6) Feature X o Matching with pyramid features in an up-sampling or down-sampling mode respectively, and adding the matched features in an element adding mode;
7) Inputting the features obtained in the step 6) into a candidate region generation network to generate candidate frames, and extracting the features of the candidate frames;
8) Inputting the characteristics of the candidate frames into a head detection module for prediction to obtain the category and position coordinates of the candidate frames; and filtering the detection result of the candidate frame by a non-maximum suppression method to obtain the category and position information of the articles in the candidate frame.
2. The method of claim 1, wherein the hole residual block comprises a plurality of residual blocks having different hole rates; sequentially inputting top-level features in the plurality of scale features into each residual block, wherein each residual block firstly adopts a 1X 1 convolution layer to reduce the channel number of input data, then enhances context semantic information of the input data through a 3X 3 convolution layer, and then uses a 1X 1 convolution layer to recover the channel number of the input data; wherein the 3 x 3 convolutional layers in different residual blocks have different void fractions.
3. The method of claim 1 or 2, wherein the adaptive context aggregation module comprises a channel-directed aggregation module and a spatial-directed aggregation module; pyramid features are respectively input into a channel guidance aggregation module and a space guidance aggregation module to generate corresponding features X c And X s The method comprises the steps of carrying out a first treatment on the surface of the Then feature X c And X s Fusion is carried out by an element addition mode, and enhanced characteristic X is obtained m
4. The method of claim 3, wherein the channel guide aggregation module first obtains a global semantic representation of pyramid features and inputs to a global average pool layer; then, processing the input global semantic expression by using a global average pool layer to output global channel information; then compressing global channel information by using a 1X 1 convolution layer, obtaining channel weights of pyramid features by using N convolution layers to act on the compressed global features, and obtaining features X by weighting and summing the channel weights and the pyramid features c The method comprises the steps of carrying out a first treatment on the surface of the Where N is the number of pyramid feature layers.
5. The method of claim 3, wherein the spatial guidance aggregation module first obtains a global semantic representation of pyramid features; then carrying out average pooling and maximum pooling operation on the global semantic representation respectively to generate two different spatial context information; then fusing the two space context information; then using N7X 7 convolution layers to act on the fused spatial context information to obtain the spatial weight of the pyramid feature, and finally obtaining the feature difference through the spatial weight and the pyramid feature in a weighted summation mode; where N is the number of pyramid feature layers.
6. The method of claim 1, wherein the dependency enhancement module is an attention module GCBlock.
7. The method of claim 1, wherein the candidate region generation network generates a detection box having a different scale and aspect ratio for each point on the feature resulting from step 6); then extracting the characteristics of the detection frames and inputting the characteristics into two network layers, wherein one network layer is used for classifying, namely identifying whether an object contained in the detection frame belongs to the prospect; the other network layer predicts and outputs the offset of the detection frame relative to the real object frame; then correcting the detection frame through the predicted offset;
and then reclassifying and regressing the corrected detection frame.
8. A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the method of any of claims 1 to 7.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202111061082.7A 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness Active CN113743521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111061082.7A CN113743521B (en) 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111061082.7A CN113743521B (en) 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness

Publications (2)

Publication Number Publication Date
CN113743521A CN113743521A (en) 2021-12-03
CN113743521B true CN113743521B (en) 2023-06-27

Family

ID=78737903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111061082.7A Active CN113743521B (en) 2021-09-10 2021-09-10 Target detection method based on multi-scale context awareness

Country Status (1)

Country Link
CN (1) CN113743521B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920468B (en) * 2021-12-13 2022-03-15 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN116052026B (en) * 2023-03-28 2023-06-09 石家庄铁道大学 Unmanned aerial vehicle aerial image target detection method, system and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area
CN111401201A (en) * 2020-03-10 2020-07-10 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN111461110A (en) * 2020-03-02 2020-07-28 华南理工大学 Small target detection method based on multi-scale image and weighted fusion loss
CN111738110A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
CN112347859A (en) * 2020-10-15 2021-02-09 北京交通大学 Optical remote sensing image saliency target detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936861B2 (en) * 2018-09-28 2021-03-02 Aptiv Technologies Limited Object detection system of a vehicle

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
CN111259758A (en) * 2020-01-13 2020-06-09 中国矿业大学 Two-stage remote sensing image target detection method for dense area
CN111461110A (en) * 2020-03-02 2020-07-28 华南理工大学 Small target detection method based on multi-scale image and weighted fusion loss
CN111401201A (en) * 2020-03-10 2020-07-10 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN111738110A (en) * 2020-06-10 2020-10-02 杭州电子科技大学 Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN112347859A (en) * 2020-10-15 2021-02-09 北京交通大学 Optical remote sensing image saliency target detection method
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Reverse Densely Connected Feature Pyramid Network for Object Detection;Xin, Y.等;《Asian Conference on Computer Vision(ACCV 2018)》;530–545 *
SSFENet: Spatial and Semantic Feature Enhancement Network for Object Detection;T. Wang等;《IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;1500-1504 *
基于多尺度特征融合网络的遥感影像目标检测;田婷婷等;《激光与光电子学进展》;第59卷(第16期);427-435 *

Also Published As

Publication number Publication date
CN113743521A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN110084292B (en) Target detection method based on DenseNet and multi-scale feature fusion
CN110929736B (en) Multi-feature cascading RGB-D significance target detection method
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN110348437B (en) Target detection method based on weak supervised learning and occlusion perception
CN113139543B (en) Training method of target object detection model, target object detection method and equipment
CN111027576B (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN113743521B (en) Target detection method based on multi-scale context awareness
CN110674685B (en) Human body analysis segmentation model and method based on edge information enhancement
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN110782430A (en) Small target detection method and device, electronic equipment and storage medium
CN115222946A (en) Single-stage example image segmentation method and device and computer equipment
CN111507359A (en) Self-adaptive weighting fusion method of image feature pyramid
CN116012722A (en) Remote sensing image scene classification method
EP3671635B1 (en) Curvilinear object segmentation with noise priors
US20230154005A1 (en) Panoptic segmentation with panoptic, instance, and semantic relations
CN117079098A (en) Space small target detection method based on position coding
CN115830449A (en) Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
CN114359709A (en) Target detection method and device for remote sensing image
Xu et al. Scale-aware squeeze-and-excitation for lightweight object detection
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN117372853A (en) Underwater target detection algorithm based on image enhancement and attention mechanism
Liu et al. Global-local attention mechanism based small object detection
CN114462490A (en) Retrieval method, retrieval device, electronic device and storage medium of image object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant