GB2614954A - Object detection method based on attention-enhanced bidirectional feature pyramid network (A-BiFPN) - Google Patents

Object detection method based on attention-enhanced bidirectional feature pyramid network (A-BiFPN) Download PDF

Info

Publication number
GB2614954A
GB2614954A GB2217717.4A GB202217717A GB2614954A GB 2614954 A GB2614954 A GB 2614954A GB 202217717 A GB202217717 A GB 202217717A GB 2614954 A GB2614954 A GB 2614954A
Authority
GB
United Kingdom
Prior art keywords
attention
features
feature
bifpn
follows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2217717.4A
Other versions
GB202217717D0 (en
Inventor
Zhang Huanlong
Zhang Jianwei
Shi Kunfeng
Du Qifan
Zhang Jie
Zhang Xuncai
Han Dongwei
Tian Yangyang
Guo Zhimin
Wang Fengxian
Qiao Jianwei
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Publication of GB202217717D0 publication Critical patent/GB202217717D0/en
Publication of GB2614954A publication Critical patent/GB2614954A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/72Data preparation, e.g. statistical preprocessing of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

Object detection method based on an attention-enhanced bidirectional feature pyramid network (A-BiFPN), comprising: inputting an image to a Visual Geometry Group (VGG) network to obtain four features layers Pin3-6 of different resolutions; inputting the feature layers to a BiFPN, fusing the features at different dimensions through top-bottom and bottom-top path branches thereby obtaining fused features Pout3-6 that containing rich semantic information and detailed information; processing the fused features with a coordinate attention mechanism (grey box) to obtain attention feature maps Y3-6 (not shown); inputting the attention feature maps into a prediction module (predict) for classification and location; and filtering out a redundant prediction boxes through non-maximum suppression (NMS) to obtain a final prediction result. Utilising the above method may improve detection of small objects.

Description

Intellectual Property Office Application No G132217717.4 RTM Date:22 May 2023 The following terms are registered trade marks and should be read as such wherever they occur in this document: Intel NVIDIA Python Intellectual Property Office is an operating name of the Patent Office www.gov.uk/ipo
OBJECT DETECTION METHOD BASED ON ATTENTION-ENHANCED
BIDIRECTIONAL FEATURE PYRAMID NETWORK (A-BiFPN)
TECHNICAL FIELD
[0001] The present disclosure relates to the technical field of object detection, and in particular, to an object detection method based on an attention-enhanced bidirectional feature pyramid network (A-BiFPN).
BACKGROUND
[0002] Object detection is a popular computer vision technique that works to identify, locate, and label specific objects in input images. It has been used in various computer vision applications such as face recognition, self-driving, etc In recent years, due to the development of Convolutional Neural Network (CNN) and hardware computing, the object detection based on deep learning has made significant breakthroughs.
[0003] Although great progress has been made in object detection, small object detection, which is widely used in practical production, remains an unsolved challenge. This is mainly because small objects take up less space and have limited pixels. In addition, after convolution and pooling are conducted multiple times, feature information of the small object in the feature map is seriously lost, resulting in the failure of a detector to accurately detect the small object. Therefore, Liu et al, proposed a typical pyramid structure in a Single Shot Detector (SSD). A typical pyramid structure creatively uses shallow features for smaller object detection and deep features for larger object detection. However, as we all know, shallow features contain rich detailed information, while deep features contain more semantic information. Therefore, SSD algorithm cannot obtain enough details and semantic information of a small object via single feature mapping, making it difficult to effectively detect the small object To address this problem, many researchers have focused on developing multi-scale feature fusion to obtain richer feature representations. In addition to multi-scale feature fusion, the attention mechanism is also helpful to small object detection. The attention mechanism can learn to generate different weights according to the ability of different channels and positions to represent the object, and locally enhance the important channels and positions, which is conducive to the location and recognition of small objects.
SUMMARY
[0004] In view of the problems existing in the prior art, the present disclosure provides an object detection method based on an attention-enhanced bidirectional feature pyramid network (ABiFPN). According to the method, firstly, a BiFPN fuses features at different dimensions such that output features have rich semantic information and detailed information; and then a coordinate attention mechanism enables the network to pay attention to channels and locations related to objects, thereby improving small object detection performance of the object detection algorithm.
[0005] The technical solution of the present disclosure is implemented as follows [0006] The method includes the following steps: Si: inputting an image to a Visual Geometry P °Jr Group (VGG) network to obtain features and / r of 4 layers, 100071 S2. inputting * and Po'' to a BiFPN, and fusing the features at different dimensions through top-bottom and bottom-top path branches so as to obtain features and r6 containing rich semantic information and detailedinformation; 100081 S3: processing and r by a coordinate attention mechanism respectively to obtain attention feature maps Y4, Y4, Y5 and Y6, 100091 S4: putting the attention feature maps Y3, Y4, Y5 and Y6 of four layers output by the coordinate attention mechanism into a prediction module for classification and location; and 100101 S5: filtering out a redundant prediction box through non-maximum suppression (NIVIS) to obtain a final prediction result.
[0011] In step S2, the features of different layers are subject to weighted fusion as follows: [0012] fusing the features of different layers in a fast and normalized manner, where a calculation formula for weighted feature fusion is as follows: E, e4.3v, [0013] [0014] where wi?0 is ensured by using a rectified linear unit (ReLU) after each w-J, E is configured to avoid numerical uncertainty with a value of 0.0001, and Ii represents a value of an ith input feature.
100151 In step S2, a process of fusing the features of different layers by the BiFPN is performed specifically as follows: [0016] For example, a calculation process for fusing t by a top-bottom path branch is as follows: [0017] [0018] where Fup denotes an upsampling process, t; and respectively denote an input feature at a fifth layer and an input feature at a sixth layer in the BiFPN, wi and w) denote weights when are fused, and c is configured to avoid numerical uncertainty with a value of 0.0001.
[0019] For example, a calculation process for fusing by a bottom-top path branch is as follows: [0020] p 100211 Fdo, represents a downsampling process. Finally, and / are fused in the above fusing manner to obtain, and ^ containing rich semantic information and detailed information.
[0022] The processing the fused features by a coordinate attention mechanism in step S3 specifically includes: 100231 S3.1: when an input X has a size of (C xH xW), setting pooling kernels with sizes of (H,1) and (1,W) to encode information of different channels in horizontal and vertical directions; for a cth channel in features, calculating an output of a feature with a height of h after pooling as follows. (h
100241, and [0025] calculating an output of a feature with a width of B after pooling as follows [0026] [0027] S3.2: after performing pooling in horizontal and vertical directions, transforming from C/W/H to CxWx I and cx 1 xH, and transforming CxWx1 to Cx1xH for the purpose of integration; [0028] S3.3: performing connection at a third dimension (H+H=2H) to obtain an attention feature map of C x 1 x2H; [0029] S3.4: giving the attention feature map as an input to a 1 xl convolutional layer, where afterwards, a channel number changes to C/r, and the dimension of the attention feature map changes to Cirx 1 x 2H; [0030] S3.5: decomposing the attention feature map of Cirx 1 x 2H into two independent tensors fhElIcl'H) and r E RCir'w) along a spatial dimension; [0031] S3.6: then restoring channel numbers of the two tensors to C through two 1 xl convolutional layers Fii and F",, and using a sigmoid activation function to obtain weight matrices gf and g as follows: [0032] gh=c(Fh(0) [0033] gw=c(F,(r)), and 100341 S3.7: multiplying the input feature X by the weight matrices to obtain a final output Y of a coordinate attention module as follows.
[0035] YI0' 100361 Compared with the prior art, the present disclosure has the following beneficial effects: the A-BiFPN fuses the features at different dimensions through top-bottom and bottom-top path branches so as to obtain features containing rich semantic information and detailed information. In addition, each feature output branch is processed by coordinate attention such that the network can easily pay attention to the channels and locations related to objects in feature maps, thereby achieving precise classification and location on the objects.
[0037] A Visual Geometry Group (VGG) network or VGGNet is a neural network, in particular, a convolutional neural network, such as a deep convolutional neural network for image recognition. Such a neural network may be trained by the Visual Geometry Group (VGG) at the University of Oxford. This is set out at: https //machinel earning. wtfiterms/vggnet/#:-:text=VGGNet%20is%20e/020deep%20convolutio naP/O2Oneural%2Onetwor10/O20for, the%202014°/O20ImageNet%20Large%20Scale%20VisualgiO2 ORecognition%20Competition.
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] FIG. 1 is a network structure diagram according to the present disclosure; [0039] FIG. 2(a) is a network structure diagram for a coordinate attention model-coordinate attention mechanism; [0040] FIG. 2(b) is a flowchart for a coordinate attention model-coordinate attention mechanism; 100411 FIG. 3 shows comparison of detection results on an NWPU VHR-10 dataset between the present disclosure and an original SSD algorithm, where the detection results of the original SSD algorithm are shown; and [0042] FIG. 4 shows comparison of detection results on an NWPU VHR-10 dataset between the present disclosure and an original SSD algorithm, where the detection results of the improved SSD algorithm are shown.
DETAILED DESCRIPTION OF THE EMBODIMENTS
100431 The technical solutions of the embodiments of the present disclosure are clearly and completely described below with reference to the accompanying drawings. Apparently, the described embodiments are merely a part rather than all of the embodiments of the present disclosure. All other embodiments obtained by those of ordinary skill in the art based on the embodiments of the present disclosure without creative efforts should fall within the protection scope of the present disclosure.
[0044] As shown in FIG 1, an embodiment of the present disclosure provides an object detection method based on an A-BiFPN, where the method includes the following steps: [0045] SI: Input an image to a Visual Geometry Group (VGG) network, the VGG network ni conducts feature extraction on the input image to obtain features and of four layers.
[0046] S2: Input and Y to a BiFPN, and fuse the features at different dimensions through top-bottom and bottom-top path branches so as to obtain features and containing rich semantic information and detailed information.
[0047] The features of different layers are subject to weighted fusion as follows: [0048] Fuse the features of different layers in a fast and normalized manner, where a calculation formula for weighted feature fusion is as follows: [0049] [0050] Where w>0 is ensured by using a rectified linear unit (ReLLT) after each Wi, E is configured to avoid numerical uncertainty with a value of 0.0001, and I; represents a value of an ith input feature.
[0051] A process of fusing the features of different layers by the BiFPN is performed specifically as follows: [0052] For example, a calculation process for fusing t by a top-bottom path branch is as follows: r w [0053] [0054] Where F1r. denotes an upsampling process, and FT respectively denote an input feature at a fifth layer and an input feature at a sixth layer in the BiFPN, mo and w) denote weights when and' are fused, and a is configured to avoid numerical uncertainty with a value of 0.0001.
[0055] For example, a calculation process for fusing by a bottom-top path branch is as follows [0056] [0057] Where Fdowil denotes a downsampling process yea fused in the above fusing manner to obtain information and detailed information.
and finally, k3and P'2. are 14,ht nd 4 containing rich semantic Pr [0058] S3: Process and = by a coordinate attention mechanism respectively to obtain attention feature maps Y3, Y4, Y5 and Yo. For example, the processing the input feature map Pr by the coordinate attention model is specifically as follows: [0059] S31: When Ps has a size of (256x 10x10), set pooling kernels with sizes of 00,1) and (1,10) to encode information of different channels in horizontal and vertical directions; and for a cth channel in features, calculate an output of a feature with a height of h after pooling as follows: [0060] [0061] Calculate an output of a feature with a width of B after pooling as follows: [0062] [0063] S3.2: After performing pooling in horizontal and vertical directions, tranform from 256 x10/10 to 256 x 10x 1 and 256x 1/10. Transform 256/10x 1 to 256 x 1/ 10 for the purpose of integration.
[0064] S3.3: Perform connection at a third dimension (10+10=20) to obtain an attention feature map of 256/1/20.
[0065] S3.4: Give the attention feature map as an input to a 1/1 convolutional layer, where afterwards, a channel number changes to 8, and the dimension of the attention feature map changes to 8/ 1/20.
[0066] S3.5: Decompose the attention feature map of 8x 1 x20 into two independent F h E ROIAH) E,w and Fw R0r) along a spatial dimension.
[0067] S3.6: Then restore channel numbers of the two tensors to 256 through two lx] convolutional layers Fh and F, and use a sigmoid activation function to obtain weight matrices gf and g"' as follows: [0068] gl1=c(Fh(f11)) [0069] gw=a(F".(r)) [0070] S3.7: Multiply the input feature rby the weight matrices to obtain a final output Y3 of a coordinate attention module as follows.
x ') x [00711 Dow [0072] 53.8: Sequentially process an d1 according to steps S3.1-S3.7 to obtain attention feature maps Y4, Y5 and Y6.
[0073] S4: Put the attention feature maps Y3, Ya, Y5 and Ys of four layers output by the coordinate attention mechanism into a prediction module for classification and location; and [0074] S5: Filter out a redundant prediction box through non-maximum suppression (NMS) to obtain a final prediction result.
[0075] As shown in FIGs. 3-4, FIG. 3 shows comparison of detection results on an NWPU VHR-10 dataset between the original SSD object detection algorithm and the object detection method based on the A-BiFPN in the present disclosure. The comparison shows an 7.92% improvement in the detection performance. The example of the present disclosure is implemented on a computer with Intel Platinum 8163CPU(2.50 GHz), 256 GB RAM and NVIDIA TITANRTX using python3.6. The present disclosure uses the NWPU VHR-10 dataset as an experimental material, and average precision MAP as an evaluation indicator. The dataset includes 10 objects of different types, including airplanes, ships, storage tanks, baseball diamonds, tennis courts, basketball courts, ground trackfields, harbors, bridges and vehicles, with a total of 520 training samples and 280 test samples. The training samples are used to train the object detection model, and the test samples are used to evaluate the model detection effect. 100761 The above described are merely preferred embodiments of the present disclosure, and not intended to limit the present disclosure. Any modifications, equivalent replacements and improvements made within the principle of the present disclosure should all fall within the scope of protection of the present disclosure.

Claims (4)

  1. SWHAT IS CLAIMED IS: 1. An object detection method based on an attention-enhanced bidirectional feature pyramid network (A-BiFPN), comprising the following steps: Si: inputting an image to a Visual Geometry Group (VGG) network to obtain features nd of 4 layers; S2: inputting and Pt to a Bi FPN, and fusing the features at different dimensions through top-bottom and bottom-top path branches so as to obtain features a and containing rich semantic information and detailed information; S3: processing irw 5<" and 6 by a coordinate attention mechanism respectively to obtain attention feature maps Y3, Y4, Y5 and Yo, 54: putting the attention feature maps Y3, Yt, Y5 and Y6 of four layers output by the coordinate attention mechanism into a prediction module for classification and location; and S5: filtering out a redundant prediction box through non-maximum suppression (NMS) to obtain a final prediction result.
  2. 2. The object detection method based on an A-BiFPN according to claim 1, wherein the fusing in step S2 specifically comprises: fusing the features of different layers in a fast and normalized manner, wherein a calculation formula for weighted feature fusion is as follows: wherein wL0 is ensured by using a rectified linear unit (ReLU) after each wi, E is configured to avoid numerical uncertainty with a value of 0.0001, and L represents a value of an ith input feature
  3. 3. The object detection method based on an A-BiFPN according to claim 1 or claim 2, wherein in step S2, a process of fusing features at the third layer through a top-bottom path branch is expressed as follows: -Co wherein F" denotes an upsampling process, and 6 respectively denote an input feature at a fifth layer and an input feature at a sixth layer in the BiFPN, wl and w) denote weights when Pr and are fused, and e is configured to avoid numerical uncertainty with a value of 0.0001; and a process of fusing features at the third layer through a bottom-top path branch is expressed as follows.C (.311Z, wherein Fdown denotes a downsampling process; and finally, are fused in the above fusing manner to obtain and r containing rich semantic information and detailed information
  4. 4. The object detection method based on an A-B FPN according to any preceding claim, wherein the processing the fused features by a coordinate attention mechanism in step S3 specifically comprises: S3.1: when an input X has a size of C xilxW, setting pooling kernels with sizes of (H,1) and (1,W) to encode information of different channels in horizontal and vertical directions; and for a cth channel in features, calculating an output of a feature with a height of h after pooling as follows: 0 -(h, and calculating an output of a feature with a width of B after pooling as follows: S3.2: after performing pooling in horizontal and vertical directions, transforming from C >AV xH to Cx\V x 1 and Cx1xH, and transforming C >AV/ 1 to Cx 1 xl1; S3.3: performing connection at a third dimension to obtain an attention feature map of Cx1x2H; S3.4: giving the attention feature map as an input to a t xf convolutional layer, wherein afterwards, a channel number changes to C/r, and the dimension of the attention feature map changes to C/rx 1 x21-1; S3.5: decomposing the attention feature map of C/rx1x2H into two independent tensors fhe Itcmhi) and r e Itcm w) along a spatial dimension; S3.6: then restoring channel numbers of the two tensors to C through two 1 x 1 convolutional layers Fit and Fw, and using a sigmoid activation function to obtain weight matrices sif and gw as follows: gh=c(Fi,(0) gw=iii(Fw(F)), and S3.7: multiplying the input feature X by the weight matrices to obtain a final output Y of a coordinate attention module as follows: Ye(
GB2217717.4A 2022-05-23 2022-11-25 Object detection method based on attention-enhanced bidirectional feature pyramid network (A-BiFPN) Pending GB2614954A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210567741.2A CN114972860A (en) 2022-05-23 2022-05-23 Target detection method based on attention-enhanced bidirectional feature pyramid network

Publications (2)

Publication Number Publication Date
GB202217717D0 GB202217717D0 (en) 2023-01-11
GB2614954A true GB2614954A (en) 2023-07-26

Family

ID=82984798

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2217717.4A Pending GB2614954A (en) 2022-05-23 2022-11-25 Object detection method based on attention-enhanced bidirectional feature pyramid network (A-BiFPN)

Country Status (2)

Country Link
CN (1) CN114972860A (en)
GB (1) GB2614954A (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565077A (en) * 2022-09-29 2023-01-03 哈尔滨天枢问道技术有限公司 Remote sensing image small target detection algorithm based on spatial feature integration
CN116189021B (en) * 2023-02-27 2024-04-09 中国人民解放军国防科技大学 Multi-branch intercrossing attention-enhanced unmanned aerial vehicle multispectral target detection method
CN117315458B (en) * 2023-08-18 2024-07-12 北京观微科技有限公司 Target detection method and device for remote sensing image, electronic equipment and storage medium
CN117351359B (en) * 2023-10-24 2024-06-21 中国矿业大学(北京) Mining area unmanned aerial vehicle image sea-buckthorn identification method and system based on improved Mask R-CNN
CN117636172B (en) * 2023-12-06 2024-06-21 中国科学院长春光学精密机械与物理研究所 Target detection method and system for weak and small target of remote sensing image
CN117876831A (en) * 2024-01-15 2024-04-12 国家粮食和物资储备局科学研究院 Target detection and identification method, device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591648A (en) * 2021-07-22 2021-11-02 北京工业大学 Method, system, device and medium for detecting real-time image target without anchor point
CN114332620A (en) * 2021-12-30 2022-04-12 杭州电子科技大学 Airborne image vehicle target identification method based on feature fusion and attention mechanism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401201B (en) * 2020-03-10 2023-06-20 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN111914917A (en) * 2020-07-22 2020-11-10 西安建筑科技大学 Target detection improved algorithm based on feature pyramid network and attention mechanism
CN112396115B (en) * 2020-11-23 2023-12-22 平安科技(深圳)有限公司 Attention mechanism-based target detection method and device and computer equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591648A (en) * 2021-07-22 2021-11-02 北京工业大学 Method, system, device and medium for detecting real-time image target without anchor point
CN114332620A (en) * 2021-12-30 2022-04-12 杭州电子科技大学 Airborne image vehicle target identification method based on feature fusion and attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG et al., Recognition and detection of Wolfberry in the natural background based on improved YOLOv5 network, 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), IEEE, 20 May 2022, pp256 *

Also Published As

Publication number Publication date
CN114972860A (en) 2022-08-30
GB202217717D0 (en) 2023-01-11

Similar Documents

Publication Publication Date Title
GB2614954A (en) Object detection method based on attention-enhanced bidirectional feature pyramid network (A-BiFPN)
CN109902677B (en) Vehicle detection method based on deep learning
CN109522966B (en) Target detection method based on dense connection convolutional neural network
CN110176027A (en) Video target tracking method, device, equipment and storage medium
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN112288008B (en) Mosaic multispectral image disguised target detection method based on deep learning
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN109977997A (en) Image object detection and dividing method based on convolutional neural networks fast robust
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN105224935A (en) A kind of real-time face key point localization method based on Android platform
Lu et al. A cnn-transformer hybrid model based on cswin transformer for uav image object detection
CN112434586A (en) Multi-complex scene target detection method based on domain adaptive learning
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN112070040A (en) Text line detection method for video subtitles
Zhang et al. Sam3d: Zero-shot 3d object detection via segment anything model
CN114742799A (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
CN113239914A (en) Classroom student expression recognition and classroom state evaluation method and device
CN116434230A (en) Ship water gauge reading method under complex environment
CN116168246A (en) Method, device, equipment and medium for identifying waste slag field for railway engineering
Yuan et al. Faster light detection algorithm of traffic signs based on YOLOv5s-A2
CN114943888A (en) Sea surface small target detection method based on multi-scale information fusion, electronic equipment and computer readable medium
Yang et al. Automatic detection of bridge surface crack using improved Yolov5s
CN115410102A (en) SAR image airplane target detection method based on combined attention mechanism
Xie et al. Lightweight and anchor-free frame detection strategy based on improved CenterNet for multiscale ships in SAR images
Hou et al. The Improved CenterNet for Ship Detection in Scale-Varying Images

Legal Events

Date Code Title Description
COOA Change in applicant's name or ownership of the application

Owner name: YANGTZE DELTA REGION INSTITUTE (HUZHOU), UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA

Free format text: FORMER OWNER: ZHENGZHOU UNIVERSITY OF LIGHT INDUSTRY