CN114037885A - Target detection method based on size of selectable expansion convolution kernel - Google Patents

Target detection method based on size of selectable expansion convolution kernel Download PDF

Info

Publication number
CN114037885A
CN114037885A CN202010705702.5A CN202010705702A CN114037885A CN 114037885 A CN114037885 A CN 114037885A CN 202010705702 A CN202010705702 A CN 202010705702A CN 114037885 A CN114037885 A CN 114037885A
Authority
CN
China
Prior art keywords
convolution
feature
characteristic
sdcm
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010705702.5A
Other languages
Chinese (zh)
Other versions
CN114037885B (en
Inventor
何小海
熊书琪
吴晓红
陈洪刚
卿粼波
滕奇志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202010705702.5A priority Critical patent/CN114037885B/en
Publication of CN114037885A publication Critical patent/CN114037885A/en
Application granted granted Critical
Publication of CN114037885B publication Critical patent/CN114037885B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on the size of a selectable expansion convolution kernel, and relates to the field of computer vision and artificial intelligence. Firstly, extracting features through a convolutional neural network, and performing feature fusion through a feature pyramid; then, introducing the feature maps obtained by fusion into selectable expansion coefficient convolution modules in corresponding layers respectively, and obtaining better features through the modules; and finally, performing multi-classification and bounding box regression on the fused feature layer, and continuously training an iterative model to obtain a multi-scale fused target detection result. The method not only realizes effective improvement of precision, can keep real-time performance under the condition of inputting pictures with certain sizes, and can be applied to places such as machine vision, face recognition, automatic driving, intelligent video, medical detection and the like.

Description

Target detection method based on size of selectable expansion convolution kernel
Technical Field
The invention relates to a target detection method based on the size of a selectable expansion convolution kernel, and belongs to the technical field of computer vision and intelligent information.
Background
Object detection is the basis for computer vision tasks and for many applications in the field of artificial intelligence. For target detection, it is defined as follows: given an input RGB image, target detection accomplishes two tasks: and detecting and identifying, namely knowing what category the object belongs to and finding out the position of the picture where the object is located. The category may be a species commonly found in nature, such as people, poultry, vehicles, and the like, and the positioning is performed by using a bounding box (bounding box). The target detection has wide application in face recognition, automatic driving, man-machine interaction, content-based image retrieval, intelligent video monitoring and the like.
The existing detectors are mainly divided into two types: one is a single-stage detector and the other is a two-stage detector. The two-stage method divides the detection problem into two processes, firstly generates a region suggestion, and then classifies and regresses a candidate region by a boundary box, the beginning of the algorithm is an R-CNN algorithm proposed in 2016, but because a large number of frames are generated in the two stages, the calculation complexity is greatly increased, and real-time detection is difficult to achieve; the single-stage detector adopts a regression-based idea, a region-abandoning proposal stage predicts the class probability and the position coordinate of an object through an Anchor point (Anchor), and can obtain a final detection result through end-to-end learning, wherein the region-abandoning proposal stage can greatly reduce the calculation complexity, so that the real-time detection can be realized under the condition of proper input resolution, and a representative algorithm is as follows: YOLOv3, DSSD, RfineDet, etc. In recent years, the field of view problem is gaining more and more attention in the field of computer vision, the larger the field of view means that the more the computer pays attention to, the better the integrity is, but the larger the field of view often ignores many detailed factors, so a convolution method capable of selecting the expansion coefficient is proposed to learn the size of the field of view adaptively, so that the network focuses on the region of interest adaptively, and more attention is paid recently.
Disclosure of Invention
The invention provides a target detection method based on the size of a selectable expansion convolution kernel, and aims to design a selectable expansion convolution network structure, apply the selectable expansion convolution network structure to a characteristic pyramid so as to better utilize characteristic fusion, and then perform target detection.
The invention realizes the purpose through the following technical scheme:
(1) extracting features by using a reference network Darknet-53, obtaining a multi-scale feature layer after 5 downsampling convolution and 3 upsampling operations, performing weighting fusion operation, and finally performing secondary classification and frame regression operation;
(2) constructing a selectable Dilated convolution module (selected scaled convolutional module (SDCM);
(3) the SDCM is introduced into the combination of the characteristic pyramid structure, namely the top layer characteristic is fused with the bottom layer characteristic and then is assisted by the attention information SDCM to obtain more effective characteristic P5,P4,P3For multiple classification and regression operations.
(4) And finally, directly applying the characteristics to multi-classification and regression operation, and continuously training an iterative model to obtain a final detection result.
Drawings
FIG. 1 is a block diagram of a method for detecting an object according to the present invention.
Fig. 2 is a block diagram of an alternative expansion convolution based module of the present invention.
FIG. 3 is a diagram of a feature pyramid fusion module according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings in which:
the method for constructing the selectable dilated convolution module is as follows:
a series of feature, pooling and activation layers are obtained through the Darknet-53 network, P in Darknet-535,P4And sequentially introducing the SDCM into the characteristic layers, and sending the SDCM into the corresponding characteristic layers in the up-sampling stage for weighted fusion.
The network consists of a convolution function (conv), an activation function (LEAKyrelu) and an average pooling function (avgpool), given input F, obtaining output F ' after F passes through a selected expansion coefficient formula, and obtaining F ' by F ' through a channel attention mechanismCThen the spatial attention mechanism yields the final output F ".
F'=w×conv 1(F)+(1-w)×conv 2(F) (1)
In the formula, the convolution coefficient of the conv1 function is 1, the convolution coefficient of the conv2 function is 2, and the feature map F ″ output finally is as follows:
Figure BDA0002594664320000021
in the formula, NcRepresenting channel attention mechanism operation, NsRepresenting a spatial attention mechanism of operation,
Figure BDA0002594664320000031
representing a convolution operation.
The method for constructing the characteristic pyramid fusion module comprises the following steps:
the size of the input feature map is (C multiplied by H multiplied by W), wherein H and W represent the input height and width, and C represents the number of channels; for top layer feature P'5Dimension is reduced from 1024 to 512 for detected feature P 'by dimension reduction operation of 3 x 3 convolutional layers'4The output P 'obtained by performing dimension reduction operation while enlarging the feature map through dimension reduction operation of the 3 x 3 convolution layer'5Has a dimension of (19 × 19 × 255); then, the output is subjected to weighted element-wise summation operation to obtain a fused feature map P'4Has a dimension of (38 × 38 × 255); and then made of P'4Down-sampling to obtain P'3Finally, P 'are respectively paired'5,P'4,P'3Classification and regression operations were performed separately.
In order to verify the effectiveness of the target detection method based on the selectable dilation convolution, experiments were conducted in the MS COCO2017 dataset. The experimental platforms herein are: ubuntu 20.04, Nvidia RTX 2080Ti GPU, Intel (R) core (TM) i7-9700 CPU. The deep learning frame is Pythrch, the precision evaluation indexes are mAP (mean average precision) and AP (average precision), and the speed evaluation index is fps (frames per second).
The MS COCO2017 data set comprises 118282 training sets and 5000 testing sets. Experiments were trained on a COCO2017 train val and tested on a COCO2017 testval dataset. All experiments were pre-trained in the VGG16 reference network. The learning rate is adjusted using the Cosine approach, namely, inThe learning rate in the first 50epochs was set to 10-2Then, the learning rates in 100epochs were set to 10, respectively-3. When the input picture size is 608 × 608, the batch size is set to 16 during training, and the number of GPUs is 2; when the picture size is 416 × 416, the batch size is set to 32 at the time of training, and the number of GPUs is 2. The batch size is set to 1 for testing, and the pytorch acceleration is not applicable. The experimental results of the invention are shown in table 1 and table 2, for 608 × 608 input, the mAP is 36.2%, and the detection speed is 50 fps; for an input of 416 x 416, the mAP is 36.1%, and the detection speed is 60fps, which is superior to the existing one-stage detector.
TABLE 1 test results of different algorithms on COCO test-dev2017 data set
Figure BDA0002594664320000041
TABLE 2 different algorithms are small on COCO test-dev2017 dataset (AP)s) China (AP)m) Large (AP)l) AP comparison of targets
Figure BDA0002594664320000042

Claims (4)

1. A method for detecting an object based on a selectable magnitude of a dilated convolution kernel, comprising the steps of:
(1) extracting features by using a reference network Darknet-53, obtaining a multi-scale feature layer after 5 downsampling convolution and 3 upsampling operations, performing weighting fusion operation, and finally performing secondary classification and frame regression operation;
(2) constructing a selectable scaled Convolution Module (SDCM);
(3) the SDCM is introduced into the combination of the characteristic pyramid structure, namely the top layer characteristic is fused with the bottom layer characteristic and then is assisted by the attention information SDCM to obtain more effective characteristic P5,P4,P3For multiple classification and regression operations.
2. The method of claim 1, wherein the optional dilation convolution module is constructed in (1) by:
a series of feature, pooling and activation layers are obtained through the Darknet-53 network, P in Darknet-534,P5And sequentially introducing the SDCM into the characteristic layers, and sending the SDCM into the corresponding characteristic layers in the up-sampling stage for weighted fusion.
3. The method of claim 1, wherein the optional dilation convolution module is constructed in (2) by:
the network consists of a convolution function (conv), an activation function (leakyrelu) and an average pooling function (avgpool); given input F, obtaining output F 'after selecting expansion coefficient formula, obtaining F' by channel attention mechanismCThen the spatial attention mechanism yields the final output F ":
F'=w×conv 1(F)+(1-w)×conv 2(F) (1)
in the formula, the convolution coefficient of the conv1 function is 1, the convolution coefficient of the conv2 function is 2, and the feature map F ″ output finally is as follows:
Figure FDA0002594664310000011
in the formula, NcRepresenting channel attention mechanism operation, NsRepresenting a spatial attention mechanism of operation,
Figure FDA0002594664310000012
representing a convolution operation.
4. The method of claim 1, wherein the fusing of the pyramid structure of the features of the dilated convolution module in (3) is selected as follows:
the size of the input feature map is (C multiplied by H multiplied by W), wherein H and W represent the input height and width, and C represents the number of channels; for top layer feature P'5Dimension reduction from 1024 to 512 for feature P 'via dimension reduction operation of 3 x 3 convolutional layers'4The output P 'obtained by performing dimension reduction operation while enlarging the feature map through dimension reduction operation of the 3 x 3 convolution layer'5Has a dimension of (19 × 19 × 255); then, the output is subjected to weighted element-wise summation operation to obtain a fused feature map P'4Has a dimension of (38 × 38 × 255); and then made of P'4Down-sampling to obtain P'3Finally, P 'are respectively paired'5,P'4,P'3Classification and regression operations were performed separately.
CN202010705702.5A 2020-07-21 2020-07-21 Target detection method based on selectable expansion convolution kernel size Active CN114037885B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010705702.5A CN114037885B (en) 2020-07-21 2020-07-21 Target detection method based on selectable expansion convolution kernel size

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010705702.5A CN114037885B (en) 2020-07-21 2020-07-21 Target detection method based on selectable expansion convolution kernel size

Publications (2)

Publication Number Publication Date
CN114037885A true CN114037885A (en) 2022-02-11
CN114037885B CN114037885B (en) 2023-06-20

Family

ID=80134077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010705702.5A Active CN114037885B (en) 2020-07-21 2020-07-21 Target detection method based on selectable expansion convolution kernel size

Country Status (1)

Country Link
CN (1) CN114037885B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118070851A (en) * 2024-04-22 2024-05-24 长春理工大学 Conformer-based multi-scale fusion convolution system in stream type voice recognition

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510012A (en) * 2018-05-04 2018-09-07 四川大学 A kind of target rapid detection method based on Analysis On Multi-scale Features figure
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
WO2018229490A1 (en) * 2017-06-16 2018-12-20 Ucl Business Plc A system and computer-implemented method for segmenting an image
CN109815785A (en) * 2018-12-05 2019-05-28 四川大学 A kind of face Emotion identification method based on double-current convolutional neural networks
CN110276269A (en) * 2019-05-29 2019-09-24 西安交通大学 A kind of Remote Sensing Target detection method based on attention mechanism
CN110532955A (en) * 2019-08-30 2019-12-03 中国科学院宁波材料技术与工程研究所 Example dividing method and device based on feature attention and son up-sampling
CN110705457A (en) * 2019-09-29 2020-01-17 核工业北京地质研究院 Remote sensing image building change detection method
CN110796168A (en) * 2019-09-26 2020-02-14 江苏大学 Improved YOLOv 3-based vehicle detection method
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229490A1 (en) * 2017-06-16 2018-12-20 Ucl Business Plc A system and computer-implemented method for segmenting an image
CN108510012A (en) * 2018-05-04 2018-09-07 四川大学 A kind of target rapid detection method based on Analysis On Multi-scale Features figure
CN109034210A (en) * 2018-07-04 2018-12-18 国家新闻出版广电总局广播科学研究院 Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN109815785A (en) * 2018-12-05 2019-05-28 四川大学 A kind of face Emotion identification method based on double-current convolutional neural networks
CN110276269A (en) * 2019-05-29 2019-09-24 西安交通大学 A kind of Remote Sensing Target detection method based on attention mechanism
CN110532955A (en) * 2019-08-30 2019-12-03 中国科学院宁波材料技术与工程研究所 Example dividing method and device based on feature attention and son up-sampling
CN110796168A (en) * 2019-09-26 2020-02-14 江苏大学 Improved YOLOv 3-based vehicle detection method
CN110705457A (en) * 2019-09-29 2020-01-17 核工业北京地质研究院 Remote sensing image building change detection method
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LIANG-CHIEH CHEN等: "Rethinking Atrous Convolution for Semantic Image Segmentation", 《COMPUTER VISION AND PATTERN RECOGNITION》 *
SANGHYUN WOO等: "CBAM: Convolutional Block Attention Module", 《COMPUTER VISION AND PATTERN RECOGNITION》 *
TSUNG-YI LIN等: "Feature Pyramid Networks for Object Detection", 《COMPUTER VISION AND PATTERN RECOGNITION》 *
YAN ZHAO等: "Pyramid Attention Dilated Network for Aircraft Detection in SAR Images", 《IEEE GEOSCIENCE AND REMOTE SENSING LETTERS 》 *
单倩文等: "基于改进多尺度特征图的目标快速检测与识别算法", 《激光与光电子学进展》 *
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118070851A (en) * 2024-04-22 2024-05-24 长春理工大学 Conformer-based multi-scale fusion convolution system in stream type voice recognition

Also Published As

Publication number Publication date
CN114037885B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN109447034B (en) Traffic sign detection method in automatic driving based on YOLOv3 network
CN110163187B (en) F-RCNN-based remote traffic sign detection and identification method
CN110348376B (en) Pedestrian real-time detection method based on neural network
CN109784150B (en) Video driver behavior identification method based on multitasking space-time convolutional neural network
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN110929578A (en) Anti-blocking pedestrian detection method based on attention mechanism
CN110348357B (en) Rapid target detection method based on deep convolutional neural network
CN108009526A (en) A kind of vehicle identification and detection method based on convolutional neural networks
CN110619638A (en) Multi-mode fusion significance detection method based on convolution block attention module
CN111723829B (en) Full-convolution target detection method based on attention mask fusion
CN112464911A (en) Improved YOLOv 3-tiny-based traffic sign detection and identification method
CN111915583B (en) Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN106295532B (en) A kind of human motion recognition method in video image
CN114926747A (en) Remote sensing image directional target detection method based on multi-feature aggregation and interaction
CN113468994A (en) Three-dimensional target detection method based on weighted sampling and multi-resolution feature extraction
CN113128476A (en) Low-power consumption real-time helmet detection method based on computer vision target detection
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN116977738A (en) Traffic scene target detection method and system based on knowledge enhancement type deep learning
CN115661607A (en) Small target identification method based on improved YOLOv5
Fan et al. Covered vehicle detection in autonomous driving based on faster rcnn
CN114037885B (en) Target detection method based on selectable expansion convolution kernel size
CN112085164B (en) Regional recommendation network extraction method based on anchor-free frame network
CN118115934A (en) Dense pedestrian detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant