CN111612065A - Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling - Google Patents

Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling Download PDF

Info

Publication number
CN111612065A
CN111612065A CN202010433145.6A CN202010433145A CN111612065A CN 111612065 A CN111612065 A CN 111612065A CN 202010433145 A CN202010433145 A CN 202010433145A CN 111612065 A CN111612065 A CN 111612065A
Authority
CN
China
Prior art keywords
rois
training
feature
fpn
rpn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010433145.6A
Other languages
Chinese (zh)
Inventor
朱勉春
许曼玲
戴宪华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010433145.6A priority Critical patent/CN111612065A/en
Publication of CN111612065A publication Critical patent/CN111612065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-scale feature object detection algorithm based on ratio adaptive pooling. The method comprises the following steps: (1) collecting a large number of images, dividing the images into a training set and a test set according to a certain proportion, and preprocessing the training set; (2) inputting the training set into a pre-trained neural network (ResNet50) for feature extraction to obtain a corresponding feature map; (3) embedding RPN in RAP combined FPN structure to generate different scale characteristics and training RPN; (4) performing RoI Pooling on the ROIs with different scales generated in the step (3), and then calculating loss, classification and more detailed frame regression; (5) and inputting the test set image into the trained detection model to output a detection result. The method can effectively relieve the semantic loss problem of the FPN in the fusion process and improve the detection precision.

Description

Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling
Technical Field
The invention relates to the field of image processing, in particular to a multi-scale feature object detection algorithm (hereinafter referred to as RAP) based on ratio adaptive pooling.
Background
The target detection is widely applied to the fields of pedestrian detection, intelligent driving assistance, intelligent monitoring, flame smoke detection, intelligent robots and the like, and although the target detection technology is developed rapidly, the target detection technology has many problems, wherein how to improve the detection precision is always the difficulty of the target detection.
The feature pyramid network (hereinafter abbreviated as FPN) is mainly used for solving the problem of multi-scale of the target. The method is a structure formed by a characteristic layer from top to bottom, because FPN can reduce dimension through 1 x 1 convolution in the top-to-bottom and transverse connection processes, channels (channels) are reduced, semantic information can be lost, and the top layer of the FPN is directly reduced in dimension through 1 x 1 convolution and then is subjected to 3 x 3 convolution to generate a new top layer for final prediction, because the FPN is not fused with other layer information, the number of channels is reduced, and the semantic information can be lost.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-scale feature object detection algorithm based on ratio self-adaptive pooling, which can enhance semantic information in the top-down and transverse connection fusion processes of FPN and improve the multi-scale object detection precision.
The technical scheme of the invention is as follows:
a multi-scale feature object detection algorithm based on ratio adaptive pooling improves a traditional top-down feature pyramid network fusion structure, and the method comprises the following steps:
the image to be measured is input to a convolutional neural network (ResNet50), CxC ═ C, characteristic map generated on behalf of each module of ResNet2,C3,C4,C5The overall framework retains the original structure of the FPN, and the transverse connection is that C is reduced to 256 dimension through convolution of 1 × 1 to M (M) as the transverse connection2,M3,M4,M5Then through the top most M5Enhanced by RAP module, output is recorded as P5
P5M connected through up-sampling and transverse4Merging, output is noted as P4. Sequentially operating until the last layer of feature map P is output2And finishing the enhancement process. P ═ P2,P3,P4,P5Sending the data to subsequent detection.
Wherein the specific operations in RAP enhancement are: topmost feature M of FPN5Through a ratio adaptation, the pooling mode is adaptive average pooling, and here we choose a pooling coefficient of α ═ 0.1,0.2,0.3]The three feature maps with different output resolutions are marked as { A, B and C };
then { A, B, C } is convolved by 3 x 3, then is subjected to bilinear difference upsampling and output to be marked as { E, F, G }, and the original input resolution is restored;
and finally, carrying out feature splicing on the three feature graphs with the same resolution, reducing the channel number to 256 through 1 × 1 convolution, and outputting a final enhancement result. By this point, the rate adaptive pooling process is complete.
Embedding RPN in the improved FPN for multi-scale feature fusion; wherein for P2,P3,P4,P5These layers, defining the anchor size of 322,642,1282,2562,5122In addition, each layer has 3 length-width contrasts {1:2, 1:1, 2:1 }. So the whole feature pyramid has 15 kinds of anchor points (anchors);
when training the RPN, only 256 anchors are selected for training, and the proportion of positive samples to negative samples is 1:1 approximately. The positive and negative samples are delimited, in order to ensure that at least a positive sample participates in training, for each real frame, the anchor with the maximum IoU (cross-over ratio) is taken as the positive sample; in the remaining anchors, if IoU of the corresponding anchor and any one of the real boxes is greater than 0.7, the corresponding anchor is taken as a positive sample of training, and if IoU of the anchor and any one of the real boxes is less than 0.3, the anchor is taken as a negative sample of training.
When training RPN, RPN will also generate RoIs to send to Fast R-CNN network. The RPN selects most anchors (such as 12000) and roughly corrects the positions of the anchors to obtain the RoIs, and then selects 2000 RoIs with the highest probability from the 12000 RoIs by using a non-maximum suppression (NMS) method.
The network selects 128 RoIs from 2000 RoIs as training, selects positive and negative sample rules as positive samples (for example, N) with the ratio of the RoIs to IoU of the real frame being greater than 0.5, and selects the remaining (128-N) negative samples from the RoIs with the ratio of IoU of the real frame being less than or equal to 0 (or 0.1), wherein the ratio of the positive and negative samples is approximately 1: 3.
And selecting the 128 pieces of the obtained RoIs to perform subsequent operations of RoI Pooling, loss calculation, classification, more detailed border regression and the like. Where the penalty function typically employed for regression of location is Smooth _ L1Loss, and the cross-entropy penalty function is typically employed for classification problems. When calculating the position regression Loss L1Loss, the negative examples do not participate in the calculation.
Selecting 5011 pictures in total of a training sample set as an open source data set PASCAL VOC2007 train inval, and performing data preprocessing, wherein only horizontal turning and vertical turning with the probability of 0.5 are set, and corresponding changes need to be made to coordinate information.
Inputting the data into an improved network for training, wherein the version of an operating system of the experimental environment of the algorithm is ubuntu16.04, the model of a video card is GeForce GTX 1080Ti (the number is 2), the size of a video memory is 11GB, and the used deep learning framework is Pythrch. In the experiment, ResNet50 is used as a feature extraction network, Fsater R-CNN + FPN + RAP is used as a target detection framework, and mAP (mean Average precision) is used as an evaluation index. Initializing the network weight by using the trained ImageNet weight on the organ network, optimizing by adopting a random gradient descent method (SGD), wherein the learning rate is 0.04, the training period is 14 periods, one period is to traverse the data set once (5011 times), weight attenuation is carried out in the tenth period, and the attenuation coefficient is 0.0001.
Selecting 4952 pictures as an open source data set PASCAL VOC2007 test, and inputting the pictures into a trained improved network for testing, wherein the network parameter weight is the weight stored in the last period. The picture size set in the training and testing process is (1000, 600), the shortest side is greater than 600, and the longest side is less than 1000.
The beneficial technical effects of the invention are as follows:
1. the application discloses a multi-scale feature object detection algorithm (RAP) based on ratio self-adaptive pooling, which is improved in a classic feature pyramid network FPN multi-layer feature information fusion structure algorithm, a ratio self-adaptive module is added, the action position of 3 multiplied by 3 convolution in the original FPN is changed, the semantic information of the top layer of the FPN is enhanced, and the detection precision is improved.
2. The RAP structure is particularly simple, and the object detection precision can be effectively improved on the premise that only a small amount of calculation is added compared with the FPN structure;
3. RAPs are also portable structures that can act on multi-scale feature processes rather than just a pyramid of features.
Drawings
FIG. 1 is a flow chart of an object detection algorithm in the present application
Fig. 2 is a schematic diagram of an FPN binding Ratio Adaptive Pooling (RAP) structure.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
A multi-scale feature object detection algorithm (RAP) based on rate adaptive pooling is disclosed in the present application. Because the FPN can be subjected to dimension reduction through 1 x 1 convolution in the top-down and transverse connection processes, channels (channels) are reduced, semantic information is lost, the uppermost layer of the FPN is directly subjected to dimension reduction through 1 x 1 convolution, and then a new top layer is generated through 3 x 3 convolution for final prediction, because the top layer is not fused with other layer information, and the semantic information is lost due to the reduction of the number of channels. The method is improved in a classic feature pyramid network FPN multi-layer feature information fusion structure algorithm, a ratio self-adaption module is added, the action position of the 3 x 3 convolution in the original FPN is changed, the semantic information of the FPN top layer is enhanced, and the detection precision is improved.
Before the method disclosed by the invention uses the Faster R-CNN to detect the target, the Faster R-CNN needs to be trained, so the method is divided into two parts, wherein the first part is a training model part, the second part is a target detection part of a test set, and the main flow refers to fig. 1. The method mainly comprises the following steps:
the first part mainly comprises the following steps:
(1) and acquiring a sample set to be detected. Selecting 5011 pictures of a training sample set as an open source data set PASCAL VOC2007 train val, and 4952 pictures of a testing sample set as an open source data set PASCAL VOC2007 test, and performing data preprocessing, wherein only horizontal turning and vertical turning with the probability of 0.5 are set, and corresponding changes need to be made in coordinate information.
(2) Reading in weights pre-trained on ImageNet by a basic network ResNet50, taking the read parameters as the initial of parameters of a convolutional neural network, inputting training data into the network, extracting features of an image by the convolutional neural network, calculating a feature map by a convolution kernel, wherein the feature map is generally smaller and smaller, the output of some feature layers is the same as the original size and is called as the same network stage, and for ResNet, feature activation output of a final residual error structure of each stage is used. These residual block outputs are denoted as { C2,C3,C4,C5Corresponding to the outputs of conv2, conv3, conv4 and conv 5.
(3) Adding RAP structure into FPN, and implementing the steps as follows:
(3.1) input the image to be measured into a convolutional neural network (ResNet50), CxC ═ C, characteristic map generated on behalf of each module of ResNet2,C3,C4,C5The overall framework retains the original structure of the FPN, and the transverse connection is that C is reduced to 256 dimension through convolution of 1 × 1 to M (M) as the transverse connection2,M3,M4,M5Then through the top most M5Enhanced by RAP module, output is recorded as P5
(3.2)P5M connected through up-sampling and transverse4Merging, output is noted as P4. Sequentially operating until the last layer of feature map P is output2And finishing the enhancement process. P ═ P2,P3,P4,P5Sending the data to subsequent detection.
(3.3) wherein the specific operations at RAP enhancement are: topmost feature M of FPN5Through a ratio adaptation, the pooling mode is adaptive average pooling, and here we choose a pooling coefficient of α ═ 0.1,0.2,0.3]The three feature maps with different output resolutions are marked as { A, B and C };
(3.4) after carrying out 3 × 3 convolution on the { A, B, C }, carrying out up-sampling on a bilinear difference value, outputting and recording as { E, F, G }, and recovering to the original input resolution;
and (3.5) carrying out feature splicing on the last three feature maps with the same resolution, reducing the channel number to 256 through 1 × 1 convolution, and outputting a final enhancement result. By this point, the rate adaptive pooling process is complete.
(4) Training the RPN network, the concrete steps are as follows
(4.1) embedding the RPN in the improved FPN for multi-scale feature fusion; wherein for P2,P3, P4,P5These layers, defining the anchor size of 322,642,1282,2562,5122In addition, each layer has 3 length-width contrasts {1:2, 1:1, 2:1 }. So the whole feature pyramid has 15 kinds of anchor points (anchors);
(4.2) when training the RPN, only 256 anchors are selected for training, and the approximate proportion of positive samples to negative samples is 1: 1. The positive and negative samples are delimited, in order to ensure that at least a positive sample participates in training, for each real frame, the anchor with the maximum IoU (cross-over ratio) is taken as the positive sample; in the remaining anchors, if IoU of the corresponding anchor and any one of the real boxes is greater than 0.7, the corresponding anchor is taken as a positive sample of training, and if IoU of the anchor and any one of the real boxes is less than 0.3, the anchor is taken as a negative sample of training.
And (4.3) training the RPN, and simultaneously generating the RoIs by the RPN and sending the RoIs into a Fast R-CNN network. The RPN selects most anchors (such as 12000) and roughly corrects the positions of the anchors to obtain the RoIs, and then selects 2000 RoIs with the highest probability from the 12000 RoIs by using a non-maximum suppression (NMS) method.
(4.4) the network will select 128 RoIs from the 2000 RoIs as training, and select positive and negative sample rules that IoU between the RoIs and the real frame is greater than 0.5 as positive samples (such as N), and the remaining (128-N) negative samples are selected from the RoIs that IoU is less than or equal to 0 (or 0.1) from the real frame, and the ratio of positive and negative samples is approximately 1: 3.
(4.5) selecting 128 RoIs obtained for subsequent RoI Pooling, wherein RoIs with different scales use different characteristic layers as input of RoI Align layersIn, the larger-scale RoI is achieved by using the latter pyramid layers, such as P5; small-scale RoI uses the feature layer of the previous point, such as P4. Here, a coefficient P is definedkThe output of that layer is determined to be changed from RoI to be defined as:
Figure BDA0002501271430000051
where 224 is the standard input for ImageNet, k0Is a reference value, set to 5, representing P5Layer output (original size is P)5Layers), w and h are the length and width of the RoI, assuming RoI is 112 × 112, then k — k0-1-5-1-4, meaning that the RoI should use P4The characteristic layer of (1). The value of k should be rounded to prevent the result from being an integer.
(5) Loss, classification and regression are calculated, and when the position regression Loss L1Loss is calculated, the negative sample does not participate in calculation. The Loss function generally adopted for the regression of the position is Smooth _ L1Loss, and the cross entropy Loss function is generally adopted for the classification problem, and the formula is as follows:
Figure BDA0002501271430000052
Figure BDA0002501271430000053
wherein v ═ v (v)x,vy,vw,vh) For the real pan-zoom parameter(s),
Figure BDA0002501271430000054
the parameters are scaled for the predicted translation.
In training, the operating system version of the experimental environment of the algorithm is ubuntu16.04, the video card model is GeForce GTX 1080Ti (number is 2), the video memory size is 11GB, and the deep learning framework is Pytorch. In the experiment, ResNet50 is used as a feature extraction network, Fsater R-CNN + FPN + RAP is used as a target detection framework, and mAP (mean Average precision) is used as an evaluation index. Initializing the network weight by using the trained ImageNet weight on the organ network, optimizing by adopting a random gradient descent method (SGD), wherein the learning rate is 0.04, the training period is 14 periods, one period is to traverse the data set once (5011 times), weight attenuation is carried out in the tenth period, and the attenuation coefficient is 0.0001.
The learning used in the training is shown in Table 1
Table 1 this experimental network training parameters
Period of time Learning rate Impulse quantity Weight attenuation
1~10 4e-3 0.9 0.0001
10~14 4e-4 0.9 0.0001
The second part is a target detection part, and after a fusion network of Faster R-CNN + FPN + RAP is obtained through training, the target detection is carried out on the image to be detected through the network, and the method comprises the following steps:
(1) carrying out multi-scale preprocessing on an image to be detected; the training set and the test set are single-scale pictures, the maximum long edge is 1000, and the minimum short edge is 600.
(2) And importing the image to be tested into a convolutional neural network ResNet50, extracting the characteristics of the input image by the convolutional neural network, embedding the FPN into the RPN, generating a characteristic mapping diagram by the characteristics of the test data in the RPN, carrying out rough classification and rough frame regression on the foreground and the background of the test data, and finally carrying out finer classification and frame regression through softmax.
(3) The number of convolutional layer channels in all the above operations is 256, the convolutional neural network uses a residual network ResNet50, the classifier used is a softmax classifier, and the map (mean Average precision) is used as an evaluation index.
(4) The training sample set is selected as an open source data set PASCAL VOC2007, and the training target types of the training sample set are all 20 categories, such as some common categories of automobiles, cats, airplanes, people, bicycles, horses, and the like. The test results are shown in table 2:
TABLE 2 detection results based on the improved FPN fusion Structure
Figure BDA0002501271430000061
Figure BDA0002501271430000071
Wherein, baseline is the detection result of the Faster R-CNN (ResNet-50) + FPN, and Ours is the detection result of the algorithm of Faster R-CNN (ResNet-50) + FPN + RAP.
(4.1) it can be seen from table 2 that the object detection accuracy can be effectively improved in the RAP network, and by using the FPN in fig. 2 in combination with the RAP structure, the corresponding mAP is improved by 1.6% relative to Baseline (original FPN structure), and the accuracy of each category is higher than that of Baseline, thus proving the improved RAP structure effectiveness.
What has been described above is only a preferred embodiment of the present application, and the present invention is not limited to the above embodiment. It is to be understood that other modifications and variations directly derivable or suggested by those skilled in the art without departing from the spirit and concept of the present invention are to be considered as included within the scope of the present invention.

Claims (6)

1. A multi-scale feature object detection algorithm based on ratio adaptive pooling improves a traditional top-down feature pyramid network fusion structure, and the method comprises the following steps:
the image to be measured is input to a convolutional neural network (ResNet50), CxC ═ C, characteristic map generated on behalf of each module of ResNet2,C3,C4,C5}, integral frameThe original structure of FPN is kept, and the transverse connection is that C is reduced to 256 dimension through 1 × 1 convolution and M is recorded as M ═ M { (M) }2,M3,M4,M5Then through the top most M5Enhanced by RAP module, output is recorded as P5
P5M connected through up-sampling and transverse4Merging, output is noted as P4. Sequentially operating until the last layer of feature map P is output2And finishing the enhancement process. P ═ P2,P3,P4,P5Sending the data to subsequent detection.
Wherein the specific operations in RAP enhancement are: topmost feature M of FPN5Through a ratio adaptation, the pooling mode is adaptive average pooling, and here we choose a pooling coefficient of α ═ 0.1,0.2,0.3]The three feature maps with different output resolutions are marked as { A, B and C };
then { A, B, C } is convolved by 3 x 3, then is subjected to bilinear difference upsampling and output to be marked as { E, F, G }, and the original input resolution is restored;
and finally, carrying out feature splicing on the three feature graphs with the same resolution, reducing the channel number to 256 through 1 × 1 convolution, and outputting a final enhancement result. By this point, the rate adaptive pooling process is complete.
2. The method of claim 1, wherein RPN is embedded in the refined FPN for multi-scale feature fusion; wherein for P2,P3,P4,P5These layers, defining the anchor size of 322,642,1282,2562,5122In addition, each layer has 3 length-width contrasts {1:2, 1:1, 2:1 }. The whole feature pyramid has 15 anchor points (anchors).
3. The method as claimed in claims 1 and 2, wherein when training the RPN, only 256 anchors are selected for training, and the approximate ratio of positive and negative samples is 1: 1. The positive and negative samples are delimited, in order to ensure that at least a positive sample participates in training, for each real frame, the anchor with the maximum IoU (cross-over ratio) is taken as the positive sample; in the remaining anchors, if IoU of the corresponding anchor and any one of the real boxes is greater than 0.7, the corresponding anchor is taken as a positive sample of training, and if IoU of the anchor and any one of the real boxes is less than 0.3, the anchor is taken as a negative sample of training.
4. The method according to claim 3, wherein the RPN is trained and the RoIs generated by the RPN are sent to the Fast R-CNN network. The RPN selects most anchors (such as 12000) and roughly corrects the positions of the anchors to obtain the RoIs, and then selects 2000 RoIs with the highest probability from the 12000 RoIs by using a non-maximum suppression (NMS) method.
5. The method of claim 4, wherein the network selects 128 RoIs from the 2000 RoIs as training, and selects positive and negative samples according to a rule that IoU between the RoIs and the real box is greater than 0.5 as positive samples (such as N), and the remaining (128-N) negative samples are selected from the RoIs that are less than or equal to 0 (or 0.1) from IoU of the real box, and the ratio of positive and negative samples is approximately 1: 3.
6. The method of claim 5, wherein the 128 RoIs are selected for subsequent RoIPooling, loss calculation, classification, more detailed bounding box regression, and the like. Where the penalty function typically employed for regression of location is Smooth _ L1Loss, and the cross-entropy penalty function is typically employed for classification problems. When the position regression Loss L1Loss is calculated, the negative samples do not participate in the calculation.
CN202010433145.6A 2020-05-21 2020-05-21 Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling Pending CN111612065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010433145.6A CN111612065A (en) 2020-05-21 2020-05-21 Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010433145.6A CN111612065A (en) 2020-05-21 2020-05-21 Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling

Publications (1)

Publication Number Publication Date
CN111612065A true CN111612065A (en) 2020-09-01

Family

ID=72201920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010433145.6A Pending CN111612065A (en) 2020-05-21 2020-05-21 Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling

Country Status (1)

Country Link
CN (1) CN111612065A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886357A (en) * 2019-03-13 2019-06-14 哈尔滨工程大学 A kind of adaptive weighting deep learning objective classification method based on Fusion Features
CN110084124A (en) * 2019-03-28 2019-08-02 北京大学 Feature based on feature pyramid network enhances object detection method
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886357A (en) * 2019-03-13 2019-06-14 哈尔滨工程大学 A kind of adaptive weighting deep learning objective classification method based on Fusion Features
CN110084124A (en) * 2019-03-28 2019-08-02 北京大学 Feature based on feature pyramid network enhances object detection method
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHAOXU GUO ET AL: "AugFPN: Improving Multi-scale Feature Learning for Object Detection", 《HTTPS://ARXIV.ORG/ABS/1912.05384V1》 *
ROSS GIRSHICK: "Fast R-CNN", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *
SHAOQING REN ET AL: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
YIQING ZHANG ET AL: "Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation", 《SENSORS》 *

Similar Documents

Publication Publication Date Title
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN108427920B (en) Edge-sea defense target detection method based on deep learning
CN110189255B (en) Face detection method based on two-stage detection
CN110633661A (en) Semantic segmentation fused remote sensing image target detection method
EP3690740B1 (en) Method for optimizing hyperparameters of auto-labeling device which auto-labels training images for use in deep learning network to analyze images with high precision, and optimizing device using the same
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN112307982B (en) Human body behavior recognition method based on staggered attention-enhancing network
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN114841972A (en) Power transmission line defect identification method based on saliency map and semantic embedded feature pyramid
CN110135446B (en) Text detection method and computer storage medium
CN107273870A (en) The pedestrian position detection method of integrating context information under a kind of monitoring scene
CN110245620B (en) Non-maximization inhibition method based on attention
CN109858327B (en) Character segmentation method based on deep learning
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN114842343A (en) ViT-based aerial image identification method
CN111626379B (en) X-ray image detection method for pneumonia
CN109508639B (en) Road scene semantic segmentation method based on multi-scale porous convolutional neural network
CN113705583A (en) Target detection and identification method based on convolutional neural network model
CN115147648A (en) Tea shoot identification method based on improved YOLOv5 target detection
CN115880495A (en) Ship image target detection method and system under complex environment
CN112818840A (en) Unmanned aerial vehicle online detection system and method
CN117351348A (en) Image road extraction method based on Unet improved feature extraction and loss function
CN116051984B (en) Weak and small target detection method based on Transformer
CN111612065A (en) Multi-scale characteristic object detection algorithm based on ratio self-adaptive pooling
CN112232102A (en) Building target identification method and system based on deep neural network and multitask learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200901