CN111126202B - Optical remote sensing image target detection method based on void feature pyramid network - Google Patents

Optical remote sensing image target detection method based on void feature pyramid network Download PDF

Info

Publication number
CN111126202B
CN111126202B CN201911271302.1A CN201911271302A CN111126202B CN 111126202 B CN111126202 B CN 111126202B CN 201911271302 A CN201911271302 A CN 201911271302A CN 111126202 B CN111126202 B CN 111126202B
Authority
CN
China
Prior art keywords
network
remote sensing
feature
training
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911271302.1A
Other languages
Chinese (zh)
Other versions
CN111126202A (en
Inventor
应翔
申继宁
高洁
刘志强
于健
李雪威
喻梅
于瑞国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201911271302.1A priority Critical patent/CN111126202B/en
Publication of CN111126202A publication Critical patent/CN111126202A/en
Application granted granted Critical
Publication of CN111126202B publication Critical patent/CN111126202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to an optical remote sensing image target detection method based on a void feature pyramid network, which is characterized by comprising the following steps of: the method comprises the following steps: s1, dividing the adopted optical remote sensing image data set into a training set and a testing set; s2, carrying out size transformation, standardization and normalization processing on the optical remote sensing image in the data set, and carrying out data enhancement on the training set; s3, constructing a hole characteristic pyramid network by using hole convolution, and training a network model by using the images in the training set; and S4, detecting the remote sensing image by using the trained target detection model, and analyzing and comparing the detection effect. The method is scientific and reasonable in design, the detection performance of the multi-scale target on the optical remote sensing image is improved by constructing the novel cavity feature fusion module, and better generalization capability is obtained.

Description

Optical remote sensing image target detection method based on void feature pyramid network
Technical Field
The invention belongs to the field of computer vision, relates to multi-scale target detection in computer vision tasks, and particularly relates to an optical remote sensing image target detection method based on a hole characteristic pyramid network.
Background
Object detection is one of the basic problems in computer vision recognition tasks, and has wide application in a plurality of fields. The target detection in the optical remote sensing image has wide application prospect in the aspects of military application, urban planning, environmental management and the like. Unlike target detection on natural images, targets on optical remote sensing images are much smaller than those on natural images, and the size and orientation of the targets are diverse (e.g., playground, car, bridge, etc.). Furthermore, the visual appearance of the target instance in the remotely sensed image varies largely due to occlusion, shading, lighting, resolution and viewpoint variations. Therefore, detection of objects on remote sensing images is much more difficult than detection of objects on natural images. At present, target detection algorithms on optical remote sensing images are mainly divided into two types: the target detection method based on the traditional image processing and machine learning algorithm and the target detection algorithm based on the deep learning.
Among them, the image Features used in the target detection method based on the conventional image processing and machine learning algorithm are designed manually, such as SIFT (Scale Invariant Feature Transform), HOG (Histogram of oriented gradients), SURF (Speeded Up Robust Features), and so on. The method comprises the steps of extracting features of an input image through a manually designed feature extractor, identifying a target according to the features, and positioning the target by combining a corresponding strategy. For a long time, object detection algorithms based on manually designed features have dominated the computer vision field. However, such manually-designed based features are not very robust to the diversity of targets
The target detection algorithm based on deep learning utilizes a deep convolutional neural network to automatically learn feature representation from data, and can learn the feature representation with good robustness and strong expression capability. With the rapid development of deep learning, the target detection algorithm is also shifted from the traditional algorithm based on manual features to the target detection algorithm based on the deep convolutional neural network. With the introduction of deep convolutional neural networks, the task of object detection has advanced greatly in both speed and accuracy over the past few years. At present, target detection algorithms based on deep convolutional neural networks are mainly classified into two types: a two-stage process and a one-stage process. The two-stage target detection method comprises the steps of firstly extracting candidate regions from a given image, and then classifying and regressing and positioning each extracted candidate region. The one-stage approach proposes a single, monolithic convolutional neural network that reconstructs the object detection problem into a regression problem to directly predict the class and location of the object. In general, the two-stage approach has advantages in accuracy, while the one-stage approach has advantages in speed. However, as target detection algorithms continue to evolve, both types of algorithms balance speed and accuracy.
At present, multi-scale target detection remains a challenging problem. The object detection task of the image is an extension of the classification task. Deep features in the deep convolutional neural network contain rich semantic information, which is beneficial to the image classification task, but lack detailed information beneficial to small target detection. Shallow features in the network have higher spatial resolution, which is beneficial to bounding box regression, but lack high-level semantic information beneficial to target classification. Therefore, in order to solve the problem of target scale variation in the target detection task, many convolutional neural network-based target detection algorithms gradually blend semantic information of deep features into shallow features, and various types of feature pyramid structures are proposed. The image pyramid adjusts the input image to different sizes of scales, and then inputs the images of different scales to the feature extraction network to generate feature maps of different scales. This approach can significantly increase memory and computational complexity and is inefficient. The method of Faster R-CNN 1, YOLOv1 2, R-FCN 3, etc. uses the feature graph output by the last layer convolution in the feature extraction network to detect the target. But because it only uses a single scale feature map for prediction, this method has poor detection performance for multi-scale targets, especially small targets. The SSD [4] algorithm constitutes a feature pyramid architecture by extracting multi-level features of the backbone network. Although the method selects the characteristics with different scales of multiple levels in the network for prediction, the detection effect on small targets depending on deep semantic information is poor because the context semantic information is not fused. The FPN 5 target detection algorithm adopts a top-down path and a transverse connection structure, fuses semantic information of shallow features and deep features, and detects targets with different sizes by using features of different levels. However, semantic information of the structural shallow feature cannot meet the detection requirement of the multi-scale target.
Although the above mentioned target detection algorithm has achieved a good effect on natural images, the detection accuracy on optical remote sensing images still needs to be improved, and especially the detection effect on multi-scale targets in optical remote sensing images is not ideal.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an optical remote sensing image target detection method based on a hole feature pyramid network, which improves the detection performance of a multi-scale target on an optical remote sensing image by constructing a novel hole feature fusion module and obtains better generalization capability.
The technical problem to be solved by the invention is realized by the following technical scheme:
a method for detecting an optical remote sensing image target based on a hole characteristic pyramid network is characterized by comprising the following steps: the method comprises the following steps:
s1: dividing an adopted image data set into a training set which is 80% used for network model training and a testing set which is 20% used for model testing, and keeping the data distribution consistency of different types of samples in the training set and the testing set as much as possible;
s2: carrying out size transformation, standardization and normalization processing on the optical remote sensing image in the data set, and carrying out data enhancement on the training set;
s201, preprocessing the image in the data set on the basis of S1, performing size transformation on the adopted data set, and setting the shortest and longest edges of the input image as 600 and 1000 pixels respectively;
s202, calculating the RGB mean value of the selected data set according to the divided training set, and performing the operation of subtracting the RGB mean value on all samples in the training set and the testing set to highlight the characteristic difference among individuals in the image;
s203, carrying out image standardization and normalization processing on the images in the data set: according to the convex optimization theory and the data probability distribution related knowledge, performing centralized processing on data through a mean value removing operation to realize standardized processing of images; the normalization processing of the image is realized by mapping each pixel value in the image to the range of 0-1;
s204, enhancing data by adopting simple horizontal turning and random cutting operations, thereby increasing the number of training samples of a training set and ensuring the robustness of a target detection model;
s3: constructing a hole characteristic pyramid network by using hole convolution, and training a network model by using images in a training set;
s301, constructing a cavity feature pyramid network, selecting ResNet-101 as a basic network of a target detection network, extracting feature maps with different scales by using residual blocks for the ResNet-101, extracting the output of the last residual structure of the following four rolling blocks from the ResNet-101 basic network as basic features, and representing the basic feature maps as { C2, C3, C4 and C5 };
s302, in a hole feature fusion module AFFM of a pyramid network, C2 reduces the number of feature channels to 256 dimensions through Conv1 x 1, and { C3, C4, C5} respectively perform upsampling operation to bilinear interpolation on each feature map to the size of a C2 feature map, reduce the number of the feature map channels after upsampling to 256 dimensions through Conv1 x 1 operation, then connect the feature maps obtained above in series through Concat operation to obtain a multi-level fusion feature, and then apply Conv1 x 1 to reduce the feature dimension of the multi-level fusion feature to 256 dimensions;
s303, constructing a cavity transverse connection module, enabling three branches of the cavity transverse connection module to have different-size receptive fields by adopting three convolution operations of Conv1 multiplied by 1, Conv3 multiplied by 3 and Conv5 multiplied by 5, adding Conv3 multiplied by 3 operations with different cavity rates behind the three branches, and splicing feature maps generated by each branch together through the Concat operation, thereby obtaining a transverse connection feature map with stronger multi-scale expression capacity;
s304, generating a plurality of groups of feature maps with different scales from bottom to top through a multi-layer downsampling and hole transverse connection module, representing the feature maps as { P2, P3, P4 and P5}, respectively corresponding to { C2, C3, C4 and C5}, integrating features generated by multi-layer fusion features and features generated by a hole transverse connection module by using a channel cascade operation, and obtaining { P2, P3, P4 and P5} through a Conv1 × 1 operation, wherein the calculation functions of the feature maps are as follows:
Figure BDA0002314270120000041
wherein: piMulti-level features for input to a detection network header to predict results;
ALCB (Ci) is a multi-branch convolution operation function with convolution kernels of different sizes and hole rate;
Conv3×3(Pi-1) For convolution operations with a convolution kernel size of 3 x 3 and a step size of 2, i.e. for Pi-1Carrying out down-sampling operation;
s305, inputting the { P2, P3, P4, P5} generated by the hole feature fusion module into an area generation network and a detection network header behind the network model to further generate candidate areas and calculate detection results;
s306, training the constructed hole characteristic pyramid network by using the obtained training set, and adopting an approximate joint training strategy: the network model trains 100K iterations totally, and the learning rate of the first 60K iterations is 10-3The learning rate of the next 20K iterations is 10-4The weight decay and momentum are 0.00004 and 0.9, respectively;
s4: detecting the remote sensing image by using the trained target detection model, and analyzing and comparing the detection effect; and carrying out duplicate removal on the obtained detection result through non-maximum suppression operation, setting IoU threshold values of the non-maximum suppression to be 0.7, and selecting mAP as an evaluation index for measuring the target detection effect of the remote sensing image, wherein the IoU threshold value is set to be 0.5.
The invention has the advantages and beneficial effects that:
1. the invention relates to an optical remote sensing image target detection method based on a cavity feature pyramid network, which aims at the problem of multi-scale target detection in a remote sensing image, utilizes a cavity feature fusion module to construct the cavity feature pyramid network, can obviously improve the detection performance of an Faster R-CNN target detection algorithm, and realizes accurate identification and accurate detection of multi-scale targets in the optical remote sensing image.
2. The invention relates to an optical remote sensing image target detection method based on a void feature pyramid network, which can realize 96.70% mAP on a single Tesla K80 GPU and can reach 96.75% mAP on an RSOD data set on an NWPUVHR-10 optical remote sensing image data set.
3. The optical remote sensing image target detection method based on the cavity characteristic pyramid network is superior to the traditional two-stage method, namely fast R-CNN, FPN and the like in the result of the PASCAL VOC natural data set, and reaches 81.7 percent mAP.
4. The optical remote sensing image target detection method based on the hole characteristic pyramid network can improve the detection performance of multi-scale and complex appearance targets in remote sensing image data sets, and the target detection method has good generalization performance and good robustness for the targets with scale changes.
Drawings
FIG. 1 is a flow chart of the detection method of the present invention;
FIG. 2 is a schematic diagram of a void feature fusion module according to the present invention;
FIG. 3 is a schematic view of a cavity interconnect module according to the present invention;
FIG. 4 is a diagram illustrating the detection effect of the void feature pyramid network according to the present invention.
Detailed Description
The present invention is further illustrated by the following specific examples, which are intended to be illustrative, not limiting and are not intended to limit the scope of the invention.
A method for detecting an optical remote sensing image target based on a hole characteristic pyramid network is characterized by comprising the following steps: the method comprises the following steps:
s1: dividing an adopted image data set into a training set which is 80% used for network model training and a testing set which is 20% used for model testing, and keeping the data distribution consistency of different types of samples in the training set and the testing set as much as possible;
s2: carrying out size transformation, standardization and normalization processing on the optical remote sensing image in the data set, and carrying out data enhancement on the training set;
s201, preprocessing the image in the data set on the basis of S1, performing size transformation on the adopted data set, and setting the shortest and longest edges of the input image as 600 and 1000 pixels respectively;
s202, calculating the RGB mean value of the selected data set according to the divided training set, and performing the operation of subtracting the RGB mean value on all samples in the training set and the testing set to highlight the characteristic difference among individuals in the image;
s203, carrying out image standardization and normalization processing on the images in the data set: according to the convex optimization theory and the data probability distribution related knowledge, performing centralized processing on data through a mean value removing operation to realize standardized processing of images; the normalization processing of the image is realized by mapping each pixel value in the image to the range of 0-1;
s204, enhancing data by adopting simple horizontal turning and random cutting operations, thereby increasing the number of training samples of a training set and ensuring the robustness of a target detection model;
s3: constructing a hole characteristic pyramid network by using hole convolution, and training a network model by using images in a training set;
s301, constructing a cavity feature pyramid network, selecting ResNet-101 as a basic network of a target detection network, extracting feature maps with different scales by using residual blocks for the ResNet-101, extracting the output of the last residual structure of the following four rolling blocks from the ResNet-101 basic network as basic features, and representing the basic feature maps as { C2, C3, C4 and C5 };
s302, in a hole Feature fusion module AFFM (irregular Feature fusion module) of the pyramid network, C2 reduces the number of Feature channels to 256 dimensions through Conv1 × 1, { C3, C4, C5} respectively perform upsampling operation to bilinear interpolation on each Feature map to the size of a C2 Feature map, reduce the number of the upsampled Feature map channels to 256 dimensions through Conv1 × 1 operation, then connect the Feature maps obtained above in series through Concat operation to obtain a multi-level fusion Feature, and then apply Conv1 × 1 to reduce the Feature dimension of the multi-level fusion Feature to 256 dimensions;
s303, constructing a hole transverse Connection module ALCB (aperture transverse Connection Block), adopting three convolution operations of Conv1 multiplied by 1, Conv3 multiplied by 3 and Conv5 multiplied by 5 to enable three branches of the hole transverse Connection module to have reception fields with different sizes, adding Conv3 multiplied by 3 operations with different hole rates behind the three branches, and splicing feature graphs generated by each branch together through a Concat operation, thereby obtaining a transverse Connection feature graph with strong multi-scale expression capability;
s304, generating a plurality of groups of feature maps with different scales from bottom to top through a multi-layer downsampling and hole transverse connection module, representing the feature maps as { P2, P3, P4 and P5}, respectively corresponding to { C2, C3, C4 and C5}, integrating features generated by multi-layer fusion features and features generated by a hole transverse connection module by using a channel cascade operation, and obtaining { P2, P3, P4 and P5} through a Conv1 × 1 operation, wherein the calculation functions of the feature maps are as follows:
Figure BDA0002314270120000061
wherein: piMulti-level features for input to a detection network header to predict results;
ALCB (Ci) is a multi-branch convolution operation function with convolution kernels of different sizes and hole rate;
Conv3×3(Pi-1) For convolution operations with a convolution kernel size of 3 x 3 and a step size of 2, i.e. for Pi-1Carrying out down-sampling operation;
s305, inputting the { P2, P3, P4, P5} generated by the hole feature fusion module into an area generation network and a detection network header behind the network model to further generate candidate areas and calculate detection results;
s306, training the constructed hole characteristic pyramid network by using the obtained training set, and adopting an approximate joint training strategy: the network model trains 100K iterations totally, the learning rate of the first 60K iterations is 10-3, the learning rate of the next 20K iterations is 10-4, and the weight attenuation and momentum are 0.00004 and 0.9 respectively;
s4: detecting the remote sensing image by using the trained target detection model, and analyzing and comparing the detection effect; and carrying out duplicate removal on the obtained detection result through non-maximum suppression operation, setting IoU threshold values of the non-maximum suppression to be 0.7, and selecting mAP as an evaluation index for measuring the target detection effect of the remote sensing image, wherein the IoU threshold value is set to be 0.5.
The method provided by the invention is compared with the detection result of the existing method for analysis, so that the advantages and the disadvantages of the model are further analyzed.
The AP for a single class target and the mep values for all class targets for each method are shown in table 1.
TABLE 1 comparison of test results on NWPU VHR-10 dataset
Figure BDA0002314270120000071
As can be seen from Table 1, the method of the present invention achieves 96.89% mAP on the NWPU VHR-10 dataset, which is improved by about 10% compared with the original fast R-CNN method, and in addition, the detection result of the hole feature pyramid network of the present invention is also superior to FPN, which is improved by about 3.8%, and the detection result of the hole feature pyramid network on the NWPU VHR-10 dataset is also improved by 14.5% and 5.6% respectively compared with the SSD of the one-stage detector.
Table 2 is a comparison table of the test results on RSOD dataset and table 3 is a comparison table of the test results on PASCAL VOC dataset. As can be seen from tables 2 and 3, the detection results of the method of the present invention are also superior to other methods in terms of RSOD data set and PASCAL VOC data set.
Table 2 comparison of test results on RSOD dataset
Figure BDA0002314270120000081
TABLE 3 comparison of test results on PASCAL VOC data set
Figure BDA0002314270120000082
The detection effect diagram of the hole characteristic pyramid network is shown in fig. 4, and the method provided by the invention improves the detection accuracy of targets with multiple sizes and complex appearances in a data set, such as oil storage tanks, bridges, playgrounds and the like.
The invention relates to a method for detecting an optical remote sensing image target based on a void feature pyramid network, which solves the problem of scale change of target detection in an optical remote sensing image and improves the detection precision of a multi-scale target in the image; constructing a characteristic pyramid network with multi-scale characteristic expression capacity through a cavity characteristic fusion module AFFM; in the hole feature fusion module AFFM, a multi-branch hole convolution module is used to fully utilize image feature information.
The hole characteristic pyramid network not only achieves 96.89% mAP on a NWPU VHR-10 data set, but also can achieve good detection performance on other optical remote sensing image data sets and natural image data sets. Experimental results show that the method not only can improve the detection performance of the remote sensing image, but also can well process the natural image, and the method has good generalization performance.
Although the embodiments of the present invention and the accompanying drawings are disclosed for illustrative purposes, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the invention and the appended claims, and therefore the scope of the invention is not limited to the disclosure of the embodiments and the accompanying drawings.

Claims (1)

1. A method for detecting an optical remote sensing image target based on a hole characteristic pyramid network is characterized by comprising the following steps: the method comprises the following steps:
s1: dividing an adopted image data set into a training set which is 80% used for network model training and a testing set which is 20% used for model testing, and keeping the data distribution consistency of different types of samples in the training set and the testing set as much as possible;
s2: carrying out size transformation, standardization and normalization processing on the optical remote sensing image in the data set, and carrying out data enhancement on the training set;
s201, preprocessing the image in the data set on the basis of S1, performing size transformation on the adopted data set, and setting the shortest and longest edges of the input image as 600 and 1000 pixels respectively;
s202, calculating the RGB mean value of the selected data set according to the divided training set, and performing the operation of subtracting the RGB mean value on all samples in the training set and the testing set to highlight the characteristic difference among individuals in the image;
s203, carrying out image standardization and normalization processing on the images in the data set: according to the convex optimization theory and the data probability distribution related knowledge, performing centralized processing on data through a mean value removing operation to realize standardized processing of images; the normalization processing of the image is realized by mapping each pixel value in the image to the range of 0-1;
s204, enhancing data by adopting simple horizontal turning and random cutting operations, thereby increasing the number of training samples of a training set and ensuring the robustness of a target detection model;
s3: constructing a hole characteristic pyramid network by using hole convolution, and training a network model by using images in a training set;
s301, constructing a hole characteristic pyramid network, selecting ResNet-101 as a basic network of a target detection network, extracting characteristic graphs with different scales by using residual blocks by the ResNet-101, extracting the output of the last residual structure of the following four rolling blocks from the ResNet-101 basic network as basic characteristics, and representing the basic characteristic graphs as { C }2,C3,C4,C5};
S302, in a hole feature fusion module AFFM of the pyramid network, C2Reducing the number of feature channels to 256 dimensions by Conv1 × 1, { C3,C4,C5Respectively interpolating each feature map bilinearly to C by an upsampling operation2Reducing the channel number of the feature map after upsampling to 256 dimensions by Conv1 multiplied by 1 operation, then connecting the obtained feature maps in series by Concat operation to obtain multi-level fusion features, and then reducing the feature dimensions of the multi-level fusion features to 256 dimensions by applying Conv1 multiplied by 1;
s303, constructing a cavity transverse connection module, enabling three branches of the cavity transverse connection module to have different-size receptive fields by adopting three convolution operations of Conv1 multiplied by 1, Conv3 multiplied by 3 and Conv5 multiplied by 5, adding Conv3 multiplied by 3 operations with different cavity rates behind the three branches, and splicing a feature map generated by each branch and a Global Average Pooling branch together through a Concat operation so as to obtain a transverse connection feature map with stronger multi-scale expression capability;
s304, the fused feature maps generate a plurality of groups of feature maps with different scales from bottom to top through a multilayer downsampling and hole transverse connection module, and the feature maps are represented as { P }2,P3,P4,P5Are respectively corresponding to { C }2,C3,C4,C5Integrating the characteristics generated by the multi-level fusion characteristics and the characteristics generated by the cavity transverse connection module by using a channel series operation, and obtaining { P by using a Conv1 multiplied by 1 operation2,P3,P4,P5The computation function of these profiles is:
Figure FDA0003445095590000021
wherein: piMulti-level features for input to a detection network header to predict results;
ALCB(Ci) Performing a multi-branch convolution operation function with convolution kernels of different sizes and a void rate;
Conv3×3(Pi-1) For convolution operations with a convolution kernel size of 3 x 3 and a step size of 2, i.e. for Pi-1Carrying out down-sampling operation;
s305, { P ] generated by the hole feature fusion module2,P3,P4,P5Inputting the data into an area generation network behind the network model and a detection network header to further perform generation of a candidate area and calculation of a detection result;
s306, training the constructed hole characteristic pyramid network by using the obtained training set, and adopting an approximate joint training strategy: the network model trains 100K iterations totally, and the learning rate of the first 60K iterations is 10-3The learning rate of the next 20K iterations is 10-4The weight decay and momentum are 0.00004 and 0.9, respectively;
s4: detecting the remote sensing image by using the trained target detection model, and analyzing and comparing the detection effect; and carrying out duplicate removal on the obtained detection result through non-maximum suppression operation, setting IoU threshold values of the non-maximum suppression to be 0.7, and selecting mAP as an evaluation index for measuring the target detection effect of the remote sensing image, wherein the IoU threshold value is set to be 0.5.
CN201911271302.1A 2019-12-12 2019-12-12 Optical remote sensing image target detection method based on void feature pyramid network Active CN111126202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911271302.1A CN111126202B (en) 2019-12-12 2019-12-12 Optical remote sensing image target detection method based on void feature pyramid network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911271302.1A CN111126202B (en) 2019-12-12 2019-12-12 Optical remote sensing image target detection method based on void feature pyramid network

Publications (2)

Publication Number Publication Date
CN111126202A CN111126202A (en) 2020-05-08
CN111126202B true CN111126202B (en) 2022-03-04

Family

ID=70499561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911271302.1A Active CN111126202B (en) 2019-12-12 2019-12-12 Optical remote sensing image target detection method based on void feature pyramid network

Country Status (1)

Country Link
CN (1) CN111126202B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753677B (en) * 2020-06-10 2023-10-31 杭州电子科技大学 Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN112070729B (en) * 2020-08-26 2023-07-07 西安交通大学 Anchor-free remote sensing image target detection method and system based on scene enhancement
CN112149547B (en) * 2020-09-17 2023-06-02 南京信息工程大学 Remote sensing image water body identification method based on image pyramid guidance and pixel pair matching
CN112434669B (en) * 2020-12-14 2023-09-26 武汉纺织大学 Human body behavior detection method and system based on multi-information fusion
CN112434670A (en) * 2020-12-14 2021-03-02 武汉纺织大学 Equipment and method for detecting abnormal behavior of power operation
CN112734734A (en) * 2021-01-13 2021-04-30 北京联合大学 Railway tunnel crack detection method based on improved residual error network
CN112634273B (en) * 2021-03-10 2021-08-13 四川大学 Brain metastasis segmentation system based on deep neural network and construction method thereof
CN113256562B (en) * 2021-04-22 2021-12-14 深圳安德空间技术有限公司 Road underground hidden danger detection method and system based on radar images and artificial intelligence
CN113096159B (en) * 2021-06-04 2021-09-14 城云科技(中国)有限公司 Target detection and track tracking method, model and electronic equipment thereof
CN113435303A (en) * 2021-06-23 2021-09-24 中国电子科技集团公司第五十四研究所 Non-cooperative unmanned aerial vehicle visual detection and identification method
CN113343955B (en) * 2021-08-06 2022-04-08 北京惠朗时代科技有限公司 Face recognition intelligent tail box application method based on depth pyramid
CN113361528B (en) * 2021-08-10 2021-10-29 北京电信易通信息技术股份有限公司 Multi-scale target detection method and system
CN113962931B (en) * 2021-09-08 2022-06-24 宁波海棠信息技术有限公司 Foreign matter defect detection method for reed switch
CN113743521B (en) * 2021-09-10 2023-06-27 中国科学院软件研究所 Target detection method based on multi-scale context awareness
CN115082909B (en) * 2021-11-03 2024-04-12 中国人民解放军陆军军医大学第一附属医院 Method and system for identifying lung lesions
CN114332638B (en) * 2021-11-03 2023-04-25 中科弘云科技(北京)有限公司 Remote sensing image target detection method and device, electronic equipment and medium
CN113989498B (en) * 2021-12-27 2022-07-12 北京文安智能技术股份有限公司 Training method of target detection model for multi-class garbage scene recognition
CN114494893B (en) * 2022-04-18 2022-06-14 成都理工大学 Remote sensing image feature extraction method based on semantic reuse context feature pyramid
CN115272242B (en) * 2022-07-29 2024-02-27 西安电子科技大学 YOLOv 5-based optical remote sensing image target detection method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776671B2 (en) * 2018-05-25 2020-09-15 Adobe Inc. Joint blur map estimation and blur desirability classification from an image
AU2018101336A4 (en) * 2018-09-12 2018-10-11 Hu, Yuan Miss Building extraction application based on machine learning in Urban-Suburban-Integration Area
CN109325534B (en) * 2018-09-22 2020-03-17 天津大学 Semantic segmentation method based on bidirectional multi-scale pyramid
CN109800716A (en) * 2019-01-22 2019-05-24 华中科技大学 One kind being based on the pyramidal Oceanic remote sensing image ship detecting method of feature
CN109978032B (en) * 2019-03-15 2022-12-06 西安电子科技大学 Bridge crack detection method based on space pyramid cavity convolution network
CN110097129B (en) * 2019-05-05 2023-04-28 西安电子科技大学 Remote sensing target detection method based on profile wave grouping characteristic pyramid convolution
CN110321923B (en) * 2019-05-10 2021-05-04 上海大学 Target detection method, system and medium for fusion of different-scale receptive field characteristic layers
CN110348447B (en) * 2019-06-27 2022-04-19 电子科技大学 Multi-model integrated target detection method with abundant spatial information

Also Published As

Publication number Publication date
CN111126202A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN111126202B (en) Optical remote sensing image target detection method based on void feature pyramid network
CN109840556B (en) Image classification and identification method based on twin network
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN113298818A (en) Remote sensing image building segmentation method based on attention mechanism and multi-scale features
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN110163213B (en) Remote sensing image segmentation method based on disparity map and multi-scale depth network model
CN106295613A (en) A kind of unmanned plane target localization method and system
CN110009622B (en) Display panel appearance defect detection network and defect detection method thereof
CN114943963A (en) Remote sensing image cloud and cloud shadow segmentation method based on double-branch fusion network
CN116485717B (en) Concrete dam surface crack detection method based on pixel-level deep learning
Li et al. A review of deep learning methods for pixel-level crack detection
CN111275171A (en) Small target detection method based on parameter sharing and multi-scale super-resolution reconstruction
CN111797841A (en) Visual saliency detection method based on depth residual error network
CN110751195A (en) Fine-grained image classification method based on improved YOLOv3
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN111046917A (en) Object-based enhanced target detection method based on deep neural network
CN115601661A (en) Building change detection method for urban dynamic monitoring
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
CN113505670A (en) Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
CN114529581A (en) Multi-target tracking method based on deep learning and multi-task joint training
CN114092824A (en) Remote sensing image road segmentation method combining intensive attention and parallel up-sampling
CN114612709A (en) Multi-scale target detection method guided by image pyramid characteristics
CN112818777B (en) Remote sensing image target detection method based on dense connection and feature enhancement
CN112686830B (en) Super-resolution method of single depth map based on image decomposition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant