CN113177456B - Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion - Google Patents

Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion Download PDF

Info

Publication number
CN113177456B
CN113177456B CN202110442872.3A CN202110442872A CN113177456B CN 113177456 B CN113177456 B CN 113177456B CN 202110442872 A CN202110442872 A CN 202110442872A CN 113177456 B CN113177456 B CN 113177456B
Authority
CN
China
Prior art keywords
image
target
data set
feature
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110442872.3A
Other languages
Chinese (zh)
Other versions
CN113177456A (en
Inventor
白静
温征
唐晓川
董泽委
郭亚泽
裴晓龙
闫逊
孙放
张秀华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110442872.3A priority Critical patent/CN113177456B/en
Publication of CN113177456A publication Critical patent/CN113177456A/en
Application granted granted Critical
Publication of CN113177456B publication Critical patent/CN113177456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20064Wavelet transform [DWT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a remote sensing target detection method based on a single-stage full convolution network and multi-feature fusion, which mainly solves the problem that the target features of an optical remote sensing image are not fully extracted in the prior art. The implementation scheme is as follows: 1) Respectively extracting mathematical morphology characteristics, linear scale space characteristics and nonlinear scale space characteristics of original data and fusing the three characteristics to obtain a fused characteristic diagram; 2) Dividing the fusion characteristic diagram into a training data set and a test data set, and performing small-target expansion on the training data set; 3) Constructing a target detection network, and training the network on a training data set after target expansion by using a gradient descent algorithm; 4) And testing the test data set by using the trained network to obtain a detection result. The method enhances the contour characteristics and the edge characteristics of the target, is favorable for improving the accuracy of target detection, and can be used for resource exploration, natural disaster assessment and target identification.

Description

Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion
Technical Field
The invention belongs to the technical field of optical remote sensing images, and particularly relates to a remote sensing target detection method based on a single-stage full convolution network and multi-feature fusion, which can be used for resource exploration, natural disaster assessment and target identification.
Background
The remote sensing image has the characteristics of complex background, unbalanced target category, large target scale change, special shooting visual angle and the like, so that the target detection aiming at the remote sensing image has great difficulty and challenge.
The traditional target detection method mainly depends on manually designed feature extraction operators to extract image features, including V-J detection, HOG detection, DPM algorithm and the like, and the main features of the traditional target detection method are that a detector can only use a fixed feature extraction algorithm to fit a single image feature, so that the traditional target detection method can only adapt to the situation that obvious features exist and the background is simple, and the requirement of a remote sensing image target detection task cannot be met.
The target detection method based on deep learning uses a convolution network to extract image features, and can extract various abundant features in the same target, so that the detection accuracy is far higher than that of the traditional manual design method, and the target detection method becomes an industry mainstream method at present and is widely used in remote sensing image target detection tasks.
A remote sensing image ship target detection method is provided by using a YOLO v5 network structure and an attention mechanism in SENet in a patent CN112580439A, the method firstly utilizes a small batch of image target samples to effectively train a network model, and secondly obtains a test model through transfer learning, the detection speed of the network in a large-format image is improved, and the accuracy and the robustness of ship target detection are kept.
A multi-scale feature extraction network is designed in a patent (CN 110378297A), the network can extract multi-scale features of images and respectively predict candidate regions of feature images corresponding to each image scale, and accuracy of remote sensing image target detection is effectively improved.
In a patent (CN 112070729A), an anchor-free-based target detection network is adopted, firstly, linear enhancement is carried out on an obtained remote sensing image data set in a balance coefficient mixed enhancement mode, and then feature extraction and fusion are carried out on a depth residual error network ResNet-50 and a feature pyramid network FPN. The invention fully utilizes the context multi-feature fusion method, enhances the feature extraction capability and the category prediction capability of the network, and improves the detection precision.
However, the above methods based on the deep convolutional neural network are all technical solutions in which a convolution operation is directly applied to an original input image, or a simple linear enhancement mode is adopted to preprocess data, and this solution does not alleviate the difficulty of extracting features by the deep convolutional network, and especially for target detection under a complex background of a remote sensing image, this method for detecting a target based on the deep convolutional network cannot accurately extract feature information of a target part, thereby affecting the improvement of detection performance.
Disclosure of Invention
The invention aims to provide a remote sensing target detection method based on single-stage full convolution network and multi-feature fusion to accurately extract feature information of a target part and improve detection performance aiming at the defects of the prior art.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) Respectively extracting morphological characteristics, linear scale space characteristics and nonlinear scale space characteristics of the optical remote sensing image:
1a) Respectively performing opening operation and closing operation on the original image to obtain 2n initial feature maps, then performing pixel-by-pixel addition on all the initial feature maps and taking an average value to obtain a morphological feature map of the original image, wherein n represents the number of opening operation or closing operation;
1b) Filtering the original image by using a Gaussian filter and a Sobel edge extraction operator respectively to obtain a three-channel Gaussian fuzzy feature map and four single-channel local edge feature maps; summing the four local edge feature graphs pixel by pixel and averaging to obtain an integral edge feature graph; performing pixel-by-pixel fusion on each channel component of the three-channel Gaussian feature map and the integral edge feature map to obtain a linear multi-scale spatial feature map;
1c) Converting an original optical remote sensing image into a single-channel gray-scale image, and performing wavelet decomposition on the single-channel gray-scale image by using a two-dimensional single-level wavelet transformation function to obtain four single-channel subgraphs, namely a low-frequency component diagram, a horizontal high-frequency component diagram, a vertical high-frequency component diagram and a diagonal high-frequency component diagram; discarding the low-frequency component subgraph, and performing channel splicing on the other three high-frequency component subgraphs to obtain a nonlinear multi-scale spatial feature graph;
(2) Constructing a fusion feature map:
2a) Performing pixel-by-pixel fusion on the morphological characteristic diagram and the linear multi-scale space characteristic diagram according to the proportion of alpha and beta to obtain an initial fusion image, wherein the alpha and the beta meet the requirement that the alpha + beta =0.5;
2b) Multiplying the original image by a proportionality coefficient of 0.5, then summing the original image with the initial fusion image pixel by pixel, and then adding the original image and the initial fusion image pixel by pixel with the nonlinear multi-scale spatial feature map to obtain a final feature fusion image;
(3) Data set partitioning and small target expansion:
3a) For all optical remote sensing images, calculating the maximum and minimum of the area in all targets to be detected according to the labeling information, and marking as S max And S min And sets a threshold value
Figure GDA0004048133030000021
3b) All optical remote sensing images are randomly arranged according to the following 8:2 into a training data set and a test data set;
3c) For each original image in the training set, traversing all targets to be detected in the image if the target area is S i If the value is less than the threshold value S, selecting a target-free position in the original image, and copying the minimum square area where the target is located to the selected position to obtain a new training image; otherwise, the original image is not changed; after traversing is completed, a new training data set is obtained;
(4) Training and detecting by using a target detection network based on deep learning:
4a) Respectively extracting and fusing the characteristics of the test data set and the new training data set according to the operations (1) and (2) to obtain a training data set and a test data set after the characteristics are fused;
4b) Training the existing single-stage full convolution target detection network by using a training data set after feature fusion through a gradient descent algorithm until the overall loss of the network is not changed any more, and obtaining a trained target detection network;
4c) And inputting the test data set into a trained target detection network to obtain a target detection result of the optical remote sensing image.
Compared with the prior art, the invention has the following advantages:
firstly, the multi-feature extraction and fusion operation is carried out on the original image before the convolution operation of the neural network, so that the contour feature and the edge feature of the target are enhanced, and therefore, when the deep convolution network is used for extracting the target feature, the deep convolution network is more sensitive to the target part, the extracted feature is more accurate, and the accuracy of target detection is favorably improved.
Secondly, the image morphological characteristics, the linear multi-scale characteristics and the nonlinear multi-scale characteristics are fused, so that compared with the existing simple linear data enhancement method, the target saliency is enhanced, especially for small targets and complex background areas, the target characteristics are effectively enhanced, the background part is restrained, the accuracy of extracting the target characteristics by the deep convolutional neural network is improved, and the detection performance is improved.
Drawings
FIG. 1 is a general flow chart of an implementation of the present invention;
FIG. 2 is a sub-flow diagram of the present invention for constructing morphological features of an image;
FIG. 3 is a sobel operator directional template used by the present invention;
FIG. 4 is a sub-flow diagram of the construction of a linear scale spatial feature map according to the present invention;
FIG. 5 is a sub-flow diagram of the construction of a non-linear scale space feature map according to the present invention.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of this example are as follows:
the remote sensing image contains abundant spatial information and scale effect, multi-scale is a characteristic naturally existing in the remote sensing image ground object observation, different levels of ground object features and spatial relation rules can be obtained by analyzing from different scales, the multi-scale spatial information is very key for the accurate identification of the ground object, and a deep learning method generally extracts and classifies the features of the remote sensing image from a set certain scale level and lacks the comprehensive consideration of the multi-scale spatial information. Therefore, more and more scholars are beginning to research how to combine the multi-scale spatial features of the remote sensing images to improve the spatial comprehensive feature recognition capability. In addition, the mathematical morphology is a classical nonlinear spatial information processing technology, can extract meaningful shape components from complex information of optical remote sensing images, and retains the spatial geometrical structural characteristics in the images. Therefore, the characteristics of remote sensing ground feature classification can be better met. A large number of researches show that the mathematical morphology can accurately describe the contour and the spatial relationship of the ground feature, and the abundant spatial information can be effectively extracted from the remote sensing image based on the calculation and processing of the mathematical morphology method. Therefore, the invention designs a multi-feature extraction fusion technology based on the mathematical morphological features and the multi-scale features of the remote sensing images, and the specific implementation steps are as follows:
step 1, extracting mathematical morphology characteristics of an image.
The mathematical morphology method includes four basic operations: expansion, erosion, open and close operations. The expansion operation is to perform point-by-point convolution on the original image by using an operation kernel, and the maximum pixel value of the coverage area of the operation kernel is taken as a new pixel value of the convolution position. The erosion operation is to perform point-by-point convolution with the original image by using an operation kernel, and the minimum pixel value of the area covered by the operation kernel is used as a new pixel value of the convolution position. The open operation is to perform erosion before expansion on the image, and the close operation is to perform erosion after expansion on the image. In order to effectively eliminate image noise and smooth image edges, mathematical morphological features are extracted by adopting an open operation mode and a closed operation mode.
Referring to fig. 2, the specific implementation of this step is as follows:
1.1 Using operation cores with the sizes of 3 × 3 and 5 × 5 respectively to perform opening operation and closing operation on an original image to obtain two opening operation characteristic graphs open _3 and open_5 and two closing operation characteristic graphs close _3 and close_5 respectively;
1.2 The feature maps open _3, open _5, close _3 and close _5 obtained in the previous step are summed pixel by pixel and averaged to obtain a three-channel imageIs shown in the figure I M The feature map has the same resolution and dimensions as the original image.
And 2, extracting the linear multi-scale spatial features of the image.
In the field of computer vision, the multi-scale features can effectively improve the results of tasks such as image classification, target detection and the like, and the adoption of a proper mode to construct the multi-scale features of the images, effectively fuse and utilize the multi-scale features and the multi-scale features are always the key problems concerned by researchers.
The gaussian kernel is the only kernel that can generate a multi-scale space, and the multi-scale space can be effectively established by filtering an image with a gaussian filter, and for a two-dimensional image I (x, y), the image after gaussian filtering is used as follows:
L(x,y,δ)=G(x,y,δ)*I(x,y)
wherein G (x, y, δ) represents a Gaussian function whose formula is
Figure GDA0004048133030000051
In the formula (x) 0 ,y 0 ) Is the coordinate of the central point, delta is a scale parameter, and determines the smoothness degree of the transformed image. The larger the value of delta is, the better the filtering smoothness is.
The Sobel operator is a discrete differential operator, and is commonly used for edge detection in image processing. In the implementation process of the Sobel operator, a 3 x 3 template is used as a convolution kernel to perform convolution operation with each pixel point in the image, and different direction templates are used to obtain edge detection characteristic maps in different directions.
The method for constructing linear multi-scale spatial features of images used in this example is mainly based on gaussian filter and Sobel edge extraction operator to extract edge features respectively, where the Sobel operator uses four directional templates (0 °,45 °,90 °,135 °), as shown in fig. 3.
Referring to fig. 4, the specific implementation of this step is as follows:
2.1 Filter the image with a Gaussian filter to obtain a Gaussian blur feature map I G
2.2 Convert the original image to a grayscale image, extract the four edge features ^ of the grayscale image in four directions using sobel operator, respectively h ,▽ v ,▽ r ,▽ l Respectively representing a horizontal edge feature map, a vertical edge feature map and two diagonal edge feature maps of the image;
2.3 Carrying out pixel-by-pixel fusion on the four extracted edge feature maps to obtain an overall edge feature map I S
Figure GDA0004048133030000052
By fusing the edge feature maps in the four directions, the edge feature part in the original picture can be enhanced, the pixel value of the edge feature part is far greater than 0, the non-edge part is inhibited, and the pixel value of the edge feature part is close to 0;
2.4 Integral edge feature map I) S Si fuzzy characteristic diagram I G Fusing to obtain the final linear multi-scale space characteristic diagram I L
Figure GDA0004048133030000053
Wherein, r =0.3,I Gi Is represented by I G The ith channel component of (1).
And 3, extracting the nonlinear multi-scale spatial features of the image.
The present example employs wavelet transform as the primary method for constructing nonlinear multi-scale spatial features of an image, the wavelet basis using a two-dimensional single-level wavelet transform function dwt2 ().
Referring to fig. 5, the specific implementation of this step is as follows:
3.1 Convert a common three-channel optical image to a single-channel grayscale image;
3.2 Carrying out wavelet decomposition on the gray-scale image in 3.1) by using a two-dimensional single-level wavelet transformation function to respectively obtain a low-frequency component subgraph, a horizontal high-frequency component subgraph, a vertical high-frequency component subgraph and a diagonal high-frequency component subgraph of the gray-scale image, wherein the resolution of each subgraph is only one fourth of that of the original image;
3.3 Removing the low-frequency component subgraph in 3.2), extracting only three high-frequency component subgraphs and expanding the three high-frequency component subgraphs to be the same as the resolution of the original image by using a bilinear interpolation method;
3.4 Splicing the three high-frequency component images after the resolution ratio expansion to obtain a nonlinear space multi-scale characteristic image I NL
And 4, constructing a fusion characteristic graph.
The original image I p And Gaussian blur feature map I G Linear multi-scale space characteristic diagram I L Nonlinear space multi-scale characteristic diagram I NL Carrying out weighted summation to obtain a final fusion image I;
I=0.5×I p +α×I M +β×I L +I NL
wherein alpha and beta are two hyperparameters with different values, and alpha + beta =0.5 is satisfied, because I NL The pixel value in (2) is very small, and the pixel-by-pixel addition is directly performed because the term does not add a weight coefficient.
And 5, expanding the small target.
The detection of small targets is always a difficult point in the field of computer vision, the detection precision is generally not high in the existing various algorithm frames, and the small target sample is expanded in the data preprocessing stage by the method, and the method is specifically realized as follows:
5.1 For all samples in the data set, calculating the area of the real target frame according to the labeling information, and finding out the maximum value S of the area max And minimum value S min
5.2 ) set a threshold value
Figure GDA0004048133030000061
Dividing all data into a training data set and a testing data set according to the ratio of 8: 2;
5.3 For each picture in the training set, calculating the area S of the label box of all targets in the picture i ∈(S 1 ,S 2 ...S n ) Go through S i And then comparing it with a set threshold:
if S is i If S is true, the rectangular area where the target is located is copied, a new position is randomly selected in the image for pasting, and execution is carried out 5.4)
If S is i If < S is not true, no operation is performed, and the next S is traversed i
5.4 Select a new location:
5.4.1 Randomly selecting a point (x, y) in the image, calculating the size of the annotation box [ x, y, x + w ] for the new location i ,y+h i ]Wherein x + w i ,y+h i Respectively representing the width and the height of the new labeling box;
5.4.2 Judging whether the new position is overlapped with an existing marking frame in the image or not;
if the position is not coincident, the pasting operation is carried out on the new position;
if the superposition is carried out, returning to 5.4.1), recording the returning times, and if the returning times reach 100 times, abandoning the pasting operation;
5.5 ) is repeatedly performed 5.3) a total of 5 times, so that the small target is sufficiently expanded.
And 6, constructing a deep convolutional network for training and detection.
6.1 Data preprocessing
Performing multi-feature fusion operation on all optical remote sensing images according to the steps 1-4 to obtain feature fusion images, dividing all the feature fusion images into a training data set and a test data set according to the ratio of 8:2, and performing small-target expansion on all the images in the training data set according to the step 5;
6.2 Construct an object detection network
The method adopts an existing single-stage full-convolution target Detection network as a Detection frame, wherein the single-stage full-convolution target Detection network comprises a backbone network resnet-50, a characteristic pyramid network FPN, a classification Head Class _ Head and a Detection Head Detection _ Head, the pyramid network FPN comprises five characteristic layers P3, P4, P5, P6 and P7, and target frames are predicted in the five characteristic layers respectively;
in this example, the top layer P7 of the FPN is deleted, and the target frame prediction is performed only on four feature layers P3, P4, P5, and P6, where the target frame prediction ranges in the four feature layers are: 0.64, 64.128, 128.256, 256 ∞.
6.3 Network training
And (3) sending the training data set after the small target in the step 6.1) is expanded into the single-stage full convolution target detection network constructed in the step 6.2), and training by using a gradient descent algorithm until the network converges to obtain the trained single-stage full convolution target detection network.
6.4 Network test and result evaluation
And (3) sending the test data set in the 6.1) into the 6.3) trained network for testing to obtain the detection results of the network on all targets in the test data set.
The effect of the present invention is further explained by combining the simulation experiment as follows:
1. simulation experiment conditions are as follows:
the hardware platform of the simulation experiment of the invention is as follows: the CPU model is Intel Xeon E5-2630 v4, 20 cores, the main frequency is 2.4GHz, and the memory size is 64GB; the GPU is NVIDIA GeForce GTX 1080Ti/PCIe/SSE2, and the video memory size is 20GB.
The software platform of the simulation experiment of the invention is as follows: the operating system is Ubuntu20.04 LTS, the cuda version is 10.1, and the version of Pytrch is 1.5.0. The opencv version is 4.4.0.
The data set used for the experiment was the public remote sensing image data set LEVIR.
2. Simulation experiment and results
In the first experiment, original data are used for training and testing the existing single-stage full-convolution target detection network, and the average accuracy mAP and average recall ratio recall indexes are calculated according to the test result.
And in the second experiment, the original data is preprocessed in a multi-feature fusion mode, the preprocessed data is used for training and testing the conventional single-stage full convolution target detection network, and the average accuracy mAP and the average recall call index are calculated.
And thirdly, preprocessing original data by adopting a small target enhancement mode, training and testing the conventional single-stage full-convolution target detection network by using the preprocessed data, and calculating the average accuracy mAP and average recall indexes.
And fourthly, preprocessing original data by adopting a multi-feature fusion mode and a small target enhancement mode, training and testing the conventional single-stage full-convolution target detection network by using the preprocessed data, and calculating the average accuracy mAP and the average recall rate call indexes.
The results of the above experiments are shown in table 1.
TABLE 1 comparison of simulation test results
Experimental setup mAP Recall
Experiment one 90.3% 72.5%
Experiment two 90.6% 72.9%
Experiment three 91.1% 75.8%
Experiment four 91.4% 76.1%
By comparing the results of the first experiment and the second experiment with the results of the first experiment and the third experiment, the method for preprocessing data by using the multi-feature fusion mode and the small target enhancement mode can effectively improve the detection performance of the existing single-stage full convolution target detection network.
By comparing the results of the experiment four with the results of the experiment two and the experiment three, the improvement of the single-stage full-convolution target detection network performance by simultaneously using a small target enhancement mode and a multi-feature fusion mode to carry out data preprocessing can be seen to be most obvious.

Claims (4)

1. A remote sensing target detection method based on single-stage full convolution network and multi-feature fusion is characterized by comprising the following steps:
(1) Respectively extracting mathematical morphology characteristics, linear scale space characteristics and nonlinear scale space characteristics of the optical remote sensing image:
1a) Respectively performing opening operation and closing operation on the original image to obtain 2n initial feature maps, then performing pixel-by-pixel addition on all the initial feature maps and taking an average value to obtain a mathematical morphology feature map of the original image, wherein n represents the number of the opening operation or the closing operation;
1b) Filtering the original image by using a Gaussian filter and a Sobel edge extraction operator respectively to obtain a three-channel Gaussian fuzzy feature map and four single-channel local edge feature maps; summing the four single-channel local edge feature graphs pixel by pixel and averaging to obtain an overall edge feature graph; performing pixel-by-pixel fusion on each channel component of the three-channel Gaussian fuzzy feature map and the overall edge feature map to obtain a linear multi-scale spatial feature map;
1c) Converting an original optical remote sensing image into a single-channel grey-scale image, and performing wavelet decomposition on the single-channel grey-scale image by using a two-dimensional single-level wavelet transformation function to obtain four single-channel subgraphs, namely a low-frequency component diagram, a horizontal high-frequency component diagram, a vertical high-frequency component diagram and a diagonal high-frequency component diagram; discarding the low-frequency component subgraph, and performing channel splicing on the other three high-frequency component subgraphs to obtain a nonlinear multi-scale spatial feature graph;
(2) Constructing a fusion feature map:
2a) Performing pixel-by-pixel fusion on the mathematical morphology characteristic diagram and the linear multi-scale space characteristic diagram according to the proportion of alpha and beta to obtain an initial fusion image, wherein the alpha and the beta meet the requirement that the alpha + beta =0.5;
2b) Multiplying the original image by a proportionality coefficient of 0.5, then carrying out pixel-by-pixel summation with the original fusion image, and then carrying out pixel-by-pixel addition with the nonlinear multi-scale spatial feature map to obtain a final feature fusion image;
(3) Data set partitioning and small target expansion:
3a) For all optical remote sensing images, calculating the maximum and minimum of the area in all targets to be detected according to the labeling information, and marking as S max And S min And sets a threshold value
Figure FDA0004076022420000011
3b) All optical remote sensing images are randomly distributed according to the following ratio of 8:2 into a training data set and a test data set;
3c) For each original image in the training set, traversing all the targets to be detected in the original image, if the target area S i If the image size is smaller than the threshold S, selecting a target-free position in the original image, and copying the minimum square area where the target is located to the selected position to obtain a new training image; otherwise, the original image is not changed; after traversing is completed, a new training data set is obtained;
the method for selecting a non-target position in an original image is realized as follows:
3c1) Randomly selecting a position (x, y) in an original image, and calculating the marking frame information [ x, y, x + w ] of the new position i ,y+h i ]Wherein x + w i ,y+h i Respectively representing the width and height of the new target frame;
3c2) Judging whether the new position is overlapped with an existing marking frame in the current image, if not, selecting the position for subsequent operation, otherwise, returning to 3c 1);
3c3) When the selection is successful or the random selection of the position in 3c 1) is repeated for 100 times, ending the position selection;
(4) Training and detecting by using a deep learning-based target detection network:
4a) Respectively extracting and fusing the characteristics of the test data set and the new training data set according to the operations (1) and (2) to obtain a training data set and a test data set after the characteristics are fused;
4b) Training the existing single-stage full convolution target detection network by using a training data set after feature fusion through a gradient descent algorithm until the overall loss of the network is not changed any more, and obtaining a trained target detection network;
4c) And inputting the test data set into a trained target detection network to obtain a target detection result of the optical remote sensing image.
2. The method according to claim 1, wherein 1 a) the opening operation and the closing operation are respectively performed on the original image, and the original optical remote sensing image is subjected to the expansion-first corrosion-then-corrosion operation and the expansion-first corrosion-then-corrosion operation by using convolution kernels with the sizes of 3 x 3 and 5 x 5 respectively, so as to obtain two opening operation characteristic maps and two closing operation characteristic maps.
3. The method of claim 1, wherein the filtering of the original image by using the Sobel edge extraction operator in 1 b) is performed by convolving four convolution kernels with a size of 3 × 3 with the original image respectively to obtain four local edge feature maps, wherein directions of the four convolution kernels are respectively: 0 °,45 °,90 °,135 °.
4. The method of claim 1, wherein the training of the existing single-stage full convolution target detection network by the gradient descent algorithm in 4 b) is implemented as follows:
4b1) Deleting the topmost feature layer P7 of the FPN in the single-stage full-convolution target detection network, and reserving the feature layers P3, P4, P5 and P6;
4b2) Sending the training data into a network for forward propagation, and performing target frame regression on the P3, P4, P5 and P6 feature layers reserved in 4b 1) to obtain a target prediction result, wherein the size ranges of predicted targets in the four feature layers are as follows: 0,64,128,256, ∞;
4b3) Calculating the integral loss between the prediction result obtained by the step 4b 2) and the real label, then performing back propagation, and updating the network parameters;
4b4) Repeat 4b 2) -4b 3) until the network converges.
CN202110442872.3A 2021-04-23 2021-04-23 Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion Active CN113177456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110442872.3A CN113177456B (en) 2021-04-23 2021-04-23 Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110442872.3A CN113177456B (en) 2021-04-23 2021-04-23 Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion

Publications (2)

Publication Number Publication Date
CN113177456A CN113177456A (en) 2021-07-27
CN113177456B true CN113177456B (en) 2023-04-07

Family

ID=76924464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110442872.3A Active CN113177456B (en) 2021-04-23 2021-04-23 Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion

Country Status (1)

Country Link
CN (1) CN113177456B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610838A (en) * 2021-08-25 2021-11-05 华北电力大学(保定) Bolt defect data set expansion method
CN114155208B (en) * 2021-11-15 2022-07-08 中国科学院深圳先进技术研究院 Atrial fibrillation assessment method and device based on deep learning
CN116168302B (en) * 2023-04-25 2023-07-14 耕宇牧星(北京)空间科技有限公司 Remote sensing image rock vein extraction method based on multi-scale residual error fusion network
CN116823838B (en) * 2023-08-31 2023-11-14 武汉理工大学三亚科教创新园 Ocean ship detection method and system with Gaussian prior label distribution and characteristic decoupling

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629378A (en) * 2012-03-01 2012-08-08 西安电子科技大学 Remote sensing image change detection method based on multi-feature fusion
CN107092871A (en) * 2017-04-06 2017-08-25 重庆市地理信息中心 Remote sensing image building detection method based on multiple dimensioned multiple features fusion
CN107292339A (en) * 2017-06-16 2017-10-24 重庆大学 The unmanned plane low altitude remote sensing image high score Geomorphological Classification method of feature based fusion
CN108154192A (en) * 2018-01-12 2018-06-12 西安电子科技大学 High Resolution SAR terrain classification method based on multiple dimensioned convolution and Fusion Features
CN108537238A (en) * 2018-04-13 2018-09-14 崔植源 A kind of classification of remote-sensing images and search method
CN109214439A (en) * 2018-08-22 2019-01-15 电子科技大学 A kind of infrared image icing River detection method based on multi-feature fusion
CN109271928A (en) * 2018-09-14 2019-01-25 武汉大学 A kind of road network automatic update method based on the fusion of vector road network with the verifying of high score remote sensing image
CN109325395A (en) * 2018-04-28 2019-02-12 二十世纪空间技术应用股份有限公司 The recognition methods of image, convolutional neural networks model training method and device
CN112132006A (en) * 2020-09-21 2020-12-25 西南交通大学 Intelligent forest land and building extraction method for cultivated land protection
CN112329677A (en) * 2020-11-12 2021-02-05 北京环境特性研究所 Remote sensing image river target detection method and device based on feature fusion
CN112395958A (en) * 2020-10-29 2021-02-23 中国地质大学(武汉) Remote sensing image small target detection method based on four-scale depth and shallow layer feature fusion
CN112465880A (en) * 2020-11-26 2021-03-09 西安电子科技大学 Target detection method based on multi-source heterogeneous data cognitive fusion
CN112580439A (en) * 2020-12-01 2021-03-30 中国船舶重工集团公司第七0九研究所 Method and system for detecting large-format remote sensing image ship target under small sample condition

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629378A (en) * 2012-03-01 2012-08-08 西安电子科技大学 Remote sensing image change detection method based on multi-feature fusion
CN107092871A (en) * 2017-04-06 2017-08-25 重庆市地理信息中心 Remote sensing image building detection method based on multiple dimensioned multiple features fusion
CN107292339A (en) * 2017-06-16 2017-10-24 重庆大学 The unmanned plane low altitude remote sensing image high score Geomorphological Classification method of feature based fusion
CN108154192A (en) * 2018-01-12 2018-06-12 西安电子科技大学 High Resolution SAR terrain classification method based on multiple dimensioned convolution and Fusion Features
CN108537238A (en) * 2018-04-13 2018-09-14 崔植源 A kind of classification of remote-sensing images and search method
CN109325395A (en) * 2018-04-28 2019-02-12 二十世纪空间技术应用股份有限公司 The recognition methods of image, convolutional neural networks model training method and device
CN109214439A (en) * 2018-08-22 2019-01-15 电子科技大学 A kind of infrared image icing River detection method based on multi-feature fusion
CN109271928A (en) * 2018-09-14 2019-01-25 武汉大学 A kind of road network automatic update method based on the fusion of vector road network with the verifying of high score remote sensing image
CN112132006A (en) * 2020-09-21 2020-12-25 西南交通大学 Intelligent forest land and building extraction method for cultivated land protection
CN112395958A (en) * 2020-10-29 2021-02-23 中国地质大学(武汉) Remote sensing image small target detection method based on four-scale depth and shallow layer feature fusion
CN112329677A (en) * 2020-11-12 2021-02-05 北京环境特性研究所 Remote sensing image river target detection method and device based on feature fusion
CN112465880A (en) * 2020-11-26 2021-03-09 西安电子科技大学 Target detection method based on multi-source heterogeneous data cognitive fusion
CN112580439A (en) * 2020-12-01 2021-03-30 中国船舶重工集团公司第七0九研究所 Method and system for detecting large-format remote sensing image ship target under small sample condition

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A Fusion Framework Based on Sparse Gaussian–Wigner Prediction for Vehicle Localization Using GDOP of GPS Satellites;Vincent Havyarimana 等;《IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS》;20200229;第21卷(第2期);680-689 *
CHANGE DETECTION OF HIGH-RESOLUTION REMOTE SENSING IMAGES BASED ON ADAPTIVE FUSION OF MULTIPLE FEATURES;GuangHui Wang 等;《The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences》;20181231;1689-1694 *
Multi-Feature Fusion for Weak Target Detection on Sea-Surface Based on FAR Controllable Deep Forest Model;Jiahuan Zhang 等;《remote sensing》;20210223;第13卷(第812期);1-33 *
基于多尺度融合特征卷积神经网络的遥感图像飞机目标检测;姚群力 等;《测绘学报》;20191031;第48卷(第10期);1266-1274 *
基于多特征融合和软投票的遥感图像河流检测;张庆春 等;《光学学报》;20180630;第38卷(第6期);1-7 *

Also Published As

Publication number Publication date
CN113177456A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN113177456B (en) Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN108596055B (en) Airport target detection method of high-resolution remote sensing image under complex background
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN103049763B (en) Context-constraint-based target identification method
CN110599537A (en) Mask R-CNN-based unmanned aerial vehicle image building area calculation method and system
CN112464911A (en) Improved YOLOv 3-tiny-based traffic sign detection and identification method
CN108564085B (en) Method for automatically reading of pointer type instrument
CN107808138B (en) Communication signal identification method based on FasterR-CNN
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
CN104866868A (en) Metal coin identification method based on deep neural network and apparatus thereof
CN110659601B (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN109345559B (en) Moving target tracking method based on sample expansion and depth classification network
CN112396619A (en) Small particle segmentation method based on semantic segmentation and internally complex composition
Gooda et al. Automatic detection of road cracks using EfficientNet with residual U-net-based segmentation and YOLOv5-based detection
CN117058069A (en) Automatic detection method for apparent diseases of pavement in panoramic image
CN111652287A (en) Hand-drawing cross pentagon classification method for AD (analog-to-digital) scale based on convolution depth neural network
CN114022787B (en) Machine library identification method based on large-scale remote sensing image
CN111046861B (en) Method for identifying infrared image, method for constructing identification model and application
CN113947723A (en) High-resolution remote sensing scene target detection method based on size balance FCOS
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception
CN113077484A (en) Image instance segmentation method
CN111914751A (en) Image crowd density identification and detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant