CN111582104A - Semantic segmentation method and device for remote sensing image - Google Patents
Semantic segmentation method and device for remote sensing image Download PDFInfo
- Publication number
- CN111582104A CN111582104A CN202010350688.1A CN202010350688A CN111582104A CN 111582104 A CN111582104 A CN 111582104A CN 202010350688 A CN202010350688 A CN 202010350688A CN 111582104 A CN111582104 A CN 111582104A
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- layer
- sensing image
- segmented
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000011218 segmentation Effects 0.000 title claims abstract description 18
- 230000002776 aggregation Effects 0.000 claims abstract description 46
- 238000004220 aggregation Methods 0.000 claims abstract description 46
- 238000005070 sampling Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 24
- 230000004927 fusion Effects 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 16
- 238000011176 pooling Methods 0.000 claims description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 230000003287 optical effect Effects 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 12
- 238000012795 verification Methods 0.000 claims description 12
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 230000008447 perception Effects 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 10
- 230000006872 improvement Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Astronomy & Astrophysics (AREA)
- Remote Sensing (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a method and a device for semantic segmentation of remote sensing images, comprising the following steps: acquiring a remote sensing image to be segmented; inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network; up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented; the technical scheme provided by the invention can effectively enhance the correlation between modal characteristics and spatial information, improve the perception capability of multi-scale targets to the context information and obtain a more precise semantic annotation result.
Description
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a method and a device for semantic segmentation of a remote sensing image.
Background
In recent years, with the gradual and intensive research of deep learning technology in the field of image processing, image processing methods based on deep learning, particularly full convolution neural networks, are rapidly developed in the field of remote sensing. In the image processing under the remote sensing scene, the semantic segmentation can acquire the class marking information of the target pixel level, and the method has wide application prospect in the fields of land planning, wartime investigation, environmental monitoring and the like. However, the semantic segmentation method based on deep learning is a data-driven method, and requires a large amount of accurately labeled data. The traditional manual marking mode is high in cost and low in efficiency, so that the improvement of the data marking efficiency and precision is particularly important.
The existing semantic annotation method is sensitive to noise introduced by a complex background in a remote sensing scene, and has poor semantic perception capability on multi-scale ground feature elements. The characteristic receptive field of the convolutional neural network is usually improved by using the porous convolution, however, the existing multi-scale porous structure has limited size and variety of the receptive field, complex surface feature elements in a high-resolution remote sensing scene cannot be labeled, and semantic information is difficult to obtain under the condition that the multi-scale elements cause large-scale difference.
Another idea for enhancing semantic annotation in remote sensing scenes is to utilize the rich features of multiple modality data. However, in the existing method, multi-modal images or features are directly combined or added, the feature learning completely depends on the performance of a convolutional neural network, the differences of inherent data structures and feature complexity degrees of different modes are ignored, redundant features are easily introduced, the marking performance is reduced, and the network scale and parameter quantity are redundant.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method and a device for segmenting the semantics of a remote sensing image, which can effectively enhance the correlation between modal characteristics and spatial information, improve the perception capability of multi-scale targets to the context information and obtain a more refined semantic annotation result.
The purpose of the invention is realized by adopting the following technical scheme:
in a method of semantic segmentation of a remote sensing image, the improvement comprising:
acquiring a remote sensing image to be segmented;
inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented.
Preferably, the process of establishing the pre-established self-attention multi-scale feature aggregation network includes:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
Further, the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
Further, the deep convolutional neural network is an improved Xception network structure, and the improvement process includes: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
Further, the VGG neural network is an improved VGG16 network structure, and the improvement process thereof comprises:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
Further, the self-attention modality calibration module includes: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
Further, the dense multi-scale context aggregation module comprises: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
Further, the self-attention space calibration module comprises: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
Further, the step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
Based on the same invention concept, the invention also provides a remote sensing image semantic segmentation device, and the improvement is that the device comprises:
the acquisition module is used for acquiring a remote sensing image to be segmented;
the segmentation module is used for inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and the adjusting module is used for up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented and obtaining the final prediction result of the remote sensing image to be segmented.
Compared with the closest prior art, the invention has the following beneficial effects:
the invention relates to a method and a device for semantic segmentation of remote sensing images, comprising the following steps: acquiring a remote sensing image to be segmented; inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network; up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented; the relevance of modal characteristics and spatial information can be effectively enhanced, the downward information perception capability of a multi-scale target is improved, and a more precise semantic annotation result is obtained;
the semantic features of the multi-modal data are extracted by using a two-way network in the pre-established self-attention multi-scale feature aggregation network, and the parameter efficiency is improved, the model complexity is reduced, and the generalization capability is improved by using an asymmetric network structure while the annotation precision is improved by using rich modal information.
The self-attention modal calibration module in the pre-established self-attention multi-scale feature aggregation network carries out global semantic display modeling association on features of different modalities, a self-attention mechanism is utilized to weaken redundant features and highlight useful features, and the calibrated modality fusion features can enhance the accuracy of the labeling result.
The dense multi-scale context aggregation module in the pre-established self-attention multi-scale feature aggregation network improves the perception range of the network context semantic information by utilizing convolution of various band-pass rates, and meanwhile dense connection enables effective features of the multi-scale feature map to be denser, thereby being beneficial to refining marking results.
A pre-established self-attention space calibration module in the self-attention multi-scale feature aggregation network calibrates high-level features losing a large amount of space information, introduces features rich in space information at the bottom layers of two modes, and can recover edge information of large-scale ground feature elements and some small-scale ground feature elements after dynamic weighting selection of a self-attention mechanism, so that precision of fine labeling is improved.
Drawings
FIG. 1 is a flow chart of a method for semantic segmentation of remote sensing images provided by the present invention;
FIG. 2 is a schematic structural diagram of a self-attention mode calibration module according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a dense multi-scale context aggregation module in an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a self-attention space calibration module according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a semantic segmentation apparatus for remote sensing images provided by the present invention.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the problems of multi-modal data fusion, difficult multi-scale semantic extraction of a remote sensing scene and the like in the prior art, the invention provides a remote sensing image semantic segmentation method, as shown in figure 1, which comprises the following steps:
101, acquiring a remote sensing image to be segmented;
102, inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
103, up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented.
Specifically, the process of establishing the pre-established self-attention multi-scale feature aggregation network includes:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
Wherein the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
Wherein, the deep convolutional neural network is an improved Xception network structure, and the improvement process comprises: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
The VGG neural network is an improved VGG16 network structure, and the improvement process comprises the following steps:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
In an embodiment of the present invention, as shown in fig. 2, the self-attention modality calibration module includes: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
As shown in fig. 3, the dense multi-scale context aggregation module includes: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
As shown in fig. 4, the self-attention space calibration module includes: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
Further, the step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
Based on the same inventive concept, the invention also provides a semantic segmentation device for remote sensing images, as shown in fig. 5, the device comprises:
the acquisition module is used for acquiring a remote sensing image to be segmented;
the segmentation module is used for inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and the adjusting module is used for up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented and obtaining the final prediction result of the remote sensing image to be segmented.
Preferably, the process of establishing the pre-established self-attention multi-scale feature aggregation network includes:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
Further, the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
Further, the deep convolutional neural network is an improved Xception network structure, and the improvement process includes: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
Further, the VGG neural network is an improved VGG16 network structure, and the improvement process thereof comprises:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
Further, the self-attention modality calibration module includes: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
Further, the dense multi-scale context aggregation module comprises: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
Further, the self-attention space calibration module comprises: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
Further, the step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (10)
1. A semantic segmentation method for remote sensing images is characterized by comprising the following steps:
acquiring a remote sensing image to be segmented;
inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented.
2. The method of claim 1, wherein the pre-established self-attention multi-scale feature aggregation network establishment procedure comprises:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
3. The method of claim 2, wherein the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
4. The method of claim 3, wherein the deep convolutional neural network is a modified Xception network structure, the modification comprising: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
5. The method of claim 3, wherein the VGG neural network is a modified VGG16 network structure, the modification comprising:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
6. The method of claim 3, wherein the self-attentive modality calibration module comprises: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
7. The method of claim 3, wherein the dense multi-scale context aggregation module comprises: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
8. The method of claim 3, wherein the self-attention space calibration module comprises: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
9. The method of claim 2, wherein step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
10. A device for semantic segmentation of remote sensing images, the device comprising:
the acquisition module is used for acquiring a remote sensing image to be segmented;
the segmentation module is used for inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and the adjusting module is used for up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented and obtaining the final prediction result of the remote sensing image to be segmented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010350688.1A CN111582104B (en) | 2020-04-28 | 2020-04-28 | Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010350688.1A CN111582104B (en) | 2020-04-28 | 2020-04-28 | Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111582104A true CN111582104A (en) | 2020-08-25 |
CN111582104B CN111582104B (en) | 2021-08-06 |
Family
ID=72120069
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010350688.1A Active CN111582104B (en) | 2020-04-28 | 2020-04-28 | Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111582104B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016511A (en) * | 2020-09-08 | 2020-12-01 | 重庆市地理信息和遥感应用中心 | Remote sensing image blue top room detection method based on large-scale depth convolution neural network |
CN112529081A (en) * | 2020-12-11 | 2021-03-19 | 大连大学 | Real-time semantic segmentation method based on efficient attention calibration |
CN112598003A (en) * | 2020-12-18 | 2021-04-02 | 燕山大学 | Real-time semantic segmentation method based on data expansion and full-supervision preprocessing |
CN113269787A (en) * | 2021-05-20 | 2021-08-17 | 浙江科技学院 | Remote sensing image semantic segmentation method based on gating fusion |
CN113723411A (en) * | 2021-06-18 | 2021-11-30 | 湖北工业大学 | Feature extraction method and segmentation system for semantic segmentation of remote sensing image |
CN114332636A (en) * | 2022-03-14 | 2022-04-12 | 北京化工大学 | Polarized SAR building region extraction method, equipment and medium |
CN115601605A (en) * | 2022-12-13 | 2023-01-13 | 齐鲁空天信息研究院(Cn) | Surface feature classification method, device, equipment, medium and computer program product |
CN117523410A (en) * | 2023-11-10 | 2024-02-06 | 中国科学院空天信息创新研究院 | Image processing and construction method based on multi-terminal collaborative perception distributed large model |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108537238A (en) * | 2018-04-13 | 2018-09-14 | 崔植源 | A kind of classification of remote-sensing images and search method |
CN109086668A (en) * | 2018-07-02 | 2018-12-25 | 电子科技大学 | Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network |
CN109255334A (en) * | 2018-09-27 | 2019-01-22 | 中国电子科技集团公司第五十四研究所 | Remote sensing image terrain classification method based on deep learning semantic segmentation network |
CN110197182A (en) * | 2019-06-11 | 2019-09-03 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic segmentation method based on contextual information and attention mechanism |
CN110210608A (en) * | 2019-06-05 | 2019-09-06 | 国家广播电视总局广播电视科学研究院 | The enhancement method of low-illumination image merged based on attention mechanism and multi-level features |
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
CN110232696A (en) * | 2019-06-20 | 2019-09-13 | 腾讯科技(深圳)有限公司 | A kind of method of image region segmentation, the method and device of model training |
CN110866526A (en) * | 2018-08-28 | 2020-03-06 | 北京三星通信技术研究有限公司 | Image segmentation method, electronic device and computer-readable storage medium |
-
2020
- 2020-04-28 CN CN202010350688.1A patent/CN111582104B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232394A (en) * | 2018-03-06 | 2019-09-13 | 华南理工大学 | A kind of multi-scale image semantic segmentation method |
CN108537238A (en) * | 2018-04-13 | 2018-09-14 | 崔植源 | A kind of classification of remote-sensing images and search method |
CN109086668A (en) * | 2018-07-02 | 2018-12-25 | 电子科技大学 | Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network |
CN110866526A (en) * | 2018-08-28 | 2020-03-06 | 北京三星通信技术研究有限公司 | Image segmentation method, electronic device and computer-readable storage medium |
CN109255334A (en) * | 2018-09-27 | 2019-01-22 | 中国电子科技集团公司第五十四研究所 | Remote sensing image terrain classification method based on deep learning semantic segmentation network |
CN110210608A (en) * | 2019-06-05 | 2019-09-06 | 国家广播电视总局广播电视科学研究院 | The enhancement method of low-illumination image merged based on attention mechanism and multi-level features |
CN110197182A (en) * | 2019-06-11 | 2019-09-03 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic segmentation method based on contextual information and attention mechanism |
CN110232696A (en) * | 2019-06-20 | 2019-09-13 | 腾讯科技(深圳)有限公司 | A kind of method of image region segmentation, the method and device of model training |
Non-Patent Citations (3)
Title |
---|
MAOKE YANG等: "DenseASPP for Semantic Segmentation in Street Scenes", 《IEEE》 * |
ZHIYING CAO等: "End-to-End DSM Fusion Networks for Semantic Segmentation in High-Resolution Aerial Images", 《IEEE GEOSCIENCE AND REMOTE SENSING LETTERS》 * |
余帅等: "基于多级通道注意力的遥感图像分割方法", 《激光与光电子学进展》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016511A (en) * | 2020-09-08 | 2020-12-01 | 重庆市地理信息和遥感应用中心 | Remote sensing image blue top room detection method based on large-scale depth convolution neural network |
CN112529081A (en) * | 2020-12-11 | 2021-03-19 | 大连大学 | Real-time semantic segmentation method based on efficient attention calibration |
CN112529081B (en) * | 2020-12-11 | 2023-11-07 | 大连大学 | Real-time semantic segmentation method based on efficient attention calibration |
CN112598003A (en) * | 2020-12-18 | 2021-04-02 | 燕山大学 | Real-time semantic segmentation method based on data expansion and full-supervision preprocessing |
CN112598003B (en) * | 2020-12-18 | 2022-11-25 | 燕山大学 | Real-time semantic segmentation method based on data expansion and full-supervision preprocessing |
CN113269787A (en) * | 2021-05-20 | 2021-08-17 | 浙江科技学院 | Remote sensing image semantic segmentation method based on gating fusion |
CN113723411A (en) * | 2021-06-18 | 2021-11-30 | 湖北工业大学 | Feature extraction method and segmentation system for semantic segmentation of remote sensing image |
CN113723411B (en) * | 2021-06-18 | 2023-06-27 | 湖北工业大学 | Feature extraction method and segmentation system for semantic segmentation of remote sensing image |
CN114332636A (en) * | 2022-03-14 | 2022-04-12 | 北京化工大学 | Polarized SAR building region extraction method, equipment and medium |
CN115601605A (en) * | 2022-12-13 | 2023-01-13 | 齐鲁空天信息研究院(Cn) | Surface feature classification method, device, equipment, medium and computer program product |
CN117523410A (en) * | 2023-11-10 | 2024-02-06 | 中国科学院空天信息创新研究院 | Image processing and construction method based on multi-terminal collaborative perception distributed large model |
Also Published As
Publication number | Publication date |
---|---|
CN111582104B (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111582104B (en) | Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network | |
CN108647585B (en) | Traffic identifier detection method based on multi-scale circulation attention network | |
CN111047551B (en) | Remote sensing image change detection method and system based on U-net improved algorithm | |
CN111127493A (en) | Remote sensing image semantic segmentation method based on attention multi-scale feature fusion | |
CN113011427A (en) | Remote sensing image semantic segmentation method based on self-supervision contrast learning | |
CN114117614B (en) | Automatic generation method and system for building elevation texture | |
CN111489396A (en) | Determining camera parameters using critical edge detection neural networks and geometric models | |
CN111242127A (en) | Vehicle detection method with granularity level multi-scale characteristics based on asymmetric convolution | |
CN111626176A (en) | Ground object target detection method and system of remote sensing image | |
CN113313094B (en) | Vehicle-mounted image target detection method and system based on convolutional neural network | |
CN112084923A (en) | Semantic segmentation method for remote sensing image, storage medium and computing device | |
CN115908772A (en) | Target detection method and system based on Transformer and fusion attention mechanism | |
CN110008900A (en) | A kind of visible remote sensing image candidate target extracting method by region to target | |
CN114972947B (en) | Depth scene text detection method and device based on fuzzy semantic modeling | |
CN114926734B (en) | Solid waste detection device and method based on feature aggregation and attention fusion | |
CN112686184A (en) | Remote sensing house change detection method based on neural network | |
CN113591614B (en) | Remote sensing image road extraction method based on close-proximity spatial feature learning | |
CN113963333A (en) | Traffic sign board detection method based on improved YOLOF model | |
CN116797830A (en) | Image risk classification method and device based on YOLOv7 | |
CN116416136A (en) | Data amplification method for ship target detection of visible light remote sensing image and electronic equipment | |
CN112488015B (en) | Intelligent building site-oriented target detection method and system | |
Zhang et al. | Feature enhanced centernet for object detection in remote sensing images | |
Ju et al. | Multiscale feature fusion network for automatic port segmentation from remote sensing images | |
CN114119971B (en) | Semantic segmentation method, system and electronic equipment | |
CN118379731A (en) | Shale microscopic substance intelligent detection method and system based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |