CN111582104A - Semantic segmentation method and device for remote sensing image - Google Patents

Semantic segmentation method and device for remote sensing image Download PDF

Info

Publication number
CN111582104A
CN111582104A CN202010350688.1A CN202010350688A CN111582104A CN 111582104 A CN111582104 A CN 111582104A CN 202010350688 A CN202010350688 A CN 202010350688A CN 111582104 A CN111582104 A CN 111582104A
Authority
CN
China
Prior art keywords
remote sensing
layer
sensing image
segmented
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010350688.1A
Other languages
Chinese (zh)
Other versions
CN111582104B (en
Inventor
付琨
刁文辉
孙显
代贵杰
牛瑞刚
闫梦龙
卢宛萱
郭荣鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202010350688.1A priority Critical patent/CN111582104B/en
Publication of CN111582104A publication Critical patent/CN111582104A/en
Application granted granted Critical
Publication of CN111582104B publication Critical patent/CN111582104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method and a device for semantic segmentation of remote sensing images, comprising the following steps: acquiring a remote sensing image to be segmented; inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network; up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented; the technical scheme provided by the invention can effectively enhance the correlation between modal characteristics and spatial information, improve the perception capability of multi-scale targets to the context information and obtain a more precise semantic annotation result.

Description

Semantic segmentation method and device for remote sensing image
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a method and a device for semantic segmentation of a remote sensing image.
Background
In recent years, with the gradual and intensive research of deep learning technology in the field of image processing, image processing methods based on deep learning, particularly full convolution neural networks, are rapidly developed in the field of remote sensing. In the image processing under the remote sensing scene, the semantic segmentation can acquire the class marking information of the target pixel level, and the method has wide application prospect in the fields of land planning, wartime investigation, environmental monitoring and the like. However, the semantic segmentation method based on deep learning is a data-driven method, and requires a large amount of accurately labeled data. The traditional manual marking mode is high in cost and low in efficiency, so that the improvement of the data marking efficiency and precision is particularly important.
The existing semantic annotation method is sensitive to noise introduced by a complex background in a remote sensing scene, and has poor semantic perception capability on multi-scale ground feature elements. The characteristic receptive field of the convolutional neural network is usually improved by using the porous convolution, however, the existing multi-scale porous structure has limited size and variety of the receptive field, complex surface feature elements in a high-resolution remote sensing scene cannot be labeled, and semantic information is difficult to obtain under the condition that the multi-scale elements cause large-scale difference.
Another idea for enhancing semantic annotation in remote sensing scenes is to utilize the rich features of multiple modality data. However, in the existing method, multi-modal images or features are directly combined or added, the feature learning completely depends on the performance of a convolutional neural network, the differences of inherent data structures and feature complexity degrees of different modes are ignored, redundant features are easily introduced, the marking performance is reduced, and the network scale and parameter quantity are redundant.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method and a device for segmenting the semantics of a remote sensing image, which can effectively enhance the correlation between modal characteristics and spatial information, improve the perception capability of multi-scale targets to the context information and obtain a more refined semantic annotation result.
The purpose of the invention is realized by adopting the following technical scheme:
in a method of semantic segmentation of a remote sensing image, the improvement comprising:
acquiring a remote sensing image to be segmented;
inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented.
Preferably, the process of establishing the pre-established self-attention multi-scale feature aggregation network includes:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
Further, the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
Further, the deep convolutional neural network is an improved Xception network structure, and the improvement process includes: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
Further, the VGG neural network is an improved VGG16 network structure, and the improvement process thereof comprises:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
Further, the self-attention modality calibration module includes: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
Further, the dense multi-scale context aggregation module comprises: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
Further, the self-attention space calibration module comprises: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
Further, the step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
Based on the same invention concept, the invention also provides a remote sensing image semantic segmentation device, and the improvement is that the device comprises:
the acquisition module is used for acquiring a remote sensing image to be segmented;
the segmentation module is used for inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and the adjusting module is used for up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented and obtaining the final prediction result of the remote sensing image to be segmented.
Compared with the closest prior art, the invention has the following beneficial effects:
the invention relates to a method and a device for semantic segmentation of remote sensing images, comprising the following steps: acquiring a remote sensing image to be segmented; inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network; up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented; the relevance of modal characteristics and spatial information can be effectively enhanced, the downward information perception capability of a multi-scale target is improved, and a more precise semantic annotation result is obtained;
the semantic features of the multi-modal data are extracted by using a two-way network in the pre-established self-attention multi-scale feature aggregation network, and the parameter efficiency is improved, the model complexity is reduced, and the generalization capability is improved by using an asymmetric network structure while the annotation precision is improved by using rich modal information.
The self-attention modal calibration module in the pre-established self-attention multi-scale feature aggregation network carries out global semantic display modeling association on features of different modalities, a self-attention mechanism is utilized to weaken redundant features and highlight useful features, and the calibrated modality fusion features can enhance the accuracy of the labeling result.
The dense multi-scale context aggregation module in the pre-established self-attention multi-scale feature aggregation network improves the perception range of the network context semantic information by utilizing convolution of various band-pass rates, and meanwhile dense connection enables effective features of the multi-scale feature map to be denser, thereby being beneficial to refining marking results.
A pre-established self-attention space calibration module in the self-attention multi-scale feature aggregation network calibrates high-level features losing a large amount of space information, introduces features rich in space information at the bottom layers of two modes, and can recover edge information of large-scale ground feature elements and some small-scale ground feature elements after dynamic weighting selection of a self-attention mechanism, so that precision of fine labeling is improved.
Drawings
FIG. 1 is a flow chart of a method for semantic segmentation of remote sensing images provided by the present invention;
FIG. 2 is a schematic structural diagram of a self-attention mode calibration module according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a dense multi-scale context aggregation module in an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a self-attention space calibration module according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a semantic segmentation apparatus for remote sensing images provided by the present invention.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the problems of multi-modal data fusion, difficult multi-scale semantic extraction of a remote sensing scene and the like in the prior art, the invention provides a remote sensing image semantic segmentation method, as shown in figure 1, which comprises the following steps:
101, acquiring a remote sensing image to be segmented;
102, inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
103, up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented.
Specifically, the process of establishing the pre-established self-attention multi-scale feature aggregation network includes:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
Wherein the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
Wherein, the deep convolutional neural network is an improved Xception network structure, and the improvement process comprises: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
The VGG neural network is an improved VGG16 network structure, and the improvement process comprises the following steps:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
In an embodiment of the present invention, as shown in fig. 2, the self-attention modality calibration module includes: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
As shown in fig. 3, the dense multi-scale context aggregation module includes: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
As shown in fig. 4, the self-attention space calibration module includes: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
Further, the step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
Based on the same inventive concept, the invention also provides a semantic segmentation device for remote sensing images, as shown in fig. 5, the device comprises:
the acquisition module is used for acquiring a remote sensing image to be segmented;
the segmentation module is used for inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and the adjusting module is used for up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented and obtaining the final prediction result of the remote sensing image to be segmented.
Preferably, the process of establishing the pre-established self-attention multi-scale feature aggregation network includes:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
Further, the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
Further, the deep convolutional neural network is an improved Xception network structure, and the improvement process includes: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
Further, the VGG neural network is an improved VGG16 network structure, and the improvement process thereof comprises:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
Further, the self-attention modality calibration module includes: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
Further, the dense multi-scale context aggregation module comprises: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
Further, the self-attention space calibration module comprises: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
Further, the step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (10)

1. A semantic segmentation method for remote sensing images is characterized by comprising the following steps:
acquiring a remote sensing image to be segmented;
inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network, and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented, and obtaining the final prediction result of the remote sensing image to be segmented.
2. The method of claim 1, wherein the pre-established self-attention multi-scale feature aggregation network establishment procedure comprises:
step 1, carrying out artificial semantic annotation on remote sensing images in a remote sensing image data set, and dividing the remote sensing image data set into a training set, a verification set and a test set;
step 2, performing data enhancement on the training set;
step 3, slicing the data of the training set, the verification set and the test set into 513x 513;
and 4, training the pre-established self-attention multi-scale feature aggregation initial network by utilizing the training set, the verification set and the test set.
3. The method of claim 2, wherein the pre-established self-attention multi-scale feature aggregation initial network comprises: the system comprises a deep convolutional neural network, a VGG neural network, a self-attention modal calibration module, a dense multi-scale context aggregation module and a self-attention space calibration module;
the deep convolutional neural network is used for extracting the characteristics of the optical image in the remote sensing image;
the VGG neural network is used for extracting the characteristics of digital surface model data in the remote sensing image;
the self-attention modal calibration module is used for performing feature fusion on the features of the optical image and the features of the digital surface model data to obtain a multi-modal feature fusion map;
the dense multi-scale context aggregation module is used for extracting a multi-scale fusion feature map of the multi-modal feature fusion map;
the self-attention space calibration module is used for obtaining an initial prediction result based on the characteristics of the optical image, the characteristics of the digital surface model data and the multi-scale fusion characteristic diagram.
4. The method of claim 3, wherein the deep convolutional neural network is a modified Xception network structure, the modification comprising: reducing the repeated structure of the middle circulation group of the Xcaption network structure to 6 groups, removing the last full connection layer of the Xcaption network structure, replacing all the maximum pooling layers in the Xcaption network structure with depth separable convolution layers with the step length of 2, and replacing the last three-layer depth separable convolution layers of the circulation group at the tail end of the Xcaption network structure with perforated convolution layers with the hole rates of 1,3 and 5 respectively.
5. The method of claim 3, wherein the VGG neural network is a modified VGG16 network structure, the modification comprising:
replacing all convolutional layers of the VGG16 network structure with depth separable convolutional layers, removing the last full-connection layer of the VGG16 network structure, replacing all the maximum pooling layers of the VGG16 network structure with depth separable convolutional layers with the step length of 2, and replacing the last three-layer depth separable convolutional layers of the VGG16 network structure with perforated convolutional layers with the perforation rates of 1,3 and 5 respectively.
6. The method of claim 3, wherein the self-attentive modality calibration module comprises: the system comprises a first merging connection layer, a first global maximum pooling layer, a first full connection layer, a first Relu function layer, a second full connection layer and a first Sigmoid function layer which are connected in sequence.
7. The method of claim 3, wherein the dense multi-scale context aggregation module comprises: a 1x1 convolutional layer, a first 3x3 convolutional layer, a second 3x3 convolutional layer, a third 3x3 convolutional layer, and a second merge-connection layer;
the output ends of the 1x1 convolutional layers are respectively connected with the input ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer, and the output ends of the first 3x3 convolutional layer, the second 3x3 convolutional layer and the third 3x3 convolutional layer are respectively connected with the input end of the second combined connecting layer.
8. The method of claim 3, wherein the self-attention space calibration module comprises: the system comprises a third merging and connecting layer, a second global maximum pooling layer, a third full-connection layer, a second Relu function layer, a fourth full-connection layer and a second Sigmoid function layer.
9. The method of claim 2, wherein step 2 comprises:
and sequentially carrying out random overturning in the horizontal and vertical directions on the training set according to the probability of 0.5, carrying out image random rotation operation of an angle of-20 degrees to 20 degrees and a step pitch of 1 degree, carrying out fixed angle random rotation operation of 90 degrees, 180 degrees and 270 degrees, and carrying out image size random scaling operation of 0.25 to 4 times.
10. A device for semantic segmentation of remote sensing images, the device comprising:
the acquisition module is used for acquiring a remote sensing image to be segmented;
the segmentation module is used for inputting the remote sensing image to be segmented to a pre-established self-attention multi-scale feature aggregation network and obtaining an initial prediction result of the remote sensing image to be segmented output by the pre-established self-attention multi-scale feature aggregation network;
and the adjusting module is used for up-sampling the initial prediction result of the remote sensing image to be segmented to the image size of the remote sensing image to be segmented and obtaining the final prediction result of the remote sensing image to be segmented.
CN202010350688.1A 2020-04-28 2020-04-28 Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network Active CN111582104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010350688.1A CN111582104B (en) 2020-04-28 2020-04-28 Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010350688.1A CN111582104B (en) 2020-04-28 2020-04-28 Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network

Publications (2)

Publication Number Publication Date
CN111582104A true CN111582104A (en) 2020-08-25
CN111582104B CN111582104B (en) 2021-08-06

Family

ID=72120069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010350688.1A Active CN111582104B (en) 2020-04-28 2020-04-28 Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network

Country Status (1)

Country Link
CN (1) CN111582104B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016511A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image blue top room detection method based on large-scale depth convolution neural network
CN112529081A (en) * 2020-12-11 2021-03-19 大连大学 Real-time semantic segmentation method based on efficient attention calibration
CN112598003A (en) * 2020-12-18 2021-04-02 燕山大学 Real-time semantic segmentation method based on data expansion and full-supervision preprocessing
CN113269787A (en) * 2021-05-20 2021-08-17 浙江科技学院 Remote sensing image semantic segmentation method based on gating fusion
CN113723411A (en) * 2021-06-18 2021-11-30 湖北工业大学 Feature extraction method and segmentation system for semantic segmentation of remote sensing image
CN114332636A (en) * 2022-03-14 2022-04-12 北京化工大学 Polarized SAR building region extraction method, equipment and medium
CN115601605A (en) * 2022-12-13 2023-01-13 齐鲁空天信息研究院(Cn) Surface feature classification method, device, equipment, medium and computer program product
CN117523410A (en) * 2023-11-10 2024-02-06 中国科学院空天信息创新研究院 Image processing and construction method based on multi-terminal collaborative perception distributed large model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537238A (en) * 2018-04-13 2018-09-14 崔植源 A kind of classification of remote-sensing images and search method
CN109086668A (en) * 2018-07-02 2018-12-25 电子科技大学 Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network
CN109255334A (en) * 2018-09-27 2019-01-22 中国电子科技集团公司第五十四研究所 Remote sensing image terrain classification method based on deep learning semantic segmentation network
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism
CN110210608A (en) * 2019-06-05 2019-09-06 国家广播电视总局广播电视科学研究院 The enhancement method of low-illumination image merged based on attention mechanism and multi-level features
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
CN110232696A (en) * 2019-06-20 2019-09-13 腾讯科技(深圳)有限公司 A kind of method of image region segmentation, the method and device of model training
CN110866526A (en) * 2018-08-28 2020-03-06 北京三星通信技术研究有限公司 Image segmentation method, electronic device and computer-readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232394A (en) * 2018-03-06 2019-09-13 华南理工大学 A kind of multi-scale image semantic segmentation method
CN108537238A (en) * 2018-04-13 2018-09-14 崔植源 A kind of classification of remote-sensing images and search method
CN109086668A (en) * 2018-07-02 2018-12-25 电子科技大学 Based on the multiple dimensioned unmanned aerial vehicle remote sensing images road information extracting method for generating confrontation network
CN110866526A (en) * 2018-08-28 2020-03-06 北京三星通信技术研究有限公司 Image segmentation method, electronic device and computer-readable storage medium
CN109255334A (en) * 2018-09-27 2019-01-22 中国电子科技集团公司第五十四研究所 Remote sensing image terrain classification method based on deep learning semantic segmentation network
CN110210608A (en) * 2019-06-05 2019-09-06 国家广播电视总局广播电视科学研究院 The enhancement method of low-illumination image merged based on attention mechanism and multi-level features
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism
CN110232696A (en) * 2019-06-20 2019-09-13 腾讯科技(深圳)有限公司 A kind of method of image region segmentation, the method and device of model training

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MAOKE YANG等: "DenseASPP for Semantic Segmentation in Street Scenes", 《IEEE》 *
ZHIYING CAO等: "End-to-End DSM Fusion Networks for Semantic Segmentation in High-Resolution Aerial Images", 《IEEE GEOSCIENCE AND REMOTE SENSING LETTERS》 *
余帅等: "基于多级通道注意力的遥感图像分割方法", 《激光与光电子学进展》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016511A (en) * 2020-09-08 2020-12-01 重庆市地理信息和遥感应用中心 Remote sensing image blue top room detection method based on large-scale depth convolution neural network
CN112529081A (en) * 2020-12-11 2021-03-19 大连大学 Real-time semantic segmentation method based on efficient attention calibration
CN112529081B (en) * 2020-12-11 2023-11-07 大连大学 Real-time semantic segmentation method based on efficient attention calibration
CN112598003A (en) * 2020-12-18 2021-04-02 燕山大学 Real-time semantic segmentation method based on data expansion and full-supervision preprocessing
CN112598003B (en) * 2020-12-18 2022-11-25 燕山大学 Real-time semantic segmentation method based on data expansion and full-supervision preprocessing
CN113269787A (en) * 2021-05-20 2021-08-17 浙江科技学院 Remote sensing image semantic segmentation method based on gating fusion
CN113723411A (en) * 2021-06-18 2021-11-30 湖北工业大学 Feature extraction method and segmentation system for semantic segmentation of remote sensing image
CN113723411B (en) * 2021-06-18 2023-06-27 湖北工业大学 Feature extraction method and segmentation system for semantic segmentation of remote sensing image
CN114332636A (en) * 2022-03-14 2022-04-12 北京化工大学 Polarized SAR building region extraction method, equipment and medium
CN115601605A (en) * 2022-12-13 2023-01-13 齐鲁空天信息研究院(Cn) Surface feature classification method, device, equipment, medium and computer program product
CN117523410A (en) * 2023-11-10 2024-02-06 中国科学院空天信息创新研究院 Image processing and construction method based on multi-terminal collaborative perception distributed large model

Also Published As

Publication number Publication date
CN111582104B (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN111582104B (en) Remote sensing image semantic segmentation method and device based on self-attention feature aggregation network
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN111127493A (en) Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN113011427A (en) Remote sensing image semantic segmentation method based on self-supervision contrast learning
CN114117614B (en) Automatic generation method and system for building elevation texture
CN111489396A (en) Determining camera parameters using critical edge detection neural networks and geometric models
CN111242127A (en) Vehicle detection method with granularity level multi-scale characteristics based on asymmetric convolution
CN111626176A (en) Ground object target detection method and system of remote sensing image
CN113313094B (en) Vehicle-mounted image target detection method and system based on convolutional neural network
CN112084923A (en) Semantic segmentation method for remote sensing image, storage medium and computing device
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN110008900A (en) A kind of visible remote sensing image candidate target extracting method by region to target
CN114972947B (en) Depth scene text detection method and device based on fuzzy semantic modeling
CN114926734B (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN112686184A (en) Remote sensing house change detection method based on neural network
CN113591614B (en) Remote sensing image road extraction method based on close-proximity spatial feature learning
CN113963333A (en) Traffic sign board detection method based on improved YOLOF model
CN116797830A (en) Image risk classification method and device based on YOLOv7
CN116416136A (en) Data amplification method for ship target detection of visible light remote sensing image and electronic equipment
CN112488015B (en) Intelligent building site-oriented target detection method and system
Zhang et al. Feature enhanced centernet for object detection in remote sensing images
Ju et al. Multiscale feature fusion network for automatic port segmentation from remote sensing images
CN114119971B (en) Semantic segmentation method, system and electronic equipment
CN118379731A (en) Shale microscopic substance intelligent detection method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant