CN113283435A - Remote sensing image semantic segmentation method based on multi-scale attention fusion - Google Patents

Remote sensing image semantic segmentation method based on multi-scale attention fusion Download PDF

Info

Publication number
CN113283435A
CN113283435A CN202110528206.1A CN202110528206A CN113283435A CN 113283435 A CN113283435 A CN 113283435A CN 202110528206 A CN202110528206 A CN 202110528206A CN 113283435 A CN113283435 A CN 113283435A
Authority
CN
China
Prior art keywords
fusion
remote sensing
scale
image
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110528206.1A
Other languages
Chinese (zh)
Other versions
CN113283435B (en
Inventor
雷涛
李林泽
加小红
薛丁华
张月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202110528206.1A priority Critical patent/CN113283435B/en
Publication of CN113283435A publication Critical patent/CN113283435A/en
Application granted granted Critical
Publication of CN113283435B publication Critical patent/CN113283435B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image semantic segmentation method based on multi-scale attention fusion, mainly relates to an image segmentation technology and aims to solve the semantic segmentation problem of a high-resolution remote sensing image. The method solves the problem that targets in the remote sensing image are difficult to classify by fusing multi-mode data; an attention mechanism is introduced to redistribute resources in the feature extraction stage, so that redundant features are avoided; the problem of large difference of target scales of remote sensing images is solved by adopting a multi-scale space context module; and the coding end information is reserved and optimized by using a residual jump connection strategy, and the problem of image feature loss during down-sampling is solved. The invention not only can realize semantic segmentation of the high-resolution remote sensing image, but also has higher classification precision, and provides objective and accurate data for understanding and analyzing the high-resolution remote sensing image.

Description

Remote sensing image semantic segmentation method based on multi-scale attention fusion
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a remote sensing image semantic segmentation method based on multi-scale attention fusion.
Background
The high-resolution remote sensing image can provide abundant ground geometric information and ground target details, so that the method is widely applied to ground target classification and identification in complex scenes. Semantic segmentation of high-resolution remote sensing images requires the assignment of specific semantic labels to each pixel, such as: buildings, pavements, automobiles, trees, low vegetation, and the like. Image semantic segmentation may identify multiple objects in an image simultaneously, as opposed to a single object identification. Therefore, the method is widely applied to the fields of military target detection, city planning, building identification, road extraction and the like. However, the existing high-resolution remote sensing image semantic segmentation technology faces the following challenges: firstly, the imaging technology of the high-resolution remote sensing image has unique complexity, and the difficulty and the efficiency of manually distinguishing the ground target are high; secondly, interference from shadows, clouds, and lighting causes large classification errors. Compared with a natural image, the semantic segmentation task of the remote sensing image is more complex. On the one hand, high resolution remote sensing images typically contain many more complex scenes. On the other hand, the remote sensing image has the characteristics of high intra-class variance and low inter-class variance, and different ground objects such as trees and low vegetation can show similar characteristics in the spectral image, which brings challenges to the semantic segmentation task of the high-resolution remote sensing image. However, a Digital Surface Model (DSM) containing rich geographic information provides supplementary information for ground feature classification, and experiments show that the accuracy of segmentation can be significantly improved by fully utilizing DSM data. Therefore, the research of the semantic segmentation network with strong robustness and high accuracy has great significance for understanding the complex scene of the high-resolution remote sensing image.
A number of conventional machine learning methods have been used for remote sensing image analysis. Although these methods can achieve object detection and recognition in a telepresence image, they are less accurate because reliable feature extraction is difficult. In recent years, with the rapid development of deep learning technology, convolutional neural networks have achieved great success in the field of image semantic segmentation. It is well known that convolutional neural networks can provide hierarchical feature representations and learn deep semantic features, which is very important to improve model performance by stacking convolutional layers. In addition, the convolutional neural network can effectively suppress noise interference, thereby enhancing robustness.
Different from the traditional high-resolution remote sensing image semantic segmentation, the addition of multi-modal data further improves the classification precision. In order to reasonably utilize data of two modes, the related art discloses a side-to-side DSM Fusion Network (DSMFNet), which designs four interaction modes to fuse and process multi-mode data, wherein a model with the highest precision extends the strong performance of deep v3+ when extracting RGB image features, and designs a lightweight depth separable convolution module to extract DSM image features separately, and fuses information of different modes before upsampling. Although DSM images contain less information, simple network models cannot extract deeper features, and simply superimposing red, green, blue (RGB) spectral images and DSM images does not take full advantage of the relationship between multimodal information, but introduces redundant features.
In the high-resolution remote sensing image semantic segmentation task, the difference of the target dimension to be segmented is large. Aiming at the problem of target Multi-scale, the related technology also discloses a Multi-scale Adaptive Feature Fusion Network (MANet), wherein the Network uses ResNet101 as a backbone Network to extract image features, and transmits high-level semantic features to a context extraction module, so that the problem that the target size in a remote sensing image is large in difference and difficult to segment is solved. The self-adaptive fusion module fuses high-level semantic information and low-level semantic information, and redistributes the fused resources, so that redundant information is avoided while self-adaptive combination weight is achieved. However, the algorithm does not utilize the spatial relationship between channels when extracting multi-scale features, and for targets with similar semantic features, the relevance of features in classes cannot be emphasized, so that the segmentation precision is low.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a remote sensing image semantic segmentation method based on multi-scale attention fusion, which can realize semantic segmentation of a high-resolution remote sensing image, avoid redundant features, has higher classification precision and provides objective and accurate data for understanding and analyzing the high-resolution remote sensing image.
In order to achieve the above purpose, the technical scheme adopted by the invention comprises the following steps:
1) clipping a data set;
2) inputting the cut IRRG image and the cut DSM image into a multi-modal fusion module to obtain multi-modal fusion characteristics F of each stage0、F1、F2And F3(ii) a A module based on channel attention is introduced into the multi-mode fusion module to extract, recombine and fuse the characteristics, and weight resources are distributed;
3) integration and improvement of multi-modal fusion features F using a multi-scale spatial context enhancement module3Then carrying out primary up-sampling;
4) multi-mode fusion feature F for optimizing encoding end by utilizing residual jump connection strategy0、F1And F2Fusing the characteristics with the decoding end in the corresponding scale and continuously up-sampling to output a segmentation graph;
5) and splicing the segmentation maps according to the size of the original image to finish the semantic segmentation of the remote sensing image.
Further, the IRRG image, the DSM image, and their corresponding label maps are cropped using a sliding window in step 1), and the cropped image size is 256 × 256.
Further, the multi-modal fusion module includes an optical branch, a depth branch, and a coded fusion branch, each optical branch and depth branch providing a set of feature maps at each module stage, the coded fusion branch processing fused data using a fusion from the optical branch and depth branch as input prior to down-sampling.
Further, the multi-modal fusion module implementation includes:
1) features I of IRRG image0And DSM image characteristics D0Respectively input to two ResNet50 pre-trained on ImageNet, and I is0And D0Fusion feature M of0Input to a third ResNet50 pre-trained on ImageNet, the initial stage model MFM-0 is:
Figure BDA0003067100990000031
Figure BDA0003067100990000032
wherein
Figure BDA0003067100990000033
Which represents an addition at the pixel level,
Figure BDA0003067100990000034
means for representing a channel based attention;
2) taking the feature graph output by the three branches in the first stage as the input of the second stage, wherein the fused output detail MFM-1 is as follows:
Figure BDA0003067100990000035
3) taking the feature graph output by the three branches in the second stage as the input of the third stage, wherein the fused output detail MFM-2 is as follows:
Figure BDA0003067100990000036
4) the feature map of the output of the third branch in the third stage is used as the input of the fourth stage, and the fused output detail MFM-3 is:
Figure BDA0003067100990000037
further, the module implementation based on channel attention includes:
1) inputting a characteristic diagram A ═ a1,a2,...,ac]Viewed as channel ai∈RH×WThe vector G is obtained after the whole local average pooling1×1×CAnd kthElement, the model is:
Figure BDA0003067100990000041
integrating global information into a vector G;
2) converting vector G into
Figure BDA0003067100990000042
Wherein, O1∈R1×1×C/2, O2∈R1×1×CDenotes two fully connected convolutional layers, in O1After that, an Activation function is added, and the Activation function is further added through a Sigmoid function sigma (·)
Figure BDA0003067100990000043
Activate, constrain it at [0, 1];
3) A is mixed with
Figure BDA0003067100990000044
Performing outer product to obtain
Figure BDA0003067100990000045
The model is
Figure BDA0003067100990000046
Further, the multi-scale spatial context enhancement module comprises an ASPP module and a non-local module, wherein F represents a feature map processed by the multi-scale spatial context enhancement module, and the model is F-nl (ASPP (F)3))。
Further, the multi-scale spatial context enhancement module implementation comprises:
1) multi-modal fusion feature F of the last stage of the multi-modal fusion module3Input multiscale spatial context enhancement moduleExtracting multi-scale information, combining 3 × 3 convolutions with expansion rates of 3, 6 and 9 with a standard 1 × 1 convolution for multi-scale information extraction, and adding image average pooling integrated global context information;
2) after multi-scale information fusion is carried out by using an ASPP module, the number of channels is reduced to 256 by using 1 multiplied by 1 convolution, and then the channels enter a non-local module;
3) the non-local model is
Figure BDA0003067100990000047
Let the characteristic diagram X ═ X1,x2,...,xn]As an input, where xi∈R1×1×C,xj∈R1×1×CThe feature vectors of the i position and the j position respectively, N ═ hxw represents the number of pixel points, hxw represents the space dimension, F is the same as the number of channels of X, c (X) is normalization operation, g (X)j)=WvxjExpressed as 1 × 1 convolution in the network, f (x)i,xj) (X) is a vector xiAnd vector xjThe normalized correlation of (1) and (2) calculating the spatial similarity, the model is
Figure BDA0003067100990000048
Wherein m (x)i) And n (x)j) Is a linear transformation matrix, m (x)i)=Wqxi, g(xj)=WkxjAll 1 × 1 convolutions in the network;
4) f is upsampled once in a bilinear interpolation.
Further, the residual jump connection model is
Figure BDA0003067100990000049
Figure BDA00030671009900000410
Wherein f islIs 1thThe characteristics of the layers, Tconv is transposed convolution, Activation is the ReLU Activation function, DSC stands for depth separable convolution,
Figure BDA00030671009900000411
for addition at the pixel level, fl-1Is (l-1)thCharacteristic of a layer not being sampled, fl+1Is the result after the residual jump connection processing.
Further, the residual jump connection implementation includes:
1) will lthThe features of the layers are restored to AND (l-1) via a transposed convolution learningthFeatures of the same layer size;
2) will (l-1)thThe features of the layer which are not down sampled are extracted separately and added;
3) features are learned again using depth separable convolution and transmitted to (l +1)thAfter upsampling of the layer.
Further, the features optimized by the residual error jump connection strategy are gradually fused with the features of the decoder to perform continuous upsampling in a manner of a bilinear difference value until a segmentation graph is output.
Compared with the prior art, the invention aims at the problems that the scale difference of the segmented target is larger, the complexity of the segmented scene is stronger, the semantic segmentation of the high-resolution remote sensing image is difficult, and the semantic segmentation effect cannot be improved only by using the frequency spectrum data and the traditional multi-scale feature extraction module, and is based on the coding-decoding structure, firstly, the multi-mode fusion module is used for respectively processing IRRG and DSM data from different modes, and the module based on the channel attention is introduced for redistributing resources between high-level semantic features and low-level semantic features, thereby improving the fusion result of the multi-mode data, solving the problem that the target in the remote sensing image is difficult to classify, avoiding redundant features, utilizing the multi-mode fusion module to respectively extract and fuse the features from the IRRG image and the DSM image, not only solving the problem that the DSM image information cannot be fully utilized due to the unbalanced network structure, the problem that characteristic redundancy is easy to occur in two modal branched encoders is solved, and a better segmentation result can be obtained compared with a mainstream high-resolution remote sensing image semantic segmentation algorithm. And secondly, the fused features are improved by using a multi-scale spatial context enhancement module, and the problem of large target scale difference in the remote sensing image is solved. And finally, the multi-mode fusion information of the encoding end is reserved and optimized by using a residual jump connection strategy, the problem of image feature loss in the down-sampling process is solved, the output result of the encoder is optimized by using the residual jump connection strategy, effective feature mapping is provided for the decoding end, and the precision of the target contour in the high-resolution remote sensing image is improved while the multi-mode information fusion is optimized. Compared with a mainstream remote sensing image semantic segmentation algorithm, the method improves the pixel classification precision by utilizing the image depth data and the multi-scale spatial context enhancement module on one hand, and improves the outline precision of the segmented target while accurately positioning the target by utilizing a residual jump connection strategy on the other hand. The method can realize semantic segmentation of the high-resolution remote sensing image, has higher classification precision, and has wide application prospect in the field of scene understanding and analysis of the high-resolution remote sensing image.
Drawings
FIG. 1 is a diagram of a network architecture of the present invention;
FIG. 2a is a block diagram of the MFM-0 stage of the multimodal fusion module of the present invention; FIG. 2b is a block diagram of the MFM-n (n e [1,3]) stage of the multimodal fusion module of the present invention;
FIG. 3 is a block diagram of the present invention based on channel attention;
FIG. 4 is a block diagram of the multi-scale spatial context enhancement module of the present invention;
FIG. 5 is a diagram of the residual jump connection strategy architecture of the present invention;
FIG. 6 is a sample image of a Potsdam dataset and a Vaihingen dataset;
FIG. 7 is a graph comparing the segmentation results of the Potsdam dataset slices according to the present invention and the prior art method;
FIG. 8 is a graph comparing the segmentation results of Potsdam datasets for the present invention and prior art methods;
FIG. 9 is a graph comparing the segmentation results of slices of the Vaihingen data set according to the present invention and the prior art method;
FIG. 10 is a graph comparing the results of the present invention and prior art methods in the segmentation of the Vaihingen data set.
Detailed Description
The present invention will be further explained with reference to the drawings and specific examples in the specification, and it should be understood that the examples described are only a part of the examples of the present application, and not all examples. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.
The invention provides a remote sensing image semantic segmentation method based on Multi-scale Attention Fusion, designs a Multi-scale Attention Fusion Network (MAFNet) for high-resolution remote sensing image semantic segmentation, relates to technologies such as a convolutional neural Network and Multi-mode data Fusion, can be applied to the semantic segmentation problem of high-resolution remote sensing images, and lays a foundation for scene understanding of the high-resolution remote sensing images.
Referring to fig. 1, the Multi-scale Attention Fusion network of the present invention is based on an encoding-decoding structure, and first uses a Multi-Modal Fusion Module (MFM) to process IRRG and DSM data from different modalities respectively, and introduces a Channel Attention-based Module (MCA) to redistribute resources between high-level semantic features and low-level semantic features, thereby improving the Fusion result of Multi-modal data, solving the problem that objects in remote sensing images are difficult to classify, and avoiding redundant features. And secondly, a Multi-scale Spatial Context Enhancement Module (MSCEM) is used for improving the characteristics after fusion, so that the problem of large difference of target scales in the remote sensing image is solved. And finally, a Residual Skip Connection strategy (RSC) is utilized to reserve and optimize multi-mode fusion information of the encoding end, and the problem that image characteristics are lost during down-sampling is solved. Compared with a mainstream remote sensing image semantic segmentation algorithm, the method improves the pixel classification precision by utilizing the image depth data and the multi-scale spatial context enhancement module on one hand, and improves the outline precision of the segmented target while accurately positioning the target by utilizing a residual jump connection strategy on the other hand. The method can realize semantic segmentation of the high-resolution remote sensing image, has higher classification precision, and has wide application prospect in the field of scene understanding and analysis of the high-resolution remote sensing image.
The present invention takes into account that higher segmentation accuracy is obtained using near infrared, red, green (IRRG) data than RGB data, and therefore uses an IRRG image and a normalized depth image (DSM) as data sources. The symmetrical structure of the encoder and the decoder with reasonable design is used for reference. Firstly, in the encoding process, the proposed model utilizes a multi-mode fusion module to respectively extract features from an IRRG image and a DSM image, and multi-mode data are fused at each sampling stage; secondly, a multi-scale spatial context enhancement module is introduced at the end of the encoding stage, and global spatial information of different-scale targets is integrated and improved. Finally, learning and optimizing multi-modal characteristics by using a residual error jump connection strategy, and performing upsampling and outputting a segmentation graph by fusing the multi-modal characteristics with decoding characteristics, wherein the method comprises the following steps:
(1) clipping a data set;
(2) inputting the cut IRRG image and the cut DSM image into a multi-modal fusion module to obtain multi-modal fusion characteristics F of each stage0、F1、F2And F3(ii) a A module based on channel attention is introduced into the multi-mode fusion module to extract, recombine and fuse the characteristics, and weight resources are distributed;
(3) integration and improvement of multi-modal fusion features F using a multi-scale spatial context enhancement module3Then carrying out primary up-sampling;
(4) optimization of multi-modal fusion characteristics F of encoding end by using residual jump connection strategy0、F1And F2Fusing the characteristics with the decoding end in the corresponding scale and continuously up-sampling to output a segmentation graph;
(5) and (5) splicing and outputting the segmentation images according to the size of the original image, namely completing the semantic segmentation of the remote sensing image.
The invention is described in detail below, comprising the steps of:
(1) the IRRG image, the DSM image, and their corresponding label maps are cropped using a sliding window, and the image size after cropping is 256 × 256.
(2) The cropped IRRG image and the cropped DSM image are input to a multi-modal fusion module, which includes an optical branch, a depth branch and a coding fusion branch, image features are extracted at the optical branch and the depth branch, each optical branch and depth branch provides a set of feature maps at each module stage, on this basis, a third coding fusion branch is introduced, which is used to process the fused data, see fig. 2a, which shows that the coding fusion branch takes the fusion from the optical branch and the depth branch as input before down-sampling, residual learning is performed by a convolution block, and the residual utilizes the sum of the feature maps of the other two encoders until the stage MFM-3. MFM-n (n ∈ [1,3]) is structured as shown in FIG. 2b, with three pre-trained ResNet50 extracting features from three branches and then fusing the features using the same pattern as the MFM-0 stage. Before the feature map addition, a module based on channel attention is introduced, and the last downsampling is abandoned in the encoding stage. The specific implementation mode is as follows:
(a) features I of IRRG image0And DSM image characteristics D0Respectively input to two ResNet50 pre-trained on ImageNet, and I is0And D0Fusion feature M of0Input to a third ResNet50 pre-trained on ImageNet. Initial stage model MFM-0 of
Figure BDA0003067100990000081
Wherein
Figure BDA0003067100990000082
Which represents an addition at the pixel level,
Figure BDA0003067100990000083
representing modules based on channel attention.
(b) The feature diagram output by three branches in the first stage is used as the input of the second stage, and the fused output detail MFM-1 is
Figure BDA0003067100990000084
(c) The feature diagram output by the three branches of the second stage is taken as the input of the third stage, and the fused output detail MFM-2 is
Figure BDA0003067100990000085
(d) The feature diagram output by the three branches of the third stage is taken as the input of the fourth stage, and the fused output detail MFM-3 is
Figure BDA0003067100990000086
(3)
Figure BDA0003067100990000091
A channel attention-based module is shown for extracting, reorganizing and fusing features, assigning weighting resources on a more meaningful feature map. As shown in fig. 3, the specific implementation is as follows:
(a) inputting a characteristic diagram A ═ a1,a2,...,ac]Viewed as channel ai∈RH×WCombinations of (a) and (b). Firstly, Global Average Pooling (GAP) is carried out to obtain a vector G epsilon R1×1×CAnd k thereofthElement, model is
Figure BDA0003067100990000092
This operation integrates global information into the vector G.
(b) Next, the vector is converted to
Figure BDA0003067100990000093
Wherein O is1∈R1×1×C/2, O2∈R1×1×CDenotes two fully connected convolutional layers, in O1Activation functions are also added afterwards, which create channel dependencies on feature extraction. Will be further processed by Sigmoid function σ (·)
Figure BDA0003067100990000094
ActivationWhich is constrained to [0, 1 ]]。
(c) Finally, A and
Figure BDA0003067100990000099
performing outer product to obtain
Figure BDA0003067100990000095
The model is
Figure BDA0003067100990000096
Figure BDA0003067100990000097
ReLU remaps the original channels to the new channels, adding nonlinear elements in an adaptive fashion, resulting in a better fit of the network. During the process of network learning, the module can suppress redundant features and recalibrate the weights to be further optimized on a more meaningful feature map.
(4) Referring to fig. 4, the multi-scale Spatial context enhancement module is composed of an ASPP (associated Spatial gradient) module and a non-local module, where F represents a characteristic diagram processed by the multi-scale Spatial context enhancement module, and the model is F ═ nl (ASPP (F)3) Concrete implementation is as follows:
(a) fusing feature F of the last stage of the multi-modal fusion module3And the input multi-scale space context enhancing module extracts multi-scale information. Due to F3The size of the expansion convolution is 16 multiplied by 16, the expansion convolution with different expansion rates is to perform operation on the same feature map and perform fusion output, the fusion should cover the whole feature map, therefore, the 3 multiplied by 3 convolution with the expansion rates of 3, 6 and 9 and a standard 1 multiplied by 1 convolution are combined to perform multi-scale information extraction, and then an image average pooling integrated global context information is added. Better efficiency and performance without increasing the number of parameters.
(b) After multi-scale information fusion is carried out by using an ASPP module, the number of channels is reduced to 256 by using 1 multiplied by 1 convolution, and then the channels enter a non-local module.
(c) The non-local model is
Figure BDA0003067100990000098
Let the characteristic diagram X ═ X1,x2,...,xn]As an input, where xi∈R1×1×C,xj∈R1×1×CThe feature vectors for the i and j positions, respectively. N — H × W represents the number of pixels, and H × W represents a spatial dimension. F is the same as X in number, C (X) is normalization, g (X)j)=WvxjExpressed as a 1 × 1 convolution in the network. Second, f (x)i,xj) (X) is a vector xiAnd vector xjThe normalized correlation of (1) and (2) calculating the spatial similarity, the model is
Figure BDA0003067100990000101
Wherein m (x)i) And n (x)j) Is a linear transformation matrix, m (x)i)=Wqxi, g(xj)=WkxjAll in the network are 1 × 1 convolutions. The module establishes a relationship between any two spatial positions, and improves semantic feature expression.
(d) F is upsampled once in a bilinear interpolation.
The application of a global context remote dependence strategy is important in semantic segmentation of multi-classification high-resolution remote sensing images, and in order to better utilize spatial information of a multi-scale feature map, non-local is introduced after multi-scale integration information. The DSM data also provides ancillary physical properties for specific classes in the remote sensing image, and spatial relationships may enhance local properties of the feature map by clustering dependencies on other pixel locations. For the targets with similar semantic features, the relevance of the features in the class is enhanced by the aid of the strategy of the relation context, and the module combines global and local information to enable semantic segmentation results of the high-resolution remote sensing image to be more accurate.
(5) FIG. 5 is a diagram showing a residual jump link strategy, (l-1)thLayer characteristics are transferred to (l +1) via a jump connectionthLayer whose characteristics are passed on to l via downsamplingthThe layer is further transmitted to (l +1) through the up-samplingthLayer, the process being repeatedAnd (4) calculating. The loss of low resolution information results in blurring of the segmentation boundaries. The conventional jump connection directly transmits the high-resolution feature map to a decoder without learning of any convolution layer, and effective information of a coding end is continuously lost in a down-sampling process, so that a network model obtained by final learning cannot effectively map the high-resolution information. The residual jump connection model is
Figure BDA0003067100990000102
The specific implementation mode is as follows:
(a) will lthThe features of the layers are restored to AND (l-1) via a transposed convolution learningthFeatures with the same layer size.
(b) Will (l-1)thFeatures that are not down-sampled by the layer are individually extracted and added to them.
(c) Features are learned again using depth separable convolution and transmitted to (l +1)thAfter upsampling of the layer.
Features F of the first three stages obtained by the multi-modal fusion module0、F1And F2The information is fed back to the decoder in its entirety using this strategy. This approach allows for higher levels of features. Wherein f islIs 1thThe characteristics of the layers, Tconv is transposed convolution, Activation is the ReLU Activation function, DSC stands for depth separable convolution,
Figure BDA0003067100990000103
for addition at the pixel level, fl-1Is (l-1)thFeature of a layer not downsampled, fl+1Is the result after the residual jump connection processing.
(6) And gradually fusing the features optimized by the residual jump connection strategy with the features of the decoder to perform continuous upsampling in a bilinear difference mode until a segmentation map is output.
(7) And splicing and outputting the segmentation images according to the size of the original image.
In order to verify the validity of the semantic segmentation of the high-resolution remote sensing image, the method respectively performs urban ground feature classification experiments on the Vaihingen data set and the Potsdam data set of the two public data sets, and verifies the performance of the model by utilizing an evaluation index evaluation result.
The Potsdam dataset includes 38 images, each having three bands corresponding to near Infrared (IR), red (R), and green (G), respectively. The data set also provides a digital surface model and a normalized digital surface model corresponding to the image slice. The image slices had a spatial resolution of 5 cm and were all 6000 x 6000 pixels in size. Six categories (road, building, low vegetation, trees, cars, and clutter) have been marked in pixels on 24 tagged images. The present invention uses both IRRG and DSM data types. Image numbers 5_12, 6_7, and 7_9 are selected for verification, image numbers 5_10 and 6_8 are selected for testing, and the remaining images are selected for training.
The Vaihingen data set includes 33 images with a spatial resolution of 9 centimeters. The band for each image was the same as the Potsdam dataset, with an average size of 2494 × 2064 pixels. Only 16 images have ground truth labels that also contain the same six categories as the Potsdam dataset. Both IRRG and DSM data types are also used. Five images were used as a test set to evaluate the network model of the present invention, image numbers 11, 15, 28, 30, 34, 3 images were used as a validation set, numbers 7, 23, 37, and the remaining images were used for training. Fig. 6 is a sample image, a digital surface model and corresponding labels from these two datasets.
Due to GPU memory limitations, the size of the image in the dataset needs to be changed to adapt the network model of the present invention. Each image is cropped to 256 × 256 pixels with an overlap of 128 pixels, and the prediction results are finally stitched. The present invention uses data enhancement to reduce the risk of overfitting, including random flipping (vertical and horizontal) and random rotation (0 °, 90 °, 180 °, 270 °) on all training images. The enhanced data can effectively prevent the model from being over-fitted, and the robustness of the model is improved. The invention is built by using a deep learning framework PyTorch. ResNet50 pre-trained on ImageNet acts as the backbone network. The operating system is Windows 10, the processors are Intel (R) Xeon (R) CPUs E5-1620 v4, the proposed MAFNet is trained on two NVIDIA GeForce GTX 1080 graphics processors, each having 8GB of memory. The invention uses a cross entropy loss and random gradient descent optimizer with momentum of 0.9 and weight attenuation of 0.004 to optimize the network. The initial learning rate is 1e-3, multiplied by 0.98 at the end of each epoch. The total batch was set to 16 and 250 epochs were used to train the network.
In order to further show the superiority of the invention, the method for comparing and visualizing the result of the remote sensing image semantic segmentation mainstream algorithm based on deep learning comprises the following steps: deeplab v3+, APPD, MANet, DSMFNet, and REMSNet.
In order to compare the performances of different algorithms, the Overall Accuracy (OA) and F1 Score are selected as evaluation indexes, and the larger the values of OA and F1 Score are, the better the segmentation result is.
Comparative experiments for different methods were performed on the Potsdam dataset and the results are shown in table 1:
TABLE 1 results of comparative experiments on Potsdam data set for different methods
Method Imp.Surf. Building Low veg. Tree Car Mean F1 OA
DeepLab v3+ 89.88 93.78 83.23 81.66 93.50 88.41 87.72
APPD 90.80 94.56 84.37 85.14 94.42 89.86 88.42
MANet 91.33 95.91 85.88 87.01 91.46 90.32 89.19
DSMFNet 93.03 95.75 86.33 86.46 94.88 91.29 90.36
REMSNet 93.48 96.17 87.52 87.97 95.03 92.03 90.79
MAFNet 93.61 96.26 87.87 88.65 95.32 92.34 91.04
In the experiments with the Potsdam dataset, we calculated F1 Score, mean F1 Score and Overall Accuracy (OA) for each class. As shown in Table 1, the average F1 Score and the overall accuracy of the method reach 92.34% and 91.04%, respectively, which are superior to other algorithms in all evaluation indexes. The Potsdam data set scene is relatively complex, the tree and low vegetation are difficult to classify, compared with DeepLab v3+, the classification of the tree is improved by 7.0%, and the classification of other categories is correspondingly improved, so that the MAFNet can capture targets with different scales by utilizing global context space information.
FIG. 7 is a segmentation result visualization on Potsdam dataset slices by the present invention and other methods. From unmarked artwork, it can be seen that trees and low vegetation are very similar, and some areas or even the human eye has failed to classify them correctly. However, from the comparison of the dashed boxes, it can be found that the present invention can obtain better segmentation results in the classification of trees and low vegetation, which also verifies the superior performance of the present invention. Secondly, the proposed MAFNet segmentation results are also more refined for small targets like vehicles. The residual jump connection strategy provided by the invention solves the problem that small targets are easy to be wrongly divided due to information loss caused by down-sampling, also enhances the semantic characteristics of the attributes in the class for large targets, and reduces wrong classification.
Fig. 8 is a visualization of the overall classification result of the 5_10 region in the Potsdam dataset, which can clearly distinguish the regions and distribution rules of all categories, and has practical significance for city planning.
Comparative experiments for different methods were performed on the Vaihingen dataset, see table 2 for comparative results:
TABLE 2 results of comparative experiments on the Vaihingen data set for different methods
Method Imp.Surf. Building Low veg. Tree Car Mean F1 OA
DeepLab v3+ 87.67 93.95 79.17 86.26 80.34 85.48 87.22
APPD 88.78 93.38 80.43 86.76 80.88 86.05 87.71
MANet 90.12 94.08 81.01 87.21 81.16 86.72 88.17
DSMFNet 91.47 95.08 82.11 88.61 81.01 87.66 89.80
REMSNet 92.01 95.67 82.35 89.73 81.26 88.20 90.08
MAFNet 92.06 96.12 82.71 90.01 82.13 88.61 90.27
Table 2 evaluation results F1 Score, mean F1 Score and Overall Accuracy (OA) were calculated for each category. As shown in table 2, the average F1 Score and overall accuracy of the proposed MAFNet are 88.61% and 90.27%, respectively, which is superior to other algorithms. Especially in the automobile category, the residual error jump connection strategy provided by the invention effectively retains the information of the small objects. DSM data is added into the network model input, and the method is improved on other class classification with auxiliary physical space height information, so that the problem that the high-resolution remote sensing image target is difficult to classify is solved, and the data fusion among different modes is verified to help the classification of the remote sensing image ground features. The result shows that the method has stronger capacity in a complex high-resolution remote sensing scene, the multi-scale space context enhancement module solves the problem of larger scale difference of the segmented targets, effectively extracts the characteristics of the targets with different scales, and can still realize correct segmentation even when the target has small occupation ratio in one region and has higher similarity with other class targets.
Figure 9 is a visualization of the results of terrain classification using different algorithms on the Vaihingen test set. Trees and low vegetation have high similarity and thus cause difficulty in classification. The dashed bounding box shows that the invention not only can better distinguish areas with high similarity, but also retains all the information of small objects. The proposed MAFNet can also reduce interference factors to some extent for factors such as lighting, shadows, etc., for example, in the fourth row, trees under shadow can also be correctly classified.
FIG. 10 is a complete area diagram after slice splicing, the second behavior is compared with other algorithms and results of the invention, and experimental results show that the invention can play a better role in scene analysis of complex high-resolution remote sensing images.
The present invention decomposes and combines the proposed modules and further verifies the effectiveness of the different modules with F1 Score and overall accuracy. Ablation experiments used the Vaihingen dataset. Firstly, the first model uses two ResNet50 to extract the features of different modal data respectively, and the features are fused after the last residual block, and a segmentation graph is output through continuous up-sampling. No interaction between different data is done during feature extraction. And secondly, verifying MFM, adding a multi-mode fusion module, reasonably distributing feature resources by using an attention mechanism during feature extraction by two ResNet50, continuously performing information fusion, introducing a third ResNet50 to process fusion branches, and finally continuously upsampling a feature map fused by the three branches to obtain a final segmentation map. In the third model, in the encoding stage, the ResNet50 fused feature map is input into a multi-scale space context enhancement module to obtain a new feature map, and in the decoding stage, continuous up-sampling is carried out until the final output is obtained. And then combining ResNet50 and a residual jump connection strategy for the fourth model, verifying the fusion of a decoding end and preserving the validity of information of a coding end, sequentially corresponding the first three down-sampled information of feature extraction with an up-sampling stage by utilizing a residual learning strategy, and outputting the final prediction. Finally, all modules were integrated together and all results of the ablation experiments are shown in table 3:
TABLE 3 results of ablation experiments performed on the Vaihingen dataset
Models Imp.Surf. Building Low veg. Tree Car Mean F1 OA
Res50 86.94 89.67 75.83 84.42 77.40 82.85 84.98
Res50+MFM 88.15 93.84 76.49 86.48 78.02 84.60 86.66
Res50+MSCEM 88.79 93.09 79.79 85.55 80.38 85.52 87.35
Res50+RSC 90.11 92.97 80.24 86.04 81.14 86.10 87.82
MAFNet 92.06 96.12 82.71 90.01 82.13 88.61 90.27
The results in table 3 show that the average F1 Score of "Res 50+ MFM" is improved by 1.8% compared with that of Res net50, the overall accuracy is improved by 1.7%, an attention mechanism is introduced before data fusion of different modes to solve the problem of feature map weight distribution, the validity of multi-mode data information is verified at the same time, and the segmentation accuracy can be improved by efficiently fusing the features. The average F1 Score and the overall accuracy of the 'Res 50+ MSCEM' are improved by 2.7% and 2.4% compared with the ResNet50, the multi-scale space context enhancement module improves the performance of a backbone network, effectively obtains all information in an image, enhances the relevance among different categories, and solves the problem that multi-scale targets in a remote sensing image are difficult to extract. Compared with ResNet50, the average F1 Score of Res50+ RSC is improved by 3.3%, the overall accuracy is improved by 2.8%, and compared with the common jump connection, the novel residual jump connection strategy not only enhances the characteristics output by a coding end, but also provides better characteristic fusion for a decoding end. In addition, all modules are integrated, and compared with an initial network model, the average F1 Score and the overall accuracy of the proposed MAFNet are respectively improved by 5.8% and 5.3%, so that the semantic segmentation effect of the high-resolution remote sensing image can be remarkably improved.
In conclusion, the remote sensing image semantic segmentation method based on multi-scale attention fusion solves the problem that targets in remote sensing images are difficult to classify by fusing multi-mode data; an attention mechanism is introduced to reallocate resources in the characteristic extraction stage, so that redundant characteristics are avoided; the problem of large difference of target scales of the remote sensing image is solved by adopting a multi-scale space context module; and the coding end information is reserved and optimized by using a residual jump connection strategy, and the problem of image feature loss during down-sampling is solved. The invention not only can realize semantic segmentation of the high-resolution remote sensing image, but also has higher classification precision, and provides objective accurate data for understanding and analysis of the high-resolution remote sensing image.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A remote sensing image semantic segmentation method based on multi-scale attention fusion is characterized by comprising the following steps:
1) clipping a data set;
2) inputting the cut IRRG image and the cut DSM image into a multi-modal fusion module to obtain multi-modal fusion characteristics F of each stage0、F1、F2And F3(ii) a A module based on channel attention is introduced into the multi-mode fusion module to extract, recombine and fuse the characteristics, and weight resources are distributed;
3) integration and improvement of multi-modal fusion features F using a multi-scale spatial context enhancement module3Then carrying out primary up-sampling;
4) multi-mode fusion feature F for optimizing encoding end by utilizing residual jump connection strategy0、F1And F2Fusing the feature with the scale corresponding to the decoding end and continuously up-sampling to output a segmentation graph;
5) and splicing the segmentation maps according to the size of the original image to finish the semantic segmentation of the remote sensing image.
2. The method for semantically segmenting the remote sensing image based on the multi-scale attention fusion as claimed in claim 1, wherein the IRRG image, the DSM image and the corresponding label map are cut by using a sliding window in the step 1), and the size of the cut image is 256 x 256.
3. The method for semantically segmenting the remote sensing image based on the multi-scale attention fusion as claimed in claim 1, wherein the multi-modal fusion module comprises an optical branch, a depth branch and a coding fusion branch, each optical branch and depth branch provides a set of feature mapping at each module stage, the coding fusion branch takes the fusion from the optical branch and the depth branch as input before down-sampling, and processes the fused data.
4. The method for semantic segmentation of remote sensing images based on multi-scale attention fusion as claimed in claim 3, characterized in that the implementation manner of the multi-modal fusion module comprises:
1) features I of IRRG image0And DSM image characteristics D0Respectively input to two ResNet50 pre-trained on ImageNet, and I is0And D0Fusion feature M of0Input to a third ResNet50 pre-trained on ImageNet, the initial stage model MFM-0 is:
Figure FDA0003067100980000011
Figure FDA0003067100980000012
wherein
Figure FDA0003067100980000013
Which represents an addition at the pixel level,
Figure FDA0003067100980000014
means for representing a channel based attention;
2) taking the feature map output by the three branches in the first stage as the input of the second stage, wherein the fused output detail MFM-1 is as follows:
Figure FDA0003067100980000021
3) taking the feature graph output by the three branches in the second stage as the input of the third stage, wherein the fused output detail MFM-2 is as follows:
Figure FDA0003067100980000022
4) the feature map of the output of the third branch in the third stage is used as the input of the fourth stage, and the fused output detail MFM-3 is:
Figure FDA0003067100980000023
5. the method for semantically segmenting the remote sensing image based on the multi-scale attention fusion as claimed in claim 4, wherein the module implementation manner based on the channel attention comprises:
1) inputting a characteristic diagram A ═ a1,a2,...,ac]Viewed as channel ai∈RH×WTo obtain a vector G epsilon R after global average pooling1×1×CAnd kthElement, the model is:
Figure FDA0003067100980000024
integrating global information into a vector G;
2) converting vector G into
Figure FDA0003067100980000025
Wherein, O1∈R1×1×C/2,O2∈R1×1×CDenotes two fully connected convolutional layers, in O1After that, an Activation function is added, and the Activation function is further added through a Sigmoid function sigma (·)
Figure FDA0003067100980000026
Activate, constrain it at [0, 1];
3) A is mixed with
Figure FDA0003067100980000027
Performing outer product to obtain
Figure FDA0003067100980000028
The model is
Figure FDA0003067100980000029
6. The method for semantically segmenting the remote sensing image based on multi-scale attention fusion as claimed in claim 1, wherein said multi-scale spatial context enhancement module comprises ASPP module and non-local module, F represents the semantic segmentation process through multi-scale attention fusionThe feature map processed by the spatial context enhancement module is F ═ nl (ASPP (F)3))。
7. The method for semantically segmenting the remote sensing image based on the multi-scale attention fusion as claimed in claim 6, wherein the implementation manner of the multi-scale spatial context enhancement module comprises:
1) multi-modal fusion feature F of the last stage of the multi-modal fusion module3Inputting a multi-scale space context enhancement module to extract multi-scale information, combining 3 × 3 convolutions with expansion rates of 3, 6 and 9 with a standard 1 × 1 convolution to extract the multi-scale information, and adding an image average pooling integrated global context information;
2) after multi-scale information fusion is carried out by using an ASPP module, the number of channels is reduced to 256 by using 1 multiplied by 1 convolution, and then the channels enter a non-local module;
3) the non-local model is
Figure FDA0003067100980000031
Let the characteristic diagram X ═ X1,x2,…,xn]As an input, where xi∈R1×1×C,xj∈R1×1×CThe feature vectors of the i position and the j position respectively, N ═ hxw represents the number of pixel points, hxw represents the space dimension, F is the same as the number of channels of X, c (X) is normalization operation, g (X)j)=WvxjExpressed as 1 × 1 convolution in the network, f (x)i,xj) (X) is a vector xiAnd vector xjThe normalized correlation of (1) and (2) calculating the spatial similarity, the model is
Figure FDA0003067100980000032
Wherein m (x)i) And n (x)j) Is a linear transformation matrix, m (x)i)=Wqxi,g(xj)=WkxjAll 1 × 1 convolutions in the network;
4) f is upsampled once in a bilinear interpolation.
8. The method for semantically segmenting the remote sensing image based on the multi-scale attention fusion as claimed in claim 1, wherein the residual jump connection model is
Figure FDA0003067100980000033
Figure FDA0003067100980000034
Wherein fl is the characteristics of the lth layer, Tconv is the transposed convolution, Activation is the ReLU Activation function, DSC represents the depth separable convolution,
Figure FDA0003067100980000035
for addition at the pixel level, fl-1Is (l-1)thFeature of a layer not downsampled, fl+1Is the result after the residual jump connection processing.
9. The method for semantically segmenting the remote sensing image based on the multi-scale attention fusion as claimed in claim 8, wherein the implementation manner of residual jump connection comprises:
1) will lthThe features of the layers are restored to AND (l-1) via a transposed convolution learningthFeatures of the same layer size;
2) will (l-1)thThe features of the layer which are not down sampled are extracted separately and added;
3) features are learned again using depth separable convolution and transmitted to (l +1)thAfter upsampling of the layer.
10. The remote sensing image semantic segmentation method based on multi-scale attention fusion of claim 9, characterized in that the features optimized by the residual jump connection strategy are gradually fused with decoder features to perform continuous upsampling in a bilinear difference manner until a segmentation map is output.
CN202110528206.1A 2021-05-14 2021-05-14 Remote sensing image semantic segmentation method based on multi-scale attention fusion Active CN113283435B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110528206.1A CN113283435B (en) 2021-05-14 2021-05-14 Remote sensing image semantic segmentation method based on multi-scale attention fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110528206.1A CN113283435B (en) 2021-05-14 2021-05-14 Remote sensing image semantic segmentation method based on multi-scale attention fusion

Publications (2)

Publication Number Publication Date
CN113283435A true CN113283435A (en) 2021-08-20
CN113283435B CN113283435B (en) 2023-08-22

Family

ID=77279332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110528206.1A Active CN113283435B (en) 2021-05-14 2021-05-14 Remote sensing image semantic segmentation method based on multi-scale attention fusion

Country Status (1)

Country Link
CN (1) CN113283435B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113850824A (en) * 2021-09-27 2021-12-28 太原理工大学 Remote sensing image road network extraction method based on multi-scale feature fusion
CN113887470A (en) * 2021-10-15 2022-01-04 浙江大学 High-resolution remote sensing image ground object extraction method based on multitask attention mechanism
CN114387439A (en) * 2022-01-13 2022-04-22 中国电子科技集团公司第五十四研究所 Semantic segmentation network based on fusion of optical and PolSAR (polar synthetic Aperture Radar) features
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and equipment for estimating optical flow
CN115424023A (en) * 2022-11-07 2022-12-02 北京精诊医疗科技有限公司 Self-attention mechanism module for enhancing small target segmentation performance
CN115546649A (en) * 2022-10-24 2022-12-30 中国矿业大学(北京) Single-view remote sensing image height estimation and semantic segmentation multi-task prediction method
CN116188497A (en) * 2023-04-27 2023-05-30 成都国星宇航科技股份有限公司 Method, device, equipment and storage medium for optimizing generation of DSM (digital image model) of stereo remote sensing image pair
CN116307267A (en) * 2023-05-15 2023-06-23 成都信息工程大学 Rainfall prediction method based on convolution
CN116363134A (en) * 2023-06-01 2023-06-30 深圳海清智元科技股份有限公司 Method and device for identifying and dividing coal and gangue and electronic equipment
CN116452936A (en) * 2023-04-22 2023-07-18 安徽大学 Rotation target detection method integrating optics and SAR image multi-mode information
CN116740362A (en) * 2023-08-14 2023-09-12 南京信息工程大学 Attention-based lightweight asymmetric scene semantic segmentation method and system
CN117274608A (en) * 2023-11-23 2023-12-22 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN117635953A (en) * 2024-01-26 2024-03-01 泉州装备制造研究所 Multi-mode unmanned aerial vehicle aerial photography-based real-time semantic segmentation method for power system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
CN111563418A (en) * 2020-04-14 2020-08-21 浙江科技学院 Asymmetric multi-mode fusion significance detection method based on attention mechanism
US20200357143A1 (en) * 2019-05-09 2020-11-12 Sri International Semantically-aware image-based visual localization
CN112132006A (en) * 2020-09-21 2020-12-25 西南交通大学 Intelligent forest land and building extraction method for cultivated land protection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200357143A1 (en) * 2019-05-09 2020-11-12 Sri International Semantically-aware image-based visual localization
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
CN111563418A (en) * 2020-04-14 2020-08-21 浙江科技学院 Asymmetric multi-mode fusion significance detection method based on attention mechanism
CN112132006A (en) * 2020-09-21 2020-12-25 西南交通大学 Intelligent forest land and building extraction method for cultivated land protection

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113850824B (en) * 2021-09-27 2024-03-29 太原理工大学 Remote sensing image road network extraction method based on multi-scale feature fusion
CN113850824A (en) * 2021-09-27 2021-12-28 太原理工大学 Remote sensing image road network extraction method based on multi-scale feature fusion
CN113887470A (en) * 2021-10-15 2022-01-04 浙江大学 High-resolution remote sensing image ground object extraction method based on multitask attention mechanism
CN114387439B (en) * 2022-01-13 2023-09-12 中国电子科技集团公司第五十四研究所 Semantic segmentation network based on optical and PolSAR feature fusion
CN114387439A (en) * 2022-01-13 2022-04-22 中国电子科技集团公司第五十四研究所 Semantic segmentation network based on fusion of optical and PolSAR (polar synthetic Aperture Radar) features
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and equipment for estimating optical flow
CN115546649A (en) * 2022-10-24 2022-12-30 中国矿业大学(北京) Single-view remote sensing image height estimation and semantic segmentation multi-task prediction method
CN115546649B (en) * 2022-10-24 2023-04-18 中国矿业大学(北京) Single-view remote sensing image height estimation and semantic segmentation multi-task prediction method
CN115424023A (en) * 2022-11-07 2022-12-02 北京精诊医疗科技有限公司 Self-attention mechanism module for enhancing small target segmentation performance
CN116452936A (en) * 2023-04-22 2023-07-18 安徽大学 Rotation target detection method integrating optics and SAR image multi-mode information
CN116452936B (en) * 2023-04-22 2023-09-29 安徽大学 Rotation target detection method integrating optics and SAR image multi-mode information
CN116188497A (en) * 2023-04-27 2023-05-30 成都国星宇航科技股份有限公司 Method, device, equipment and storage medium for optimizing generation of DSM (digital image model) of stereo remote sensing image pair
CN116188497B (en) * 2023-04-27 2023-07-07 成都国星宇航科技股份有限公司 Method, device, equipment and storage medium for optimizing generation of DSM (digital image model) of stereo remote sensing image pair
CN116307267B (en) * 2023-05-15 2023-07-25 成都信息工程大学 Rainfall prediction method based on convolution
CN116307267A (en) * 2023-05-15 2023-06-23 成都信息工程大学 Rainfall prediction method based on convolution
CN116363134B (en) * 2023-06-01 2023-09-05 深圳海清智元科技股份有限公司 Method and device for identifying and dividing coal and gangue and electronic equipment
CN116363134A (en) * 2023-06-01 2023-06-30 深圳海清智元科技股份有限公司 Method and device for identifying and dividing coal and gangue and electronic equipment
CN116740362B (en) * 2023-08-14 2023-11-21 南京信息工程大学 Attention-based lightweight asymmetric scene semantic segmentation method and system
CN116740362A (en) * 2023-08-14 2023-09-12 南京信息工程大学 Attention-based lightweight asymmetric scene semantic segmentation method and system
CN117274608A (en) * 2023-11-23 2023-12-22 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN117274608B (en) * 2023-11-23 2024-02-06 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN117635953A (en) * 2024-01-26 2024-03-01 泉州装备制造研究所 Multi-mode unmanned aerial vehicle aerial photography-based real-time semantic segmentation method for power system
CN117635953B (en) * 2024-01-26 2024-04-26 泉州装备制造研究所 Multi-mode unmanned aerial vehicle aerial photography-based real-time semantic segmentation method for power system

Also Published As

Publication number Publication date
CN113283435B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN113283435B (en) Remote sensing image semantic segmentation method based on multi-scale attention fusion
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN111461039B (en) Landmark identification method based on multi-scale feature fusion
CN114187520B (en) Building extraction model construction and application method
CN114266794B (en) Pathological section image cancer region segmentation system based on full convolution neural network
CN114638994B (en) Multi-modal image classification system and method based on attention multi-interaction network
CN115565071A (en) Hyperspectral image transform network training and classifying method
CN111696136A (en) Target tracking method based on coding and decoding structure
Jiang et al. Forest-CD: Forest change detection network based on VHR images
CN114155371A (en) Semantic segmentation method based on channel attention and pyramid convolution fusion
Gao A method for face image inpainting based on generative adversarial networks
Li et al. Maskformer with improved encoder-decoder module for semantic segmentation of fine-resolution remote sensing images
CN117115641B (en) Building information extraction method and device, electronic equipment and storage medium
CN116894820B (en) Pigment skin disease classification detection method, device, equipment and storage medium
CN116543165B (en) Remote sensing image fruit tree segmentation method based on dual-channel composite depth network
Ma et al. MSFNET: multi-stage fusion network for semantic segmentation of fine-resolution remote sensing data
CN116977747A (en) Small sample hyperspectral classification method based on multipath multi-scale feature twin network
CN111274936A (en) Multispectral image ground object classification method, system, medium and terminal
CN113554655B (en) Optical remote sensing image segmentation method and device based on multi-feature enhancement
CN114331894A (en) Face image restoration method based on potential feature reconstruction and mask perception
CN110991617B (en) Construction method of kaleidoscope convolution network
CN112329647A (en) Land use type identification method based on U-Net neural network
CN114998363B (en) High-resolution remote sensing image progressive segmentation method
CN116778294B (en) Remote sensing change detection method for contexts in combined image and between images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant