CN117611817A - Remote sensing image semantic segmentation method and system based on stacked depth residual error network - Google Patents

Remote sensing image semantic segmentation method and system based on stacked depth residual error network Download PDF

Info

Publication number
CN117611817A
CN117611817A CN202311609856.4A CN202311609856A CN117611817A CN 117611817 A CN117611817 A CN 117611817A CN 202311609856 A CN202311609856 A CN 202311609856A CN 117611817 A CN117611817 A CN 117611817A
Authority
CN
China
Prior art keywords
network
remote sensing
sensing image
semantic segmentation
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311609856.4A
Other languages
Chinese (zh)
Inventor
陈一平
谢相依
李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202311609856.4A priority Critical patent/CN117611817A/en
Publication of CN117611817A publication Critical patent/CN117611817A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Remote Sensing (AREA)
  • Databases & Information Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of computer vision and remote sensing image processing, in particular to a remote sensing image semantic segmentation method and a remote sensing image semantic segmentation system based on a stacked depth residual error network, wherein the method comprises the following steps: constructing a stacked depth residual error network, and extracting depth characteristics of the remote sensing image; scaling the depth of the stacked depth residual network using residual learning; utilizing the expansion residual block to aggregate multi-scale context characteristics; performing supervised learning on the stacked depth residual error network by using the intermediate loss; and performing semantic segmentation on the remote sensing image data by using the stack depth residual error network after supervised learning. The invention adopts a stacked depth residual error network with higher calculation efficiency to improve the network model so as to solve the problem of image land coverage classification, and semantic features are extracted from different layers of a network backbone network so as to improve semantic segmentation performance.

Description

Remote sensing image semantic segmentation method and system based on stacked depth residual error network
Technical Field
The invention relates to the field of computer vision and remote sensing image processing, in particular to a remote sensing image semantic segmentation method and a remote sensing image semantic segmentation system based on a stacked depth residual error network.
Background
The land cover image of the land surface generated by semantic segmentation of the high-resolution remote sensing image provides key decision support and technical support for urban planning, resource management and establishment of social development policies. However, high inter-class similarities and high intra-class differences are prevalent in terrain, which complicates classification tasks. In addition, the high-resolution remote sensing image has the phenomenon that a large object shields a smaller object, and the shielding of the object can bring great challenges to accurate semantic segmentation, so that the object in a specific category is difficult to distinguish.
Traditional prior knowledge-based methods have difficulty in distinguishing complex features in high-resolution remote sensing images. Most shallow classifiers do not fully utilize the rich context information in high resolution images, and are inefficient, time consuming, and dependent on a priori knowledge when performing classification tasks. Some shallow classifiers achieve good results in certain situations, but may not perform in other situations.
The Convolutional Neural Networks (CNNs) have complex and deep network structures, and can accurately segment the remote sensing image data with abundant information and multiple heterogeneous sources. The CNNs use the convolution layer, the sub-sampling layer and the activation function which are alternately connected to perform layering and automatic learning on complex image features, have strong learning capacity in the aspect of identifying complex relations in high-resolution spatial data, and can efficiently identify and analyze fine-grained images. Furthermore, CNNs can leverage existing computing resources to speed up computing in parallel or in a distributed manner with G PU and distributed computing.
Deep convolutional neural networks (Deep convolutionalneuralnetworks, DCCNs) are higher versions of CN Ns, with deeper layers, intended to learn high semantic representations from large amounts of data. The superior performance of DCCNs in image related tasks (such as object recognition, object detection and semantic segmentation) and the ability to process complex remote sensing big data make them unpredictably more popular than shallow models in remote sensing image analysis tasks.
In most CN Ns structures, the sub-sampling layer gradually reduces the spatial detail of the image through a pooling operation, thereby capturing a larger field of view, reducing the number of learnable parameters. However, multiple downsampling operations of cn s may cause the details of the image features to be greatly reduced, resulting in a rough feature image. The nonlinear complex relation between the processing data of the deeper network is very effective, more space detail information which is necessary for accurate semantic segmentation is reserved, and the method is widely used for extracting the detailed structural features of the images in the high-resolution images. Furthermore, residual learning is typically used to solve the gradient vanishing problem in DCNs. The students put forward the precision of the semantic segmentation task by using a full convolutional neural network, seg Net and the like, but how to acquire enough context information and fully utilize space details is still a difficulty in semantic segmentation of high-resolution images, so that a robust remote sensing data semantic segmentation network model is needed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a remote sensing image semantic segmentation method and a remote sensing image semantic segmentation system based on a stacked depth residual network, which adopt a framework with higher calculation efficiency, namely a stacked depth residual network (stacked deep residual network, SDRNet), improve a network model so as to solve the problem of image land coverage classification, and extract semantic features from different layers of a network backbone network so as to improve semantic segmentation performance.
The segmentation method specifically adopts the following technical scheme: a remote sensing image semantic segmentation method based on a stacked depth residual error network comprises the following steps:
s1, constructing a stacked depth residual error network, and extracting depth features of a remote sensing image;
s2, zooming the depth of the stacked depth residual error network by using residual error learning;
s3, utilizing the expansion residual block to aggregate multi-scale context characteristics;
s4, performing supervised learning on the stacked depth residual error network by using the intermediate loss;
s5, semantic segmentation is carried out on the remote sensing image data by using the stacking depth residual error network after supervised learning.
Preferably, the stacked depth residual network of step S1 comprises a backbone network and two hierarchically connected first and second sub-networks, each sub-network comprising an encoder, an extraction unit and a decoder, wherein the extraction unit comprises an expansion residual module and an attention module; the output of the decoder of the first subnetwork is directly transmitted to the encoder of the second subnetwork.
Further preferably, the encoder of the first sub-network is constructed by adopting five structural blocks, and the high-level image semantics are improved by the decoder of the first sub-network to realize the up-sampling task, and the image size is up-sampled to the initial image size;
performing feature encoding on the input image by using an encoder of the first sub-network and an encoder of the second sub-network;
an encoder and a decoder of a second sub-network are constructed by adopting four structural blocks, and the decoder of the second sub-network upsamples the image size to the initial image size;
constructing a jump connection from an encoder of the first sub-network to a decoder of the second sub-network, connecting a feature map from the encoder to the output feature map; after each jump connection, two additional convolution operations are performed;
a self-attention mechanism is constructed using the inputs of layers 3, 4, 5 of the encoder of the first sub-network and the inputs of layers 2, 3, 4 of the encoder of the second sub-network to form an attention module to extract multi-level features.
Preferably, step S3 includes:
integrating the multi-scale context information using an expansion residual module;
the global receptive field is obtained using progressive dilation rates in different layers of the dilation residual module.
The invention discloses a remote sensing image semantic segmentation system based on a stacked depth residual error network, which concretely adopts the following technical scheme that the system comprises the following modules:
the feature extraction module is used for constructing a stacked depth residual error network and extracting depth features of the remote sensing image;
a depth scaling module that scales a depth of the stacked depth residual network using residual learning;
the feature aggregation module is used for aggregating the multi-scale context features by using the expanded residual block;
the supervised learning module performs supervised learning on the stacked depth residual error network by using the intermediate loss;
and the semantic segmentation module is used for carrying out semantic segmentation on the remote sensing image data by using the stack depth residual error network after supervised learning.
After the technical scheme is adopted, compared with the prior art, the invention has the following advantages:
1. the invention provides a new stacked depth residual error network to solve the challenging task of semantic segmentation of high-resolution images. The network architecture employs stacked encoder-decoder subnetworks to advance a multi-level feature learning and attention mechanism to refine basic learnable features.
2. The residual blocks expanded in the stacked depth residual network expand the receptive field, and the context features are further enriched by utilizing multi-scale reasoning, so that the accuracy of semantic segmentation is improved.
3. A large number of experiments show that the lightweight residual error network framework provided by the invention has good performance in semantic segmentation tasks and is superior to the prior art.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a schematic diagram of an SDRNet framework according to the present invention;
fig. 3 is a block diagram of a DRB according to the present invention;
fig. 4 is a block diagram of a core of the DRB proposed by the present invention;
FIG. 5 is a graph showing the segmentation results on a Vaihingen dataset according to the present invention; wherein (a) is an input image, (b) is ground truth, and (c) is a prediction result;
FIG. 6 is a graph of the segmentation results on a Potsdam dataset of the present invention, where (a) is the input image, (b) is the ground truth, and (c) is the predicted result.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The implementation flow of the remote sensing image semantic segmentation method based on the stacked depth residual error network in this embodiment can be seen in fig. 1, and specifically includes the following steps:
s1, constructing a stacked depth residual error network, and extracting depth features of the remote sensing image.
The remote sensing image is high-resolution remote sensing image data, and the Vaihingen data set and the Potsdam data set of the ISPRS are used for training, verifying and testing the model. The two data sets are divided into six common categories of land cover, watertight surface (white), building (blue), low vegetation cover (cyan), tree (green), car (yellow) and graff (red). The Potsdam dataset contained 38 images of 6000 x 6000 pixels in size with a spatial resolution of 5cm, and an RGB image of 14 of these images was experimentally selected as the test image. The Vaihingen dataset contained 33 2494 x 2064 pixels of image with a spatial resolution of 9cm, and the experiment used three bands of near infrared, red and green.
The specific steps of depth feature extraction in the step comprise the following steps:
s11, a stacked depth residual network adopted by the embodiment is also called a stacked encoder-decoder network, and comprises a main network and two sub-networks connected in a layered manner, wherein the main network uses a ResNet50 model. Each sub-network comprises an encoder, an extraction unit comprising an expansion residual module (dilated residual blocks, DRB) and an attention module, and a decoder.
In stacked depth residual networks, two encoders help generate robust features from an input image, and two decoders enable reconstruction of spatial detail. Furthermore, the output of the decoder 1 of the first sub-network is directly transmitted to the encoder 2 of the second sub-network, which reduces the feature loss. The SDRNet framework in this embodiment is shown in FIG. 2.
S12, a pre-trained ResNet50 network is used on the encoder 1 of the first sub-network, so that basic feature learning capacity of the network is enriched, and requirements of the network on massive training labels are reduced.
Since the scene is different from the normal image used in the original training dataset, this embodiment redesigns the encoder 2 of the second sub-network to activate deeper layers, effectively learning more advanced functions.
In the stacked depth residual network of this embodiment, symmetrical encoders and decoders are stacked to form a spatial reconstruction sub-network, and abundant spatial details in the image are encoded to generate a new feature map. The encoder 1 of the first subnetwork uses five building blocks, gradually decreasing the image dimension as the image channel characteristics increase; the image size is up-sampled to the original image size by the decoder 1 of the symmetric first sub-network enhancing the advanced image semantics to achieve the up-sampling task. A 2 x 2 bilinear upsampling operation is performed in the decoder.
S13, performing feature coding on the input image by using the encoder 1 of the first sub-network and the encoder 2 of the second sub-network, performing convolution operation with the convolution kernel size of 3 multiplied by 3 on each convolution block in each encoder, and performing batch normalization to reduce internal covariate offset, accelerate network training and enhance the stability of a model.
S14, applying a ReLU activation function, enhancing nonlinearity of the model, performing maximum pooling operation to reduce the size of the feature map, reducing the parameter number and the calculated amount of the model, and simultaneously retaining the most important feature information.
S15, adopting four structural blocks to construct the encoder 2 and the decoder 2 of the second sub-network so as to reduce the number of the learnable features and network complexity. Since each block in the decoder performs a two-by-two linear upsampling operation on the feature input, the dimension of the input feature map is doubled. The decoder 2 at the second sub-network upsamples the picture size to the original picture size.
S16, constructing jump connection from the encoder 1 of the first sub-network to the decoder 2 of the second sub-network so as to maintain high spatial resolution and improve the overall quality of the output characteristic diagram. After each jump connection, two additional 3 x 3 convolution operations are performed, each followed by a batch normalization step and a ReLU activation function; finally, mapping is performed through a softmax function.
Thus, both decoders upsample the image size to the original image size before connecting to the multi-class softmax function.
The present embodiment connects the feature map from the encoder with the output feature map through a jump connection. There is a jump connection from the encoder 1 of the first sub-network to the decoder 1 of the first sub-network and from the encoder 1 of the first sub-network and the encoder 2 of the second sub-network to the decoder 2 of the second sub-network in order to maintain a higher spatial resolution and to improve the overall quality of the output profile.
S17, constructing a self-attention mechanism by adopting the input of layers 3, 4 and 5 of the encoder 1 of the first sub-network and the input of layers 2, 3 and 4 of the encoder 2 of the second sub-network to form an attention module so as to extract multi-stage characteristics.
Since the high resolution telemetry image contains redundant and unwanted features that are independent of the output class, the attention mechanism limits the irrelevant areas in the input image to highlight certain specific classes of features. Inspired by the human visual system, the self-attention mechanism improves the network to focus on specific important areas, significantly reducing the cost of learning features from unrelated areas and redundant data. Furthermore, since it is virtually impossible to activate all the learnable parameters in a deep convolutional neural network, the network is restricted to features and only features that are relevant to a particular class are utilized.
This embodiment avoids the use of layer 1 (i.e., the initial layer) of the encoder, which is not used because the initial layer learns features that are simple, basic, and does not provide enough information to complete the complex segmentation task.
The present embodiment defines the self-attention mechanism as:
M s (F)=σ(F a×a ([AvrPool(F);MaxPool(F)]))
wherein σ represents a sigmoid function, F a×a Representing a convolution operation with a filter size of a×a, avrPool (F) represents an average pooling operation, and MaxPool (F) represents a maximum pooling operation.
S2, zooming the depth of the stacked depth residual error network by using residual error learning.
Residual learning strategies are used to solve the gradient vanishing problem, and stacked convolution blocks are replaced with identity mapping to construct a deep network that is not affected by the gradient vanishing problem. By jumping the connection, the network gradient transfer may not pass through a nonlinear activation function, thereby mitigating gradient explosion or extinction. In addition, the jump connection improves the gradient flow in the back propagation process, namely improves the backward gradient flow and accelerates the convergence of the deep network.
Accordingly, by decomposing the image into an encoding stage and a decoding stage and gradually adding more detailed information in the decoding stage, the original image can be reconstructed better.
Defining a residual function as:
y=F(x,W i )+x
where x represents the input, y represents the output feature map, F (x, W i ) Representing the residual join function, W i Representing the weight parameter.
S3, aggregation of multi-scale context features is carried out by using the expanded residual block DRB. The method comprises the following specific steps:
s31, integrating multi-scale context information by using an expansion residual error module.
To obtain more context feature information, multi-scale context information aggregation is achieved without sacrificing image resolution, and dilation convolution uses larger sparse kernels in the pooling layer and the convolution layer.
In this embodiment, the expansion residual module is provided based on a two-dimensional expansion convolution operation, where the two-dimensional expansion convolution operation is defined as:
where y (m, n) is the m-and n-dimensional output of the dilation convolution, x (m, n) is the m-and n-dimensional input of the dilation convolution, w (i, j) is a filter of size i x j of the dilation convolution, and the parameter r represents the expansion ratio.
When the expansion rate is equal to 1, the expansion convolution function is normal; when the expansion ratio is less than 1, detailed segmentation of the feature map is represented, which requires more training time; when the expansion ratio is greater than 1, the receptive field increases without increasing the number of parameters or the computational requirements. The receptive field range can be adjusted by different expansion rates.
S32, acquiring a global receptive field by using progressive expansion rates in different layers of the expansion residual error module.
Based on the concept of dilation convolution, a dilation residual block is used as a dilation residual module that uses progressive dilation rates in different layers to achieve a complete receptive field without loss of resolution or coverage.
The embodiment adopts the expansion kernels with progressive expansion rate and jump connection to enhance the information flow between the connection layers, and can cover the response of all areas in the image by ensuring that the obtained expansion convolution kernels can not increase the number of convolution kernel parameters, thereby reducing the phenomena of grid effect and information continuity loss.
The meshing problem is the problem of loss of information continuity due to not all pixels getting kernel responses through a "checkerboard effect". Aiming at the problems that a large expansion rate can cause grid effect and inhibit the performance of an expansion kernel, a DRB block (expansion residual block) uses an expansion kernel with a progressive expansion rate to complete a pixel-level dense prediction task. The layers are widely connected gradually through jump connection, information flow between the connected layers is enhanced, and information is efficiently transferred. By ensuring that the generated expanded kernel can obtain responses from all areas of the image without increasing the number of kernel parameters, a significant increase in the effect of image processing is achieved.
The present embodiment defines the convolution kernel of the extended residual block DRB as:
M i =max[M i+1 -2r i ,M i+1 -2(M i+1 -r i ),r i ]
wherein r is i Is the expansion ratio of the ith layer, M i The maximum generation rate of the i-th layer is set, and the total layer number is n. In this embodiment, the structure of the extended residual block is shown in fig. 3, and the core of the extended residual block is shown in fig. 4.
And S4, performing supervised learning on the stacked depth residual error network by using the intermediate loss.
In the embodiment, intermediate loss is introduced at the end of the sub-network so as to overcome the gradient attenuation problem and improve the effective learning from deep layer to shallow layer in the deep layer network; the optimization of the middle layer is improved and optimized through the performance difference between the calculation model and the real situation, so as to improve gradient flow and enhance learning in the back propagation process. The method comprises the following specific steps:
s41, respectively placing loss functions at the two sub-network ends, and defining the loss functions as:
where N is the number of categories, W j Representing the weight of category j, p j And p k Representing the predicted value and the true value of category j, respectively.
S42, defining the total loss function as:
L T =(α×MainLoss)+(β×InterL1)
where α and β are the respective weights in the network, interL 1 is the loss value of the output layer, mainLoss is the loss value at the end of the first subnetwork (weights scaled 1:1).
S5, semantic segmentation is carried out on the remote sensing image data by using the stacking depth residual error network after supervised learning.
In this embodiment, the segmentation results of the Vaihingen dataset and the watsdam dataset are shown in fig. 5 and 6.
Example 2
The embodiment and embodiment 1 are based on the same inventive concept, and provide a remote sensing image semantic segmentation system based on a stacked depth residual error network, which comprises the following modules:
the feature extraction module is used for constructing a stacked depth residual error network and extracting depth features of the remote sensing image;
a depth scaling module that scales a depth of the stacked depth residual network using residual learning;
the feature aggregation module is used for aggregating the multi-scale context features by using the expanded residual block;
the supervised learning module performs supervised learning on the stacked depth residual error network by using the intermediate loss;
and the semantic segmentation module is used for carrying out semantic segmentation on the remote sensing image data by using the stack depth residual error network after supervised learning. The modules of this embodiment are used to implement the steps of embodiment 1, respectively, for detailed implementation procedures of embodiment 1.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (10)

1. The remote sensing image semantic segmentation method based on the stacked depth residual error network is characterized by comprising the following steps of:
s1, constructing a stacked depth residual error network, and extracting depth features of a remote sensing image;
s2, zooming the depth of the stacked depth residual error network by using residual error learning;
s3, utilizing the expansion residual block to aggregate multi-scale context characteristics;
s4, performing supervised learning on the stacked depth residual error network by using the intermediate loss;
s5, semantic segmentation is carried out on the remote sensing image data by using the stacking depth residual error network after supervised learning.
2. The remote sensing image semantic segmentation method according to claim 1, wherein the stacked depth residual network of step S1 comprises a backbone network and two hierarchically connected first and second sub-networks, each sub-network comprising an encoder, an extraction unit and a decoder, wherein the extraction unit comprises an expansion residual module and an attention module; the output of the decoder of the first subnetwork is directly transmitted to the encoder of the second subnetwork.
3. The remote sensing image semantic segmentation method according to claim 2, wherein five structural blocks are adopted to construct an encoder of a first sub-network, and the decoder of the first sub-network is used for improving the high-level image semantic to realize an up-sampling task, so that the image size is up-sampled to the initial image size;
performing feature encoding on the input image by using an encoder of the first sub-network and an encoder of the second sub-network;
an encoder and a decoder of a second sub-network are constructed by adopting four structural blocks, and the decoder of the second sub-network upsamples the image size to the initial image size;
constructing a jump connection from an encoder of the first sub-network to a decoder of the second sub-network, connecting a feature map from the encoder to the output feature map; after each jump connection, two additional convolution operations are performed;
a self-attention mechanism is constructed using the inputs of layers 3, 4, 5 of the encoder of the first sub-network and the inputs of layers 2, 3, 4 of the encoder of the second sub-network to form an attention module to extract multi-level features.
4. A method of semantic segmentation of a remote sensing image according to claim 3, characterized in that the self-attention mechanism is defined as:
M s (F)=σ(F a×a ([AvrPool(F);MaxPool(F)]))
wherein σ represents a sigmoid function, F a×a Representing a convolution operation with a filter size of a×a, avrPool (F) represents an average pooling operation, and MaxPool (F) represents a maximum pooling operation.
5. The method of claim 1, wherein the stacked convolution blocks are replaced by identity mapping in step S2 to construct a deep network that is not affected by the problem of gradient extinction.
6. The method of claim 1, wherein in step S2, the residual function is defined as:
y=F(x,W i )+x
where x represents the input, y represents the output feature map, F (x, W i ) Representing the residual join function, W i Representing the weight parameter.
7. The method of claim 1, wherein step S3 includes:
integrating the multi-scale context information using an expansion residual module;
the global receptive field is obtained using progressive dilation rates in different layers of the dilation residual module.
8. The remote sensing image semantic segmentation method according to claim 7, wherein the expansion residual module is based on a two-dimensional expansion convolution operation, and the two-dimensional expansion convolution operation is defined as:
where y (m, n) is the m-and n-dimensional output of the dilation convolution, x (m, n) is the m-and n-dimensional input of the dilation convolution, w (i, j) is a filter of size i x j of the dilation convolution, and the parameter r represents the expansion ratio.
9. The remote sensing image semantic segmentation method according to claim 7, wherein a dilation residual block is adopted as a dilation residual module, and a progressive dilation rate is used in different layers to realize a complete receptive field;
the convolution kernel of the dilated residual block DRB is defined as:
M i =max[M i+1 -2r i ,M i+1 -2(M i+1 -r i ),r i ]
wherein r is i Is the expansion ratio of the ith layer, M i The maximum generation rate of the i-th layer is set, and the total layer number is n.
10. The remote sensing image semantic segmentation system based on the stacked depth residual error network is characterized by comprising the following modules:
the feature extraction module is used for constructing a stacked depth residual error network and extracting depth features of the remote sensing image;
a depth scaling module that scales a depth of the stacked depth residual network using residual learning;
the feature aggregation module is used for aggregating the multi-scale context features by using the expanded residual block;
the supervised learning module performs supervised learning on the stacked depth residual error network by using the intermediate loss;
and the semantic segmentation module is used for carrying out semantic segmentation on the remote sensing image data by using the stack depth residual error network after supervised learning.
CN202311609856.4A 2023-11-29 2023-11-29 Remote sensing image semantic segmentation method and system based on stacked depth residual error network Pending CN117611817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311609856.4A CN117611817A (en) 2023-11-29 2023-11-29 Remote sensing image semantic segmentation method and system based on stacked depth residual error network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311609856.4A CN117611817A (en) 2023-11-29 2023-11-29 Remote sensing image semantic segmentation method and system based on stacked depth residual error network

Publications (1)

Publication Number Publication Date
CN117611817A true CN117611817A (en) 2024-02-27

Family

ID=89945911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311609856.4A Pending CN117611817A (en) 2023-11-29 2023-11-29 Remote sensing image semantic segmentation method and system based on stacked depth residual error network

Country Status (1)

Country Link
CN (1) CN117611817A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200167930A1 (en) * 2017-06-16 2020-05-28 Ucl Business Ltd A System and Computer-Implemented Method for Segmenting an Image
CN111797703A (en) * 2020-06-11 2020-10-20 武汉大学 Multi-source remote sensing image classification method based on robust deep semantic segmentation network
WO2021041772A1 (en) * 2019-08-30 2021-03-04 The Research Foundation For The State University Of New York Dilated convolutional neural network system and method for positron emission tomography (pet) image denoising
CN115423828A (en) * 2022-06-04 2022-12-02 哈尔滨理工大学 Retina blood vessel image segmentation method based on MRNet
WO2023077816A1 (en) * 2021-11-03 2023-05-11 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium
CN116630824A (en) * 2023-06-06 2023-08-22 北京星视域科技有限公司 Satellite remote sensing image boundary perception semantic segmentation model oriented to power inspection mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200167930A1 (en) * 2017-06-16 2020-05-28 Ucl Business Ltd A System and Computer-Implemented Method for Segmenting an Image
WO2021041772A1 (en) * 2019-08-30 2021-03-04 The Research Foundation For The State University Of New York Dilated convolutional neural network system and method for positron emission tomography (pet) image denoising
CN111797703A (en) * 2020-06-11 2020-10-20 武汉大学 Multi-source remote sensing image classification method based on robust deep semantic segmentation network
WO2023077816A1 (en) * 2021-11-03 2023-05-11 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium
CN115423828A (en) * 2022-06-04 2022-12-02 哈尔滨理工大学 Retina blood vessel image segmentation method based on MRNet
CN116630824A (en) * 2023-06-06 2023-08-22 北京星视域科技有限公司 Satellite remote sensing image boundary perception semantic segmentation model oriented to power inspection mechanism

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MD. RAYHAN AHMED 等: "DoubleU-NetPlus: A Novel Attention and Context Guided Dual U-Net with Multi-Scale Residual Feature Fusion Network for Semantic Segmentation of Medical Images", 《NEURAL COMPUTING AND APPLICATIONS》, vol. 35, 31 December 2022 (2022-12-31), pages 14379 - 14401 *
佟畅: "基于生成对抗网络的图片阴影去除研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2022, no. 3, 15 March 2022 (2022-03-15), pages 28 - 29 *
侯幸幸: "基于DoubleU-Net的中分辨率遥感影像高原湖泊提取研究", 《中国优秀硕士学位论文全文数据库 基础科学辑》, vol. 2022, no. 11, 15 November 2022 (2022-11-15), pages 51 - 57 *
吕宗旺 等: "基于改进 U-Net 的不同容重小麦籽粒识别检测", 《河南农业科学》, 20 September 2023 (2023-09-20), pages 6 *

Similar Documents

Publication Publication Date Title
CN112347859B (en) Method for detecting significance target of optical remote sensing image
CN111914907B (en) Hyperspectral image classification method based on deep learning space-spectrum combined network
CN108171701B (en) Significance detection method based on U network and counterstudy
CN110210539B (en) RGB-T image saliency target detection method based on multi-level depth feature fusion
CN113240580A (en) Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN115797931A (en) Remote sensing image semantic segmentation method based on double-branch feature fusion
CN111242238B (en) RGB-D image saliency target acquisition method
CN112581409B (en) Image defogging method based on end-to-end multiple information distillation network
CN113076957A (en) RGB-D image saliency target detection method based on cross-modal feature fusion
CN110738663A (en) Double-domain adaptive module pyramid network and unsupervised domain adaptive image segmentation method
CN114359293A (en) Three-dimensional MRI brain tumor segmentation method based on deep learning
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN116912708A (en) Remote sensing image building extraction method based on deep learning
CN115601236A (en) Remote sensing image super-resolution reconstruction method based on characteristic information distillation network
CN117058160A (en) Three-dimensional medical image segmentation method and system based on self-adaptive feature fusion network
CN111222453A (en) Remote sensing image change detection method based on dense connection and geometric structure constraint
CN117809200A (en) Multi-scale remote sensing image target detection method based on enhanced small target feature extraction
CN117314808A (en) Infrared and visible light image fusion method combining transducer and CNN (carbon fiber network) double encoders
CN117191268A (en) Oil and gas pipeline leakage signal detection method and system based on multi-mode data
CN116778165A (en) Remote sensing image disaster detection method based on multi-scale self-adaptive semantic segmentation
CN117058367A (en) Semantic segmentation method and device for high-resolution remote sensing image building
CN116168418A (en) Multi-mode target perception and re-identification method for image
CN115797181A (en) Image super-resolution reconstruction method for mine fuzzy environment
CN116071645A (en) High-resolution remote sensing image building change detection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination