CN116524189A - High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization - Google Patents

High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization Download PDF

Info

Publication number
CN116524189A
CN116524189A CN202310496605.3A CN202310496605A CN116524189A CN 116524189 A CN116524189 A CN 116524189A CN 202310496605 A CN202310496605 A CN 202310496605A CN 116524189 A CN116524189 A CN 116524189A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
decoding
coding
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310496605.3A
Other languages
Chinese (zh)
Inventor
于纯妍
李东霖
王玉磊
赵恩宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN202310496605.3A priority Critical patent/CN116524189A/en
Publication of CN116524189A publication Critical patent/CN116524189A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a high-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization, which comprises the following steps: acquiring and amplifying a remote sensing image set, normalizing the remote sensing image, constructing a remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization, training the segmentation model according to a training set, acquiring a prediction label of each pixel in the remote sensing image, calculating loss according to a truth label and the prediction label, judging whether a loss value meets a threshold value, updating parameters of the segmentation model if the loss value does not meet the threshold value, acquiring a trained segmentation model if the loss value does not meet the threshold value, acquiring a processed remote sensing image, inputting the processed remote sensing image into the trained segmentation model, and outputting a semantic segmentation result graph of the remote sensing image. The feature extraction and processing of the ground object edge information are improved, the recognition accuracy of small-size objects and complex boundary information in the remote sensing image is improved, and the accurate semantic segmentation of the remote sensing ground object edge is realized.

Description

High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a high-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization.
Background
The high-spatial resolution remote sensing image (high-resolution remote sensing image) is an important component of a modern remote sensing image, has the characteristics of high spatial resolution, high definition, high timeliness, large information quantity and the like, and can clearly and intuitively present rich ground feature detail information and the relationship between adjacent ground features. At present, semantic segmentation of images is a research hotspot in the visual direction of computers, and the task essence of the semantic segmentation is category identification of image areas, namely, category labels are assigned to each pixel in the images. The semantic segmentation of the high-resolution remote sensing image is used as an important component of the semantic segmentation direction, so that the earth surface features in the remote sensing image can be automatically extracted, and semantic categories are assigned to the ground object targets. The semantic segmentation of the high-resolution remote sensing image has wide application in the fields of disaster assessment and prediction, environmental protection, urban planning, traffic navigation, military safety and the like.
In recent years, deep learning, particularly deep convolutional neural network technology, has been rapidly developed and applied. The method has the advantages that the method shows striking feature extraction capability in tasks such as image classification, target detection, semantic segmentation and the like, can adaptively extract shallow features and deep features in images, and particularly has good understanding capability on complex scenes. Therefore, the application of the deep learning technology to semantic segmentation of the high-resolution remote sensing image has important practical significance, and a new development opportunity is brought to the processing of the remote sensing image. However, high resolution remote sensing images are typically composed of large and complex scenes and heterogeneous objects, and the problems of occlusion and shading caused by illumination conditions and imaging angles during image acquisition result in poor separation effects of the existing depth remote sensing segmentation model at the edges of objects. In addition, the small object has a higher proportion of edge pixels to the whole pixels of the object, and the division effect of the whole object is poor if the division is not ideal at the edge.
Disclosure of Invention
The invention provides a high-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization, which aims to overcome the technical problems.
A semantic segmentation method of high-resolution remote sensing images based on coding and decoding indexing edge characterization comprises the steps of,
step one, acquiring a remote sensing image set, amplifying the remote sensing image set, namely rotating the remote sensing image at any angle and storing the remote sensing image into the remote sensing image set, respectively carrying out normalization processing on the remote sensing image, dividing the remote sensing image set into a training set and a testing set,
step two, constructing a remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation, training the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation according to a training set, obtaining a prediction label of each pixel in the remote sensing image, calculating loss according to the truth label and the prediction label of each pixel, judging whether the value of the loss meets a threshold value, optimizing parameters of the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation according to the difference between the value of the loss and the threshold value if the value of the loss does not meet the threshold value, obtaining the trained remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation if the value of the loss meets the threshold value,
the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation comprises a multi-scale feature encoder, a separable pyramid unit, a coding and decoding indexing edge representation unit and an up-sampling decoder,
the multi-scale feature encoder is used for generating four initial feature matrixes according to the size h of the remote sensing image, the sizes of the four initial feature matrixes are h/2, h/4, h/8 and h/16 respectively,
the separable pyramid unit is used for obtaining four context feature matrixes according to four initial feature matrixes of the remote sensing image, the sizes of the four context feature matrixes are h/2, h/4, h/8 and h/16 respectively,
the coding and decoding indexing edge characterization unit is used for acquiring a first coding index and a first decoding index according to a context feature matrix with the size of h/2, acquiring a second coding index and a second decoding index according to a context feature matrix with the size of h/4, fusing the first coding index with the context feature matrix with the size of h/2, fusing the second coding index with the context feature matrix with the size of h/4, acquiring a fused context feature matrix with the size of h/2 and a fused context feature matrix with the size of h/4,
the up-sampling decoder is used for decoding and up-sampling the four context feature matrixes according to the order from small to large in size to obtain a semantic segmentation result graph of the remote sensing image, the semantic segmentation result graph comprises a prediction label of each pixel in the remote sensing image,
and thirdly, acquiring the processed remote sensing image, inputting the processed remote sensing image into a trained remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization, and outputting a semantic segmentation result graph of the remote sensing image.
Preferably, the up-sampling decoder is configured to decode and up-sample a context feature matrix with a size of h/16 to obtain an output feature matrix x with a size of h/8 d1 Will x d1 The new feature matrix x is spliced with the context feature matrix with the size of h/8 along the dimension 1 by using a torch.cat function m1
For x m1 Decoding up-sampling is carried out to obtain an output characteristic matrix x with the size of h/4 d2 Will x d2 The new feature matrix x is spliced with the context feature matrix with the size of h/4 along the dimension 1 by using a torch.cat function m2 For the second decoding index and the feature matrix x m2 After matrix multiplication operation, a characteristic matrix x is output n2
For x n2 Decoding and up-sampling to obtain an output characteristic matrix with the size of h/2, and then splicing the output characteristic matrix with the size of h/2 with a context characteristic matrix along the dimension 1 by using a torch.cat function to form a new characteristic matrix x m3 For the first decoding index and the feature matrix x m3 Performing matrix product operation to output a feature matrix x n3
For x n3 Decoding and up-sampling are carried out to obtain a feature matrix x with the size of h d4 For x d4 And after one convolution, inputting the result into a softmax activation function to obtain a semantic segmentation result graph.
Preferably, said calculating the penalty based on the true label and the predicted label for each pixel comprises calculating the penalty based on equation (1),
Loss focal =-(1-p t ) γ log(p t ) (1)
wherein p is t Is the prediction probability of the truth value label, the prediction probability is obtained according to the truth value label and the prediction label, gamma is the super parameter, and Loss focal Representing the focal point loss function.
Preferably, the multi-scale context feature encoder comprises a spatial feature extraction branch for extracting local feature information of the remote sensing image, a self-attention feature extraction branch for extracting global feature information of the remote sensing image, and a fusion branch for fusing the local feature information and the global feature information according to formula (2),
x=concatnate(Conv2d(x ci ),Conv2d(x si ))
y=sigmoid(Conv2d(ReLU(Conv2d(AdaptiveAvgPool2d(x)))))
x fi =x×reshape(y) (2)
wherein x is si Representing the ith stage feature matrix, x of the self-attention feature extraction branch ci Representing the ith stage feature matrix, x of the spatial feature extraction branch fi Representing the fused features, conv2d (x) denotes a 2d convolution, adaptive avgpool2d (x) denotes an adaptive pooling function, sigmoid (x) denotes a sigmoid activation function, reLU (x) denotes a ReLU activation function, concate (x) denotes a concatenation of two matrices along dimension 1, and reshape (x) denotes a shape change function.
Preferably, the obtaining the first encoding index and the first decoding index according to the context feature matrix with the size of h/2 comprises,
s11, expressing a context characteristic matrix with the size of h/2 as x i Obtaining x i The shape parameters include batch size value batch size, channel number c, height h, width w,
s12, x i Obtaining x from input Conv2d function i1 Will x i1 Inputting into a BatchNorm2d function to obtain x i2 Will x i2 Inputting into a BatchNorm2d function to obtain x i3 Will x i3 Inputting into a BatchNorm2d function to obtain x i4
S13, pair x i1 、x i2 、x i3 、x i4 Respectively carrying out maximum pooling operation to obtain four initial indexes x 1 ,x 2 ,x 3 ,x 4
S14, the initial index x is obtained through a torch.cat function 1 ,x 2 ,x 3 ,x 4 Splicing the two matrixes into a new matrix along the dimension 1, and transmitting the new matrix to a sigmoid activation function to obtain an initial decoding index y; s15, the initial decoding index y is subjected to softmax function to obtain an initial coding index z, the view function is used for adjusting the shape parameters of the initial decoding index y and the initial coding index z, the shape parameters are adjusted to be batch size, c multiplied by 4, h/2 and w/2, the adjusted initial decoding index y and the initial coding index z are obtained,
s15, reorganizing the adjusted initial decoding index y and initial encoding index z into the size before adjustment by using a pixel_shuffle function to obtain a first encoding index and a first decoding index.
The invention provides a high-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization, which extracts multi-scale semantic features of images through a multi-scale feature encoder in a remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization, can capture spatial context information in parallel, strengthens the segmentation effect of remote sensing feature edge information through extracting coding and decoding indexes containing edge information, improves feature extraction and processing of the remote sensing feature edge information, improves recognition precision of small-size objects and complex boundary information in a remote sensing image, and realizes accurate semantic segmentation of the remote sensing feature edge.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it will be obvious that the drawings in the following description are some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic flow chart of a semantic segmentation model of a remote sensing image based on coding and decoding indexing edge characterization;
FIG. 3 is a schematic diagram of a multi-scale feature encoder of the present invention;
FIG. 4 is a schematic diagram of the structure of a separable pyramid unit of the present invention;
FIG. 5 is a schematic diagram of the structure of the generated index of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flowchart of the method of the present invention, as shown in FIG. 1, the method of the present embodiment may include:
step one, acquiring a remote sensing image set, amplifying the remote sensing image set, namely rotating the remote sensing image at any angle and storing the remote sensing image into the remote sensing image set, respectively carrying out normalization processing on the remote sensing image, dividing the remote sensing image set into a training set and a testing set,
step two, constructing a remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation, training the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation according to a training set, obtaining a prediction label of each pixel in the remote sensing image, calculating loss according to the truth label and the prediction label of each pixel, judging whether the value of the loss meets a threshold value, optimizing parameters of the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation according to the difference between the value of the loss and the threshold value if the value of the loss does not meet the threshold value, obtaining the trained remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation if the value of the loss meets the threshold value,
the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation comprises a multi-scale feature encoder, a separable pyramid unit, a coding and decoding indexing edge representation unit and an up-sampling decoder,
the multi-scale feature encoder is used for generating four initial feature matrixes according to the size h of the remote sensing image, the sizes of the four initial feature matrixes are h/2, h/4, h/8 and h/16 respectively,
the separable pyramid unit is used for obtaining four context feature matrixes according to four initial feature matrixes of the remote sensing image, the sizes of the four context feature matrixes are h/2, h/4, h/8 and h/16 respectively,
the coding and decoding indexing edge characterization unit is used for acquiring a first coding index and a first decoding index according to a context feature matrix with the size of h/2, acquiring a second coding index and a second decoding index according to a context feature matrix with the size of h/4, fusing the first coding index with the context feature matrix with the size of h/2, fusing the second coding index with the context feature matrix with the size of h/4, acquiring a fused context feature matrix with the size of h/2 and a fused context feature matrix with the size of h/4,
the up-sampling decoder is used for decoding and up-sampling the context feature matrix according to the order from small to large in size to obtain a semantic segmentation result graph of the remote sensing image, the semantic segmentation result graph comprises a prediction label of each pixel in the remote sensing image,
and thirdly, acquiring the processed remote sensing image, inputting the processed remote sensing image into a trained remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization, and outputting a semantic segmentation result graph of the remote sensing image.
Based on the scheme, the multi-scale semantic features of the remote sensing image are extracted through the multi-scale feature encoder in the remote sensing image semantic segmentation model based on the encoding and decoding indexing edge characterization, the separable pyramid units capture spatial context information in parallel, the segmentation effect of the remote sensing ground object edge information is enhanced through the encoding and decoding indexing which extracts the edge information, the feature extraction and processing of the remote sensing ground object edge information are improved, the recognition precision of small-size objects and complex boundary information in the remote sensing image is improved, and the accurate semantic segmentation of the remote sensing ground object edge is realized.
Step one, acquiring a remote sensing image set, amplifying the remote sensing image set, namely rotating the remote sensing image at any angle and storing the remote sensing image set into the remote sensing image set, wherein the data amplification is used for preventing the model from being overfitted and improving the robustness of the model, generally comprises the steps of carrying out operations such as vertical and horizontal overturning, rotating by 90 degrees and the like on the remote sensing image, respectively carrying out normalization processing on the remote sensing image, wherein the normalization is a data preprocessing operation, mapping the original remote sensing image data into a range of 0-1, dividing the remote sensing image set into a training set and a testing set,
step two, constructing a remote sensing image semantic segmentation model based on coding and decoding indexing edge representation, as shown in fig. 2, training the remote sensing image semantic segmentation model based on coding and decoding indexing edge representation according to a training set to obtain a prediction label of each pixel in the remote sensing image, calculating loss according to the truth label and the prediction label of each pixel, judging whether the value of the loss meets a threshold value, if not, updating parameters of the remote sensing image semantic segmentation model based on coding and decoding indexing edge representation according to the difference between the loss and the threshold value, if yes, obtaining the trained remote sensing image semantic segmentation model based on coding and decoding indexing edge representation, wherein calculating the loss according to the truth label and the prediction label of each pixel comprises calculating the loss according to a formula (1),
wherein p is t Is the prediction probability of the truth value label, the prediction probability is obtained according to the truth value label and the prediction label, gamma is the super parameter, and Loss focal Representing the focal point loss function.
The remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation comprises a multi-scale feature encoder, a separable pyramid unit, a coding and decoding indexing edge representation unit and an up-sampling decoder,
the multi-scale feature encoder is used for generating four initial feature matrixes according to the size h of the remote sensing image, the sizes of the four initial feature matrixes are h/2, h/4, h/8 and h/16 respectively,
the multi-scale context feature encoder comprises a spatial feature extraction branch, a self-attention feature extraction branch and a fusion branch, wherein the spatial feature extraction branch is used for extracting local feature information of a remote sensing image, the self-attention feature extraction branch is used for extracting global feature information of the remote sensing image, the fusion branch is used for fusing the local feature information and the global feature information according to a formula (2),
wherein x is si Representing the ith stage feature matrix, x of the self-attention feature extraction branch ci Representing the ith stage feature matrix, x of the spatial feature extraction branch fi Representation ofThe fused features Conv2d (x) represent 2d convolution, adaptive avgpool2d (x) represent adaptive pooling functions, sigmoid (x) represent sigmoid activation functions, reLU (x) represent ReLU activation functions, concate (x) represent stitching two matrices along dimension 1, and reshape (x) represent shape change functions.
Specifically, when the multi-scale context feature encoder is adopted to extract global and local feature information of the remote sensing image, the following modes are adopted:
the first five stages of the lightweight characteristic extraction model EfficientNet-B5 are used as spatial characteristic extraction branches to extract local characteristic information of remote sensing images, and EfficientNet-B5 is one of convolutional neural network structures and belongs to EfficientNet series models. The design goal of the EfficientNet series model is to provide better performance while keeping the computational cost low. EfficienientNet B-5 is the fifth model in the EfficienientNet series;
the Swin transducer is used as a self-attention feature extraction branch to extract global feature information of the remote sensing image, is a novel neural network architecture, and adopts a transducer-based method to solve the computer vision task. The innovation is that a method called "sliding window" (Shifted Windows) is used to process the image, which enables it to more efficiently perform calculations when processing large-scale image data;
the separable pyramid unit is used for acquiring four context feature matrixes according to four initial feature matrixes of a remote sensing image, the sizes of the four context feature matrixes are h/2, h/4, h/8 and h/16 respectively, the separable pyramid unit is shown in a structural diagram 4, the separable pyramid unit has the functions of capturing the context feature matrixes with the same size and the same expansion rate by using separable expansion convolution with different expansion rates, and the expansion rates are 0, 1, 6 and 12;
the coding and decoding indexing edge characterization unit is configured to obtain a first coding index and a first decoding index according to a context feature matrix with a size of h/2, and the structure of generating the index is shown in fig. 5, specifically, obtaining the first coding index and the first decoding index according to the context feature matrix with a size of h/2 includes,
s11, expressing a context characteristic matrix with the size of h/2 as x i Obtaining x i The shape parameters include batch size value batch size, channel number c, height h, width w,
s12, x i Obtaining x from input Conv2d function i1 Will x i1 Inputting into a BatchNorm2d function to obtain x i2 Will x i2 Inputting into a BatchNorm2d function to obtain x i3 Will x i3 Inputting into a BatchNorm2d function to obtain x i4
S13, pair x i1 、x i2 、x i3 、x i4 Respectively carrying out maximum pooling operation to obtain four initial indexes x 1 ,x 2 ,x 3 ,x 4
S14, the initial index x is obtained through a torch.cat function 1 ,x 2 ,x 3 ,x 4 Splicing the two matrixes into a new matrix along the dimension 1, and transmitting the new matrix to a sigmoid activation function to obtain an initial decoding index y; s15, the initial decoding index y is subjected to softmax function to obtain an initial coding index z, the view function is used for adjusting the shape parameters of the initial decoding index y and the initial coding index z, the shape parameters are adjusted to be batch size, c multiplied by 4, h/2 and w/2, the adjusted initial decoding index y and the initial coding index z are obtained,
s16, reorganizing the adjusted initial decoding index y and initial encoding index z into the size before adjustment by using a pixel_shuffle function to obtain a first encoding index and a first decoding index.
Obtaining a second coding index and a second decoding index according to the context feature matrix with the size of h/4, processing the context feature matrix with the size of h/4 according to steps S11-S16 to obtain the second coding index and the second decoding index,
fusing the first coding index with a context feature matrix with the size of h/2, fusing the second coding index with a context feature matrix with the size of h/4, obtaining a fused context feature matrix with the size of h/2 and a fused context feature matrix with the size of h/4,
specifically, the separable pyramid unit is used for capturing the spatial context information of multiple scales of the initial feature map in parallel in the following manner:
the separable pyramid unit replaces all 3 x 3 convolutions in the expanded pyramid module with depth separable convolutions and builds and applies the separable pyramid unit based on four different sized feature matrices. The expansion pyramid module is a convolutional neural network module commonly used in deep learning and is used for extracting features with different scales from images. The structure is as follows:
input layer: accepting input data from a previous layer.
Convolution layer: and carrying out convolution operation on the input data by using convolution cores with different sizes, and extracting features with different scales.
Expansion layer: and expanding the characteristic diagram output by the convolution layer to obtain a larger receptive field.
Fusion layer: and fusing the feature graphs with different scales to obtain richer feature representations.
The up-sampling decoder is used for decoding and up-sampling the context feature matrix according to the order from small to large in size to obtain a semantic segmentation result graph of the remote sensing image, the semantic segmentation result graph comprises a prediction label of each pixel in the remote sensing image,
specifically, the up-sampling decoder is used for decoding and up-sampling the context feature matrix with the size of h/16 to obtain the output feature matrix x with the size of h/8 d1 Will x d1 The new feature matrix x is spliced with the context feature matrix with the size of h/8 along the dimension 1 by using a torch.cat function m1
For x m1 Decoding up-sampling is carried out to obtain an output characteristic matrix x with the size of h/4 d2 Will x d2 The new feature matrix x is spliced with the context feature matrix with the size of h/4 along the dimension 1 by using a torch.cat function m2 For the second decoding index and the feature matrix x m2 After matrix multiplication operation, a characteristic matrix x is output n2
For x n2 Decoding and up-sampling to obtain an output characteristic matrix with the size of h/2, and then splicing the output characteristic matrix with the size of h/2 with a context characteristic matrix along the dimension 1 by using a torch.cat function to form a new characteristic matrix x m3 For the first decoding index and the feature matrix x m3 Performing matrix product operation to output a feature matrix x n3
For x n3 Decoding and up-sampling are carried out to obtain a feature matrix x with the size of h d4 For x d4 And after one convolution, inputting the result into a softmax activation function to obtain a semantic segmentation result graph.
Based on the technical scheme, the remote sensing image semantic segmentation method based on the coding and decoding indexing edge characterization provided by the invention uses an image-based deep learning classification frame in a segmentation model, and the remote sensing image is input into the model for training and prediction. Firstly, extracting and fusing the characteristics of the remote sensing image through a multi-scale characteristic encoder. Secondly, capturing spatial context information of multiple scales of the initial feature map in parallel through separable pyramid units with different expansion rates in an expansion convolution mode, and thirdly, inputting two feature maps with the largest size into an index generation module to extract coding indexes and decoding indexes containing edge feature information, and integrating the coding indexes into a feature matrix in a matrix product mode; and fourthly, performing four times of decoding and up-sampling on the feature map with the minimum size, merging the feature map with the corresponding size by using an output matrix of each decoding and up-sampling in a jump connection mode, merging two decoding indexes into the results of the second decoding and up-sampling and the third decoding in a matrix product mode, and finally classifying the results of the fourth decoding and up-sampling by using a one-time transposition convolution and a one-time softmax activation function to output a final semantic segmentation result map. The remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization is used for improving feature extraction and processing of remote sensing ground object edge information, and improving recognition accuracy of small-size objects and complex boundary information in a remote sensing image.
In the embodiment, the real remote sensing image data are adopted for experiments, and two groups of public real remote sensing image data sets are adopted for carrying out test description, analysis and evaluation of application effects on the remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization.
1. Data set and parameter settings
The present embodiment uses two published high resolution remote sensing image datasets (the Potsdam dataset and the Vaihingen dataset) in an ISPRS 2D semantic annotation large race for experimentation and analysis, the datasets employing Digital Surface Models (DSMs) generated from high resolution orthogonal photographs and corresponding dense image matching techniques.
The isps Vaihingen dataset comprises 33 images of varying sizes, with an average size of 2494 x 2064, a spatial resolution of 9cm, and images containing three bands of Near Infrared (NIR), red (R) and green (G). The label categories include water impermeable surfaces, buildings, low plants, trees, automobiles, and other 6 categories altogether.
The isps watsdam dataset then contained 38 images, each 6000 x 6000 in pixel size, with 5cm spatial resolution, using three bands of red (R), green (G) and blue (B). The tag class and number are consistent with the Vaihingen dataset.
The Batch size is set to 16 during training of the model, 300 epochs are trained each time, the learning rate is dynamically adjusted in a cosine annealing mode, the initial learning rate is set to 1e-3, the learning rate attenuation coefficient is 0.2, the learning rate attenuation interval is 5, and an AdamW optimizer is used for optimizing parameters.
2. Experimental evaluation index
The Overall Accuracy (OA) is a performance index for evaluating the classification model, and in the image semantic segmentation task, refers to the proportion of the number of correctly classified pixels to the total number of pixels. The calculation formula is as follows:
the F1 score and the mF1 score are indexes for measuring the performance of the classification model and are commonly used for evaluating the accuracy of a two-classification or multi-classification model. The cross-over ratio (Intersection over Union, ioU) and the average cross-over ratio (Mean Intersection over Union, mlou) are a commonly used index for measuring target detection and semantic segmentation model performance.
3. Analysis and evaluation of experimental results
The results of the remote sensing image semantic segmentation model based on the coding and decoding indexing edge characterization and using two groups of remote sensing image data experiments are shown in table 1 and table 2.
Table 1 Vaihingen dataset versus experiment results table
TABLE 2 Potsdam dataset vs. Experimental results Table
The experiments introduced DCNN based models UNet and SegNet, a modified model transfunet based on a Transformer, and a CapsUNet model using the same dataset. From the classification results, the following conclusions can be analytically drawn:
it can be seen from the table that the best results are achieved by the segmentation model. Because the remote sensing image data volume is limited, the TransUNet model experiment result is poor. The Transformer is used as an encoder in the transfunet model to present modeled remote dependencies and add low-level detail information to feature mapping in the decoder by skipping connections. However, because the Transformer model requires a large amount of data to train, transfune is not as similar to the remote sensing image semantic segmentation model based on the coding and decoding indexing edge characterization as proposed by the invention in the experiment. The model provided by the invention obtains better performance by comparing experimental results of a DCNN-based improved model (Unet, segNet, capsUNet) and a transform-based improved model TransUNet.
The whole beneficial effects are that:
the invention provides a high-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization, which extracts multi-scale semantic features of images through a multi-scale feature encoder in a remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization, can capture spatial context information in parallel, strengthens the segmentation effect of remote sensing feature edge information through extracting coding and decoding indexes containing edge information, improves feature extraction and processing of the remote sensing feature edge information, improves recognition precision of small-size objects and complex boundary information in a remote sensing image, and realizes accurate semantic segmentation of the remote sensing feature edge.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (5)

1. A semantic segmentation method of a high-resolution remote sensing image based on coding and decoding indexing edge characterization is characterized by comprising the following steps of,
step one, acquiring a remote sensing image set, amplifying the remote sensing image set, namely rotating the remote sensing image at any angle and storing the remote sensing image into the remote sensing image set, respectively carrying out normalization processing on the remote sensing image, dividing the remote sensing image set into a training set and a testing set,
step two, constructing a remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation, training the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation according to a training set, obtaining a prediction label of each pixel in the remote sensing image, calculating loss according to the truth label and the prediction label of each pixel, judging whether the value of the loss meets a threshold value, optimizing parameters of the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation according to the difference between the value of the loss and the threshold value if the value of the loss does not meet the threshold value, obtaining the trained remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation if the value of the loss meets the threshold value,
the remote sensing image semantic segmentation model based on the coding and decoding indexing edge representation comprises a multi-scale feature encoder, a separable pyramid unit, a coding and decoding indexing edge representation unit and an up-sampling decoder,
the multi-scale feature encoder is used for generating four initial feature matrixes according to the size h of the remote sensing image, the sizes of the four initial feature matrixes are h/2, h/4, h/8 and h/16 respectively,
the separable pyramid unit is used for obtaining four context feature matrixes according to four initial feature matrixes of the remote sensing image, the sizes of the four context feature matrixes are h/2, h/4, h/8 and h/16 respectively,
the coding and decoding indexing edge characterization unit is used for acquiring a first coding index and a first decoding index according to a context feature matrix with the size of h/2, acquiring a second coding index and a second decoding index according to a context feature matrix with the size of h/4, fusing the first coding index with the context feature matrix with the size of h/2, fusing the second coding index with the context feature matrix with the size of h/4, acquiring a fused context feature matrix with the size of h/2 and a fused context feature matrix with the size of h/4,
the up-sampling decoder is used for decoding and up-sampling the four context feature matrixes according to the order from small to large in size to obtain a semantic segmentation result graph of the remote sensing image, the semantic segmentation result graph comprises a prediction label of each pixel in the remote sensing image,
and thirdly, acquiring the processed remote sensing image, inputting the processed remote sensing image into a trained remote sensing image semantic segmentation model based on coding and decoding indexing edge characterization, and outputting a semantic segmentation result graph of the remote sensing image.
2. The method for semantic segmentation of high-resolution remote sensing images based on coding and decoding indexed edge representation according to claim 1, wherein the method comprises the following steps ofThe up-sampling decoder is used for decoding and up-sampling the context feature matrix with the size of h/16 to obtain an output feature matrix x with the size of h/8 d1 Will x d1 The new feature matrix x is spliced with the context feature matrix with the size of h/8 along the dimension 1 by using a torch.cat function m1
For x m1 Decoding up-sampling is carried out to obtain an output characteristic matrix x with the size of h/4 d2 Will x d2 The new feature matrix x is spliced with the context feature matrix with the size of h/4 along the dimension 1 by using a torch.cat function m2 For the second decoding index and the feature matrix x m2 After matrix multiplication operation, a characteristic matrix x is output n2
For x n2 Decoding and up-sampling to obtain an output characteristic matrix with the size of h/2, and then splicing the output characteristic matrix with the size of h/2 with a context characteristic matrix along the dimension 1 by using a torch.cat function to form a new characteristic matrix x m3 For the first decoding index and the feature matrix x m3 Performing matrix product operation to output a feature matrix x n3
For x n3 Decoding and up-sampling are carried out to obtain a feature matrix x with the size of h d4 For x d4 And after one convolution, inputting the result into a softmax activation function to obtain a semantic segmentation result graph.
3. The method of claim 1, wherein calculating the loss from the truth label and the predictive label for each pixel comprises calculating the loss according to equation (1),
Loss focal =-(1-p t ) γ log(p t ) (1)
wherein p is t Is the prediction probability of the truth value label, the prediction probability is obtained according to the truth value label and the prediction label, gamma is the super parameter, and Loss focal Representing the focal point loss function.
4. The method of claim 1, wherein the multi-scale context feature encoder comprises a spatial feature extraction branch for extracting local feature information of the remote sensing image, a self-attention feature extraction branch for extracting global feature information of the remote sensing image, and a fusion branch for fusing the local feature information and the global feature information according to formula (2),
x=concatnate(Conv2d(x ci ),Conv2d(x si ))
y=sigmoid(Conv2d(ReLU(Conv2d(AdaptiveAvgPool2d(x)))))
x fi =x×reshape(y) (2)
wherein x is si Representing the ith stage feature matrix, x of the self-attention feature extraction branch ci Representing the ith stage feature matrix, x of the spatial feature extraction branch fi Conv2d (x) represents 2d convolution, adapteveAvgPool 2d (x) represents an adaptive pooling function, sigmoid (x) represents a sigmoid activation function, reLU (x) represents a ReLU activation function, concate (x) represents stitching two matrices along dimension 1, and reshape (x) represents a shape change function.
5. The method of claim 1, wherein the obtaining the first encoding index and the first decoding index according to the context feature matrix with the size of h/2 comprises,
s11, expressing a context characteristic matrix with the size of h/2 as x i Obtaining x i The shape parameters include a batch size value, a channel number c, a height h, a width w,
s12, x i Obtaining x from input Conv2d function i1 Will x i1 Inputting into a BatchNorm2d function to obtain x i2 Will x i2 Inputting into a BatchNorm2d function to obtain x i3 Will x i3 Inputting into a BatchNorm2d function to obtain x i4
S13, pair x i1 、x i2 、x i3 、x i4 Respectively carrying out maximum pooling operation to obtain four initial indexes x 1 ,x 2 ,x 3 ,x 4
S14, the initial index x is obtained through a torch.cat function 1 ,x 2 ,x 3 ,x 4 Splicing the two matrixes into a new matrix along the dimension 1, and transmitting the new matrix to a sigmoid activation function to obtain an initial decoding index y; s15, the initial decoding index y is subjected to softmax function to obtain an initial coding index z, the view function is used for adjusting the shape parameters of the initial decoding index y and the initial coding index z, the shape parameters are adjusted to be batch size, c multiplied by 4, h/2 and w/2, the adjusted initial decoding index y and the initial coding index z are obtained,
s15, reorganizing the adjusted initial decoding index y and initial encoding index z into the size before adjustment by using a pixel_shuffle function to obtain a first encoding index and a first decoding index.
CN202310496605.3A 2023-05-05 2023-05-05 High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization Pending CN116524189A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310496605.3A CN116524189A (en) 2023-05-05 2023-05-05 High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310496605.3A CN116524189A (en) 2023-05-05 2023-05-05 High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization

Publications (1)

Publication Number Publication Date
CN116524189A true CN116524189A (en) 2023-08-01

Family

ID=87391724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310496605.3A Pending CN116524189A (en) 2023-05-05 2023-05-05 High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization

Country Status (1)

Country Link
CN (1) CN116524189A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740364A (en) * 2023-08-16 2023-09-12 长春大学 Image semantic segmentation method based on reference mechanism
CN117237623A (en) * 2023-08-04 2023-12-15 山东大学 Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237623A (en) * 2023-08-04 2023-12-15 山东大学 Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle
CN117237623B (en) * 2023-08-04 2024-03-29 山东大学 Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle
CN116740364A (en) * 2023-08-16 2023-09-12 长春大学 Image semantic segmentation method based on reference mechanism
CN116740364B (en) * 2023-08-16 2023-10-27 长春大学 Image semantic segmentation method based on reference mechanism

Similar Documents

Publication Publication Date Title
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN113780149B (en) Remote sensing image building target efficient extraction method based on attention mechanism
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN111898651A (en) Tree detection method based on Tiny Yolov3 algorithm
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN111797920B (en) Remote sensing extraction method and system for depth network impervious surface with gate control feature fusion
CN112950780B (en) Intelligent network map generation method and system based on remote sensing image
CN113034506B (en) Remote sensing image semantic segmentation method and device, computer equipment and storage medium
CN113269224B (en) Scene image classification method, system and storage medium
CN112287983B (en) Remote sensing image target extraction system and method based on deep learning
CN114359130A (en) Road crack detection method based on unmanned aerial vehicle image
CN114255403A (en) Optical remote sensing image data processing method and system based on deep learning
CN117079139B (en) Remote sensing image target detection method and system based on multi-scale semantic features
CN109325407B (en) Optical remote sensing video target detection method based on F-SSD network filtering
CN116416503A (en) Small sample target detection method, system and medium based on multi-mode fusion
CN115937693A (en) Road identification method and system based on remote sensing image
CN116206112A (en) Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM
CN116597270A (en) Road damage target detection method based on attention mechanism integrated learning network
CN113496221B (en) Point supervision remote sensing image semantic segmentation method and system based on depth bilateral filtering
CN114926826A (en) Scene text detection system
CN117576073A (en) Road defect detection method, device and medium based on improved YOLOv8 model
CN117058542A (en) Multi-scale high-precision light-weight target detection method based on large receptive field and attention mechanism
CN111160282B (en) Traffic light detection method based on binary Yolov3 network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination