CN115035418A - Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network - Google Patents

Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network Download PDF

Info

Publication number
CN115035418A
CN115035418A CN202210677113.XA CN202210677113A CN115035418A CN 115035418 A CN115035418 A CN 115035418A CN 202210677113 A CN202210677113 A CN 202210677113A CN 115035418 A CN115035418 A CN 115035418A
Authority
CN
China
Prior art keywords
data
model
training
semantic segmentation
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210677113.XA
Other languages
Chinese (zh)
Inventor
白根宝
徐欣
姚英彪
杨阿锋
刘晴
姜显扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202210677113.XA priority Critical patent/CN115035418A/en
Publication of CN115035418A publication Critical patent/CN115035418A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/182Network patterns, e.g. roads or rivers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Astronomy & Astrophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image semantic segmentation method and a system based on an improved deep LabV3+ network, wherein the method comprises the following steps: s1, acquiring a remote sensing road data set and preprocessing the data set, wherein the data in the data set is divided into training data, verification data and test data; s2, building an improved DeepLabV3+ semantic segmentation network model based on a Pytrch environment; s3, training the improved DeepLabV3+ semantic segmentation network model by using the training data and the verification data obtained in the step S1; and S4, inputting the test data obtained in the step S1 into the improved DeepLabV3+ semantic segmentation network model in the step S3 to obtain a semantic segmentation result of the remote sensing road image. Compared with a traditional DeepLabV3+ network model-based method, the method adopts an R-Drop regularization method, and regularizes the output of two sub-models randomly extracted from dropout by each data sample in training.

Description

Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network
Technical Field
The invention belongs to the technical field of remote sensing image segmentation, relates to a remote sensing image segmentation method, and particularly relates to a remote sensing image semantic segmentation method and system based on an improved DeepLabV3+ network.
Background
The remote sensing image segmentation algorithm refers to prediction of each pixel in a remote sensing image, is a pixel-level classification algorithm, can be widely applied to a plurality of application scenes such as land planning, environment monitoring and disaster assessment, and has a great application value. The traditional image segmentation method is mainly based on the steps that a classifier is manually designed on the basis of image bottom-layer features such as colors, textures and the like to segment an image, and then semantics are marked on the segmented image. For example, a pixel-level-based clustering segmentation method, a pixel-level threshold segmentation method, a pixel-level-based decision tree classification method and the like are used, the algorithms better meet the requirements of image segmentation to a certain extent, but have high requirements on manually designed feature extractors, have poor generalization performance on data sets, and are difficult to apply to general scenes with complex backgrounds in a large scale.
Along with the rapid development of computer hardware in recent years, especially the improvement of GPU computing capacity, the progress of artificial intelligence is greatly promoted, and meanwhile, great power is provided for the development of computer vision. Semantic segmentation is a basic task in computer vision, and by means of the strong computing power of a GPU, an image segmentation method based on deep learning can be used for rapidly segmenting a remote sensing image and accurately extracting useful information. Semantic segmentation architectures are usually in different forms and can be understood as a coder-decoder network as a whole. The encoder is usually a pre-trained classification network such as ResNet to extract the features of the image, and for the decoder, the role is mainly in mapping the discriminable features, and mapping from semantics to pixel space can be realized, so that dense classification can be obtained, and this is also a function that needs to be realized by semantic segmentation.
DeepLabV3+ is a network model with better performance in semantic segmentation, the extraction of feature resolution is realized mainly by arbitrarily controlling an encoder, meanwhile, the balance of efficiency and precision can be achieved, the MobileNetV2 network model is applied to the semantic segmentation, a deep separable convolutional network is used in a decoding module, and the execution efficiency of encode-decode is enhanced by adopting the mode. The Dropout method is adopted in the DeepLabV3+ network, so that the overfitting problem in the training process is avoided, part of neurons are randomly ignored in the training process, the contribution of the ignored neurons to downstream neurons temporarily disappears in the forward propagation process, and the neurons do not have any weight update in the reverse propagation process.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a semantic segmentation method and a semantic segmentation system for remote sensing images based on an improved DeepLabV3+ network, the invention uses an R-Drop regularization method to replace a Drop method used in an original DeepLabV3+ network, the R-Drop further regularizes a model space, exceeds the Drop, the generalization capability of the model can be further improved, and therefore, the effective segmentation of the remote sensing urban road images can be completed.
The technical scheme adopted by the invention is as follows:
a remote sensing image semantic segmentation method based on an improved DeepLabV3+ network comprises the following steps:
s1, acquiring a remote sensing road data set and preprocessing the remote sensing road data set;
s2, building an improved DeepLabV3+ semantic segmentation network based on a Pytrch environment;
s3, training the improved DeepLabV3+ semantic segmentation network model by using the training data and the verification data obtained in the step S1;
and S4, inputting the test data obtained in the step S1 into the step S3 to improve the deep LabV3+ semantic segmentation network model to obtain a semantic segmentation result of the remote sensing road image.
Further, the step S1 specifically includes the following steps:
s11, downloading or self-making a remote sensing data set from an open source data set website;
s12, respectively placing the image file and the label file which are originally placed in one folder into different folders;
s13, randomly dividing data in the data set into training data, verification data and test data according to the ratio of 2:1:1, and storing the divided file name list files under the path where the project is located, wherein the divided file name list files are respectively train.txt, val.txt and test.txt.
Further, the step S2 specifically includes the following steps:
s21, improving a DeepLabV3+ semantic segmentation network model and dividing the improved DeepLabV3+ semantic segmentation network model into an encoder module and a decoder module;
s22, in an encoder module, extracting shallow features and deep features of the remote sensing image by using MobileNet V2 as a main network;
and S23, performing further feature extraction operation on the deep features obtained in the S21 by adopting a Spatial Pyramid Pooling module (also called ASPP module, ASPP is an English abbreviation of atom Spatial Pyramid farming). The spatial pyramid pooling module consists of a 1 × 1 convolution, three expansion convolutions with expansion rates of 6, 12 and 18 respectively and an ImagePooling (global average pooling) module, wherein the three expansion convolutions are used for capturing the receptive field information of different scales and capturing the characteristic information of different scales, and the global average pooling and the 1 × 1 convolution layer are used for extracting characteristics;
s24, stacking the feature layers with different receptive fields obtained in the step S23 by using a continate feature fusion method, wherein the number of input channels is 5 times of the number of original input channels, and reducing the number of the channels to the original value by using a 1 multiplied by 1 convolution layer to obtain deep features;
s25, adjusting the number of channels of the shallow feature obtained in the step S22 by adopting 1 × 1 convolution in the decoder module, and then performing concatemate feature fusion with the result obtained in the step S24 after 4 times of upsampling on the deep feature layer;
s26, thinning the feature fusion result obtained in the step S25 by adopting two 3 x 3 convolutional layers, and then performing four-time upsampling to obtain a segmentation prediction graph.
Further, the step S3 specifically includes the following steps:
s31, setting initial parameters of the training model as follows:
initial learning rate, namely learning rate: 0.014;
weight decay, namely weight decay: 0.0005;
momentum, momentum: 0.9;
the batch size is determined according to the display memory size of the server for actual training;
s32, in the training process, an R-Drop regularization method is adopted, namely: in each small training batch, each data sample undergoes two forward passes, each of which is implemented by a different submodel by randomly deleting some hidden units.
The specific process is as follows: the training data is
Figure BDA0003695201620000021
The goal of the training is to learn a model P w (y i |x i ) Where n is the number of training samples, (x) i ,y i ) Is a marked data pair, x i Is input data, y i Is a label, and the loss of each sample is the cross entropy:
L i =-logP w (y i |x i )
in the case of the R-Drop regularization method, the sample may be considered to have passed through two slightly different models, denoted separately as
Figure BDA0003695201620000031
And
Figure BDA0003695201620000032
the final loss of the model is divided into two parts, one part is the conventional cross entropy:
Figure BDA0003695201620000033
the other part is the symmetrical KL divergence between the two models, which has the effect of making the outputs of the two models passing through different Dropout as consistent as possible:
Figure BDA0003695201620000034
the final loss of the network model is the weighted sum of the two losses:
Figure BDA0003695201620000035
wherein alpha is the weight of the auxiliary loss and is set as 1, and the loss function adopts the cross information entropy;
s33, calculating a gradient according to the loss function obtained in the step S32, and updating a weight value and a bias value of the neural network by adopting a random gradient descent method as an optimizer;
s34, Pixel Accuracy (PA) and average Intersection over Union (MIoU) are introduced to evaluate the performance of the model, wherein the PA represents the proportion of the number of correct pixels of the prediction category to the total number of pixels, the MIoU represents the precision of the network model for segmenting the image, and the higher the MIoU value is, the better the image segmentation effect is. The calculation method comprises the following steps:
Figure BDA0003695201620000036
Figure BDA0003695201620000037
in the above formula, tp (true positive) represents that the model prediction is correct, i.e. both the model prediction and the actual model are positive examples; FP (false positive) represents model prediction error, that is, the model predicts that the category is a positive example, but actually the category is a negative example; FN (false negative) represents model prediction error, that is, the model predicts that the category is a negative example, and actually the category is a positive example; TN (TN) represents that the model prediction is correct, which means that the model prediction and the actual model are opposite examples; n represents the number of classes, and the subscript i represents the ith class;
s35, the training process of the steps S32-S24 is repeated, after each round of training is finished, the network model is evaluated by using a verification data set, the model is stored according to the MIoU optimal result, the training is stopped until the iteration number reaches a set value, and the trained model is stored.
Further, the step S4 specifically includes the following steps:
s41, loading the model trained in the step S3, and reading in the test picture and the label of the test data obtained in the step S1;
and S42, calculating index scores and storing test results.
The invention also discloses a remote sensing image semantic segmentation system based on the improved deep LabV3+ network, which comprises the following modules:
a data classification module: acquiring a remote sensing road data set and preprocessing the data set, wherein the data in the data set is divided into training data, verification data and test data;
a model building module: constructing an improved DeepLabV3+ semantic segmentation network model based on a Pythrch environment;
a training module: training the improved DeepLabV3+ semantic segmentation network model by using training data and verification data obtained by the data classification module;
a segmentation result obtaining module: and inputting the test data obtained by the data classification module into an improved DeepLabV3+ semantic segmentation network model of the training module to obtain a semantic segmentation result of the remote sensing road image.
Compared with the prior art, the invention has the following beneficial effects:
compared with a traditional DeepLabV3+ network model-based method, the method for semantically segmenting the remote sensing image based on the improved DeepLabV3+ network can regularize the output of two submodels randomly extracted from dropouts by each data sample in training by adopting an R-Drop regularization method, can reduce the degree of freedom of network model parameters, can relieve inconsistency between the training and reasoning stages, and enhances generalization capability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a remote sensing image semantic segmentation method based on an improved deep av 3+ model according to embodiment 1 of the present invention.
FIG. 2 is a schematic diagram of the R-Drop regularization method provided in embodiment 1 of the present invention.
Fig. 3 is a semantic segmentation result diagram of a remote sensing road image provided in embodiment 1 of the present invention.
Fig. 4 is a block diagram of a remote sensing image semantic segmentation system based on an improved deep lab v3+ network according to embodiment 2 of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example 1
As shown in fig. 1, the embodiment provides a remote sensing image semantic segmentation method based on an improved deep labv3+ model, which specifically includes the following steps:
s1, acquiring a remote sensing road data set and preprocessing the remote sensing road data set. In the embodiment, a DeepGlobe Road Extraction Dataset downloaded from an open source Dataset website kaggle.com is used, and 2000 remote sensing Road RGB satellite images with the size of 1024 × 1024 are randomly selected from the Dataset and are randomly divided into training data, verification data and test data according to the ratio of 2:1: 1.
And S2, building an improved DeepLabV3+ semantic segmentation network based on the Pytrch environment. In this embodiment, MobileNetV2 is selected as a backbone network of the deep lab v3+ semantic segmentation network to extract shallow features and deep features; the deep features are input into an ASPP module to obtain multi-scale feature layers with different receptive fields through 5 different operations such as convolution, cavity convolution, global average pooling and the like, after concatenate stacking processing is carried out, the number of channels is reduced to an original value through 1 x 1 convolution to obtain deep features, and the deep features are input into a decoder module. In a decoder module of a network model, channel number adjustment is carried out on shallow features input from an encoder module, 4 times of upsampling is carried out on deep features, then the results of the shallow features and the deep features are subjected to concatenate stacking, after the stacking is finished, the stacking result is subjected to 3 x 3 depth separable convolution twice and 4 times of upsampling to restore the original image size, and the predicted segmentation image of the remote sensing image is obtained.
And S3, training the improved DeepLabV3+ semantic segmentation network model by using the training data and the verification data obtained in the step S1. In order to verify the feasibility of the network designed by the invention and the recognition effect of the path in the complex environment, the network is programmed and trained, and the specific experimental environment and configuration are shown in table 1:
TABLE 1 Experimental Environment and configuration
Figure BDA0003695201620000041
Figure BDA0003695201620000051
And setting initial parameters of the training model as shown in table 2:
table 2 initial parameter settings
Figure BDA0003695201620000052
After the parameters are set, training can be carried out, in the training process, an R-Drop regularization method is adopted to replace a Drop out method used in an original DeepLabV3+ network, specifically, in each small batch training, each data sample is subjected to forward transfer twice, each transfer is realized by randomly deleting some hidden units through different submodels, and the schematic diagram of the R-Drop regularization method is shown in FIG. 2.
The specific process is as follows:
the training data is
Figure BDA0003695201620000053
The goal of the training is to learn a model P w (y i |x i ) Where n is the number of training samples, (x) i ,y i ) Is a marked data pair, x i Is input data, y i Is a label, and the loss of each sample is the cross entropy:
L i =-logP w (y i |x i )
in the case of using the R-Drop regularization method, the sample may be considered genericAfter passing two slightly different models, respectively marked as
Figure BDA0003695201620000054
And
Figure BDA0003695201620000055
the final loss of the model is divided into two parts, one part is the conventional cross entropy:
Figure BDA0003695201620000056
the other part is the symmetrical KL divergence between the two models, which has the effect of making the outputs of the two models passing through different Dropout as consistent as possible:
Figure BDA0003695201620000057
the final loss of the network model is the weighted sum of the two losses:
Figure BDA0003695201620000058
wherein alpha is the weight of the auxiliary loss and is set to be 1, and the loss function adopts cross information entropy;
and (2) evaluating the performance of the model by introducing Pixel Accuracy (PA) and average Intersection over Union (MIoU), wherein the PA represents the proportion of the Pixel number with correct prediction category to the total Pixel number, the MIoU represents the precision of the network model for segmenting the image, and the higher the MIoU value is, the better the image segmentation effect is. The calculation method comprises the following steps:
Figure BDA0003695201620000059
Figure BDA0003695201620000061
in the formula, TP (true Positive) represents that model prediction is correct, namely the model prediction and the actual model prediction are positive examples; FP (false positive) represents model prediction error, that is, the model predicts that the category is a positive example, but actually the category is a negative example; FN (false negative) represents model prediction error, that is, the model predicts that the category is a negative example, and actually the category is a positive example; TN (TN) represents that the model prediction is correct, which means that the model prediction and the actual model are opposite examples; n represents the number of classes and the index i represents the ith class.
In the training stage, a random gradient descent method is used as an optimizer, and the weight value and the offset value after the convolutional neural network is updated are calculated; and after each round of training is finished, evaluating the network model by using a verification data set, storing the model according to the MIoU optimal result, stopping training after 300 rounds of iteration, and storing the trained model.
And S4, inputting the test data obtained in the step S1 into an improved DeepLabV3+ semantic segmentation network model to obtain a semantic segmentation result of the remote sensing road image, wherein a result graph is shown in FIG 3.
In addition to the experiment of improving the deep labv3+ semantic segmentation network model, the invention also trains the deep labv3+ algorithm into a corresponding model on the selected remote sensing road data set, and compares the model with the performance of the algorithm of the invention, wherein the performance of the two algorithms on the remote sensing road data set is shown in table 3:
TABLE 3 comparison of the Performance of the two models on a set of remote sensing road data
Modeling method PA(%) MIoU(%)
DeepLabV3+ 97.3721 73.8854
The invention 97.6744 76.8213
The remote sensing semantic segmentation method for improving the DeepLabV3+ network improves the pixel accuracy rate, provides nearly 3 points in average cross-over ratio, and has an obviously better segmentation effect on images than that of the original DeepLabV3+ algorithm.
Example 2
As shown in fig. 4, the embodiment discloses a remote sensing image semantic segmentation system based on an improved deep labv3+ network, which includes the following modules:
a data classification module: acquiring a remote sensing road data set and preprocessing the data set, wherein the data in the data set are divided into training data, verification data and test data;
a model building module: constructing an improved DeepLabV3+ semantic segmentation network model based on a Pytrch environment;
a training module: training the improved DeepLabV3+ semantic segmentation network model by using training data and verification data obtained by the data classification module;
a segmentation result obtaining module: and inputting the test data obtained by the data classification module into an improved DeepLabV3+ semantic segmentation network model of the training module to obtain a semantic segmentation result of the remote sensing road image.
Other contents of this embodiment can refer to embodiment 1.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention. However, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (6)

1. A remote sensing image semantic segmentation method based on an improved deep LabV3+ network is characterized by comprising the following steps:
s1, acquiring a remote sensing road data set and preprocessing the data set, wherein the data in the data set is divided into training data, verification data and test data;
s2, building an improved DeepLabV3+ semantic segmentation network model based on a Pytrch environment;
s3, training the improved DeepLabV3+ semantic segmentation network model by using the training data and the verification data obtained in the step S1;
and S4, inputting the test data obtained in the step S1 into the improved DeepLabV3+ semantic segmentation network model in the step S3 to obtain a semantic segmentation result of the remote sensing road image.
2. The remote sensing image semantic segmentation method based on the improved DeepLabV3+ network according to claim 1, wherein the step S1 specifically comprises the following steps:
s11, downloading or self-making a remote sensing image data set from an open source data set website;
s12, respectively placing the image file and the label file which are originally placed in one folder into different folders;
s13, randomly dividing data in the data set into training data, verification data and test data according to the ratio of 2:1:1, and storing the divided file name list files under the path of the project, wherein the divided file name list files are respectively train.txt, val.txt and test.txt.
3. The remote sensing image semantic segmentation method based on the improved deep LabV3+ network as claimed in claim 2, wherein the step S2 specifically comprises the following steps:
s21, improving a DeepLabV3+ semantic segmentation network model to be divided into an encoder module and a decoder module;
s22, in the encoder module, extracting shallow features and deep features of the remote sensing image by using MobileNet V2 as a main network;
s23, further extracting the deep features obtained in the step S21 by adopting a spatial pyramid pooling module; the spatial pyramid pooling module consists of a 1 × 1 convolution, three expansion convolutions with expansion rates of 6, 12 and 18 respectively and an imageposing module, the three expansion convolutions are used for capturing the receptive field information of different scales and capturing the characteristic information of different scales, and the global average pooling and the 1 × 1 convolution layer are used for extracting characteristics;
s24, stacking the feature layers with different receptive fields obtained in the step S23 by using a concatenate feature fusion method, wherein the number of input channels is 5 times of the number of original input channels, and reducing the number of channels to the original value by using a 1 multiplied by 1 convolution layer to obtain deep features;
s25, adjusting the number of channels of the shallow feature obtained in the step S22 by adopting 1 × 1 convolution in the decoder module, and then performing concatemate feature fusion with the result obtained in the step S24 after 4 times of upsampling on the deep feature layer;
s26, thinning the feature fusion result obtained in the step S25 by adopting two 3 x 3 convolutional layers, and then performing four-time upsampling to obtain a segmentation prediction graph.
4. The remote sensing image semantic segmentation method based on the improved deep LabV3+ network according to claim 3, wherein the step S3 specifically comprises the following steps:
s31, setting initial parameters of the training model as follows:
initial learning rate, namely learning rate: 0.014;
weight decay, namely weight decay: 0.0005;
momentum, momentum: 0.9;
s32, in the training process, an R-Drop regularization method is adopted, namely: in each small batch training, each data sample is subjected to two forward transmissions, and each transmission is processed by different submodels through random deletion of some hidden units; the method comprises the following specific steps: the training data is
Figure FDA0003695201610000011
The goal of the training is to learn a model P w (y i |x i ) Where n is the number of training samples, (x) i ,y i ) Is a marked data pair, x i Is input data, y i Is a label, and the loss of each sample is the cross entropy:
L i =-logP w (y i |x i )
in the case of the R-Drop regularization method, the samples are considered to pass through two slightly different models, denoted respectively as
Figure FDA0003695201610000021
And
Figure FDA0003695201610000022
the final loss of the model is divided into two parts, one part is the conventional cross entropy:
Figure FDA0003695201610000023
the other part is the symmetric KL divergence between the two models:
Figure FDA0003695201610000024
the final loss of the network model is the weighted sum of the two losses:
Figure FDA0003695201610000025
wherein alpha is the weight of the auxiliary loss and is set to be 1, and the loss function adopts cross information entropy;
s33, calculating a gradient according to the loss function obtained in the step S32, and updating a weight value and a bias value of the neural network by adopting a random gradient descent method as an optimizer;
s34, introducing a pixel accuracy rate PA and an average cross-over ratio MIoU to evaluate the performance of the model, wherein the PA represents the proportion of the number of pixels with correct prediction categories to the total number of pixels, the MIoU represents the image segmentation precision of the network model, and the higher the MIoU value is, the better the image segmentation effect is; the calculation method comprises the following steps:
Figure FDA0003695201610000026
Figure FDA0003695201610000027
in the formula, TP represents that the model prediction is correct, namely the model prediction and the actual model prediction are positive examples; FP represents the model prediction error, namely the model predicts that the category is a positive example, but actually the category is a negative example; FN represents model prediction error, i.e. model predicts that the category is a negative example, and actually the category is a positive example; TN represents that the model prediction is correct, which means that the model prediction and the actual model are both counter-examples; n represents the number of classes, and the subscript i represents the ith class;
s35, the training process of the steps S32-S34 is repeated, after each round of training is finished, the network model is evaluated by using verification data, the model is stored according to the MIoU optimal result, the training is stopped until the iteration number reaches a set value, and the trained model is stored.
5. The remote sensing image semantic segmentation method based on the improved deep LabV3+ network according to claim 4, wherein the step S4 specifically comprises the following steps:
s41, loading the model trained in the step S3, and reading in the test picture and the label of the test data obtained in the step S1;
and S42, calculating index scores and storing test results.
6. A remote sensing image semantic segmentation system based on an improved DeepLabV3+ network is characterized by comprising the following modules:
a data classification module: acquiring a remote sensing road data set and preprocessing the data set, wherein the data in the data set is divided into training data, verification data and test data;
a model building module: constructing an improved DeepLabV3+ semantic segmentation network model based on a Pythrch environment;
a training module: training the improved DeepLabV3+ semantic segmentation network model by using training data and verification data obtained by the data classification module;
a segmentation result obtaining module: and inputting the test data obtained by the data classification module into an improved DeepLabV3+ semantic segmentation network model of the training module to obtain a semantic segmentation result of the remote sensing road image.
CN202210677113.XA 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network Pending CN115035418A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210677113.XA CN115035418A (en) 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210677113.XA CN115035418A (en) 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network

Publications (1)

Publication Number Publication Date
CN115035418A true CN115035418A (en) 2022-09-09

Family

ID=83124046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210677113.XA Pending CN115035418A (en) 2022-06-15 2022-06-15 Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network

Country Status (1)

Country Link
CN (1) CN115035418A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115408498A (en) * 2022-11-02 2022-11-29 中孚安全技术有限公司 Data dynamic identification method based on natural language
CN115546647A (en) * 2022-10-21 2022-12-30 河北省科学院地理科学研究所 Semantic segmentation model based on remote sensing image
CN116167991A (en) * 2023-02-15 2023-05-26 中科微至科技股份有限公司 DeepLabv3+ based belt edge line detection method
CN116703834A (en) * 2023-05-22 2023-09-05 浙江大学 Method and device for judging and grading excessive sintering ignition intensity based on machine vision
CN117036982A (en) * 2023-10-07 2023-11-10 山东省国土空间数据和遥感技术研究院(山东省海域动态监视监测中心) Method and device for processing optical satellite image of mariculture area, equipment and medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546647A (en) * 2022-10-21 2022-12-30 河北省科学院地理科学研究所 Semantic segmentation model based on remote sensing image
CN115408498A (en) * 2022-11-02 2022-11-29 中孚安全技术有限公司 Data dynamic identification method based on natural language
CN116167991A (en) * 2023-02-15 2023-05-26 中科微至科技股份有限公司 DeepLabv3+ based belt edge line detection method
CN116167991B (en) * 2023-02-15 2023-09-08 中科微至科技股份有限公司 DeepLabv3+ based belt edge line detection method
CN116703834A (en) * 2023-05-22 2023-09-05 浙江大学 Method and device for judging and grading excessive sintering ignition intensity based on machine vision
CN116703834B (en) * 2023-05-22 2024-01-23 浙江大学 Method and device for judging and grading excessive sintering ignition intensity based on machine vision
CN117036982A (en) * 2023-10-07 2023-11-10 山东省国土空间数据和遥感技术研究院(山东省海域动态监视监测中心) Method and device for processing optical satellite image of mariculture area, equipment and medium
CN117036982B (en) * 2023-10-07 2024-01-09 山东省国土空间数据和遥感技术研究院(山东省海域动态监视监测中心) Method and device for processing optical satellite image of mariculture area, equipment and medium

Similar Documents

Publication Publication Date Title
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN115035418A (en) Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
CN111582397B (en) CNN-RNN image emotion analysis method based on attention mechanism
CN110222718B (en) Image processing method and device
CN112308825B (en) SqueezeNet-based crop leaf disease identification method
CN113269224B (en) Scene image classification method, system and storage medium
Yue et al. Face recognition based on histogram equalization and convolution neural network
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN113642400A (en) Graph convolution action recognition method, device and equipment based on 2S-AGCN
CN111694977A (en) Vehicle image retrieval method based on data enhancement
CN112101364A (en) Semantic segmentation method based on parameter importance incremental learning
CN114283350A (en) Visual model training and video processing method, device, equipment and storage medium
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN112150497A (en) Local activation method and system based on binary neural network
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
Nalini et al. Comparative analysis of deep network models through transfer learning
CN114648667A (en) Bird image fine-granularity identification method based on lightweight bilinear CNN model
CN114882278A (en) Tire pattern classification method and device based on attention mechanism and transfer learning
CN112527959B (en) News classification method based on pooling convolution embedding and attention distribution neural network
CN113963333A (en) Traffic sign board detection method based on improved YOLOF model
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
CN116740362A (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination