CN117351372A - Remote sensing image road segmentation method based on improved deep V & lt3+ & gt - Google Patents
Remote sensing image road segmentation method based on improved deep V & lt3+ & gt Download PDFInfo
- Publication number
- CN117351372A CN117351372A CN202311418100.1A CN202311418100A CN117351372A CN 117351372 A CN117351372 A CN 117351372A CN 202311418100 A CN202311418100 A CN 202311418100A CN 117351372 A CN117351372 A CN 117351372A
- Authority
- CN
- China
- Prior art keywords
- image
- remote sensing
- feature
- module
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000012360 testing method Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000005070 sampling Methods 0.000 claims description 34
- 238000011176 pooling Methods 0.000 claims description 33
- 230000004913 activation Effects 0.000 claims description 20
- 230000004927 fusion Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 24
- 230000006870 function Effects 0.000 description 24
- 230000007246 mechanism Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Remote Sensing (AREA)
- Biodiversity & Conservation Biology (AREA)
- Astronomy & Astrophysics (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a remote sensing image road segmentation method based on improved deep V & lt3+ & gt, which comprises the following steps: s1, acquiring a remote sensing road image dataset, preprocessing the dataset, and dividing the dataset into a training set and a testing set according to a set proportion; s2, obtaining an accurate remote sensing road segmentation map by constructing a remote sensing image road segmentation network based on improved deep V & lt3+ > s3, inputting the training set obtained in the step S1 into a remote sensing road image semantic segmentation network for training, calculating a loss function, carrying out back propagation, updating network parameters, and obtaining an optimal parameter model; s4, inputting the test set obtained in the step S1 into the best parameter model trained in the step S3, and outputting an accurate segmentation map of the remote sensing road image. The invention can improve the segmentation capability of small target objects, solve the problem that boundary blurring and shadow shielding are difficult to distinguish in road segmentation, and ensure the accuracy and efficiency of road segmentation.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a remote sensing image road segmentation method based on improved deep V & lt3+ & gt.
Background
The semantic segmentation of the remote sensing image is to divide the image into a plurality of homogeneous areas and assign corresponding labels to the homogeneous areas. The field has wide application, including environmental pollution monitoring, natural disaster prediction, post-disaster emergency response, humane and hydrologic monitoring, etc. Meanwhile, accurate road extraction has important influence on detection of other objects, such as urban scene segmentation, land coverage classification, rice lodging, photovoltaic footprint and the like.
With the continuous development of deep learning technology, convolutional neural networks achieve remarkable results in terms of image segmentation and classification, and are widely applied to the field of semantic segmentation. Du et al combined deep labv3+ and object-based image analysis (OBIA) to segment very high resolution remote sensing images, zeng et al added a feature cross-attention module in deep labv3+ that extracted low-level spatial information and high-level contextual features through two branches to refine the segmentation results. But the remote sensing road image has the characteristic of high resolution, and the problems of indistinguishable boundary blurring and shadow shielding exist in road segmentation, so that the segmentation effect on the road image detail information still needs to be improved.
The encoder-decoder structure causes information loss when the remote sensing road image segments the small target, and the network may not be capable of retaining the detail information with higher segmentation precision, thereby affecting the accurate identification and segmentation of the small target. Therefore, how to better capture the rich features of the input data, pay more attention to the local information and the context correlation, and keeping the position and boundary information of the target becomes a key problem of remote sensing road image segmentation.
Disclosure of Invention
The invention aims to: the invention aims to provide a remote sensing image road segmentation method based on improved deep V & lt3+ & gt, which is more focused on local information and context correlation, reserves the position and boundary information of a target, solves the problem that boundary blurring and shadow shielding are difficult to distinguish in road segmentation, and can ensure the accuracy and efficiency of road segmentation.
The technical scheme is as follows: the remote sensing image road segmentation method comprises the following steps:
s1, acquiring a remote sensing road image dataset, wherein the remote sensing image dataset comprises a photographed remote sensing image original image and a corresponding label image, preprocessing the dataset, and dividing the dataset into a training set and a testing set according to a set proportion;
s2, constructing a remote sensing image road segmentation network based on improved deep labV & lt3+ & gt, wherein the remote sensing image road segmentation network comprises a coding module, a multi-stage up-sampling module and a decoding module, and the coding module comprises a backbone network, an ECA module and a DS-ASPP module;
the backbone network realizes layered feature extraction of the image from low-level detail features to high-level semantic features through convolution and pooling operations; respectively inputting the advanced feature images of the original input images output by the backbone network into a DS-ASPP module and an ECA module to obtain a multi-scale feature image and a channel attention feature image, and then fusing to obtain an output feature image of the coding module; inputting the feature map output by the encoding module into the decoding module to obtain an accurate remote sensing road segmentation map;
s3, inputting the training set obtained in the step S1 into a remote sensing road image semantic segmentation network for training, calculating a loss function, carrying out back propagation, updating network parameters, and obtaining an optimal parameter model;
s4, inputting the test set obtained in the step S1 into the best parameter model trained in the step S3, and outputting an accurate segmentation map of the remote sensing road image.
Further, in step S1, preprocessing the data set specifically includes: cutting the original image of the photographed remote sensing image in the Massachusetts Roads dataset and the corresponding label image into pictures with 512 multiplied by 512 pixels at the same time in sequence; processing by adopting an image enhancement technology to obtain 12452 remote sensing road original pictures and corresponding label images;
a 512 x 512 size data set was randomly written according to 9: the scale division of 1 is divided into training and test sets.
In step S2, the step of building a remote sensing image road segmentation network based on the improved deep v3+ is as follows:
s21, inputting a JPG original image of the remote sensing road image into a main network to respectively obtain a high-level characteristic image output by the last layer of the main network and a low-level characteristic image of the middle two layers;
s22, inputting the advanced feature images of the original input images output by the backbone network into a DS-ASPP module and an ECA module respectively to obtain a multi-scale feature image and a channel attention feature image, and then fusing to obtain an output feature image of the coding module;
s23, performing multi-stage up-sampling operation on the output feature map of the coding module and the low-stage feature map in the backbone network to obtain the output feature map of the multi-stage up-sampling module;
s24, inputting the feature map output by the encoding module into the decoding module to obtain an accurate remote sensing road segmentation map.
Further, the DS-ASPP module consists of five blocks, wherein the first block is a convolution layer, the second, third and fourth blocks are depth-separable expansion convolution layers, the fifth block consists of a convolution layer and a pooling layer, and each block outputs a characteristic diagram; the convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; performing convolution operation with convolution kernel size of 3×3 and expansion coefficients of 3, 6 and 9 on the depth separable expansion convolution layer; the pooling layer carries out self-adaptive mean pooling operation;
the ECA module consists of global average pooling, a convolution layer and a Sigmoid activation function; performing global average pooling to perform average pooling operation with a convolution kernel k; the convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; the Sigmoid activation function performs normalization operation;
compressing five feature images to 1X 1 through a DS-ASPP module by an advanced feature image of an original input image output by a main network to obtain a global feature image, carrying out global average pooling on the advanced feature image of the original input image output by the main network, carrying out dot multiplication on the obtained channel information and the original input feature information through a channel attention information obtained through a 1X 1 convolution operation with a Sigmoid activation function by a quick one-dimensional convolution operation with k; and finally, performing feature fusion operation on the output global feature map and the output channel attention feature map through a 1X 1 convolution layer to obtain an output feature map of the coding module.
Further, the multi-stage up-sampling module takes the output feature map of the coding module as input, firstly carries out up-sampling operation once by 2 times, then carries out fusion with the low-stage feature map after carrying out 1×1 convolution operation on the low-stage feature map with 1/8 resolution of the original input image output by the main network to obtain a first low-stage feature, then carries out up-sampling operation once by 2 times on the first low-stage feature, and finally carries out fusion with the low-stage feature map after carrying out 1×1 convolution operation on the low-stage feature map with 1/4 resolution of the original input image output by the main network to obtain the output feature map of the multi-stage up-sampling module, wherein the output feature map is a second low-stage feature.
Further, the decoding module performs multi-stage up-sampling operation on the feature map output by the encoding module to obtain a second low-stage feature, performs 1×1 convolution operation and two 3×3 depth separable convolution operations on the second low-stage feature to restore high-stage spatial information, adjusts the channel number by using the 1×1 convolution operation, and performs bilinear interpolation quadruple up-sampling operation to obtain an accurate segmentation map of the output remote sensing image.
In step S3, parameter random initialization is carried out on the semantic segmentation network of the remote sensing image, the training set and the verification set data are input into the road segmentation network of the remote sensing image, a semantic segmentation probability map of the remote sensing image is generated, and cross entropy loss is calculated; the calculation formula of the cross entropy loss function is as follows:
wherein y is i Tag map for sample i true, y ′ i And (3) predicting a label graph for the sample i, wherein N is the number of pixels of the image.
Compared with the prior art, the invention has the following remarkable effects:
1. according to the invention, the MobileNet V3 and ECA attention mechanisms are introduced into a backbone network, compared with other networks, the MobileNet V3 has fewer parameters, meanwhile, the ECA attention mechanisms are used for extracting image details and reducing the influence of the background on target characteristics, the ECA attention mechanisms are added into the MobileNet V3 backbone network, so that time-consuming layers are optimized, the quantity of parameters and the calculated quantity are reduced, the light weight requirement is realized, meanwhile, the associated information between channels can be better captured, the model can pay more attention to continuous road characteristic information, and the background information interference is suppressed, so that the model is more comprehensive in the aspect of high-level characteristic extraction, has better robustness, and the problem that the road and the background information are difficult to distinguish by object shielding or similar scenes is effectively solved;
2. in the decoding process, multi-level up-sampling is adopted, the up-sampling operation is carried out on the high-level semantic features output by the ASPP module by 2 times, in the feature fusion process, the connection summation operation can keep the number of channels unchanged, meanwhile, the parameter number is reduced, moreover, as the low-level features have more position information and boundary information, the tight connection between the encoder and the decoder can be enhanced by compensating the high-level semantic features of the decoder, the local information and the context correlation are more concerned, the position information and the boundary information of more targets are reserved, the segmentation capability of the model on small target objects in the image can be effectively improved, the problem that boundary blurring and shadow shielding are difficult to distinguish in the road segmentation is solved, and the precision and the efficiency of the road segmentation are ensured;
3. the depth separable expansion convolution DS-ASPP module is adopted, different expansion coefficients are adjusted by the depth separable expansion convolution, richer features are extracted on different scales, and the receptive field can be further increased, so that the network can better understand the context information of the input image, the detection and recognition capability of the network on multi-scale targets is enhanced, the problem of information loss of small targets is effectively solved, the parameter quantity of the model is reduced, and the segmentation speed is improved.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a diagram of a model structure of the present invention;
FIG. 3 is a block diagram of a backbone network of the present invention;
FIG. 4 is a block diagram of a coding fusion module of the present invention;
FIG. 5 is a block diagram of a multi-level upsampling module of the present invention;
FIG. 6 is a block diagram of a decoding module embodying the present invention;
FIG. 7 is a graph showing the comparison of the cleavage effect of the present invention with PSPNet, segNet, U-Net, deep LabV3+.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
The remote sensing image road segmentation method based on improved deep v3+ provided in this embodiment, as shown in fig. 1 and 2, includes the following steps:
step one, acquiring Massachusetts Roads remote sensing road image data sets, wherein the data sets comprise photographed remote sensing image original pictures and corresponding label pictures, preprocessing the data sets and dividing the data sets into training sets and testing sets according to set proportions.
The acquisition of the data set is specifically: acquiring Massachusetts Roads remote sensing road data sets, wherein each remote sensing road data set comprises a photographed remote sensing image original image and a corresponding label image, and the size of each image is 1500 multiplied by 1500 pixels;
the preprocessing of the data set is specifically as follows: the original image of the photographed remote sensing image in the Massachusetts Roads dataset and the corresponding label image are cut into pictures with the size of 512 multiplied by 512 pixels at the same time in sequence, so that the overlapping of areas is avoided; and (3) adopting various image enhancement technologies, such as overturning, translation, scaling and shearing, and finally obtaining 12452 remote sensing road original pictures and corresponding label images.
The dividing data sets are specifically as follows: a 512 x 512 size data set was randomly written according to 9: the ratio of 1 is divided into a training set and a testing set.
Step two, a remote sensing image road segmentation network based on improved deep labV & lt3+ & gt is built, and the remote sensing image road segmentation network comprises a main network, an ECA module, a DS-ASPP module, a multi-stage up-sampling module and a decoding module, wherein the main network, the ECA module and the DS-ASPP module form an encoding module of the remote sensing road image semantic segmentation network.
The method for constructing the remote sensing image road segmentation network based on the improved deep V & lt3+ & gt comprises the following steps of:
and step 21, inputting the JPG original image of the remote sensing road image into a main network to respectively obtain a high-level characteristic image output by the last layer of the main network and a low-level characteristic image of the middle two layers.
The backbone network MobileNetV3 mainly comprises three layers, and realizes the layered feature extraction of the image from the low-level detail feature to the high-level semantic feature through convolution and pooling operation, specifically comprises the following steps: firstly, carrying out standard convolution operation on a first layer of a MobileNet V3 on a JPG original image, wherein the first layer is provided with 16 convolution filters, the second layer is stacked with 15 bnck layers, an ECA module and an H-swish (HS) activation function are introduced to improve model precision, a characteristic image with 1/2 resolution of an original input image is output, and finally, a layer carries out 1X 1 convolution operation on the characteristic image with 1/2 resolution, outputs the characteristic image with 1/4 resolution of the original input image, carries out soft pooling operation and 1X 1 convolution operation, outputs the characteristic image with 1/8 resolution of the original input image, carries out 1X 1 convolution operation, and outputs the characteristic image with 1/16 resolution of the original input image.
And 22, respectively inputting the high-level characteristic images with 1/16 resolution of the original input image output by the backbone network in the step 21 into a DS-ASPP module and an ECA module to obtain a multi-scale characteristic image and a channel attention characteristic image, and then fusing to obtain an output characteristic image of the coding module.
The DS-ASPP module consists of five blocks, is realized by using standard convolution, depth separable expansion convolution and pooling operation, and the ECA module extracts features by using global average pooling and convolution operation, specifically: firstly, carrying out 1X 1 convolution operation on a high-level feature map with 1/16 resolution in a DS-ASPP module, outputting a first block feature map, carrying out 3X 3 depth separable expansion convolution operation on the high-level feature map with 1/16 resolution in which the expansion coefficient is 3, outputting a second block feature map, carrying out 3X 3 depth separable expansion convolution operation on the high-level feature map with 1/16 resolution in which the expansion coefficient is 6, outputting a third block feature map, carrying out 3X 3 depth separable expansion convolution operation on the high-level feature map with 1/16 resolution in which the expansion coefficient is 9, outputting a fourth block feature map, carrying out pooling operation on the high-level feature map with 1/16 resolution in which the five block feature maps are subjected to feature fusion, inputting the high-level feature map with 1X 1 convolution operation into an ECA attention mechanism of an original input image, extracting features by global averaging, carrying out one-dimensional convolution with a one-dimensional convolution kernel in which the expansion coefficient is 9, outputting a fifth block feature map, carrying out feature fusion on the high-level feature map with 1X 1, carrying out channel ID of the input channel, and carrying out channel-specific feature map, and finally obtaining channel-specific information of the input channel-specific feature map, and carrying out channel-specific information, and carrying out channel-specific channel information and channel information-specific channel information obtaining.
And step 23, performing multi-stage upsampling operation on the output characteristic diagram of the coding module obtained in the step 22 and the low-stage characteristic diagram in the backbone network to obtain the output characteristic diagram of the multi-stage upsampling module.
The output characteristic diagram of the multi-stage up-sampling module carries out up-sampling operation twice by 2 times on the output characteristic diagram of the coding module, and enhances the connection between the coder and the decoder, and the specific operation is as follows: the method comprises the steps of firstly, carrying out 1X 1 convolution operation on a low-level characteristic image of 1/8 resolution of an original input image output by a main network, carrying out 2 times up sampling operation on the low-level characteristic image and an encoding module output characteristic image, fusing the low-level characteristic image with the high-level characteristic image to obtain a first low-level characteristic, carrying out 1X 1 convolution operation on a 1/4 resolution characteristic image of the original input image output by the main network, obtaining a low-level characteristic image, and finally carrying out 2 times up sampling operation on the low-level characteristic image and the first low-level characteristic image to obtain a characteristic image, fusing the characteristic image and the first low-level characteristic image, and obtaining an output characteristic image of a multi-level up sampling module.
And step 24, inputting the feature map output by the encoding module into the decoding module to obtain an accurate remote sensing road segmentation map.
The decoding module realizes the accurate semantic segmentation of the remote sensing road image by the network through the multistage upsampling module, the depth separable rolling and upsampling operation in the step 23, specifically: the method comprises the steps of performing multistage up-sampling operation on a feature map output by an encoding module to obtain a second low-level feature, performing 1×1 convolution operation and two 3×3 depth separable convolution operations on the second low-level feature to restore high-level space information, adjusting the number of channels by using the 1×1 convolution operation, and performing bilinear interpolation four-time up-sampling operation to obtain an accurate segmentation map of an output remote sensing image.
The structure of the backbone network provided in this embodiment is shown in fig. 3, the backbone network is modified by using MobileNetV3, and is composed of three layers, which has fewer parameters, and meanwhile, avoids losing some useful information as much as possible, and newly adds a lightweight ECA attention module, so as to perform multi-scale and multi-level feature extraction on the remote sensing road image.
The first layer of the backbone network is a convolution layer, the second layer is stacked with 15 bnck layers, and the third layer consists of a standard convolution layer, a soft pooling layer and two standard convolution layers without normalized BN; the convolution layer carries out convolution operation with the convolution kernel size of 3 multiplied by 3 and the step length of 2, and 16 convolution filters are provided; a bneck layer comprising a convolution operation with a convolution kernel size of 3 x 3 and a convolution kernel size of 5 x 5, a step size of 1 or 2, an ECA attention mechanism, and an H-swish (HS) activation function; the standard convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; the soft pooling layer performs soft pooling operation with step length of 1.
In the first layer of the backbone network of the embodiment, the remote sensing road image JPG original image is used as the input of the first layer, the input image is preprocessed, the original image is subjected to feature extraction and dimension reduction through convolution operation with the step length of 2 and 3×3 of H-swish activation function, and the output feature image of the first layer is obtained;
the second layer, 15 bneck layers are stacked, the characteristic diagram output by the first layer is used as the input of the second layer, firstly, the characteristic diagram output by the first layer is subjected to three convolution operations with ReLU activation functions of 3×3, the step length is 1, 2 and 1, simultaneously, the characteristic extraction is carried out through three convolution operations with ReLU activation functions of 5×5, the step length is 2, 1 and 1, an ECA attention mechanism is used, then, the characteristic diagram with the original input image of 1/2 is obtained through one convolution operation with H-swish activation function of 3×3, the step length is 2, and five convolution operations with H-swish activation functions of 3×3, the step length is 1, the ECA attention mechanism is added in the two bneck layers, finally, the resolution is 2, 1 and 1 is added through three convolution operations with H-swish activation functions of 5×5, and the ECA attention mechanism is added;
the third layer, the characteristic diagram of 1/2 resolution of the original input image is used as the input of the third layer, firstly, the characteristic diagram of 1/2 resolution of the original input image is output through a convolution operation and a BN normalization operation with H-swish activation function of 1×1 and with a step length of 1 respectively, the characteristic diagram of 1/4 resolution of the original input image is output, then the characteristic diagram of 1/4 resolution of the original input image is subjected to a soft pooling operation with a step length of 1 and a convolution operation with H-swish activation function of 1×1 and with a step length of 1 respectively, the characteristic diagram of 1/8 resolution of the original input image is output, finally, the characteristic diagram of 1/8 resolution of the original input image is subjected to a convolution operation with H-swish activation function of 1 and with a step length of 1 respectively, and the characteristic diagram of 1/16 resolution of the original input image is output, namely the final high-level characteristic diagram output by the backbone network.
The structure of the encoding fusion module provided in this embodiment is shown in fig. 4, where the encoding fusion module mainly includes a DS-ASPP module and an ECA module, and the DS-ASPP module includes five blocks;
the first block of the DS-ASPP module is a convolution layer, the second, third and fourth blocks are depth-separable expansion convolution layers, and the fifth block consists of a convolution layer and a pooling layer; the convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; performing convolution operation with convolution kernel size of 3×3 and expansion coefficients of 3, 6 and 9 on the depth separable expansion convolution layer; the pooling layer carries out self-adaptive mean pooling operation; the first block, the high-level characteristic diagram of 1/16 resolution is subjected to 1X 1 convolution operation in the DS-ASPP module, and the characteristic diagram of the first block is output; a second block, performing 3 x 3 depth separable expansion convolution operation with expansion coefficient of 3 on the high-level feature map with 1/16 resolution, and outputting a second block feature map; a third block, performing 3 x 3 depth separable expansion convolution operation with expansion coefficient of 6 on the high-level feature map with 1/16 resolution, and outputting a third block feature map; a fourth block, performing 3 x 3 depth separable expansion convolution operation with expansion coefficient of 9 on the high-level feature map with 1/16 resolution, and outputting a fourth block feature map; and a fifth block, performing convolution operation and pooling operation of 1×1 on the high-level feature map with 1/16 resolution, and outputting the feature map of the fifth block. The ECA module consists of Global Average Pooling (GAP), a convolution layer and a Sigmoid activation function; performing global average pooling to perform average pooling operation with a convolution kernel k; the convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; the Sigmoid activation function performs normalization operation.
The coding fusion module provided by the embodiment firstly compresses five feature images to 1×1 respectively through a DS-ASPP module by using a high-level feature image with 1/16 resolution of an original input image output by a backbone network, acquires a global feature image, carries out global average pooling on the high-level feature image with 1/16 resolution of the original input image output by the backbone network, carries out fast one-dimensional convolution operation with k, uses channel attention information obtained by 1×1 convolution operation with a Sigmoid activation function, and carries out dot multiplication on the acquired channel information and the original input feature information to obtain a specific channel attention feature image; and finally, performing feature fusion operation on the output global feature map through a 1X 1 convolution layer and the output channel attention feature map through a 1X 1 convolution layer to obtain an output feature map of the coding module.
The structure of the multi-level up-sampling module provided in this embodiment is shown in fig. 5, where the multi-level up-sampling module takes the output feature map of the coding module output in fig. 4 as input of the multi-level up-sampling module, and performs 2 times up-sampling operation at a time, then performs 1×1 convolution operation on the output feature map of the coding module and the feature map of the original input image 1/8 resolution output by the backbone network, so as to obtain a first low-level feature, then performs 2 times up-sampling operation on the first low-level feature, and finally performs 1×1 convolution operation on the first low-level feature and the feature map of the low-level feature map of the original input image 1/4 resolution output by the backbone network, so as to obtain the output feature map of the multi-level up-sampling module.
The structure of the decoding module provided in this embodiment is shown in fig. 6, where the decoding module mainly includes a multi-stage upsampling module, a convolution layer, a depth separable convolution layer, and upsampling in fig. 5;
the convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; the depth separable convolution layer carries out the operations of depth convolution and point-by-point convolution with the convolution kernel size of 3 multiplied by 3; the upsampling performs a bilinear interpolation four-fold upsampling operation.
The decoding module provided in this embodiment performs multi-stage upsampling operation on the feature map output by the encoding module to obtain a second low-stage feature, then performs 1×1 convolution operation and two 3×3 depth separable convolution operations on the second low-stage feature to restore high-stage spatial information, finally adjusts the channel number by using the 1×1 convolution operation, and then performs bilinear interpolation quadruple upsampling operation to obtain an accurate segmentation map of the output remote sensing image.
Inputting the remote sensing road image of the preprocessed training set in the first step into the remote sensing road image semantic segmentation network in the second step for training, calculating a loss function, carrying out back propagation, updating network parameters, and obtaining an optimal parameter model. The method specifically comprises the following steps:
step 31, carrying out parameter random initialization on a semantic segmentation network of the remote sensing image, inputting the preprocessed training set and verification set data in the step one into the multi-mode semantic segmentation network of the remote sensing image based on the coding and decoding structure in the step two, generating a semantic segmentation probability map of the remote sensing image, and calculating cross entropy loss;
the loss function used in the semantic segmentation network training of the remote sensing image is a cross entropy loss function, and the calculation formula of the cross entropy loss function is as follows:
wherein y is i Tag map for sample i true, y ′ i And (3) predicting a label graph for the sample i, wherein N is the number of pixels of the image.
And step 32, back propagation of loss, updating network parameters, taking minimization of a loss function as an optimization target, acquiring an optimal parameter model and storing.
And step four, inputting the preprocessed test set in the step one into the best parameter model trained in the step three, and outputting an accurate segmentation map of the remote sensing road image.
In order to demonstrate the effectiveness of the improved deep v3+ based remote sensing image road segmentation method provided by this embodiment, a model was trained and tested using the Massachusetts Roads remote sensing road dataset,
the remote sensing image road segmentation method based on the improved deep labV < 3+ > provided by the embodiment is compared with the results of each index of PSPNet, segNet, U-Net and deep labV < 3+ > in table 1.
TABLE 1 comparison of the results of the various indices of the invention with PSPNet, segNet, U-Net, deep LabV3+
Model | P/% | R/% | F1/% | IoU/% | Parameter number/M |
PSPNet | 82.95 | 76.59 | 81.69 | 76.14 | 5.80 |
SegNet | 89.26 | 82.46 | 85.73 | 78.26 | 46.24 |
U-Net | 91.74 | 84.27 | 87.85 | 80.87 | 60.70 |
DeepLabV3+ | 92.45 | 87.00 | 89.64 | 81.06 | 93.42 |
This embodiment | 93.71 | 87.49 | 90.49 | 83.71 | 55.57 |
As can be seen from table 1, each evaluation index of the present embodiment is higher than that of the existing split network; as shown in FIG. 7, the present invention was compared with the results of the division of PSPNet, segNet, U-Net and deep LabV3+, and it was found that the division effect was closest to that of the original image.
Claims (7)
1. The remote sensing image road segmentation method based on the improved deep V & lt3+ & gt is characterized by comprising the following steps of:
s1, acquiring a remote sensing road image dataset, wherein the remote sensing image dataset comprises a photographed remote sensing image original image and a corresponding label image, preprocessing the dataset, and dividing the dataset into a training set and a testing set according to a set proportion;
s2, constructing a remote sensing image road segmentation network based on improved deep labV & lt3+ & gt, wherein the remote sensing image road segmentation network comprises a coding module, a multi-stage up-sampling module and a decoding module, and the coding module comprises a backbone network, an ECA module and a DS-ASPP module;
the backbone network realizes layered feature extraction of the image from low-level detail features to high-level semantic features through convolution and pooling operations; respectively inputting the advanced feature images of the original input images output by the backbone network into a DS-ASPP module and an ECA module to obtain a multi-scale feature image and a channel attention feature image, and then fusing to obtain an output feature image of the coding module; inputting the feature map output by the encoding module into the decoding module to obtain an accurate remote sensing road segmentation map;
s3, inputting the training set obtained in the step S1 into a remote sensing road image semantic segmentation network for training, calculating a loss function, carrying out back propagation, updating network parameters, and obtaining an optimal parameter model;
s4, inputting the test set obtained in the step S1 into the best parameter model trained in the step S3, and outputting an accurate segmentation map of the remote sensing road image.
2. The remote sensing image road segmentation method based on improved deep v3+ according to claim 1, wherein in step S1, the preprocessing of the data set is specifically: cutting the original image of the photographed remote sensing image in the Massachusetts Roads dataset and the corresponding label image into pictures with 512 multiplied by 512 pixels at the same time in sequence; processing by adopting an image enhancement technology to obtain 12452 remote sensing road original pictures and corresponding label images;
a 512 x 512 size data set was randomly written according to 9: the scale division of 1 is divided into training and test sets.
3. The remote sensing image road segmentation method based on improved deep v3+ according to claim 1, wherein in step S2, the step of building a remote sensing image road segmentation network based on improved deep v3+ is as follows:
s21, inputting a JPG original image of the remote sensing road image into a main network to respectively obtain a high-level characteristic image output by the last layer of the main network and a low-level characteristic image of the middle two layers;
s22, inputting the advanced feature images of the original input images output by the backbone network into a DS-ASPP module and an ECA module respectively to obtain a multi-scale feature image and a channel attention feature image, and then fusing to obtain an output feature image of the coding module;
s23, performing multi-stage up-sampling operation on the output feature map of the coding module and the low-stage feature map in the backbone network to obtain the output feature map of the multi-stage up-sampling module;
s24, inputting the feature map output by the encoding module into the decoding module to obtain an accurate remote sensing road segmentation map.
4. The improved deep v3+ based remote sensing image road segmentation method according to claim 1, wherein the DS-ASPP module consists of five blocks, the first block is a convolution layer, the second, third and fourth blocks are depth-separable expansion convolution layers, the fifth block consists of a convolution layer and a pooling layer, and each block outputs a feature map; the convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; performing convolution operation with convolution kernel size of 3×3 and expansion coefficients of 3, 6 and 9 on the depth separable expansion convolution layer; the pooling layer carries out self-adaptive mean pooling operation;
the ECA module consists of global average pooling, a convolution layer and a Sigmoid activation function; performing global average pooling to perform average pooling operation with a convolution kernel k; the convolution layer carries out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1; the Sigmoid activation function performs normalization operation;
compressing five feature images to 1X 1 through a DS-ASPP module by an advanced feature image of an original input image output by a main network to obtain a global feature image, carrying out global average pooling on the advanced feature image of the original input image output by the main network, carrying out dot multiplication on the obtained channel information and the original input feature information through a channel attention information obtained through a 1X 1 convolution operation with a Sigmoid activation function by a quick one-dimensional convolution operation with k; and finally, performing feature fusion operation on the output global feature map and the output channel attention feature map through a 1X 1 convolution layer to obtain an output feature map of the coding module.
5. The remote sensing image road segmentation method based on improved deep v3+ according to claim 1, wherein the multi-stage up-sampling module takes an output feature image of the encoding module as input, firstly performs up-sampling operation 2 times at a time, then performs fusion with a low-stage feature image after 1×1 convolution operation on a feature image with 1/8 resolution of an original input image output by a main network to obtain a first low-stage feature, then performs up-sampling operation 2 times at a time on the first low-stage feature, and finally performs fusion with a low-stage feature image after 1×1 convolution operation on a feature image with 1/4 resolution of the original input image output by the main network to obtain an output feature image of the multi-stage up-sampling module, which is a second low-stage feature.
6. The remote sensing image road segmentation method based on improved deep V & lt3+ & gt according to claim 1, wherein the decoding module performs multi-stage up-sampling operation on the feature image output by the encoding module to obtain a second low-stage feature, performs 1×1 convolution operation and two 3×3 depth separable convolution operations on the second low-stage feature, restores high-stage spatial information, adjusts the channel number by using the 1×1 convolution operation, and performs bilinear interpolation four-time up-sampling operation to obtain an accurate segmentation image of the output remote sensing image.
7. The remote sensing image road segmentation method based on improved deep V & lt3+ & gt according to claim 1, wherein in step S3, parameter random initialization is carried out on a remote sensing image semantic segmentation network, training set and verification set data are input into the remote sensing image road segmentation network, a semantic segmentation probability map of the remote sensing image is generated, and cross entropy loss is calculated; the calculation formula of the cross entropy loss function is as follows:
wherein y is i Tag map for sample i true, y ′ i And (3) predicting a label graph for the sample i, wherein N is the number of pixels of the image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311418100.1A CN117351372A (en) | 2023-10-30 | 2023-10-30 | Remote sensing image road segmentation method based on improved deep V & lt3+ & gt |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311418100.1A CN117351372A (en) | 2023-10-30 | 2023-10-30 | Remote sensing image road segmentation method based on improved deep V & lt3+ & gt |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117351372A true CN117351372A (en) | 2024-01-05 |
Family
ID=89369097
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311418100.1A Pending CN117351372A (en) | 2023-10-30 | 2023-10-30 | Remote sensing image road segmentation method based on improved deep V & lt3+ & gt |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117351372A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117994279A (en) * | 2024-04-07 | 2024-05-07 | 齐鲁工业大学(山东省科学院) | Method for extracting closed contour of comprehensive feature fusion |
CN118196435A (en) * | 2024-03-21 | 2024-06-14 | 安徽大学 | On-site bare footprint extraction and analysis system based on semantic segmentation |
CN118429808A (en) * | 2024-05-10 | 2024-08-02 | 北京信息科技大学 | Remote sensing image road extraction method and system based on lightweight network structure |
-
2023
- 2023-10-30 CN CN202311418100.1A patent/CN117351372A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118196435A (en) * | 2024-03-21 | 2024-06-14 | 安徽大学 | On-site bare footprint extraction and analysis system based on semantic segmentation |
CN117994279A (en) * | 2024-04-07 | 2024-05-07 | 齐鲁工业大学(山东省科学院) | Method for extracting closed contour of comprehensive feature fusion |
CN117994279B (en) * | 2024-04-07 | 2024-06-07 | 齐鲁工业大学(山东省科学院) | Method for extracting closed contour of comprehensive feature fusion |
CN118429808A (en) * | 2024-05-10 | 2024-08-02 | 北京信息科技大学 | Remote sensing image road extraction method and system based on lightweight network structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112668494A (en) | Small sample change detection method based on multi-scale feature extraction | |
CN109840556B (en) | Image classification and identification method based on twin network | |
CN113780149B (en) | Remote sensing image building target efficient extraction method based on attention mechanism | |
CN117351372A (en) | Remote sensing image road segmentation method based on improved deep V & lt3+ & gt | |
CN112396607A (en) | Streetscape image semantic segmentation method for deformable convolution fusion enhancement | |
CN112767423B (en) | Remote sensing image building segmentation method based on improved SegNet | |
CN113066089B (en) | Real-time image semantic segmentation method based on attention guide mechanism | |
CN116580241B (en) | Image processing method and system based on double-branch multi-scale semantic segmentation network | |
CN113344110B (en) | Fuzzy image classification method based on super-resolution reconstruction | |
CN115082675B (en) | Transparent object image segmentation method and system | |
CN116469100A (en) | Dual-band image semantic segmentation method based on Transformer | |
CN110084181B (en) | Remote sensing image ship target detection method based on sparse MobileNet V2 network | |
CN114022408A (en) | Remote sensing image cloud detection method based on multi-scale convolution neural network | |
CN113408540B (en) | Synthetic aperture radar image overlap area extraction method and storage medium | |
CN116486074A (en) | Medical image segmentation method based on local and global context information coding | |
CN116612283A (en) | Image semantic segmentation method based on large convolution kernel backbone network | |
CN116778318A (en) | Convolutional neural network remote sensing image road extraction model and method | |
CN115249382B (en) | Transformer and CNN-based silence living body detection method | |
CN113887472A (en) | Remote sensing image cloud detection method based on cascade color and texture feature attention | |
CN118314353B (en) | Remote sensing image segmentation method based on double-branch multi-scale feature fusion | |
CN114529462A (en) | Millimeter wave image target detection method and system based on improved YOLO V3-Tiny | |
CN115984714A (en) | Cloud detection method based on double-branch network model | |
Zhao et al. | Squnet: An high-performance network for crater detection with dem data | |
CN112241765A (en) | Image classification model and method based on multi-scale convolution and attention mechanism | |
CN111967292B (en) | Lightweight SAR image ship detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |