CN117152435A - Remote sensing semantic segmentation method based on U-Net3+ - Google Patents
Remote sensing semantic segmentation method based on U-Net3+ Download PDFInfo
- Publication number
- CN117152435A CN117152435A CN202311135160.2A CN202311135160A CN117152435A CN 117152435 A CN117152435 A CN 117152435A CN 202311135160 A CN202311135160 A CN 202311135160A CN 117152435 A CN117152435 A CN 117152435A
- Authority
- CN
- China
- Prior art keywords
- net3
- segmentation
- remote sensing
- network model
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000000694 effects Effects 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000007246 mechanism Effects 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000012805 post-processing Methods 0.000 claims abstract description 5
- 238000012360 testing method Methods 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 10
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 24
- 238000005516 engineering process Methods 0.000 description 10
- 238000003709 image segmentation Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 3
- 101100400452 Caenorhabditis elegans map-2 gene Proteins 0.000 description 2
- 101150064138 MAP1 gene Proteins 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000005670 electromagnetic radiation Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 229910052500 inorganic mineral Inorganic materials 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000011707 mineral Substances 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
A remote sensing semantic segmentation method based on U-Net3+ relates to the field of remote sensing image processing, and comprises the following steps: data acquisition and preprocessing; constructing a U-Net3+ segmented network model, and adding a multi-scale feature extraction module and an attention mechanism into the constructed U-Net3+ segmented network model; constructing an improved mixed loss function, and applying the improved mixed loss function to the constructed U-Net3+ split network model; the preprocessed data is transmitted to the constructed U-Net3+ segmentation network model to carry out model training; carrying out semantic segmentation on the remote sensing image by using the trained U-Net3+ segmentation network model, and verifying the segmentation effect of the U-Net3+ segmentation network model; and (5) image post-processing. The invention improves the segmentation precision of remote sensing semantic segmentation, reduces the complexity of a network model, reduces the calculated amount and is beneficial to the subsequent arrangement implementation.
Description
Technical Field
The invention relates to the technical field of remote sensing image processing, in particular to a U-Net3+ based remote sensing semantic segmentation method.
Background
The remote sensing technology is a science and technology for acquiring the earth surface and atmospheric information through satellites, spacecrafts or other remote sensing equipment. These remote sensing devices can collect electromagnetic radiation data in different wavebands (e.g., visible, infrared, microwave, etc.) and convert it into digital images or other data formats for use in studying, monitoring, measuring, and managing the natural and artificial features of the earth's surface. Remote sensing technology is widely used in a plurality of fields such as industry, agriculture, forestry, military and the like. In the aspect of urban planning, the remote sensing technology helps to know information such as land utilization conditions, traffic flow and the like of cities, and provides important references for planning. In resource exploration, new mineral resources, oil-gas fields and the like can be discovered through a remote sensing technology, and effective development and utilization of the resources are promoted. In the aspect of land monitoring, the remote sensing technology can be used for monitoring and evaluating the growth condition of crops, forest coverage and the like. In the military field, remote sensing technology is often used for military information acquisition, target positioning and identification, topography analysis and mapping, combat mission planning, and the like.
In recent years, remote sensing technology is rapidly developed, but remote sensing image processing technology is relatively lagged. The intelligent efficient extraction of valuable information is one of the important problems to be solved in the remote sensing field, which is in need of ever-increasing data and various image categories. The remote sensing image accurate segmentation can realize mapping basic functions such as earth surface coverage information extraction, environment change detection and the like. The traditional remote sensing image segmentation method is affected by multiple factors such as image quality, illumination conditions, shielding conditions and the like, and has poor segmentation precision. With the rapid development of deep learning technology, in particular to the application of convolutional neural networks, the image semantic segmentation has significantly progressed in remote sensing image processing.
The U-Net network achieves remarkable results in the image segmentation task through a special coding and decoding structure. The U-Net network utilizes the encoder part to extract the characteristics, also utilizes the decoder part to reconstruct the pixel level, and combines the high-level characteristics extracted from the encoder with the low-level characteristics in the decoder through jump connection, so that the network can utilize multi-level characteristic information, thereby realizing more accurate image segmentation. The U-Net++ network adopts nested and dense long connection to grab different layers of characteristics for characteristic superposition, so that the semantic gap between an encoder and a decoder is reduced. The U-Net3+ is connected through full-scale jump, and large-scale, same-scale and small-scale features of the encoder and the decoder are fused, so that rich low-level semantic features and high-level semantic features are obtained. However, the U-Net, U-Net++, and U-Net3+ splice and fuse the low-level semantic features and the high-level semantic features, so that a large amount of redundant information can be generated, and the network cannot pay attention to the segmentation target better; and meanwhile, the multi-level feature fusion is used, so that a large amount of computing resources are occupied, and the arrangement and implementation of the algorithm are not facilitated.
The existing remote sensing semantic segmentation has the following defects:
1) The remote sensing image segmentation precision is low: the remote sensing image has complex content and large imaging range, wherein the texture is complex, the geometric forms of the ground features are changeable, the different ground features are distributed in an intricate manner, and the ground feature boundaries are easy to be confused. Secondly, the remote sensing images are rich in ground object content, variable in scale size and large in scale span, and differences exist in color trend and the like, so that difficulty is brought to segmentation.
2) Data category imbalance: the number of pixels of different types of ground objects in the remote sensing image is larger than the common difference, and the small types cannot be sufficiently trained, so that the recognition accuracy is affected.
3) The calculated amount is large, and the occupied resources are high: when the U-Net3+ segmentation network is applied to remote sensing semantic segmentation, the network calculation amount is large, and the occupied resources are more; meanwhile, characteristic graphs of different scales of the U-Net3+ segmentation network are spliced, characteristic information is fully utilized, but the simple splicing stacks useless information from each level encoder at the same time, so that information redundancy is caused, and the network cannot pay attention to the segmentation target better.
Disclosure of Invention
The invention provides a remote sensing semantic segmentation method based on U-Net3+ in order to solve the problems of low remote sensing image segmentation precision, unbalanced data category, large calculated amount and high occupied resource in the existing remote sensing semantic segmentation.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention discloses a remote sensing semantic segmentation method based on U-Net3+, which comprises the following steps:
step one, data acquisition and preprocessing;
step two, constructing a U-Net3+ segmented network model, and adding a multi-scale feature extraction module and an attention mechanism into the constructed U-Net3+ segmented network model;
step three, constructing an improved mixed loss function, and applying the improved mixed loss function to the constructed U-Net3+ split network model;
step four, the preprocessed data is transmitted to the constructed U-Net3+ segmentation network model to carry out model training;
fifthly, performing semantic segmentation on the remote sensing image by using the trained U-Net3+ segmentation network model, and verifying the segmentation effect of the U-Net3+ segmentation network model;
and step six, image post-processing.
Further, in the first step, the acquired data is derived from a remote sensing semantic segmentation data set GID-5, the remote sensing semantic segmentation data set GID-5 includes a plurality of pictures, the pictures are divided into a training set, a verification set and a test set, and the number proportion of the pictures of the training set, the verification set and the test set is as follows: 25:1:4.
In the first step, the data is preprocessed, the picture is firstly sliced, then the insufficient position in the picture is filled with the background, and meanwhile, the experimental label image is converted into the gray image.
Further, the specific operation flow of the second step is as follows:
s2.1, a U-Net3+ split network is built, and the network level is reduced from 5 layers to 4 layers;
s2.2, constructing a multi-scale feature extraction module; the multi-scale feature extraction module comprises a multi-scale convolution attention module which is divided into three parts, wherein the first part is a 5×5 depth convolution for obtaining local feature information; the second part is different mixed cavity convolutions of multiple branches and is used for extracting multi-scale characteristic information; the third part is 1 multiplied by 1 convolution, which is responsible for mixing channels, and finally multiplying the input characteristics with the convolved weight element by element to obtain the required output;
s2.3, combining the residual error module with a CBAM attention mechanism, and adding the residual error CBAM attention module in the characteristic fusion stage of each layer of the network, so that the network focuses on important information.
Further, in the third step, the mixed Loss function is formed by combining a variation Log Cosh Dice Loss of a Dice Loss function Dice Loss and a Focal Loss function Focal Loss;
the Dice Loss function Dice Loss is defined as:
wherein X represents the prediction result of the model on the target image, and Y represents the real label of the target image;
said variant Log Cosh Dice Loss is defined as:
L log-cosh-Dice =log(cosh(DiceLoss)
wherein the dash is defined as:
the Focal Loss function Focal Loss is defined as:
L Focal =-α t (1-p t ) γ log(p t )
wherein alpha is t Is a balance factor for balancing the importance of positive and negative samples; gamma rayThe value is 2;
wherein p represents the probability that the predicted sample belongs to 1 (range is 0-1), and y represents the label;
the final mixing loss function is defined as:
further, the specific operation flow of the fourth step is as follows: inputting the preprocessed picture in the first step into a constructed U-Net3+ segmentation network, updating the network parameter weight, and verifying by using the picture of a verification set to obtain a network segmentation effect, and continuously storing a better network model to obtain the optimal network model.
In the fifth step, the test set picture is input into the optimal network model to obtain a segmentation effect picture, and the segmentation effect picture is compared with the experimental label picture to obtain segmentation accuracy.
In the sixth step, the segmentation effect map obtained in the fifth step is a gray level map, the segmentation effect map obtained in the fifth step is spliced according to the position of the first slice, the background filled in the first step is removed, and the segmentation effect map is mapped into the original picture according to the positions of the gray level maps to obtain the final segmentation map.
The beneficial effects of the invention are as follows:
the remote sensing semantic segmentation method based on U-Net3+ solves the problems that a plurality of current remote sensing semantic segmentation networks are too complex, the model calculation amount is large, the remote sensing image segmentation precision is low, the data category is unbalanced and the occupied resources are high. The invention mainly aims to allocate each pixel in the remote sensing image to different semantic categories so as to realize fine classification and segmentation of the surface features. The method aims at converting the complex and changeable remote sensing image into pixel-level semantic information so as to further understand and analyze various objects, landforms and environments on the ground surface. According to the invention, through the improvement of a segmentation network, a multi-scale feature extraction module is introduced to extract image features from multi-scale directions; simultaneously introducing a residual CBAM attention module to make the network focus on the target area; in addition, the invention introduces a new loss function to balance the category in the sample, overcomes the problem of unbalanced category and improves the segmentation precision of remote sensing semantic segmentation. According to the invention, the network model is subjected to light weight treatment, the network parameter calculation amount and the network complexity are relatively low, the occupied calculation resources are less, and the subsequent arrangement and implementation are facilitated.
Drawings
FIG. 1 is a flow chart of a remote sensing semantic segmentation method based on U-Net3+.
Fig. 2 is a schematic structural diagram of the constructed U-net3+ split network.
Fig. 3 is a schematic structural diagram of a multi-scale feature extraction module.
Fig. 4 is a diagram of a full-scale hopping connection used in the network.
Fig. 5 is a schematic diagram of the structural composition of the residual CBAM attention module.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 for explanation, the remote sensing semantic segmentation method based on U-net3+ of the present invention performs accurate segmentation on each position included in the provided remote sensing image according to the position of the provided remote sensing image, and mainly includes the following steps:
step one, data acquisition and preprocessing;
step two, constructing a U-Net3+ segmented network model, and adding a multi-scale feature extraction module and an attention mechanism into the constructed U-Net3+ segmented network model;
step three, constructing an improved mixed loss function, and applying the improved mixed loss function to the constructed U-Net3+ split network model;
step four, the preprocessed data is transmitted to the constructed U-Net3+ segmentation network model to carry out model training;
fifthly, performing semantic segmentation on the remote sensing image by using the trained U-Net3+ segmentation network model, and verifying the segmentation effect of the U-Net3+ segmentation network model;
and step six, image post-processing.
The invention discloses a remote sensing semantic segmentation method based on U-Net3+, which comprises the following specific operation flow:
step one, data acquisition and preprocessing;
s1.1: the invention uses the remote sensing semantic segmentation dataset GID-5 for algorithm training, the dataset GID-5 comprises 150 pictures, the sizes of the pictures are 6800 multiplied by 7200 (pixels), 125 pictures are selected manually as a training set, 5 pictures are taken as a verification set, and 20 pictures are taken as a test set.
S1.2: because of the limitation of the computer video memory, the complete picture cannot be input into a network, the picture is firstly sliced to be changed into a picture with the size of 512 multiplied by 512 (pixels), then the insufficient position in the picture is filled with the background, and meanwhile, the experimental label picture is converted into a gray scale picture.
Step two, constructing a U-Net3+ segmented network model, and adding a multi-scale feature extraction module and an attention mechanism into the constructed U-Net3+ segmented network model;
s2.1, a U-Net3+ split network is built, the network level is reduced from 5 layers to 4 layers, and the built U-Net3+ split network structure is shown in fig. 2.
As shown in fig. 2,1 represents a primary feature map, 2 represents a secondary feature map, 3 represents a tertiary feature map, 4 represents a quaternary feature map, 5 represents an intermediate feature map, 6 represents a new tertiary feature map, 7 represents an intermediate feature map, 8 represents a new secondary feature map, 9 represents an intermediate feature map, 10 represents a new primary feature map, 11, 12, 13 and 14 all belong to the results after the feature map convolution operation of different stages under depth supervision, 15 represents the prediction result of the network, and 16 represents the input image.
The U-Net3+ split network is divided into an encoding stage and a decoding stage, and mainly comprises: a multi-scale feature extraction module and a residual CBAM attention module: the residual CBAM attention module is mainly formed by connecting a residual module with a CBAM attention mechanism through jumping.
In the encoding stage, i.e. the feature extraction stage, firstly, the residual error module is used for processing the input image 16 to obtain a first-level feature map 1, then the multi-scale feature extraction module is used for processing the first-level feature map 1 to obtain a second-level feature map 2, secondly, the residual error module is used for processing the second-level feature map 2 to obtain a third-level feature map 3, finally, the residual error module is used for processing the third-level feature map 3 to obtain a fourth-level feature map 4, and compared with the adjacent upper-level feature map, the size of the lower-level feature map is halved and the number of channels is doubled.
In the decoding stage, starting from the three-level feature map 3, up-sampling or pooling is performed on the feature maps of other levels to keep the sizes of the feature maps consistent, channel information of the feature maps is fused and then processed by a residual CBAM module to obtain a new feature map, and the new feature map is used for repeating the above operations to sequentially obtain a new two-level feature map 8 and a new one-level feature map 10.
Obtaining prediction results of different sizes from feature images of different levels through convolution operation, performing depth supervision on the prediction results in a training stage, upsampling the prediction results of different sizes to the same size of an input image, and then calculating loss and updating gradient; in the test phase, the results after the primary feature map processing are used as the final predicted results 15.
S2.2, constructing a multi-scale feature extraction module, wherein the multi-scale feature extraction module comprises a multi-scale convolution attention module, and the multi-scale convolution attention module is divided into three parts, and the first part is a 5 multiplied by 5 depth convolution for acquiring local feature information as shown in FIG. 3; the second part is different mixed cavity convolutions of multiple branches and is used for extracting multi-scale characteristic information; the third part is 1×1 convolution, which is responsible for mixing channels, and finally multiplying the input features with the convolved weights element by element to obtain the required output. Specifically, in the multi-scale feature extraction module, input features are subjected to 1×1 convolution processing firstly, the number of channels is reduced, then the input features are input to the multi-scale convolution attention module to obtain convolution attention, then BN normalization and GELU activation operations are carried out on results, after features are fully extracted through 3×3 depth convolution, dimension increasing operations are carried out, and finally, the feature matrix after dimension increasing is added with input features which are subjected to dimension increasing operations. The invention adopts a multi-scale feature extraction module in the encoding stage of the network, uses a plurality of mixed cavity convolutions to extract multi-scale feature information of the input image, and fully utilizes the context information.
As shown in fig. 3, LFMSCA represents a multi-scale feature extraction module and LMSCA represents a multi-scale convolution attention module. Wherein, (128, din) represents an input feature map with a number of channels of 128; (128, 1×1, 32) represents a convolution operation with a convolution kernel size of 1×1, a number of input channels of 128, and a number of output channels of 32; (d, 5×5) represents a depth convolution operation with a convolution kernel size of 5×5; (32, 1×1, 256) represents a convolution operation with a convolution kernel size of 1×1, a number of input channels of 32, and a number of output channels of 256; (128, 1×1, 256) represents a convolution operation with a convolution kernel size of 1×1, a number of input channels of 128, and a number of output channels of 256; (256, d out) an output characteristic map indicating the number of channels as 256; (3×3, r=1) represents a hole convolution operation with a convolution kernel size of 3×3 and a hole rate of 1; (3×3, r=2) represents a hole convolution operation with a convolution kernel size of 3×3 and a hole rate of 2; (3×3, r=3) represents a hole convolution operation with a convolution kernel size of 3×3 and a hole rate of 3.
S2.3, combining a residual error module with a CBAM attention mechanism, and adding the residual error CBAM attention module in each level characteristic fusion stage of the network, so that the network focuses on important information, redundant information is effectively avoided, and the expression capability of the network is improved. According to the method, a CBAM attention mechanism is added in a multi-scale feature fusion stage, so that a network is better focused on a region needing to be segmented, and the segmentation accuracy is improved. In addition, the invention reduces the network complexity and the parameter calculation amount by reducing the network level and the conversion channel.
Specifically, as shown in fig. 4, the feature fusion stage of each level of the network specifically adopts full-scale jump connection, wherein,representing the first layer encoder->Representing the second layer encoder->Representing a third layer encoder->Representing a fourth layer encoder, maxpooling (2) represents a max-pooling operation with a downsampling rate of 2, maxpooling (4) represents a max-pooling operation with a downsampling rate of 4, maxpooling (8) represents a max-pooling operation with a downsampling rate of 8, and Conv represents a convolution operation.
As shown in fig. 5, the residual CBAM attention module is mainly formed by connecting a residual module and a CBAM attention mechanism through jump. Firstly, calculating channel attention feature map information for an input feature map F, and multiplying the channel attention feature map information by the input feature map to carry out self-adaptive feature correction; then calculating the space attention characteristic diagram information and carrying out characteristic correction to obtain F S The method comprises the steps of carrying out a first treatment on the surface of the Finally, the input feature image and the corrected feature image are subjected to channel information fusion through jump connection to obtain a final output feature image F out 。
Step three, constructing an improved mixed loss function, and applying the improved mixed loss function to the constructed U-Net3+ split network model;
specifically, the training uses a mixed Loss function, which is formed by combining a variation Log Cosh Dice Loss of the conventional Dice Loss function Dice of image semantic segmentation and the Focal Loss function Focal Loss.
The Dice Loss function Dice Loss is defined as:
wherein X represents the prediction result of the model on the target image, and Y represents the real label of the target image.
Said variant Log Cosh Dice Loss is defined as:
L log-cosh-Dice =log(cosh(DiceLoss)
wherein the dash is defined as:
the Focal Loss function Focal Loss is defined as:
L Focal =-α t (1-p t ) γ log(p t )
wherein alpha is t As a balance factor, in general [0,1 ]]In the range, the importance of positive and negative samples is balanced; gamma is 2;
wherein p represents the probability that the predicted sample belongs to 1 (range is 0-1), and y represents the label;
the final mixing loss function is defined as:
the invention designs the mixing loss function, and solves the problem of low segmentation precision caused by unbalanced category in the remote sensing image. The mixed loss function fuses the Dice loss function and the focus loss function, improves the unbalance problem of image samples, and improves the precision of remote sensing semantic segmentation.
Step four, the preprocessed data is transmitted to the constructed U-Net3+ segmentation network model to carry out model training;
specifically, the picture preprocessed in the first step and used for training is input into a constructed U-Net3+ segmentation network, the network parameter weight is updated, the picture of a verification set is used for verification, the network segmentation effect is obtained, and a better network model is continuously stored, so that the optimal network model is obtained.
Fifthly, performing semantic segmentation on the remote sensing image by using the trained U-Net3+ segmentation network model, and verifying the segmentation effect of the U-Net3+ segmentation network model;
specifically, the divided test set picture in the step four is input into the optimal network model stored in the step four to obtain a segmentation effect picture, and the segmentation effect picture is compared with an experimental label picture to obtain segmentation accuracy.
Step six: post-processing of the image;
and fifthly, the obtained segmentation effect graph is a gray level graph, the obtained segmentation effect graph is spliced according to the position of the slice in the step one, the background filled in the step one is removed, the graph is restored to be 6800 multiplied by 7200 (pixels), and the final segmentation graph is obtained by mapping the position of each class of the gray level graph into the original graph.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.
Claims (8)
1. The remote sensing semantic segmentation method based on U-Net3+ is characterized by comprising the following steps of:
step one, data acquisition and preprocessing;
step two, constructing a U-Net3+ segmented network model, and adding a multi-scale feature extraction module and an attention mechanism into the constructed U-Net3+ segmented network model;
step three, constructing an improved mixed loss function, and applying the improved mixed loss function to the constructed U-Net3+ split network model;
step four, the preprocessed data is transmitted to the constructed U-Net3+ segmentation network model to carry out model training;
fifthly, performing semantic segmentation on the remote sensing image by using the trained U-Net3+ segmentation network model, and verifying the segmentation effect of the U-Net3+ segmentation network model;
and step six, image post-processing.
2. The U-net3+ based remote sensing semantic segmentation method according to claim 1, wherein in the first step, the acquired data is derived from a remote sensing semantic segmentation dataset GID-5, the remote sensing semantic segmentation dataset GID-5 comprises a plurality of pictures, the pictures are divided into a training set, a verification set and a test set, and the number proportion of the pictures in the training set, the verification set and the test set is as follows: 25:1:4.
3. The remote sensing semantic segmentation method based on U-Net3+ according to claim 2, wherein in the first step, preprocessing is performed on data, slicing is performed on a picture, background filling is performed on insufficient positions in the picture, and meanwhile, an experimental label graph is converted into a gray scale graph.
4. The remote sensing semantic segmentation method based on U-Net3+ according to claim 1, wherein the specific operation flow of the second step is as follows:
s2.1, a U-Net3+ split network is built, and the network level is reduced from 5 layers to 4 layers;
s2.2, constructing a multi-scale feature extraction module; the multi-scale feature extraction module comprises a multi-scale convolution attention module which is divided into three parts, wherein the first part is a 5×5 depth convolution for obtaining local feature information; the second part is different mixed cavity convolutions of multiple branches and is used for extracting multi-scale characteristic information; the third part is 1 multiplied by 1 convolution, which is responsible for mixing channels, and finally multiplying the input characteristics with the convolved weight element by element to obtain the required output;
s2.3, combining the residual error module with a CBAM attention mechanism, and adding the residual error CBAM attention module in the characteristic fusion stage of each layer of the network, so that the network focuses on important information.
5. The remote sensing semantic segmentation method based on U-Net3+ according to claim 1, wherein in the third step, the mixed Loss function is formed by combining a variation Log Cosh Dice Loss of a Dice Loss function Dice and a focus Loss function Focalloss;
the Dice Loss function Dice Loss is defined as:
wherein X represents the prediction result of the model on the target image, and Y represents the real label of the target image;
said variant Log Cosh Dice Loss is defined as:
L log-cosh-Dice =log(cosh(DiceLoss)
wherein the dash is defined as:
the focus loss function FocalLoss is defined as:
L Focal =-α t (1-p t ) γ log(p t )
wherein alpha is t Is a balance factor for balancing the importance of positive and negative samples; gamma is 2;
wherein p represents the probability that the predicted sample belongs to 1 (range is 0-1), and y represents the label;
the final mixing loss function is defined as:
6. the remote sensing semantic segmentation method based on U-Net3+ according to claim 2, wherein the specific operation flow of the fourth step is as follows: inputting the preprocessed picture in the first step into a constructed U-Net3+ segmentation network, updating the network parameter weight, and verifying by using the picture of a verification set to obtain a network segmentation effect, and continuously storing a better network model to obtain the optimal network model.
7. The remote sensing semantic segmentation method based on U-Net3+ according to claim 6, wherein in the fifth step, the test set picture is input into an optimal network model to obtain a segmentation effect picture, and the segmentation effect picture is compared with an experimental label picture to obtain segmentation accuracy.
8. The remote sensing semantic segmentation method based on U-Net3+ according to claim 7, wherein in the sixth step, the segmentation effect map obtained in the fifth step is a gray level map, the segmentation effect map obtained in the fifth step is spliced according to the position of the first slice, the background filled in the first step is removed, and the final segmentation map is obtained by mapping the positions of the gray level maps into the original picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311135160.2A CN117152435A (en) | 2023-09-05 | 2023-09-05 | Remote sensing semantic segmentation method based on U-Net3+ |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311135160.2A CN117152435A (en) | 2023-09-05 | 2023-09-05 | Remote sensing semantic segmentation method based on U-Net3+ |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117152435A true CN117152435A (en) | 2023-12-01 |
Family
ID=88898431
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311135160.2A Pending CN117152435A (en) | 2023-09-05 | 2023-09-05 | Remote sensing semantic segmentation method based on U-Net3+ |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117152435A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117893934A (en) * | 2024-03-15 | 2024-04-16 | 中国地震局地质研究所 | Improved UNet3+ network unmanned aerial vehicle image railway track line detection method and device |
-
2023
- 2023-09-05 CN CN202311135160.2A patent/CN117152435A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117893934A (en) * | 2024-03-15 | 2024-04-16 | 中国地震局地质研究所 | Improved UNet3+ network unmanned aerial vehicle image railway track line detection method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113850825B (en) | Remote sensing image road segmentation method based on context information and multi-scale feature fusion | |
CN113780296B (en) | Remote sensing image semantic segmentation method and system based on multi-scale information fusion | |
CN108764063B (en) | Remote sensing image time-sensitive target identification system and method based on characteristic pyramid | |
CN109784283B (en) | Remote sensing image target extraction method based on scene recognition task | |
CN113160234B (en) | Unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation | |
CN110598784B (en) | Machine learning-based construction waste classification method and device | |
CN111291826B (en) | Pixel-by-pixel classification method of multisource remote sensing image based on correlation fusion network | |
CN111127538B (en) | Multi-view image three-dimensional reconstruction method based on convolution cyclic coding-decoding structure | |
CN110555841B (en) | SAR image change detection method based on self-attention image fusion and DEC | |
CN111598101B (en) | Urban area intelligent extraction method, system and equipment based on remote sensing image scene segmentation | |
CN113838064B (en) | Cloud removal method based on branch GAN using multi-temporal remote sensing data | |
CN117152435A (en) | Remote sensing semantic segmentation method based on U-Net3+ | |
CN113256649A (en) | Remote sensing image station selection and line selection semantic segmentation method based on deep learning | |
CN114612664A (en) | Cell nucleus segmentation method based on bilateral segmentation network | |
CN116912708A (en) | Remote sensing image building extraction method based on deep learning | |
CN115471754A (en) | Remote sensing image road extraction method based on multi-dimensional and multi-scale U-net network | |
CN114943902A (en) | Urban vegetation unmanned aerial vehicle remote sensing classification method based on multi-scale feature perception network | |
Christophe et al. | Open source remote sensing: Increasing the usability of cutting-edge algorithms | |
CN115937774A (en) | Security inspection contraband detection method based on feature fusion and semantic interaction | |
CN116740344A (en) | Knowledge distillation-based lightweight remote sensing image semantic segmentation method and device | |
CN115661655A (en) | Southwest mountain area cultivated land extraction method with hyperspectral and hyperspectral image depth feature fusion | |
CN112818777B (en) | Remote sensing image target detection method based on dense connection and feature enhancement | |
Maduako et al. | Automated school location mapping at scale from satellite imagery based on deep learning | |
CN113111740A (en) | Characteristic weaving method for remote sensing image target detection | |
CN114511787A (en) | Neural network-based remote sensing image ground feature information generation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |