CN116486075A - HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image - Google Patents
HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image Download PDFInfo
- Publication number
- CN116486075A CN116486075A CN202310337060.1A CN202310337060A CN116486075A CN 116486075 A CN116486075 A CN 116486075A CN 202310337060 A CN202310337060 A CN 202310337060A CN 116486075 A CN116486075 A CN 116486075A
- Authority
- CN
- China
- Prior art keywords
- feature
- remote sensing
- hrnet
- model
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000004927 fusion Effects 0.000 title claims abstract description 33
- 238000000605 extraction Methods 0.000 claims abstract description 53
- 238000012549 training Methods 0.000 claims abstract description 31
- 230000007246 mechanism Effects 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 230000007704 transition Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 238000012800 visualization Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 238000007499 fusion processing Methods 0.000 claims 1
- 230000009466 transformation Effects 0.000 claims 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Astronomy & Astrophysics (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-scale strong fusion semantic segmentation method for extracting ground features of a remote sensing image based on HRNet. Based on the feature element extraction sample data set, a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism is constructed, based on the constructed MT-HRNet semantic segmentation model, training is carried out on the training set, model parameters are optimized, and a preliminary feature element extraction result is obtained. And calculating the segmentation loss based on the preliminary feature element extraction result and the real label of the remote sensing image. The segmentation loss can guide the MT-HRNet feature extraction network to perform sufficient feature extraction, and the segmentation precision is improved. Until the MTC-HRNet model converges. According to the method, the HRNet network is used for semantic segmentation of the remote sensing image, so that the feature extraction capacity of the remote sensing image is improved, and the feature extraction result is more accurate.
Description
Technical Field
The invention relates to a remote sensing image, ground feature element extraction, deep learning, attention mechanism, multi-scale strong fusion and semantic segmentation technology, in particular to a multi-scale strong fusion and semantic segmentation method for extracting a ground feature of a remote sensing image based on HRNet.
Background
With the development of aerospace technology, a large amount of remote sensing data can be obtained every day, and the intrinsic information can be obtained from the data, so that the life and production of people can be effectively helped. The feature element extraction is one of basic tasks of remote sensing image analysis, and the result provides important data support for town construction and planning, land resource management, feature element duty ratio statistics, land mapping and the like, so that the method has wide application. However, the existing semantic segmentation model is mostly developed for natural images, the direct application of the semantic segmentation model to remote sensing images is not good in segmentation effect, the segmentation model related to the existing semantic segmentation algorithm for the remote sensing images is old, and targeted optimization and improvement are seldom performed. The feature of the input remote sensing image can be automatically extracted after the semantic segmentation model is trained, an end-to-end segmentation network is formed, and the accuracy of the output segmentation result is higher. The satellite remote sensing data based ground feature element extraction greatly reduces labor cost, is beneficial to promoting reasonable planning and use of the land resources in China, and has higher accuracy in mastering the condition of the land resources.
Therefore, the patent provides a multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing images based on HRNet. The method solves the problem that the accuracy of dividing the target category is insufficient in the remote sensing image ground feature element extraction task. According to the method, the HRNet network is used for semantic segmentation of the remote sensing image, so that the feature extraction capacity of the remote sensing image is improved, and the feature extraction result is more accurate. And a global self-adaptive upsampling and multi-scale information strong fusion strategy is fused, the problem of insufficient fusion of feature maps with different resolutions is solved, and more spatial position information and advanced semantic information are reserved. The triple attention mechanism TAM is added, and the problem that the division of the slender categories such as 'water body', 'road' and the like is discontinuous or unpredictable during the division is solved. The remote sensing image information is introduced into a network training process, segmentation loss is generated by comparing tag true values, the training process of the semantic segmentation network is effectively guided, and the extraction capability of detail texture features is improved while the feature classification is carried out. The invention can extract the remote sensing image ground feature information with different resolutions, more accurately divide the ground feature categories, extract the ground feature categories with different colors, texture features and different scales, and improve the extraction capability of the slender category.
Disclosure of Invention
The technical solution of the invention is as follows: a multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing images based on HRNet is provided. The ground feature element extraction task of the remote sensing image is realized, and the precision and connectivity are ensured.
The technical scheme of the invention is as follows: a method for extracting multi-scale strong fusion semantic segmentation of a ground object of a remote sensing image based on HRNet comprises the steps of firstly obtaining remote sensing data, dividing the data into data sets, and forming a ground element extraction sample data set. Based on the feature element extraction sample data set, a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism is constructed, based on the constructed MT-HRNet semantic segmentation model, training is carried out on the training set, model parameters are optimized, and a preliminary feature element extraction result is obtained. And calculating the segmentation loss based on the preliminary feature element extraction result and the real label of the remote sensing image. The segmentation loss can guide the MT-HRNet feature extraction network to perform sufficient feature extraction, and the segmentation precision is improved. Until the MTC-HRNet model converges. And predicting the test set based on the converged MT-HRNet semantic segmentation model to obtain a feature element extraction result. The method comprises the following specific steps:
(1) Remote sensing data preprocessing and data set partitioning.
In the step (1), the remote sensing data is preprocessed to be processed into a training file in the VOC format, and a remote sensing image sample data set is generated.
Further, the remote sensing data tag is subjected to VOC format adjustment. The processing may be performed by Python script. The input picture is a JPG picture, and the label is a PNG picture. The value of each pixel of the tag is the type to which that pixel belongs. Such as: the background pixel value is 0, the target class 1 pixel value is 1, the target class 2 pixel value is 2, the target class 3 pixel value is 3, the following are included.
(2) Performing feature element extraction on the remote sensing image sample data set obtained in the step (1) to construct a semantic segmentation model MT-HRNet, wherein the method specifically comprises the following steps of:
(a) And constructing a remote sensing image ground feature element feature extraction network by adopting structures of parallel sub-networks with different resolutions. A 256×256 size three-channel picture is input, and the input is first downsampled twice using a convolution layer (containing BN and ReLU) with a convolution kernel size of 3×3, a step size of 2, and a padding of 1. At this point a total of four times down-sampling occurs. And then, carrying out channel number adjustment through a Layer1 module without changing the size of the feature diagram, wherein the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with more network layers. Then through a series of transitions and Stage structures. Every time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the branch feature maps H and W are halved, and the channel number C is doubled. Layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step sizes of 1 and 2 respectively, and the padding of 1 respectively to generate two branches with different resolutions, namely, the scales of four times and eight times are sampled downwards on the basis of an original image. Transition2 is a scale that is increased by a factor of 16 over the original two branches. Transit 3 is the same, adding a 32-fold downsampling scale to the previous one. And fusing information with different scales through a Stage structure in the network. The single branch in each Stage, with 4 superimposed BasicBlock, basicBlock modules, is less for the number of network layers in the ResNet. The high-resolution representation is maintained in the whole process, and the feature extraction capability of the model on the remote sensing image can be improved by repeatedly fusing information with different scales, so that the feature extraction result is more accurate.
(b) And constructing a global self-adaptive up-sampling module GSAU to unify different scale resolutions. The GSAU module performs a 3×3 convolution operation on the low resolution features, reducing the number of channels of the feature map. Global context information acquired from the high resolution feature map is subjected to a convolution of 1 x 1, a batch normalization operation BN and a deconvolution operation ReLU, and then multiplied by the low resolution feature map, at which time feature mapping of the high resolution spatial information to low resolution class localization has been completed. And finally, adding the high-resolution feature map and the weighted low-resolution feature map to obtain a feature map after primary fusion. The high resolution feature map of the upper branch is downsampled and then added to the feature map just processed to complete the global adaptive upsampling operation. Notably, the global adaptive upsampling GSAU module only acts during feature fusion where there is an upsampling operation.
(c) And constructing a multi-scale strong fusion module MSSFM to solve the problem of insufficient fusion of the features with different resolutions. And taking the GSAU module output characteristic diagram as an input characteristic diagram of the MSSFM module. The feature map compresses space information through global average pooling and global maximum pooling, and outputs more proper one-dimensional channel weight parameters, wherein the weight parameters are processed through a convolution of 1 multiplied by 1 and a BN layer, a non-linear ReLU activation function, and then the final channel weight attention parameters are obtained through convolution of 1 multiplied by 1 and Sigmoid. And finally multiplying the weight with the processed high-resolution feature images, namely guiding better fusion of the feature images with different resolutions by using the channel weight attention parameter, so that the feature images are more reliable. And finally, accumulating the processed high-resolution feature map to output again. This is to effectively alleviate the problems of gradient extinction, gradient explosion and degradation by using the residual concept. The feature map processed by the multi-scale strong fusion module MSSFM retains more space information and advanced semantic information. To ensure that the predicted image is more similar to the layout of the feature elements in the label image. The output of the down-sampled four-times branch in Stage2 is processed by the down-sampled eight-times branch through the GASU module and then through the MSSFM module together with the down-sampled four-times branch in Stage 2. Stage3 is similar to Stage 2.
(d) The four-fold BasicbLock processed feature maps per stack in Stage4 all need to be processed by the triple attentiveness mechanism TAM module. The three parallel branches of the TAM module are used to capture the dependency between the (C, H), (C, W) and (H, W) dimensions of the input tensor, respectively. The top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, and the bottom branch establishes the dependency relationship between H and W. And finally, summarizing the three-branch weights through simple average. The Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and the feature map is processed by using average pooling and maximum pooling, so that the layer can furthest reserve rich features of actual tensors, and meanwhile, the layer number of the neural network is reduced, so that the module is lighter. The same shape as the TAM input is obtained after TAM processing, but with a fine tensor of more detailed features. To improve the segmentation accuracy of slender categories such as 'roads', 'water bodies', etc. The output of each TAM module needs to be subjected to multi-scale feature strong fusion through the MSSFM module again.
(3) Based on the constructed MT-HRNet semantic segmentation network, training is carried out on a training set, and model parameters are optimized until the MT-HRNet semantic segmentation network converges.
(a) And (5) training a feature model of the ground object. The model learning rate was initially set to 0.004, with a minimum learning rate of 0.00004. The optimizer is sgd, the internal parameter momentum of the optimizer is set to 0.9, and the learning rate is reduced by cos. The batch number batch_size of model training is set to 16, the iteration number epoch is set to 200, and model training results MPA and MIoU are observed.
(b) And (5) verifying a road feature model. And (3) reserving an optimal model in the training round of the process (a), performing first-round verification on the verification set by using the model to generate a prediction set, and checking a prediction result.
(c) Repeating the processes of (a) and (b) until the model precision reaches a usable level, and reserving an optimal model, so as to finish the construction of the remote sensing image ground feature element extraction model.
(4) Based on the converged MT-HRNet semantic segmentation model, predicting the test set to obtain a feature element extraction visualization result. The method comprises the following specific steps: and (3) extracting feature features of the ground object element from the test set by using an MT-HRNet model, and storing extraction results MPA and MIoU, and a remote sensing image integral dataset segmentation result visualization.
Compared with the prior art, the invention has the advantages that:
1. and the high-resolution network HRNet is used for extracting ground feature elements of the remote sensing image, and the characterization of the high-resolution feature map is kept consistent in the whole training process, and different resolution information is fused repeatedly. The feature extraction capability of the network is improved, and the feature element extraction result is more accurate.
2. And the global self-adaptive up-sampling is fused, the unification of different scale resolutions is carried out, and the feature images with the same size as the high resolution are output, so that the robustness of the input space change is effectively maintained, and the unique characteristics in the feature images are learned, thereby obtaining finer space information. In addition, a multi-scale information strong fusion strategy is adopted, the problem of insufficient fusion of feature images with different resolutions is solved, more spatial position information and advanced semantic information are reserved, and the model segmentation precision is improved.
3. And adding a triple attention mechanism in the last Stage of the network, fully utilizing cross-channel interaction information, effectively helping channel information and space detail information to effectively propagate in the network, and acquiring the fine tensor with more detail characteristics in the same shape. The problem of losing the texture information of the fine strip-shaped target is solved, and the extraction effect of the ground feature elements is improved.
Drawings
Fig. 1 is an overall flow chart of the present invention.
FIG. 2 is a Stage and Transition infrastructure
Fig. 3 is a network configuration diagram of GSAU.
Fig. 4 is a network configuration diagram of the MSSFM.
Fig. 5 is a network configuration diagram of the triple attention mechanism module TAM.
FIG. 6 is a visual illustration of the segmentation results of HRNet and MT-HRNet
Detailed Description
In order to make the solution of the embodiment of the present invention better understood by those skilled in the art, the embodiment of the present invention is further described in detail below with reference to the accompanying drawings and embodiments.
The invention comprises the following steps:
1. remote sensing data preprocessing and data set partitioning. And adjusting the VOC format of the remote sensing data tag. The processing may be performed by Python script. The input picture is a JPG picture, and the label is a PNG picture. The value of each pixel of the tag is the type to which that pixel belongs. Such as: the background pixel value is 0, the target class 1 pixel value is 1, the target class 2 pixel value is 2, the target class 3 pixel value is 3, and so on.
2. Extracting ground feature elements from a remote sensing image sample dataset to construct a semantic segmentation model, wherein the method specifically comprises the following steps of: constructing a feature learning model of the feature, training the feature model of the feature, testing the feature model of the feature, and storing an optimal model;
constructing a ground object element feature learning model MT-HRNet: 256×256 three-channel pictures are input, the input is first downsampled twice using a convolution layer (comprising BN and ReLU) with a convolution kernel size of 3×3 and a step size of 2 and a step size of 1, and each downsampling twice passes through a convolution layer with a convolution kernel of 3×3 and a step size of 2 and a step size of 1, and finally passes through the BN layer, at this time, the downsampling is four times in total. And then, carrying out channel number adjustment through a Layer1 module without changing the size of the feature map, wherein the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with a large number of network layers. Then through a series of transitions and Stage structures. Every time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the branch feature maps H and W are halved, and the channel number C is doubled. Layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step sizes of 1 and 2 respectively, and the padding of 1 respectively to generate two branches with different resolutions, namely, the scales of four times and eight times are sampled downwards on the basis of an original image. Transition2 is a scale that is increased by 16 times by one downsampling based on the original two branches. Transit 3 is the same, adding a 32-fold downsampling scale to the previous one. The main work of the Stage structure in the network is to fuse information of different scales. Stage2 first uses four BasicBlock, basicBlock modules to get the result of the ResNet with fewer network layers and the output of each branch is the result of the fusion of all branches. The output of the branch with four times of downsampling in Stage2 is that the branch with eight times of downsampling is processed by the GASU module, and then is processed by the MSSFM module together with the branch with four times of downsampling in Stage2, so that more remote sensing image space information and high-level semantic information are reserved, and the predicted image and the layout of ground feature elements in the tag image are ensured to be more similar. Stage3 branches are also similar. The four-fold BasicbLock processed feature maps per stack in Stage4 all need to be processed by the triple attentiveness mechanism TAM module. The three parallel branches of the TAM module are used to capture the dependency between the (C, H), (C, W) and (H, W) dimensions of the input tensor, respectively. The top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, and the bottom branch establishes the dependency relationship between H and W. And finally, summarizing the three-branch weights through simple average. The Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and the feature map is processed by using average pooling and maximum pooling, so that the layer can furthest reserve rich features of actual tensors, and meanwhile, the layer number of the neural network is reduced, so that the module is lighter. The same shape as the TAM input is obtained after TAM processing, but with a fine tensor of more detailed features. To improve the segmentation accuracy of slender categories such as 'roads', 'water bodies', etc. The output of each TAM module needs to be subjected to multi-scale feature strong fusion through the MSSFM module again.
And (3) training a feature model of the ground object element: the model learning rate was initially set to 0.004, with a minimum learning rate of 0.00004. The optimizer is sgd, the internal parameter momentum of the optimizer is set to 0.9, and the learning rate is reduced by cos. The batch number batch_size of model training is set to 16, the iteration number epoch is set to 200, and model training results MPA and MIoU are observed.
And (3) feature model test of the ground object element: the optimal model in the feature model training round of the feature of the ground object is reserved, the model is used for carrying out a first round of test on the test set to generate a prediction set, and the prediction result is checked
Storing an optimal model: and repeating the processes of model training and model testing until the model precision reaches a usable level, and reserving an optimal model, so that the remote sensing image ground feature element extraction model construction is completed.
3. The method for extracting the ground object elements of the remote sensing image data based on the MT-HRNet semantic segmentation model comprises the following specific steps: extracting feature features of the ground object and storing segmentation results;
extracting feature features of ground object elements: and (3) putting all remote sensing image data sets into a test set, extracting the ground feature elements by using an MT-HRNet semantic segmentation model, and visualizing the ground feature element extraction result, namely, MPA and MIoU, namely, the extracted data of the ground feature elements.
Storing a segmentation result: and (3) adopting a centralized high-performance computing facility or utilizing a distributed computing environment and a storage structure to extract the ground feature elements and then physically store the data.
Table 1 experimental results
It should be noted that the method of the embodiment of the invention is suitable for extracting the ground feature elements of the remote sensing images with different resolutions.
The foregoing has described in detail embodiments of the invention, which are presented herein with particular reference to the drawings and are presented solely to aid in the understanding of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (5)
1. The method is characterized by firstly acquiring remote sensing data, and dividing the data into data sets to form a ground feature element extraction sample data set; based on a feature element extraction sample data set, constructing a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism, training on a training set based on a constructed MT-HRNet semantic segmentation model, and optimizing model parameters to obtain a preliminary feature element extraction result; calculating segmentation loss based on the primary feature element extraction result and the real label of the remote sensing image; the segmentation loss guides the MT-HRNet feature extraction network to perform feature extraction until the MTC-HRNet model converges; and predicting the test set based on the converged MT-HRNet semantic segmentation model to obtain a feature element extraction result.
2. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method according to claim 1, wherein the semantic segmentation method comprises the following specific steps:
(1) Preprocessing remote sensing data and dividing a data set; preprocessing remote sensing data, processing the remote sensing data into a training file in a VOC format, and generating a remote sensing image sample data set;
(2) Performing feature element extraction on the remote sensing image sample data set obtained in the step (1) to construct a semantic segmentation model MT-HRNet, wherein the method specifically comprises the following steps of:
(a) Constructing a remote sensing image ground feature extraction network by adopting structures of parallel sub-networks with different resolutions; inputting a three-channel picture with 256 multiplied by 256, firstly, performing two downsampling on the input by using a convolution layer with a convolution kernel size of 3 multiplied by 3 and a step length of 2 and a padding of 1, wherein the downsampling is four times; the channel number is adjusted through a Layer1 module without changing the size of a feature diagram, the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with more network layers and is in a Transition and Stage structure; each time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the feature maps H and W of the branch are halved, and the channel number C is doubled; layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step length of 1 and 2 and the padding of 1 respectively to generate two branches with different resolutions, namely the scales of four times and eight times are sampled downwards on the basis of an original image; transition2 is a scale of increasing one downsampled 16 times on the basis of the original two branches; a downsampling scale of 32 times is added to Transition 3; information of different scales is fused in a network through a Stage structure; the single branch in each Stage is overlapped with 4 BasicbLock modules, and the Basicblock modules are used for the network layers in ResNet; the characteristic extraction capability of the remote sensing image is improved by maintaining the high-resolution representation in the whole process and repeatedly fusing the information of different scales, so that the feature extraction result is more accurate;
(b) Constructing a global self-adaptive up-sampling module GSAU to unify different scale resolutions; the GSAU module executes 3×3 convolution operation on the low resolution feature, global context information obtained from the high resolution feature map is subjected to 1×1 convolution, batch normalization operation BN and branching transformation operation ReLU, and then multiplied by the low resolution feature map to complete feature mapping of the high resolution space information on low resolution category positioning; finally, adding the high-resolution feature map and the weighted low-resolution feature map to obtain a feature map after preliminary fusion; the high-resolution feature map of the upper branch is added with the feature map just processed after being downsampled to complete the global self-adaptive upsampling operation; the global self-adaptive up-sampling GSAU module only acts in the feature fusion process of up-sampling operation;
(c) Constructing a multi-scale strong fusion module MSSFM, and taking the GSAU module output feature map as an input feature map of the MSSFM module; compressing space information by global average pooling and global maximum pooling of the feature map, outputting more proper one-dimensional channel weight parameters, processing the weight parameters by a convolution of 1 multiplied by 1 and a BN layer, and obtaining final channel weight attention parameters by a convolution of 1 multiplied by 1 and a Sigmoid; finally multiplying the channel weight with the processed high-resolution feature map; accumulating the processed high-resolution feature images to output again;
(d) The feature map after being processed by Basicblock for four times in each stack in Stage4 is processed by a triple attention mechanism TAM module; the three parallel branches of the TAM module are respectively used for capturing the dependency relationship between the (C, H), (C, W) and (H, W) dimensions of the input tensor; the top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, the bottom branch establishes the dependency relationship between H and W, and the three branch weights are summarized on average; the Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and processing the feature map by using average pooling and maximum pooling; the same shape as the input of the TAM module is obtained after the TAM module is processed, and the output of each TAM module is required to be subjected to multi-scale feature strong fusion again through the MSSFM module;
(3) Training on a training set based on the constructed MT-HRNet semantic segmentation network, and optimizing model parameters until the MT-HRNet network converges;
(4) Based on the converged MT-HRNet semantic segmentation model, predicting a test set to obtain a feature element extraction visualization result; the method specifically comprises the following steps: and (3) extracting feature features of the ground object element from the test set by using an MT-HRNet model, and storing extraction results MPA and MIoU, and a remote sensing image integral dataset segmentation result visualization.
3. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein in the remote sensing data preprocessing and data set partitioning, the remote sensing data tag is subjected to VOC format adjustment and is processed through a Python script; the input picture is a JPG picture, and the label is a PNG picture; the value of each pixel of the tag is the type to which that pixel belongs.
4. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein the feature map processed by the multi-scale strong fusion module MSSFM is constructed in the multi-scale strong fusion module MSSFM, and more space information and advanced semantic information are reserved; ensuring that the predicted image is similar to the layout of the ground feature elements in the tag image; the output of the down-sampled four-times branch in Stage2 is processed by the down-sampled eight-times branch through the GASU module and then through the MSSFM module together with the down-sampled four-times branch in Stage 2.
5. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein training optimization model parameters on a training set specifically comprises (a) ground feature element feature model training; setting the model learning rate to 0.004 at the initial stage, and setting the minimum learning rate to 0.00004; the optimizer is sgd, the internal parameter momentum of the optimizer is set to be 0.9, and the learning rate is reduced by cos; batch number batch_size of model training is set to 16, iteration number epoch is set to 200, and model training results MPA and MIoU are observed;
(b) Verifying a road feature model; reserving an optimal model in the training round in the process (a), performing first-round verification on the verification set by using the model to generate a prediction set, and checking a prediction result;
(c) Repeating the processes of (a) and (b) until the model precision reaches a usable level, and reserving an optimal model, so as to finish the construction of the remote sensing image ground feature element extraction model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310337060.1A CN116486075A (en) | 2023-03-31 | 2023-03-31 | HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310337060.1A CN116486075A (en) | 2023-03-31 | 2023-03-31 | HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116486075A true CN116486075A (en) | 2023-07-25 |
Family
ID=87214720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310337060.1A Pending CN116486075A (en) | 2023-03-31 | 2023-03-31 | HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116486075A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350171A (en) * | 2023-12-04 | 2024-01-05 | 山东省计算中心(国家超级计算济南中心) | Mesoscale vortex three-dimensional subsurface structure inversion method and system based on double-flow model |
-
2023
- 2023-03-31 CN CN202310337060.1A patent/CN116486075A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350171A (en) * | 2023-12-04 | 2024-01-05 | 山东省计算中心(国家超级计算济南中心) | Mesoscale vortex three-dimensional subsurface structure inversion method and system based on double-flow model |
CN117350171B (en) * | 2023-12-04 | 2024-03-12 | 山东省计算中心(国家超级计算济南中心) | Mesoscale vortex three-dimensional subsurface structure inversion method and system based on double-flow model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111191736B (en) | Hyperspectral image classification method based on depth feature cross fusion | |
CN113850825A (en) | Remote sensing image road segmentation method based on context information and multi-scale feature fusion | |
CN114092832B (en) | High-resolution remote sensing image classification method based on parallel hybrid convolutional network | |
CN113160234B (en) | Unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation | |
CN111738111A (en) | Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid | |
CN109886330B (en) | Text detection method and device, computer readable storage medium and computer equipment | |
CN113901900A (en) | Unsupervised change detection method and system for homologous or heterologous remote sensing image | |
CN113066037B (en) | Multispectral and full-color image fusion method and system based on graph attention machine system | |
CN111353544A (en) | Improved Mixed Pooling-Yolov 3-based target detection method | |
CN112734789A (en) | Image segmentation method and system based on semi-supervised learning and point rendering | |
CN116486075A (en) | HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image | |
CN116740527A (en) | Remote sensing image change detection method combining U-shaped network and self-attention mechanism | |
CN112766409A (en) | Feature fusion method for remote sensing image target detection | |
CN116258976A (en) | Hierarchical transducer high-resolution remote sensing image semantic segmentation method and system | |
CN112950780A (en) | Intelligent network map generation method and system based on remote sensing image | |
CN113435254A (en) | Sentinel second image-based farmland deep learning extraction method | |
CN113903022A (en) | Text detection method and system based on feature pyramid and attention fusion | |
CN116090517A (en) | Model training method, object detection device, and readable storage medium | |
CN117237808A (en) | Remote sensing image target detection method and system based on ODC-YOLO network | |
CN113313180B (en) | Remote sensing image semantic segmentation method based on deep confrontation learning | |
CN114519819A (en) | Remote sensing image target detection method based on global context awareness | |
CN114332473A (en) | Object detection method, object detection device, computer equipment, storage medium and program product | |
CN113111740A (en) | Characteristic weaving method for remote sensing image target detection | |
CN112686184A (en) | Remote sensing house change detection method based on neural network | |
CN116012709B (en) | High-resolution remote sensing image building extraction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |