CN116486075A - HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image - Google Patents

HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image Download PDF

Info

Publication number
CN116486075A
CN116486075A CN202310337060.1A CN202310337060A CN116486075A CN 116486075 A CN116486075 A CN 116486075A CN 202310337060 A CN202310337060 A CN 202310337060A CN 116486075 A CN116486075 A CN 116486075A
Authority
CN
China
Prior art keywords
feature
remote sensing
hrnet
model
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310337060.1A
Other languages
Chinese (zh)
Inventor
宋永端
龙鸿
吴将娱
姚栋
胡芳
张景
刘伯威
王玉娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202310337060.1A priority Critical patent/CN116486075A/en
Publication of CN116486075A publication Critical patent/CN116486075A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-scale strong fusion semantic segmentation method for extracting ground features of a remote sensing image based on HRNet. Based on the feature element extraction sample data set, a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism is constructed, based on the constructed MT-HRNet semantic segmentation model, training is carried out on the training set, model parameters are optimized, and a preliminary feature element extraction result is obtained. And calculating the segmentation loss based on the preliminary feature element extraction result and the real label of the remote sensing image. The segmentation loss can guide the MT-HRNet feature extraction network to perform sufficient feature extraction, and the segmentation precision is improved. Until the MTC-HRNet model converges. According to the method, the HRNet network is used for semantic segmentation of the remote sensing image, so that the feature extraction capacity of the remote sensing image is improved, and the feature extraction result is more accurate.

Description

HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image
Technical Field
The invention relates to a remote sensing image, ground feature element extraction, deep learning, attention mechanism, multi-scale strong fusion and semantic segmentation technology, in particular to a multi-scale strong fusion and semantic segmentation method for extracting a ground feature of a remote sensing image based on HRNet.
Background
With the development of aerospace technology, a large amount of remote sensing data can be obtained every day, and the intrinsic information can be obtained from the data, so that the life and production of people can be effectively helped. The feature element extraction is one of basic tasks of remote sensing image analysis, and the result provides important data support for town construction and planning, land resource management, feature element duty ratio statistics, land mapping and the like, so that the method has wide application. However, the existing semantic segmentation model is mostly developed for natural images, the direct application of the semantic segmentation model to remote sensing images is not good in segmentation effect, the segmentation model related to the existing semantic segmentation algorithm for the remote sensing images is old, and targeted optimization and improvement are seldom performed. The feature of the input remote sensing image can be automatically extracted after the semantic segmentation model is trained, an end-to-end segmentation network is formed, and the accuracy of the output segmentation result is higher. The satellite remote sensing data based ground feature element extraction greatly reduces labor cost, is beneficial to promoting reasonable planning and use of the land resources in China, and has higher accuracy in mastering the condition of the land resources.
Therefore, the patent provides a multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing images based on HRNet. The method solves the problem that the accuracy of dividing the target category is insufficient in the remote sensing image ground feature element extraction task. According to the method, the HRNet network is used for semantic segmentation of the remote sensing image, so that the feature extraction capacity of the remote sensing image is improved, and the feature extraction result is more accurate. And a global self-adaptive upsampling and multi-scale information strong fusion strategy is fused, the problem of insufficient fusion of feature maps with different resolutions is solved, and more spatial position information and advanced semantic information are reserved. The triple attention mechanism TAM is added, and the problem that the division of the slender categories such as 'water body', 'road' and the like is discontinuous or unpredictable during the division is solved. The remote sensing image information is introduced into a network training process, segmentation loss is generated by comparing tag true values, the training process of the semantic segmentation network is effectively guided, and the extraction capability of detail texture features is improved while the feature classification is carried out. The invention can extract the remote sensing image ground feature information with different resolutions, more accurately divide the ground feature categories, extract the ground feature categories with different colors, texture features and different scales, and improve the extraction capability of the slender category.
Disclosure of Invention
The technical solution of the invention is as follows: a multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing images based on HRNet is provided. The ground feature element extraction task of the remote sensing image is realized, and the precision and connectivity are ensured.
The technical scheme of the invention is as follows: a method for extracting multi-scale strong fusion semantic segmentation of a ground object of a remote sensing image based on HRNet comprises the steps of firstly obtaining remote sensing data, dividing the data into data sets, and forming a ground element extraction sample data set. Based on the feature element extraction sample data set, a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism is constructed, based on the constructed MT-HRNet semantic segmentation model, training is carried out on the training set, model parameters are optimized, and a preliminary feature element extraction result is obtained. And calculating the segmentation loss based on the preliminary feature element extraction result and the real label of the remote sensing image. The segmentation loss can guide the MT-HRNet feature extraction network to perform sufficient feature extraction, and the segmentation precision is improved. Until the MTC-HRNet model converges. And predicting the test set based on the converged MT-HRNet semantic segmentation model to obtain a feature element extraction result. The method comprises the following specific steps:
(1) Remote sensing data preprocessing and data set partitioning.
In the step (1), the remote sensing data is preprocessed to be processed into a training file in the VOC format, and a remote sensing image sample data set is generated.
Further, the remote sensing data tag is subjected to VOC format adjustment. The processing may be performed by Python script. The input picture is a JPG picture, and the label is a PNG picture. The value of each pixel of the tag is the type to which that pixel belongs. Such as: the background pixel value is 0, the target class 1 pixel value is 1, the target class 2 pixel value is 2, the target class 3 pixel value is 3, the following are included.
(2) Performing feature element extraction on the remote sensing image sample data set obtained in the step (1) to construct a semantic segmentation model MT-HRNet, wherein the method specifically comprises the following steps of:
(a) And constructing a remote sensing image ground feature element feature extraction network by adopting structures of parallel sub-networks with different resolutions. A 256×256 size three-channel picture is input, and the input is first downsampled twice using a convolution layer (containing BN and ReLU) with a convolution kernel size of 3×3, a step size of 2, and a padding of 1. At this point a total of four times down-sampling occurs. And then, carrying out channel number adjustment through a Layer1 module without changing the size of the feature diagram, wherein the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with more network layers. Then through a series of transitions and Stage structures. Every time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the branch feature maps H and W are halved, and the channel number C is doubled. Layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step sizes of 1 and 2 respectively, and the padding of 1 respectively to generate two branches with different resolutions, namely, the scales of four times and eight times are sampled downwards on the basis of an original image. Transition2 is a scale that is increased by a factor of 16 over the original two branches. Transit 3 is the same, adding a 32-fold downsampling scale to the previous one. And fusing information with different scales through a Stage structure in the network. The single branch in each Stage, with 4 superimposed BasicBlock, basicBlock modules, is less for the number of network layers in the ResNet. The high-resolution representation is maintained in the whole process, and the feature extraction capability of the model on the remote sensing image can be improved by repeatedly fusing information with different scales, so that the feature extraction result is more accurate.
(b) And constructing a global self-adaptive up-sampling module GSAU to unify different scale resolutions. The GSAU module performs a 3×3 convolution operation on the low resolution features, reducing the number of channels of the feature map. Global context information acquired from the high resolution feature map is subjected to a convolution of 1 x 1, a batch normalization operation BN and a deconvolution operation ReLU, and then multiplied by the low resolution feature map, at which time feature mapping of the high resolution spatial information to low resolution class localization has been completed. And finally, adding the high-resolution feature map and the weighted low-resolution feature map to obtain a feature map after primary fusion. The high resolution feature map of the upper branch is downsampled and then added to the feature map just processed to complete the global adaptive upsampling operation. Notably, the global adaptive upsampling GSAU module only acts during feature fusion where there is an upsampling operation.
(c) And constructing a multi-scale strong fusion module MSSFM to solve the problem of insufficient fusion of the features with different resolutions. And taking the GSAU module output characteristic diagram as an input characteristic diagram of the MSSFM module. The feature map compresses space information through global average pooling and global maximum pooling, and outputs more proper one-dimensional channel weight parameters, wherein the weight parameters are processed through a convolution of 1 multiplied by 1 and a BN layer, a non-linear ReLU activation function, and then the final channel weight attention parameters are obtained through convolution of 1 multiplied by 1 and Sigmoid. And finally multiplying the weight with the processed high-resolution feature images, namely guiding better fusion of the feature images with different resolutions by using the channel weight attention parameter, so that the feature images are more reliable. And finally, accumulating the processed high-resolution feature map to output again. This is to effectively alleviate the problems of gradient extinction, gradient explosion and degradation by using the residual concept. The feature map processed by the multi-scale strong fusion module MSSFM retains more space information and advanced semantic information. To ensure that the predicted image is more similar to the layout of the feature elements in the label image. The output of the down-sampled four-times branch in Stage2 is processed by the down-sampled eight-times branch through the GASU module and then through the MSSFM module together with the down-sampled four-times branch in Stage 2. Stage3 is similar to Stage 2.
(d) The four-fold BasicbLock processed feature maps per stack in Stage4 all need to be processed by the triple attentiveness mechanism TAM module. The three parallel branches of the TAM module are used to capture the dependency between the (C, H), (C, W) and (H, W) dimensions of the input tensor, respectively. The top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, and the bottom branch establishes the dependency relationship between H and W. And finally, summarizing the three-branch weights through simple average. The Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and the feature map is processed by using average pooling and maximum pooling, so that the layer can furthest reserve rich features of actual tensors, and meanwhile, the layer number of the neural network is reduced, so that the module is lighter. The same shape as the TAM input is obtained after TAM processing, but with a fine tensor of more detailed features. To improve the segmentation accuracy of slender categories such as 'roads', 'water bodies', etc. The output of each TAM module needs to be subjected to multi-scale feature strong fusion through the MSSFM module again.
(3) Based on the constructed MT-HRNet semantic segmentation network, training is carried out on a training set, and model parameters are optimized until the MT-HRNet semantic segmentation network converges.
(a) And (5) training a feature model of the ground object. The model learning rate was initially set to 0.004, with a minimum learning rate of 0.00004. The optimizer is sgd, the internal parameter momentum of the optimizer is set to 0.9, and the learning rate is reduced by cos. The batch number batch_size of model training is set to 16, the iteration number epoch is set to 200, and model training results MPA and MIoU are observed.
(b) And (5) verifying a road feature model. And (3) reserving an optimal model in the training round of the process (a), performing first-round verification on the verification set by using the model to generate a prediction set, and checking a prediction result.
(c) Repeating the processes of (a) and (b) until the model precision reaches a usable level, and reserving an optimal model, so as to finish the construction of the remote sensing image ground feature element extraction model.
(4) Based on the converged MT-HRNet semantic segmentation model, predicting the test set to obtain a feature element extraction visualization result. The method comprises the following specific steps: and (3) extracting feature features of the ground object element from the test set by using an MT-HRNet model, and storing extraction results MPA and MIoU, and a remote sensing image integral dataset segmentation result visualization.
Compared with the prior art, the invention has the advantages that:
1. and the high-resolution network HRNet is used for extracting ground feature elements of the remote sensing image, and the characterization of the high-resolution feature map is kept consistent in the whole training process, and different resolution information is fused repeatedly. The feature extraction capability of the network is improved, and the feature element extraction result is more accurate.
2. And the global self-adaptive up-sampling is fused, the unification of different scale resolutions is carried out, and the feature images with the same size as the high resolution are output, so that the robustness of the input space change is effectively maintained, and the unique characteristics in the feature images are learned, thereby obtaining finer space information. In addition, a multi-scale information strong fusion strategy is adopted, the problem of insufficient fusion of feature images with different resolutions is solved, more spatial position information and advanced semantic information are reserved, and the model segmentation precision is improved.
3. And adding a triple attention mechanism in the last Stage of the network, fully utilizing cross-channel interaction information, effectively helping channel information and space detail information to effectively propagate in the network, and acquiring the fine tensor with more detail characteristics in the same shape. The problem of losing the texture information of the fine strip-shaped target is solved, and the extraction effect of the ground feature elements is improved.
Drawings
Fig. 1 is an overall flow chart of the present invention.
FIG. 2 is a Stage and Transition infrastructure
Fig. 3 is a network configuration diagram of GSAU.
Fig. 4 is a network configuration diagram of the MSSFM.
Fig. 5 is a network configuration diagram of the triple attention mechanism module TAM.
FIG. 6 is a visual illustration of the segmentation results of HRNet and MT-HRNet
Detailed Description
In order to make the solution of the embodiment of the present invention better understood by those skilled in the art, the embodiment of the present invention is further described in detail below with reference to the accompanying drawings and embodiments.
The invention comprises the following steps:
1. remote sensing data preprocessing and data set partitioning. And adjusting the VOC format of the remote sensing data tag. The processing may be performed by Python script. The input picture is a JPG picture, and the label is a PNG picture. The value of each pixel of the tag is the type to which that pixel belongs. Such as: the background pixel value is 0, the target class 1 pixel value is 1, the target class 2 pixel value is 2, the target class 3 pixel value is 3, and so on.
2. Extracting ground feature elements from a remote sensing image sample dataset to construct a semantic segmentation model, wherein the method specifically comprises the following steps of: constructing a feature learning model of the feature, training the feature model of the feature, testing the feature model of the feature, and storing an optimal model;
constructing a ground object element feature learning model MT-HRNet: 256×256 three-channel pictures are input, the input is first downsampled twice using a convolution layer (comprising BN and ReLU) with a convolution kernel size of 3×3 and a step size of 2 and a step size of 1, and each downsampling twice passes through a convolution layer with a convolution kernel of 3×3 and a step size of 2 and a step size of 1, and finally passes through the BN layer, at this time, the downsampling is four times in total. And then, carrying out channel number adjustment through a Layer1 module without changing the size of the feature map, wherein the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with a large number of network layers. Then through a series of transitions and Stage structures. Every time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the branch feature maps H and W are halved, and the channel number C is doubled. Layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step sizes of 1 and 2 respectively, and the padding of 1 respectively to generate two branches with different resolutions, namely, the scales of four times and eight times are sampled downwards on the basis of an original image. Transition2 is a scale that is increased by 16 times by one downsampling based on the original two branches. Transit 3 is the same, adding a 32-fold downsampling scale to the previous one. The main work of the Stage structure in the network is to fuse information of different scales. Stage2 first uses four BasicBlock, basicBlock modules to get the result of the ResNet with fewer network layers and the output of each branch is the result of the fusion of all branches. The output of the branch with four times of downsampling in Stage2 is that the branch with eight times of downsampling is processed by the GASU module, and then is processed by the MSSFM module together with the branch with four times of downsampling in Stage2, so that more remote sensing image space information and high-level semantic information are reserved, and the predicted image and the layout of ground feature elements in the tag image are ensured to be more similar. Stage3 branches are also similar. The four-fold BasicbLock processed feature maps per stack in Stage4 all need to be processed by the triple attentiveness mechanism TAM module. The three parallel branches of the TAM module are used to capture the dependency between the (C, H), (C, W) and (H, W) dimensions of the input tensor, respectively. The top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, and the bottom branch establishes the dependency relationship between H and W. And finally, summarizing the three-branch weights through simple average. The Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and the feature map is processed by using average pooling and maximum pooling, so that the layer can furthest reserve rich features of actual tensors, and meanwhile, the layer number of the neural network is reduced, so that the module is lighter. The same shape as the TAM input is obtained after TAM processing, but with a fine tensor of more detailed features. To improve the segmentation accuracy of slender categories such as 'roads', 'water bodies', etc. The output of each TAM module needs to be subjected to multi-scale feature strong fusion through the MSSFM module again.
And (3) training a feature model of the ground object element: the model learning rate was initially set to 0.004, with a minimum learning rate of 0.00004. The optimizer is sgd, the internal parameter momentum of the optimizer is set to 0.9, and the learning rate is reduced by cos. The batch number batch_size of model training is set to 16, the iteration number epoch is set to 200, and model training results MPA and MIoU are observed.
And (3) feature model test of the ground object element: the optimal model in the feature model training round of the feature of the ground object is reserved, the model is used for carrying out a first round of test on the test set to generate a prediction set, and the prediction result is checked
Storing an optimal model: and repeating the processes of model training and model testing until the model precision reaches a usable level, and reserving an optimal model, so that the remote sensing image ground feature element extraction model construction is completed.
3. The method for extracting the ground object elements of the remote sensing image data based on the MT-HRNet semantic segmentation model comprises the following specific steps: extracting feature features of the ground object and storing segmentation results;
extracting feature features of ground object elements: and (3) putting all remote sensing image data sets into a test set, extracting the ground feature elements by using an MT-HRNet semantic segmentation model, and visualizing the ground feature element extraction result, namely, MPA and MIoU, namely, the extracted data of the ground feature elements.
Storing a segmentation result: and (3) adopting a centralized high-performance computing facility or utilizing a distributed computing environment and a storage structure to extract the ground feature elements and then physically store the data.
Table 1 experimental results
It should be noted that the method of the embodiment of the invention is suitable for extracting the ground feature elements of the remote sensing images with different resolutions.
The foregoing has described in detail embodiments of the invention, which are presented herein with particular reference to the drawings and are presented solely to aid in the understanding of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (5)

1. The method is characterized by firstly acquiring remote sensing data, and dividing the data into data sets to form a ground feature element extraction sample data set; based on a feature element extraction sample data set, constructing a multi-scale strong fusion semantic segmentation network MT-HRNet fusing a triple attention mechanism, training on a training set based on a constructed MT-HRNet semantic segmentation model, and optimizing model parameters to obtain a preliminary feature element extraction result; calculating segmentation loss based on the primary feature element extraction result and the real label of the remote sensing image; the segmentation loss guides the MT-HRNet feature extraction network to perform feature extraction until the MTC-HRNet model converges; and predicting the test set based on the converged MT-HRNet semantic segmentation model to obtain a feature element extraction result.
2. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method according to claim 1, wherein the semantic segmentation method comprises the following specific steps:
(1) Preprocessing remote sensing data and dividing a data set; preprocessing remote sensing data, processing the remote sensing data into a training file in a VOC format, and generating a remote sensing image sample data set;
(2) Performing feature element extraction on the remote sensing image sample data set obtained in the step (1) to construct a semantic segmentation model MT-HRNet, wherein the method specifically comprises the following steps of:
(a) Constructing a remote sensing image ground feature extraction network by adopting structures of parallel sub-networks with different resolutions; inputting a three-channel picture with 256 multiplied by 256, firstly, performing two downsampling on the input by using a convolution layer with a convolution kernel size of 3 multiplied by 3 and a step length of 2 and a padding of 1, wherein the downsampling is four times; the channel number is adjusted through a Layer1 module without changing the size of a feature diagram, the Layer1 is formed by repeatedly stacking Bottleneck, and the Bottleneck module is a ResNet network with more network layers and is in a Transition and Stage structure; each time the feature map passes through a Transition structure, a branch with different resolutions is newly added on the basis of the original branch, the feature maps H and W of the branch are halved, and the channel number C is doubled; layer1 is processed by a Transition1 module, namely a convolution Layer +BN +ReLU with the convolution kernel size of 3 multiplied by 3, the step length of 1 and 2 and the padding of 1 respectively to generate two branches with different resolutions, namely the scales of four times and eight times are sampled downwards on the basis of an original image; transition2 is a scale of increasing one downsampled 16 times on the basis of the original two branches; a downsampling scale of 32 times is added to Transition 3; information of different scales is fused in a network through a Stage structure; the single branch in each Stage is overlapped with 4 BasicbLock modules, and the Basicblock modules are used for the network layers in ResNet; the characteristic extraction capability of the remote sensing image is improved by maintaining the high-resolution representation in the whole process and repeatedly fusing the information of different scales, so that the feature extraction result is more accurate;
(b) Constructing a global self-adaptive up-sampling module GSAU to unify different scale resolutions; the GSAU module executes 3×3 convolution operation on the low resolution feature, global context information obtained from the high resolution feature map is subjected to 1×1 convolution, batch normalization operation BN and branching transformation operation ReLU, and then multiplied by the low resolution feature map to complete feature mapping of the high resolution space information on low resolution category positioning; finally, adding the high-resolution feature map and the weighted low-resolution feature map to obtain a feature map after preliminary fusion; the high-resolution feature map of the upper branch is added with the feature map just processed after being downsampled to complete the global self-adaptive upsampling operation; the global self-adaptive up-sampling GSAU module only acts in the feature fusion process of up-sampling operation;
(c) Constructing a multi-scale strong fusion module MSSFM, and taking the GSAU module output feature map as an input feature map of the MSSFM module; compressing space information by global average pooling and global maximum pooling of the feature map, outputting more proper one-dimensional channel weight parameters, processing the weight parameters by a convolution of 1 multiplied by 1 and a BN layer, and obtaining final channel weight attention parameters by a convolution of 1 multiplied by 1 and a Sigmoid; finally multiplying the channel weight with the processed high-resolution feature map; accumulating the processed high-resolution feature images to output again;
(d) The feature map after being processed by Basicblock for four times in each stack in Stage4 is processed by a triple attention mechanism TAM module; the three parallel branches of the TAM module are respectively used for capturing the dependency relationship between the (C, H), (C, W) and (H, W) dimensions of the input tensor; the top branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension W, the middle branch is responsible for calculating the attention weight of the cross-channel dimension C and the space dimension H, the bottom branch establishes the dependency relationship between H and W, and the three branch weights are summarized on average; the Z-Pool layer is responsible for reducing tensors of channel dimensions to two dimensions, and processing the feature map by using average pooling and maximum pooling; the same shape as the input of the TAM module is obtained after the TAM module is processed, and the output of each TAM module is required to be subjected to multi-scale feature strong fusion again through the MSSFM module;
(3) Training on a training set based on the constructed MT-HRNet semantic segmentation network, and optimizing model parameters until the MT-HRNet network converges;
(4) Based on the converged MT-HRNet semantic segmentation model, predicting a test set to obtain a feature element extraction visualization result; the method specifically comprises the following steps: and (3) extracting feature features of the ground object element from the test set by using an MT-HRNet model, and storing extraction results MPA and MIoU, and a remote sensing image integral dataset segmentation result visualization.
3. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein in the remote sensing data preprocessing and data set partitioning, the remote sensing data tag is subjected to VOC format adjustment and is processed through a Python script; the input picture is a JPG picture, and the label is a PNG picture; the value of each pixel of the tag is the type to which that pixel belongs.
4. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein the feature map processed by the multi-scale strong fusion module MSSFM is constructed in the multi-scale strong fusion module MSSFM, and more space information and advanced semantic information are reserved; ensuring that the predicted image is similar to the layout of the ground feature elements in the tag image; the output of the down-sampled four-times branch in Stage2 is processed by the down-sampled eight-times branch through the GASU module and then through the MSSFM module together with the down-sampled four-times branch in Stage 2.
5. The HRNet-based remote sensing image ground feature extraction multi-scale strong fusion semantic segmentation method of claim 2, wherein training optimization model parameters on a training set specifically comprises (a) ground feature element feature model training; setting the model learning rate to 0.004 at the initial stage, and setting the minimum learning rate to 0.00004; the optimizer is sgd, the internal parameter momentum of the optimizer is set to be 0.9, and the learning rate is reduced by cos; batch number batch_size of model training is set to 16, iteration number epoch is set to 200, and model training results MPA and MIoU are observed;
(b) Verifying a road feature model; reserving an optimal model in the training round in the process (a), performing first-round verification on the verification set by using the model to generate a prediction set, and checking a prediction result;
(c) Repeating the processes of (a) and (b) until the model precision reaches a usable level, and reserving an optimal model, so as to finish the construction of the remote sensing image ground feature element extraction model.
CN202310337060.1A 2023-03-31 2023-03-31 HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image Pending CN116486075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310337060.1A CN116486075A (en) 2023-03-31 2023-03-31 HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310337060.1A CN116486075A (en) 2023-03-31 2023-03-31 HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image

Publications (1)

Publication Number Publication Date
CN116486075A true CN116486075A (en) 2023-07-25

Family

ID=87214720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310337060.1A Pending CN116486075A (en) 2023-03-31 2023-03-31 HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image

Country Status (1)

Country Link
CN (1) CN116486075A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350171A (en) * 2023-12-04 2024-01-05 山东省计算中心(国家超级计算济南中心) Mesoscale vortex three-dimensional subsurface structure inversion method and system based on double-flow model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350171A (en) * 2023-12-04 2024-01-05 山东省计算中心(国家超级计算济南中心) Mesoscale vortex three-dimensional subsurface structure inversion method and system based on double-flow model
CN117350171B (en) * 2023-12-04 2024-03-12 山东省计算中心(国家超级计算济南中心) Mesoscale vortex three-dimensional subsurface structure inversion method and system based on double-flow model

Similar Documents

Publication Publication Date Title
CN111191736B (en) Hyperspectral image classification method based on depth feature cross fusion
CN113850825A (en) Remote sensing image road segmentation method based on context information and multi-scale feature fusion
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN113160234B (en) Unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation
CN111738111A (en) Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid
CN109886330B (en) Text detection method and device, computer readable storage medium and computer equipment
CN113901900A (en) Unsupervised change detection method and system for homologous or heterologous remote sensing image
CN113066037B (en) Multispectral and full-color image fusion method and system based on graph attention machine system
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN112734789A (en) Image segmentation method and system based on semi-supervised learning and point rendering
CN116486075A (en) HRNet-based multi-scale strong fusion semantic segmentation method for extracting ground features of remote sensing image
CN116740527A (en) Remote sensing image change detection method combining U-shaped network and self-attention mechanism
CN112766409A (en) Feature fusion method for remote sensing image target detection
CN116258976A (en) Hierarchical transducer high-resolution remote sensing image semantic segmentation method and system
CN112950780A (en) Intelligent network map generation method and system based on remote sensing image
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN113903022A (en) Text detection method and system based on feature pyramid and attention fusion
CN116090517A (en) Model training method, object detection device, and readable storage medium
CN117237808A (en) Remote sensing image target detection method and system based on ODC-YOLO network
CN113313180B (en) Remote sensing image semantic segmentation method based on deep confrontation learning
CN114519819A (en) Remote sensing image target detection method based on global context awareness
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN113111740A (en) Characteristic weaving method for remote sensing image target detection
CN112686184A (en) Remote sensing house change detection method based on neural network
CN116012709B (en) High-resolution remote sensing image building extraction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination