CN113505792B - Multi-scale semantic segmentation method and model for unbalanced remote sensing image - Google Patents

Multi-scale semantic segmentation method and model for unbalanced remote sensing image Download PDF

Info

Publication number
CN113505792B
CN113505792B CN202110739174.XA CN202110739174A CN113505792B CN 113505792 B CN113505792 B CN 113505792B CN 202110739174 A CN202110739174 A CN 202110739174A CN 113505792 B CN113505792 B CN 113505792B
Authority
CN
China
Prior art keywords
layer
segmentation
network
semantic segmentation
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110739174.XA
Other languages
Chinese (zh)
Other versions
CN113505792A (en
Inventor
聂婕
王成龙
魏志强
时津津
叶敏
陈昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202110739174.XA priority Critical patent/CN113505792B/en
Publication of CN113505792A publication Critical patent/CN113505792A/en
Application granted granted Critical
Publication of CN113505792B publication Critical patent/CN113505792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-scale semantic segmentation method and a multi-scale semantic segmentation model for an unbalanced remote sensing image, wherein the multi-scale semantic segmentation model for the unbalanced remote sensing image adopts a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information; the whole network architecture is divided into three layers, each layer adopts different network structures to extract features of different scales, outputs segmented images of different resolutions, and fuses the features after images are fused in the same layer by adopting a Bayesian fusion method, so that the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized; the multi-scale semantic segmentation method for the unbalanced remote sensing image adopts an optimization algorithm which can enable pixels of different categories to be more separated and pixels of the same category to be more aggregated, so that the semantic segmentation network model can realize uniform segmentation on category unbalanced data.

Description

Multi-scale semantic segmentation method and model for unbalanced remote sensing image
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to a multi-scale semantic segmentation method and model for unbalanced remote sensing images.
Background
With the development of earth observation technology and the progress of image acquisition technology, remote sensing images provide massive research data for earth observation and discovery. The remote sensing image is subjected to content analysis by image processing and artificial intelligence technology, so that the method is an effective method for fully mining remote sensing data. The main means include scene classification, object recognition, semantic segmentation, etc. The semantic segmentation is one of important technologies for content analysis of remote sensing images, and targets and areas contained in the images are segmented by deducing semantic categories of individual pixels of the images.
The current image semantic segmentation method is a semantic segmentation method based on deep learning, and classical deep learning semantic segmentation networks comprise Full Convolution Networks (FCNs), segNet, U-Net and the like. The FCN adopts the convolution layers for the first time, so that an end-to-end segmentation network is realized, and the FCN can accept input images with any size and output segmented images with the same size. But does not fully combine context information and correlation between pixels, the segmentation accuracy is not sufficient. The U-Net network can be suitable for segmentation of multi-scale and large-size images by realizing a cascading function in a channel dimension. However, U-Net requires relatively high computational power and relatively slow computational speed from the device. The SegNet network can improve the memory utilization and the model segmentation efficiency by using the maximum pooled index stored in the encoding stage in the decoding stage. However, when the low-resolution feature map is pooled, the information of adjacent pixels is ignored, and the precision is lost. The pyramid scene parsing network (PSPNet) can fully utilize global context information by learning multi-level features using pyramid pooling modules. But does not fully utilize the entire scene information.
Because of the significant differences in resolution, spatial structure and semantics of the remote sensing image and the common image, the traditional method and the common neural network method are difficult to realize efficient segmentation aiming at the characteristics of the remote sensing image. Semantic segmentation of remote sensing images still faces the following challenges:
firstly, the semantic category distribution processed by the existing natural image segmentation method is relatively balanced, and the phenomenon that a certain category occupies a large proportion of an image in a remote sensing image is not considered. The foreground and background of the remote sensing image are unbalanced due to the difference of the real physical entity distribution of the earth surface, and the scale difference of different types of objects is larger. Secondly, a deep learning segmentation model suitable for a natural image is insensitive to scale changes of an object, and pixel precision can be lost when the model is directly used for semantic segmentation of a remote sensing image. Compared with the categories such as land, lake and the like, the categories such as vehicles and the like in the remote sensing image have negligible volume, and large scale change exists between objects. Therefore, the previous semantic segmentation method is not suitable for being directly applied to the remote sensing image, and a corresponding segmentation algorithm needs to be designed according to the characteristics of the remote sensing image.
Because of the diversity of remote sensing image acquisition and the specificity of the data itself to distinguish from natural images, its semantic segmentation does not solve the problem in a single way before. The remote sensing image has the problem of multi-scale change of the object, the large-scale object is dominant in the segmentation, and meanwhile, the learning of the small-scale object can be restrained, so that the small-class object is difficult to identify. In addition, due to the high resolution characteristic of the remote sensing image, the information contained in the image is usually dense, so that the problem of unbalanced distribution of image categories is caused.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a multi-scale semantic segmentation method and model for unbalanced remote sensing images, which solve the technical problems that: (1) the problem of large scale difference of objects in the remote sensing image. And (2) the problem of unbalanced category distribution of the remote sensing image. Aiming at the first problem, the invention provides a multi-scale semantic segmentation model for unbalanced remote sensing images, designs a multi-level semantic segmentation network, extracts features of different scales, fuses the features at the same level, realizes complementation of missing information, fully utilizes global context information, overcomes the mutual influence of multi-scale objects on the premise of retaining local detail information of the images, and improves the robustness and accuracy of remote sensing image segmentation. In order to solve the second problem, the invention carries out algorithm design from two aspects: 1) Constructing an inter-class loss function, and maximizing class spacing of samples of different classes; 2) And constructing a class weight balanced distribution loss function, and solving the problem of unbalance of positive and negative samples of all classes.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
firstly, the invention provides a multi-scale semantic segmentation model for unbalanced remote sensing images, which adopts a multi-level semantic segmentation network capable of learning fine-grained local features, preserving small-class information, learning whole global context semantic features and preserving large-scale information; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after images are fused in the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized.
Further, the first Level 1 of the multi-Level semantic segmentation network model adopts data with original resolution, the second Level 2 adopts data after downsampling by 2 times, and the third Level 3 adopts data after downsampling by 4 times;
the main network adopted by the multi-level semantic segmentation network model is a SegNet semantic segmentation network, the left side of the network is an encoder, the network is composed of 5 convolution pooling processes, each of the first two layers comprises two convolution layers, and each of the last three layers comprises three convolution layers;
the right of the network is a decoder which is composed of 5 up-sampling and convolution processes, the first layer and the fourth layer on the right are an up-sampling layer and two convolution layers, the second layer and the third layer are an up-sampling layer and three convolution layers, the fifth layer is an up-sampling layer and two convolution layers, and finally a Softmax layer is added.
Further, each layer of the encoding stage corresponds to each layer of the decoding stage one by one; each up-sampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the up-sampling layer of the decoder uses the index reserved in the maximum pooling process to up-sample the feature map, so that the features of the image classification in the encoding stage are reproduced, dense feature maps are generated, and finally the feature maps are restored to the same size as the original image, and classified by the softmax layer, namely the final segmentation map is generated.
Further, the three-layer network output segmentation map O of the multi-layer semantic segmentation network 1 ,O 2 And O 3 When the segmentation graphs output by the three layers of networks are fused after the images are processed, selecting any one layer of segmentation graph output as priori O i I=1, 2,3, any one of the divided maps O except the divided map j The likelihood, j+.i, j= {1,2,3}, is calculated, so the posterior probability is calculated as:
n represents the class of the current pixel, m represents the number of classes; f (F) ni And B ni Respectively representing a foreground region and a background region when the category is n; o (O) ni As a priori i=1, 2,3, o nj For computing likelihood, j+.i, j= {1,2,3}; in each region, by comparing O ni And O nj Likelihood is calculated at the foreground and background of each category.
Preferably, when the segmentation graphs output by the three layers of networks are fused after images are fused, the output segmentation graphs of the first two layers are fused; then fusing the segmentation map obtained by fusing the first two layers with the segmentation map output by the third layer, specifically:
firstly, using a segmentation map output by a first layer of network as a priori, calculating likelihood by using a segmentation map output by a second layer of network, and then merging information of the two segmentation maps based on a Bayesian formula;
then exchanging the two, taking the segmentation map output by the second layer as a priori, calculating likelihood ratio by using the segmentation map output by the first layer, and integrating based on a Bayesian formula;
finally, the segmentation maps of the first two network layers and the third network layer are fused in the same way to obtain the final integrated segmentation map.
Then, the invention also provides a multi-scale semantic segmentation method for the unbalanced remote sensing image, which comprises the following steps:
1. improving semantic segmentation network architecture: constructing a multi-level semantic segmentation network model, outputting different scale segmentation graphs by each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method;
2. equalizing the loss function: the optimization algorithm which can separate pixels of different categories and aggregate pixels of the same category is adopted, and the optimization algorithm is specifically as follows:
1) Constructing class weight balanced distribution loss functions based on Focal loss functions, and solving the problem of unbalance of positive and negative samples of all classes;
2) A Hinge loss function Hinge loss is introduced to construct an inter-class loss function, so that class spacing maximization of samples of different classes is realized; 3) Equalizing the loss function: and constructing an overall loss function.
Further, the class weight is balanced to distribute the loss function
Wherein p is t The probability of positive for a class tsample, M is the number of sample classes, t represents a class, γ is the hyper-parameter, -log (p t ) Is an initial cross entropy loss function; lambda is an adjustable super parameter, set to 0<λ<1, for the purpose of increasing the division of different samplesThe adjustability of class accuracy reduces the punishment weight of complex samples and increases the punishment contribution of good samples.
Further, the inter-class loss function is
Hinge=max(0,1+maxw wrong -w correct ) (11)
Wherein w is wrong The number of samples misclassified, w correct The number of correctly classified samples is equal to w wrong Taking the maximum value indicates that the class with the most erroneous samples is selected.
Further, the overall loss function is as follows
Beta is a super parameter controlling the contribution rate of the finger loss penalty term, beta > 0.
Compared with the prior art, the invention has the advantages that:
(1) The invention provides a multi-level semantic segmentation network, extracts features with different scales, fuses the features at the same level, realizes the complementation of missing information, can learn local features with fine granularity, retain small-class information, can learn semantic features of the whole global context, fully utilizes global context information and retains large-scale information; on the premise of retaining the local detail information of the image, the mutual influence of the multi-scale objects is overcome, and the robustness and accuracy of remote sensing image segmentation are improved.
(2) The invention also provides a Bayes-based multi-scale post-fusion semantic segmentation method, aiming at the characteristic of scale dependency of remote sensing images, different scale results are respectively modeled as priori and likelihood by researching the multi-scale post-fusion method, the Bayes principle is utilized to make optimal decisions, and the accuracy of segmentation can be improved by verifying the method.
The Bayesian fusion method can better identify the semantic information of the object, the outline and the category distribution of the whole object are relatively clearer, the boundaries of the objects in different categories are more obvious, and the method also improves the performance of the network output segmentation map.
(3) The invention designs an equalization loss function of semantic segmentation of the unbalanced remote sensing image, and aims at unbalanced characteristics of semantic distribution of the remote sensing image, particularly unbalanced foreground and background caused by space difference, and based on a focusing loss thought, the loss weight of a difficult training sample is balanced, the weight of the easy-to-learn category is reduced, the weight of the difficult-to-learn category is increased, and the training stability is improved; meanwhile, hinge loss is introduced, the distance between classes is enlarged, and the boundaries of samples of different classes are more obvious, so that the accuracy of a segmentation structure and the definition of local information classification are improved.
The invention can realize uniform segmentation of the semantic segmentation network model on the unbalanced data of the category by the equalization loss algorithm.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a multi-level semantic segmentation network architecture of the present invention;
FIG. 2 is a first layer segmentation network codec structure according to the present invention;
FIG. 3 is a layer two split network codec structure of the present invention;
FIG. 4 is a third layer split network codec structure of the present invention;
fig. 5 is a schematic diagram of a bayesian image fusion algorithm according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples.
Example 1
The embodiment designs a multi-level deep neural network model, and particularly provides a multi-scale semantic segmentation model for unbalanced remote sensing images, which adopts a multi-level semantic segmentation network capable of learning fine-grained local features, preserving small-class information, learning whole global context semantic features and preserving large-scale information. The network architecture is shown in fig. 1. The whole network architecture of the multi-Level semantic segmentation network is divided into three layers, wherein the first Level corresponds to Level 1 in fig. 1, and data with original resolution is adopted; the second Level corresponds to Level 2 in fig. 1, with the data downsampled 2 times; the third Level corresponds to Level 3 in fig. 1, with the data after 4 times downsampling. And obtaining more local information and global information by downsampling the remote sensing image twice.
Aiming at multi-scale information, each layer adopts different network structures to extract features with different scales, more visual information is reserved in the feature extraction process as much as possible, and segmented images with different resolutions are output; and then, fusing the images of the features at the same level by adopting a Bayesian fusion method, so as to realize fusion of multi-scale segmentation image information, realize complementation of missing information and solve accurate segmentation of the remote sensing image.
The multi-level network model for segmentation has the advantages of keeping good local detail information and keeping global semantic information better compared with other classical deep neural network segmentation models.
1. The multi-level network architecture of the present embodiment is described in detail below:
the backbone network adopted in this embodiment is a SegNet semantic segmentation network, as shown in fig. 2,3 and 4, which respectively correspond to Level 1,Level 2,Level 3 in fig. 1. To the left of the network is an encoder, consisting of a process of 5 convolution pools, each of the first two layers comprising two convolution layers and each of the last three layers comprising three convolution layers. Features of the image are extracted through the convolution layer, and then the pooling layer is used for reducing the size of the feature map and increasing the receptive field. The pooling layer uses maximum pooling to achieve spatial invariance over small spatial movements, but can result in loss of positioning accuracy and loss of spatial detail.
The right of the network is a decoder which is composed of 5 up-sampling and convolution processes, the first layer and the fourth layer on the right are an up-sampling layer and two convolution layers, the second layer and the third layer are an up-sampling layer and three convolution layers, the fifth layer is an up-sampling layer and two convolution layers, and finally a Softmax layer is added.
Each layer of the encoding stage and each layer of the decoding stage are in one-to-one correspondence, similar to the U-shaped structure of a U-net network. At each upsampling layer of the decoder corresponds to the largest pooling layer of the same-level encoder, the upsampling layer of the decoder uses the index reserved by the largest pooling process for upsampling feature maps. Features of the image classification at the encoding stage are reproduced through the upsampling layer to generate dense feature maps, and finally the feature maps are restored to the same size as the original image, and classified through the softmax layer to generate the final segmentation map. The SegNet training parameters are few, the occupied calculation memory is smaller, the segmentation accuracy can be ensured, and the method is suitable for semantic segmentation of high-resolution remote sensing images.
2. The following describes the image post-fusion method of the present embodiment in detail:
the multi-scale network can output segmented images with different resolutions, and the segmentation effect is different due to the different resolutions. The patent provides a multi-scale post-fusion semantic segmentation method based on a Bayesian principle, wherein in a saliency detection task, posterior probability is calculated by integrating saliency mapping:
S 1 and S is 2 Are saliency maps, one of which is used as a priori probability S i (i= {1,2 }) another S j (j+.i, j= {1,2 }) is used to calculate likelihood; f (F) i And B i Representing foreground and background regions, respectively, the likelihood in each region is calculated by the following formula:
wherein the method comprises the steps ofRepresenting the number of pixels in the foreground, +.>Is that its color features fall into inclusion feature S j Is->A pixel number in (a) is determined; />Representing the number of pixels in the background,/->Is that its color features fall into inclusion feature S j Background bin->The number of pixels in the display panel.
The three-layer network output segmentation graph O of the multi-layer semantic segmentation network 1 ,O 2 And O 3 When the segmentation graphs output by the three layers of networks are fused after the images are processed, selecting any one layer of segmentation graph output as priori O i (i=1, 2, 3), any one of the divided maps O other than the divided map j (j+.i, j= {1,2,3 }) calculate likelihood, so the calculation of posterior probability is:
where n represents the class of the current pixel and m represents the number of classes; f (F) ni And B ni Respectively representing a foreground region and a background region when the category is n; o (O) ni (i=1, 2, 3) as a priori, O nj (j+.i, j= {1,2,3 }) is used to calculate likelihood; in each region, by comparing O ni And O nj Likelihood is calculated at the foreground and background of each category:
representing the number of foreground pixels of the nth class,/->Is composed of feature O nj (z) the number of pixels of the color feature in the foreground region. Using O nj The posterior probability is calculated as a priori.
As a preferred embodiment, when the segmentation map output by the three-layer network is fused after the images, the output segmentation map of the first two layers is fused; and then fusing the segmentation map obtained by fusing the first two layers with the segmentation map output by the third layer, as shown in the following (6) and (7), respectively.
O n4 (z)=O B (O n1 (z),O n2 (z))=p(F n1 |O n2 (z))+p(F n2 |O n1 (z)) (6)
O(z)=O B (O n3 (z),O n4 (z))=p(F n3 |O n4 (z))+p(F n4 |O n3 (z)) (7)
Referring to fig. 5, the following are specifically:
firstly, using a segmentation map output by a first layer of network as a priori, calculating likelihood by using a segmentation map output by a second layer of network, and then merging information of the two segmentation maps based on a Bayesian formula;
then exchanging the two, taking the segmentation map output by the second layer as a priori, calculating likelihood ratio by using the segmentation map output by the first layer, and integrating based on a Bayesian formula;
finally, the segmentation maps of the first two network layers and the third network layer are fused in the same way to obtain the final integrated segmentation map.
Based on Bayesian fusion, different output segmentation graphs are used as priori repeatedly and forcefully, the effective information of the segmentation graphs with different resolutions can be fused, and the image segmentation accuracy is improved.
Example 2
The embodiment provides a multi-scale semantic segmentation method for unbalanced remote sensing images, which comprises the following steps:
1. improved semantic segmentation network architecture
And constructing a multi-level semantic segmentation network model, outputting different scale segmentation graphs by each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method.
The semantic segmentation network may adopt a classical semantic segmentation network, and as a preferred embodiment, the multi-level semantic segmentation network model may directly adopt the model described in embodiment 1, and specific reference may be made to the description of embodiment 1, which is not repeated herein.
2. Equalizing loss function
The optimization algorithm which can separate pixels of different categories and aggregate pixels of the same category is adopted, and the optimization algorithm is specifically as follows:
1) And constructing a class weight balanced distribution loss function based on a focus loss function Focal loss, and solving the problem of unbalance of positive and negative samples of all classes.
2) And introducing a Hinge loss function Hinge loss to construct an inter-class loss function, so as to maximize class spacing of samples of different classes.
3) Equalizing the loss function: and constructing an overall loss function.
The following are respectively described:
1. class weight equalization distribution
In solving the multi-classification problem, the sample categories of the dataset are unevenly distributed, the number of negative samples is too large, and most samples are distinguished to a large extent, which often results in ineffective learning during training. This problem is particularly pronounced in the segmentation of remote sensing images. The purpose of using the Focal loss function is mainly to address the extreme imbalance problem that exists between the background and the foreground of the object detection scenario. The method is improved by adding a modulation factor on the basis of cross entropy loss (which can directly refer to the prior art), and the specific formula is as follows:
in this formula, p t The probability of positive for a class tsample, M is the number of sample classes, t represents a certain class, -log (p t ) Is an initial cross entropy loss function; gamma.gtoreq.0 is an adjustable super parameter, (1-p) t ) γ Is a modulation factor. For easy-to-learn samples, p t The modulation factor tends towards zero when the value of (2) is close to 1. However, for difficult samples and misclassified samples p t The value of the modulation factor is correspondingly increased to balance the training inefficiency due to the sample problem. The method can effectively solve the problem of training failure caused by unbalanced categories in the remote sensing image. Therefore, aiming at the multi-classification problem of remote sensing image segmentation, the inter-class loss function formula is adjusted as follows:
wherein lambda is an adjustable super parameter, set to 0<λ<1, in order to increase the adjustability of classification accuracy of different samples, the weight of penalty items in complex sample training is further reduced on the basis of an original formula (8), the weight of penalty items in simple sample classification is increased, and balance consistency of samples among classes is achieved. For example when p t Higher confidence in this class of samples is indicated, and if λ=1 is set, the penalty weight for alignment will be smaller. Its contribution to the training will decrease. Also, if p t The smaller the classification of the decision sample is, the harder the sample belongs to the complex sample, when lambda is 1, the training weight is larger, the contribution to training is higher, and lambda is setMake it 0<λ<1, the punishment weight of the complex sample is reduced, the punishment contribution of the good sample is increased, and the classification accuracy of the benign sample is further improved, so that the effective setting of lambda can directly find a benign balance between the complex sample and the easy sample, and the classification accuracy of the whole sample is further improved.
2. Inter-class balancing
Because the difference between adjacent samples in the remote sensing image is smaller, how to enlarge the difference between samples is also a problem to be solved, in this embodiment, a range loss (HL) loss function is introduced, HL is generally used in a maximum interval classification task of a Support Vector Machine (SVM), the intra-class distance is reduced, the inter-class distance is increased, so as to realize the maximum boundary, and for binary classification, the formula is as follows:
Hinge=max(0,1-y*y pre ) (10)
in the above formula, y is a label of a real sample, and its value can be only-1 or 1.y is pre Is a predicted value. When the absolute value of this predicted value is 1 or more, the distance between the sample and the boundary is 1 or more, which does not give any rewards, because the probability that the sample can be correctly classified is quite large. Its multi-classification form is as follows:
Hinge=max(0,1+maxw wrong -w correct )
(11)
wherein w is wrong The number of samples misclassified, w correct Is the number of correctly classified samples. For w wrong Taking the maximum value indicates that the class with the most erroneous samples is selected. When the number of the samples of the misclassification is large, the punishment item larger than the training data is given by the formula (11), so that training is promoted to be continued, and only when the number of the misclassified samples is small and the number of the correct training samples is large, the punishment item is automatically reduced, and the training process is accelerated to finish as soon as possible. Therefore, samples in the classes can be promoted to be consistent, the intervals of the samples between the classes are enlarged, and the classification accuracy is improved.
3. Balancing algorithm
Finally, in order to solve the problem of unbalanced categories, the sample intervals of different categories are increased and the accuracy of segmentation is improved, and the overall loss function of the invention is as follows:
wherein, beta is a super parameter for controlling the contribution rate of the finger loss penalty term, and beta is more than 0. Finally, the model parameters are optimized by adopting a classical gradient descent method, so that the optimal parameters are obtained, and further, the training samples are tested.
In summary, the invention improves the semantic segmentation network structure and the design loss function equalization method, adopts the spatial multi-scale parallel post-fusion framework to realize scale difference depiction, can reserve good local detail information, and simultaneously better reserves global semantic information; the loss function is designed based on unbalanced pixel-level sample distribution, and the loss weight of difficult training samples is balanced based on the focusing loss thought, so that the training stability is improved; meanwhile, hinge loss is introduced, the distance between classes is enlarged, and the accuracy of semantic segmentation is improved; the equalization loss function is combined with the classical semantic segmentation network, so that the accuracy of the segmentation structure and the definition of local information classification are improved.
It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that various changes, modifications, additions and substitutions can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (5)

1. A construction method of a multi-scale semantic segmentation model for unbalanced remote sensing images is characterized by adopting a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images with different resolutions are output, and the features are subjected to graph in the same level by adopting a Bayesian fusion methodAfter-image fusion, fusion of multi-scale segmentation image information is realized, and complementation of missing information is realized; three-layer network output segmentation map O of the multi-layer semantic segmentation network 1 ,O 2 And O 3 When the segmentation graphs output by the three layers of networks are fused after the images are processed, selecting any one layer of segmentation graph output as priori O i I=1, 2,3, any one of the divided maps O except the divided map j The likelihood, j+.i, j= {1,2,3}, is calculated, so the posterior probability is calculated as:
n represents the class of the current pixel, m represents the number of classes; f (F) ni And B ni Respectively representing a foreground region and a background region when the category is n; o (O) ni As a priori i=1, 2,3, o nj For computing likelihood, j+.i, j= {1,2,3}; in each region, by comparing O ni And O nj Calculating likelihood at the foreground and background of each category;
when the segmentation graphs output by the three layers of networks are fused after images, the output segmentation graphs of the first two layers are fused; then fusing the segmentation map obtained by fusing the first two layers with the segmentation map output by the third layer, specifically:
firstly, using a segmentation map output by a first layer of network as a priori, calculating likelihood by using a segmentation map output by a second layer of network, and then merging information of the two segmentation maps based on a Bayesian formula;
then exchanging the two, taking the segmentation map output by the second layer as a priori, calculating likelihood ratio by using the segmentation map output by the first layer, and integrating based on a Bayesian formula;
finally, the segmentation maps of the first two network layers and the third network layer are fused in the same way to obtain the final integrated segmentation map.
2. The method for constructing the multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 1, wherein a first Level 1 of the multi-Level semantic segmentation network model adopts data with original resolution, a second Level 2 adopts data after downsampling by 2 times, and a third Level 3 adopts data after downsampling by 4 times;
the main network adopted by the multi-level semantic segmentation network model is a SegNet semantic segmentation network, the left side of the network is an encoder, the network is composed of 5 convolution pooling processes, each of the first two layers comprises two convolution layers, and each of the last three layers comprises three convolution layers;
the right of the network is a decoder which is composed of 5 up-sampling and convolution processes, the first layer and the fourth layer on the right are an up-sampling layer and two convolution layers, the second layer and the third layer are an up-sampling layer and three convolution layers, the fifth layer is an up-sampling layer and two convolution layers, and finally a Softmax layer is added.
3. The method for constructing the multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 2, wherein each layer of the encoding stage corresponds to each layer of the decoding stage one by one; each up-sampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the up-sampling layer of the decoder uses the index reserved in the maximum pooling process to up-sample the feature map, so that the features of the image classification in the encoding stage are reproduced, dense feature maps are generated, and finally the feature maps are restored to the same size as the original image, and classified by the softmax layer, namely the final segmentation map is generated.
4. The multi-scale semantic segmentation method for the unbalanced remote sensing image is characterized by comprising the following steps of:
1. improving semantic segmentation network architecture: constructing a multi-level semantic segmentation network model, outputting different scale segmentation graphs by each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method;
2. equalizing the loss function: the optimization algorithm which can separate pixels of different categories and aggregate pixels of the same category is adopted, and the optimization algorithm is specifically as follows:
1) Constructing class weight balanced distribution loss functions based on Focal loss functions, and solving the problem of unbalance of positive and negative samples of all classes; the class weight balanced distribution loss function is as follows:
wherein p is t The probability of positive for a class tsample, M is the number of sample classes, t represents a class, γ is the hyper-parameter, -log (p t ) Is an initial cross entropy loss function; lambda is the adjusted hyper-parameter, set 0<λ<1, in order to increase the adjustability of classification accuracy of different samples, reducing the punishment weight of complex samples and increasing the punishment contribution of good samples;
2) A Hinge loss function Hinge loss is introduced to construct an inter-class loss function, so that class spacing maximization of samples of different classes is realized; the inter-class loss function is:
Hinge=max(0,1+max w wrong -w correct ) (11)
wherein w is wrong The number of samples misclassified, w correct The number of correctly classified samples is equal to w wrong The maximum value is taken to represent the category with the most error samples;
3) Equalizing the loss function: and constructing an overall loss function.
5. The unbalanced remote sensing image oriented multi-scale semantic segmentation method of claim 4, wherein the overall loss function is as follows:
wherein, beta is a super parameter for controlling the contribution rate of the finger loss penalty term, and beta is more than 0.
CN202110739174.XA 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image Active CN113505792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110739174.XA CN113505792B (en) 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110739174.XA CN113505792B (en) 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Publications (2)

Publication Number Publication Date
CN113505792A CN113505792A (en) 2021-10-15
CN113505792B true CN113505792B (en) 2023-10-27

Family

ID=78009460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110739174.XA Active CN113505792B (en) 2021-06-30 2021-06-30 Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Country Status (1)

Country Link
CN (1) CN113505792B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241339A (en) * 2022-02-28 2022-03-25 山东力聚机器人科技股份有限公司 Remote sensing image recognition model, method and system, server and medium
CN114322793B (en) * 2022-03-16 2022-07-15 科大天工智能装备技术(天津)有限公司 Workpiece size measuring method and device based on global segmentation network and storage medium
CN115131307A (en) * 2022-06-23 2022-09-30 腾讯科技(深圳)有限公司 Article defect detection method and related device
CN115374859A (en) * 2022-08-24 2022-11-22 东北大学 Method for classifying unbalanced and multi-class complex industrial data
CN115272681B (en) * 2022-09-22 2022-12-20 中国海洋大学 Ocean remote sensing image semantic segmentation method and system based on high-order feature class decoupling
CN115953582B (en) * 2023-03-08 2023-05-26 中国海洋大学 Image semantic segmentation method and system
CN115984281B (en) * 2023-03-21 2023-06-20 中国海洋大学 Multi-task complement method of time sequence sea temperature image based on local specificity deepening
CN116434037B (en) * 2023-04-21 2023-09-22 大连理工大学 Multi-mode remote sensing target robust recognition method based on double-layer optimization learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion
WO2020233129A1 (en) * 2019-05-17 2020-11-26 深圳先进技术研究院 Image super-resolution and coloring method and system, and electronic device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020233129A1 (en) * 2019-05-17 2020-11-26 深圳先进技术研究院 Image super-resolution and coloring method and system, and electronic device
CN111127493A (en) * 2019-11-12 2020-05-08 中国矿业大学 Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN111797779A (en) * 2020-07-08 2020-10-20 兰州交通大学 Remote sensing image semantic segmentation method based on regional attention multi-scale feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于纹元森林和显著性先验的弱监督图像语义分割方法;韩铮;肖志涛;;电子与信息学报(03);第106-113页 *

Also Published As

Publication number Publication date
CN113505792A (en) 2021-10-15

Similar Documents

Publication Publication Date Title
CN113505792B (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN110135267B (en) Large-scene SAR image fine target detection method
CN110232394B (en) Multi-scale image semantic segmentation method
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
CN111126472A (en) Improved target detection method based on SSD
Ding et al. A deeply-recursive convolutional network for crowd counting
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN111126202A (en) Optical remote sensing image target detection method based on void feature pyramid network
CN109784183B (en) Video saliency target detection method based on cascade convolution network and optical flow
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
CN110796026A (en) Pedestrian re-identification method based on global feature stitching
CN112488229B (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
Wang et al. Ship detection based on fused features and rebuilt YOLOv3 networks in optical remote-sensing images
CN114841972A (en) Power transmission line defect identification method based on saliency map and semantic embedded feature pyramid
CN116645592B (en) Crack detection method based on image processing and storage medium
CN115049841A (en) Depth unsupervised multistep anti-domain self-adaptive high-resolution SAR image surface feature extraction method
CN112270366B (en) Micro target detection method based on self-adaptive multi-feature fusion
CN110555461A (en) scene classification method and system based on multi-structure convolutional neural network feature fusion
CN116012722A (en) Remote sensing image scene classification method
WO2023087597A1 (en) Image processing method and system, device, and medium
CN107529647B (en) Cloud picture cloud amount calculation method based on multilayer unsupervised sparse learning network
Zhou et al. MSAR‐DefogNet: Lightweight cloud removal network for high resolution remote sensing images based on multi scale convolution
Wang Remote sensing image semantic segmentation algorithm based on improved ENet network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant