CN113505792B

CN113505792B - Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Info

Publication number: CN113505792B
Application number: CN202110739174.XA
Authority: CN
Inventors: 聂婕; 王成龙; 魏志强; 时津津; 叶敏; 陈昊
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2023-10-27
Anticipated expiration: 2041-06-30
Also published as: CN113505792A

Abstract

The invention discloses a multi-scale semantic segmentation method and a multi-scale semantic segmentation model for an unbalanced remote sensing image, wherein the multi-scale semantic segmentation model for the unbalanced remote sensing image adopts a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information; the whole network architecture is divided into three layers, each layer adopts different network structures to extract features of different scales, outputs segmented images of different resolutions, and fuses the features after images are fused in the same layer by adopting a Bayesian fusion method, so that the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized; the multi-scale semantic segmentation method for the unbalanced remote sensing image adopts an optimization algorithm which can enable pixels of different categories to be more separated and pixels of the same category to be more aggregated, so that the semantic segmentation network model can realize uniform segmentation on category unbalanced data.

Description

Multi-scale semantic segmentation method and model for unbalanced remote sensing image

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a multi-scale semantic segmentation method and model for unbalanced remote sensing images.

Background

With the development of earth observation technology and the progress of image acquisition technology, remote sensing images provide massive research data for earth observation and discovery. The remote sensing image is subjected to content analysis by image processing and artificial intelligence technology, so that the method is an effective method for fully mining remote sensing data. The main means include scene classification, object recognition, semantic segmentation, etc. The semantic segmentation is one of important technologies for content analysis of remote sensing images, and targets and areas contained in the images are segmented by deducing semantic categories of individual pixels of the images.

The current image semantic segmentation method is a semantic segmentation method based on deep learning, and classical deep learning semantic segmentation networks comprise Full Convolution Networks (FCNs), segNet, U-Net and the like. The FCN adopts the convolution layers for the first time, so that an end-to-end segmentation network is realized, and the FCN can accept input images with any size and output segmented images with the same size. But does not fully combine context information and correlation between pixels, the segmentation accuracy is not sufficient. The U-Net network can be suitable for segmentation of multi-scale and large-size images by realizing a cascading function in a channel dimension. However, U-Net requires relatively high computational power and relatively slow computational speed from the device. The SegNet network can improve the memory utilization and the model segmentation efficiency by using the maximum pooled index stored in the encoding stage in the decoding stage. However, when the low-resolution feature map is pooled, the information of adjacent pixels is ignored, and the precision is lost. The pyramid scene parsing network (PSPNet) can fully utilize global context information by learning multi-level features using pyramid pooling modules. But does not fully utilize the entire scene information.

Because of the significant differences in resolution, spatial structure and semantics of the remote sensing image and the common image, the traditional method and the common neural network method are difficult to realize efficient segmentation aiming at the characteristics of the remote sensing image. Semantic segmentation of remote sensing images still faces the following challenges:

firstly, the semantic category distribution processed by the existing natural image segmentation method is relatively balanced, and the phenomenon that a certain category occupies a large proportion of an image in a remote sensing image is not considered. The foreground and background of the remote sensing image are unbalanced due to the difference of the real physical entity distribution of the earth surface, and the scale difference of different types of objects is larger. Secondly, a deep learning segmentation model suitable for a natural image is insensitive to scale changes of an object, and pixel precision can be lost when the model is directly used for semantic segmentation of a remote sensing image. Compared with the categories such as land, lake and the like, the categories such as vehicles and the like in the remote sensing image have negligible volume, and large scale change exists between objects. Therefore, the previous semantic segmentation method is not suitable for being directly applied to the remote sensing image, and a corresponding segmentation algorithm needs to be designed according to the characteristics of the remote sensing image.

Because of the diversity of remote sensing image acquisition and the specificity of the data itself to distinguish from natural images, its semantic segmentation does not solve the problem in a single way before. The remote sensing image has the problem of multi-scale change of the object, the large-scale object is dominant in the segmentation, and meanwhile, the learning of the small-scale object can be restrained, so that the small-class object is difficult to identify. In addition, due to the high resolution characteristic of the remote sensing image, the information contained in the image is usually dense, so that the problem of unbalanced distribution of image categories is caused.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention provides a multi-scale semantic segmentation method and model for unbalanced remote sensing images, which solve the technical problems that: (1) the problem of large scale difference of objects in the remote sensing image. And (2) the problem of unbalanced category distribution of the remote sensing image. Aiming at the first problem, the invention provides a multi-scale semantic segmentation model for unbalanced remote sensing images, designs a multi-level semantic segmentation network, extracts features of different scales, fuses the features at the same level, realizes complementation of missing information, fully utilizes global context information, overcomes the mutual influence of multi-scale objects on the premise of retaining local detail information of the images, and improves the robustness and accuracy of remote sensing image segmentation. In order to solve the second problem, the invention carries out algorithm design from two aspects: 1) Constructing an inter-class loss function, and maximizing class spacing of samples of different classes; 2) And constructing a class weight balanced distribution loss function, and solving the problem of unbalance of positive and negative samples of all classes.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

firstly, the invention provides a multi-scale semantic segmentation model for unbalanced remote sensing images, which adopts a multi-level semantic segmentation network capable of learning fine-grained local features, preserving small-class information, learning whole global context semantic features and preserving large-scale information; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images of different resolutions are output, the features are fused after images are fused in the same level by adopting a Bayesian fusion method, the fusion of multi-scale segmented image information is realized, and the complementation of missing information is realized.

Further, the first Level 1 of the multi-Level semantic segmentation network model adopts data with original resolution, the second Level 2 adopts data after downsampling by 2 times, and the third Level 3 adopts data after downsampling by 4 times;

the main network adopted by the multi-level semantic segmentation network model is a SegNet semantic segmentation network, the left side of the network is an encoder, the network is composed of 5 convolution pooling processes, each of the first two layers comprises two convolution layers, and each of the last three layers comprises three convolution layers;

the right of the network is a decoder which is composed of 5 up-sampling and convolution processes, the first layer and the fourth layer on the right are an up-sampling layer and two convolution layers, the second layer and the third layer are an up-sampling layer and three convolution layers, the fifth layer is an up-sampling layer and two convolution layers, and finally a Softmax layer is added.

Further, each layer of the encoding stage corresponds to each layer of the decoding stage one by one; each up-sampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the up-sampling layer of the decoder uses the index reserved in the maximum pooling process to up-sample the feature map, so that the features of the image classification in the encoding stage are reproduced, dense feature maps are generated, and finally the feature maps are restored to the same size as the original image, and classified by the softmax layer, namely the final segmentation map is generated.

Further, the three-layer network output segmentation map O of the multi-layer semantic segmentation network ₁ ,O ₂ And O ₃ When the segmentation graphs output by the three layers of networks are fused after the images are processed, selecting any one layer of segmentation graph output as priori O _i I=1, 2,3, any one of the divided maps O except the divided map _j The likelihood, j+.i, j= {1,2,3}, is calculated, so the posterior probability is calculated as:

n represents the class of the current pixel, m represents the number of classes; f (F) _ni And B _ni Respectively representing a foreground region and a background region when the category is n; o (O) _ni As a priori i=1, 2,3, o _nj For computing likelihood, j+.i, j= {1,2,3}; in each region, by comparing O _ni And O _nj Likelihood is calculated at the foreground and background of each category.

Preferably, when the segmentation graphs output by the three layers of networks are fused after images are fused, the output segmentation graphs of the first two layers are fused; then fusing the segmentation map obtained by fusing the first two layers with the segmentation map output by the third layer, specifically:

firstly, using a segmentation map output by a first layer of network as a priori, calculating likelihood by using a segmentation map output by a second layer of network, and then merging information of the two segmentation maps based on a Bayesian formula;

then exchanging the two, taking the segmentation map output by the second layer as a priori, calculating likelihood ratio by using the segmentation map output by the first layer, and integrating based on a Bayesian formula;

finally, the segmentation maps of the first two network layers and the third network layer are fused in the same way to obtain the final integrated segmentation map.

Then, the invention also provides a multi-scale semantic segmentation method for the unbalanced remote sensing image, which comprises the following steps:

1. improving semantic segmentation network architecture: constructing a multi-level semantic segmentation network model, outputting different scale segmentation graphs by each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method;

2. equalizing the loss function: the optimization algorithm which can separate pixels of different categories and aggregate pixels of the same category is adopted, and the optimization algorithm is specifically as follows:

1) Constructing class weight balanced distribution loss functions based on Focal loss functions, and solving the problem of unbalance of positive and negative samples of all classes;

2) A Hinge loss function Hinge loss is introduced to construct an inter-class loss function, so that class spacing maximization of samples of different classes is realized; 3) Equalizing the loss function: and constructing an overall loss function.

Further, the class weight is balanced to distribute the loss function

Wherein p is _t The probability of positive for a class tsample, M is the number of sample classes, t represents a class, γ is the hyper-parameter, -log (p _t ) Is an initial cross entropy loss function; lambda is an adjustable super parameter, set to 0<λ<1, for the purpose of increasing the division of different samplesThe adjustability of class accuracy reduces the punishment weight of complex samples and increases the punishment contribution of good samples.

Further, the inter-class loss function is

Hinge＝max(0,1+maxw _wrong -w _correct ) (11)

Wherein w is _wrong The number of samples misclassified, w _correct The number of correctly classified samples is equal to w _wrong Taking the maximum value indicates that the class with the most erroneous samples is selected.

Further, the overall loss function is as follows

Beta is a super parameter controlling the contribution rate of the finger loss penalty term, beta > 0.

Compared with the prior art, the invention has the advantages that:

(1) The invention provides a multi-level semantic segmentation network, extracts features with different scales, fuses the features at the same level, realizes the complementation of missing information, can learn local features with fine granularity, retain small-class information, can learn semantic features of the whole global context, fully utilizes global context information and retains large-scale information; on the premise of retaining the local detail information of the image, the mutual influence of the multi-scale objects is overcome, and the robustness and accuracy of remote sensing image segmentation are improved.

(2) The invention also provides a Bayes-based multi-scale post-fusion semantic segmentation method, aiming at the characteristic of scale dependency of remote sensing images, different scale results are respectively modeled as priori and likelihood by researching the multi-scale post-fusion method, the Bayes principle is utilized to make optimal decisions, and the accuracy of segmentation can be improved by verifying the method.

The Bayesian fusion method can better identify the semantic information of the object, the outline and the category distribution of the whole object are relatively clearer, the boundaries of the objects in different categories are more obvious, and the method also improves the performance of the network output segmentation map.

(3) The invention designs an equalization loss function of semantic segmentation of the unbalanced remote sensing image, and aims at unbalanced characteristics of semantic distribution of the remote sensing image, particularly unbalanced foreground and background caused by space difference, and based on a focusing loss thought, the loss weight of a difficult training sample is balanced, the weight of the easy-to-learn category is reduced, the weight of the difficult-to-learn category is increased, and the training stability is improved; meanwhile, hinge loss is introduced, the distance between classes is enlarged, and the boundaries of samples of different classes are more obvious, so that the accuracy of a segmentation structure and the definition of local information classification are improved.

The invention can realize uniform segmentation of the semantic segmentation network model on the unbalanced data of the category by the equalization loss algorithm.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a multi-level semantic segmentation network architecture of the present invention;

FIG. 2 is a first layer segmentation network codec structure according to the present invention;

FIG. 3 is a layer two split network codec structure of the present invention;

FIG. 4 is a third layer split network codec structure of the present invention;

fig. 5 is a schematic diagram of a bayesian image fusion algorithm according to the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples.

Example 1

The embodiment designs a multi-level deep neural network model, and particularly provides a multi-scale semantic segmentation model for unbalanced remote sensing images, which adopts a multi-level semantic segmentation network capable of learning fine-grained local features, preserving small-class information, learning whole global context semantic features and preserving large-scale information. The network architecture is shown in fig. 1. The whole network architecture of the multi-Level semantic segmentation network is divided into three layers, wherein the first Level corresponds to Level 1 in fig. 1, and data with original resolution is adopted; the second Level corresponds to Level 2 in fig. 1, with the data downsampled 2 times; the third Level corresponds to Level 3 in fig. 1, with the data after 4 times downsampling. And obtaining more local information and global information by downsampling the remote sensing image twice.

Aiming at multi-scale information, each layer adopts different network structures to extract features with different scales, more visual information is reserved in the feature extraction process as much as possible, and segmented images with different resolutions are output; and then, fusing the images of the features at the same level by adopting a Bayesian fusion method, so as to realize fusion of multi-scale segmentation image information, realize complementation of missing information and solve accurate segmentation of the remote sensing image.

The multi-level network model for segmentation has the advantages of keeping good local detail information and keeping global semantic information better compared with other classical deep neural network segmentation models.

1. The multi-level network architecture of the present embodiment is described in detail below:

the backbone network adopted in this embodiment is a SegNet semantic segmentation network, as shown in fig. 2,3 and 4, which respectively correspond to Level 1,Level 2,Level 3 in fig. 1. To the left of the network is an encoder, consisting of a process of 5 convolution pools, each of the first two layers comprising two convolution layers and each of the last three layers comprising three convolution layers. Features of the image are extracted through the convolution layer, and then the pooling layer is used for reducing the size of the feature map and increasing the receptive field. The pooling layer uses maximum pooling to achieve spatial invariance over small spatial movements, but can result in loss of positioning accuracy and loss of spatial detail.

Each layer of the encoding stage and each layer of the decoding stage are in one-to-one correspondence, similar to the U-shaped structure of a U-net network. At each upsampling layer of the decoder corresponds to the largest pooling layer of the same-level encoder, the upsampling layer of the decoder uses the index reserved by the largest pooling process for upsampling feature maps. Features of the image classification at the encoding stage are reproduced through the upsampling layer to generate dense feature maps, and finally the feature maps are restored to the same size as the original image, and classified through the softmax layer to generate the final segmentation map. The SegNet training parameters are few, the occupied calculation memory is smaller, the segmentation accuracy can be ensured, and the method is suitable for semantic segmentation of high-resolution remote sensing images.

2. The following describes the image post-fusion method of the present embodiment in detail:

the multi-scale network can output segmented images with different resolutions, and the segmentation effect is different due to the different resolutions. The patent provides a multi-scale post-fusion semantic segmentation method based on a Bayesian principle, wherein in a saliency detection task, posterior probability is calculated by integrating saliency mapping:

S ₁ and S is ₂ Are saliency maps, one of which is used as a priori probability S _i (i= {1,2 }) another S _j (j+.i, j= {1,2 }) is used to calculate likelihood; f (F) _i And B _i Representing foreground and background regions, respectively, the likelihood in each region is calculated by the following formula:

wherein the method comprises the steps ofRepresenting the number of pixels in the foreground, +.>Is that its color features fall into inclusion feature S _j Is->A pixel number in (a) is determined; />Representing the number of pixels in the background,/->Is that its color features fall into inclusion feature S _j Background bin->The number of pixels in the display panel.

The three-layer network output segmentation graph O of the multi-layer semantic segmentation network ₁ ,O ₂ And O ₃ When the segmentation graphs output by the three layers of networks are fused after the images are processed, selecting any one layer of segmentation graph output as priori O _i (i=1, 2, 3), any one of the divided maps O other than the divided map _j (j+.i, j= {1,2,3 }) calculate likelihood, so the calculation of posterior probability is:

where n represents the class of the current pixel and m represents the number of classes; f (F) _ni And B _ni Respectively representing a foreground region and a background region when the category is n; o (O) _ni (i=1, 2, 3) as a priori, O _nj (j+.i, j= {1,2,3 }) is used to calculate likelihood; in each region, by comparing O _ni And O _nj Likelihood is calculated at the foreground and background of each category:

representing the number of foreground pixels of the nth class,/->Is composed of feature O _nj (z) the number of pixels of the color feature in the foreground region. Using O _nj The posterior probability is calculated as a priori.

As a preferred embodiment, when the segmentation map output by the three-layer network is fused after the images, the output segmentation map of the first two layers is fused; and then fusing the segmentation map obtained by fusing the first two layers with the segmentation map output by the third layer, as shown in the following (6) and (7), respectively.

O _n4 (z)＝O _B (O _n1 (z),O _n2 (z))＝p(F _n1 |O _n2 (z))+p(F _n2 |O _n1 (z)) (6)

O(z)＝O _B (O _n3 (z),O _n4 (z))＝p(F _n3 |O _n4 (z))+p(F _n4 |O _n3 (z)) (7)

Referring to fig. 5, the following are specifically:

Based on Bayesian fusion, different output segmentation graphs are used as priori repeatedly and forcefully, the effective information of the segmentation graphs with different resolutions can be fused, and the image segmentation accuracy is improved.

Example 2

The embodiment provides a multi-scale semantic segmentation method for unbalanced remote sensing images, which comprises the following steps:

1. improved semantic segmentation network architecture

And constructing a multi-level semantic segmentation network model, outputting different scale segmentation graphs by each layer of network, and fusing multi-scale segmentation image information by adopting a Bayesian fusion method.

The semantic segmentation network may adopt a classical semantic segmentation network, and as a preferred embodiment, the multi-level semantic segmentation network model may directly adopt the model described in embodiment 1, and specific reference may be made to the description of embodiment 1, which is not repeated herein.

2. Equalizing loss function

The optimization algorithm which can separate pixels of different categories and aggregate pixels of the same category is adopted, and the optimization algorithm is specifically as follows:

1) And constructing a class weight balanced distribution loss function based on a focus loss function Focal loss, and solving the problem of unbalance of positive and negative samples of all classes.

2) And introducing a Hinge loss function Hinge loss to construct an inter-class loss function, so as to maximize class spacing of samples of different classes.

3) Equalizing the loss function: and constructing an overall loss function.

The following are respectively described:

1. class weight equalization distribution

In solving the multi-classification problem, the sample categories of the dataset are unevenly distributed, the number of negative samples is too large, and most samples are distinguished to a large extent, which often results in ineffective learning during training. This problem is particularly pronounced in the segmentation of remote sensing images. The purpose of using the Focal loss function is mainly to address the extreme imbalance problem that exists between the background and the foreground of the object detection scenario. The method is improved by adding a modulation factor on the basis of cross entropy loss (which can directly refer to the prior art), and the specific formula is as follows:

in this formula, p _t The probability of positive for a class tsample, M is the number of sample classes, t represents a certain class, -log (p _t ) Is an initial cross entropy loss function; gamma.gtoreq.0 is an adjustable super parameter, (1-p) _t ) ^γ Is a modulation factor. For easy-to-learn samples, p _t The modulation factor tends towards zero when the value of (2) is close to 1. However, for difficult samples and misclassified samples p _t The value of the modulation factor is correspondingly increased to balance the training inefficiency due to the sample problem. The method can effectively solve the problem of training failure caused by unbalanced categories in the remote sensing image. Therefore, aiming at the multi-classification problem of remote sensing image segmentation, the inter-class loss function formula is adjusted as follows:

wherein lambda is an adjustable super parameter, set to 0<λ<1, in order to increase the adjustability of classification accuracy of different samples, the weight of penalty items in complex sample training is further reduced on the basis of an original formula (8), the weight of penalty items in simple sample classification is increased, and balance consistency of samples among classes is achieved. For example when p _t Higher confidence in this class of samples is indicated, and if λ=1 is set, the penalty weight for alignment will be smaller. Its contribution to the training will decrease. Also, if p _t The smaller the classification of the decision sample is, the harder the sample belongs to the complex sample, when lambda is 1, the training weight is larger, the contribution to training is higher, and lambda is setMake it 0<λ<1, the punishment weight of the complex sample is reduced, the punishment contribution of the good sample is increased, and the classification accuracy of the benign sample is further improved, so that the effective setting of lambda can directly find a benign balance between the complex sample and the easy sample, and the classification accuracy of the whole sample is further improved.

2. Inter-class balancing

Because the difference between adjacent samples in the remote sensing image is smaller, how to enlarge the difference between samples is also a problem to be solved, in this embodiment, a range loss (HL) loss function is introduced, HL is generally used in a maximum interval classification task of a Support Vector Machine (SVM), the intra-class distance is reduced, the inter-class distance is increased, so as to realize the maximum boundary, and for binary classification, the formula is as follows:

Hinge＝max(0,1-y*y _pre ) (10)

in the above formula, y is a label of a real sample, and its value can be only-1 or 1.y is _pre Is a predicted value. When the absolute value of this predicted value is 1 or more, the distance between the sample and the boundary is 1 or more, which does not give any rewards, because the probability that the sample can be correctly classified is quite large. Its multi-classification form is as follows:

Hinge＝max(0,1+maxw _wrong -w _correct )

(11)

wherein w is _wrong The number of samples misclassified, w _correct Is the number of correctly classified samples. For w _wrong Taking the maximum value indicates that the class with the most erroneous samples is selected. When the number of the samples of the misclassification is large, the punishment item larger than the training data is given by the formula (11), so that training is promoted to be continued, and only when the number of the misclassified samples is small and the number of the correct training samples is large, the punishment item is automatically reduced, and the training process is accelerated to finish as soon as possible. Therefore, samples in the classes can be promoted to be consistent, the intervals of the samples between the classes are enlarged, and the classification accuracy is improved.

3. Balancing algorithm

Finally, in order to solve the problem of unbalanced categories, the sample intervals of different categories are increased and the accuracy of segmentation is improved, and the overall loss function of the invention is as follows:

wherein, beta is a super parameter for controlling the contribution rate of the finger loss penalty term, and beta is more than 0. Finally, the model parameters are optimized by adopting a classical gradient descent method, so that the optimal parameters are obtained, and further, the training samples are tested.

In summary, the invention improves the semantic segmentation network structure and the design loss function equalization method, adopts the spatial multi-scale parallel post-fusion framework to realize scale difference depiction, can reserve good local detail information, and simultaneously better reserves global semantic information; the loss function is designed based on unbalanced pixel-level sample distribution, and the loss weight of difficult training samples is balanced based on the focusing loss thought, so that the training stability is improved; meanwhile, hinge loss is introduced, the distance between classes is enlarged, and the accuracy of semantic segmentation is improved; the equalization loss function is combined with the classical semantic segmentation network, so that the accuracy of the segmentation structure and the definition of local information classification are improved.

It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that various changes, modifications, additions and substitutions can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A construction method of a multi-scale semantic segmentation model for unbalanced remote sensing images is characterized by adopting a multi-level semantic segmentation network which can learn fine-grained local features, retain small-class information, learn whole global context semantic features and retain large-scale information; the whole network architecture of the multi-level semantic segmentation network is divided into three layers, each layer adopts different network structures to extract features of different scales, segmented images with different resolutions are output, and the features are subjected to graph in the same level by adopting a Bayesian fusion methodAfter-image fusion, fusion of multi-scale segmentation image information is realized, and complementation of missing information is realized; three-layer network output segmentation map O of the multi-layer semantic segmentation network ₁ ,O ₂ And O ₃ When the segmentation graphs output by the three layers of networks are fused after the images are processed, selecting any one layer of segmentation graph output as priori O _i I=1, 2,3, any one of the divided maps O except the divided map _j The likelihood, j+.i, j= {1,2,3}, is calculated, so the posterior probability is calculated as:

n represents the class of the current pixel, m represents the number of classes; f (F) _ni And B _ni Respectively representing a foreground region and a background region when the category is n; o (O) _ni As a priori i=1, 2,3, o _nj For computing likelihood, j+.i, j= {1,2,3}; in each region, by comparing O _ni And O _nj Calculating likelihood at the foreground and background of each category;

when the segmentation graphs output by the three layers of networks are fused after images, the output segmentation graphs of the first two layers are fused; then fusing the segmentation map obtained by fusing the first two layers with the segmentation map output by the third layer, specifically:

2. The method for constructing the multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 1, wherein a first Level 1 of the multi-Level semantic segmentation network model adopts data with original resolution, a second Level 2 adopts data after downsampling by 2 times, and a third Level 3 adopts data after downsampling by 4 times;

3. The method for constructing the multi-scale semantic segmentation model for the unbalanced remote sensing image according to claim 2, wherein each layer of the encoding stage corresponds to each layer of the decoding stage one by one; each up-sampling layer of the decoder corresponds to the maximum pooling layer of the same-level encoder, the up-sampling layer of the decoder uses the index reserved in the maximum pooling process to up-sample the feature map, so that the features of the image classification in the encoding stage are reproduced, dense feature maps are generated, and finally the feature maps are restored to the same size as the original image, and classified by the softmax layer, namely the final segmentation map is generated.

4. The multi-scale semantic segmentation method for the unbalanced remote sensing image is characterized by comprising the following steps of:

1) Constructing class weight balanced distribution loss functions based on Focal loss functions, and solving the problem of unbalance of positive and negative samples of all classes; the class weight balanced distribution loss function is as follows:

wherein p is _t The probability of positive for a class tsample, M is the number of sample classes, t represents a class, γ is the hyper-parameter, -log (p _t ) Is an initial cross entropy loss function; lambda is the adjusted hyper-parameter, set 0<λ<1, in order to increase the adjustability of classification accuracy of different samples, reducing the punishment weight of complex samples and increasing the punishment contribution of good samples;

2) A Hinge loss function Hinge loss is introduced to construct an inter-class loss function, so that class spacing maximization of samples of different classes is realized; the inter-class loss function is:

Hinge＝max(0,1+max w _wrong -w _correct ) (11)

wherein w is _wrong The number of samples misclassified, w _correct The number of correctly classified samples is equal to w _wrong The maximum value is taken to represent the category with the most error samples;

3) Equalizing the loss function: and constructing an overall loss function.

5. The unbalanced remote sensing image oriented multi-scale semantic segmentation method of claim 4, wherein the overall loss function is as follows:

wherein, beta is a super parameter for controlling the contribution rate of the finger loss penalty term, and beta is more than 0.