CN113298815A - Semi-supervised remote sensing image semantic segmentation method and device and computer equipment - Google Patents

Semi-supervised remote sensing image semantic segmentation method and device and computer equipment Download PDF

Info

Publication number
CN113298815A
CN113298815A CN202110686544.8A CN202110686544A CN113298815A CN 113298815 A CN113298815 A CN 113298815A CN 202110686544 A CN202110686544 A CN 202110686544A CN 113298815 A CN113298815 A CN 113298815A
Authority
CN
China
Prior art keywords
remote sensing
semantic segmentation
sensing image
attention
semi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110686544.8A
Other languages
Chinese (zh)
Inventor
刘明明
刘兵
李爽
王伟男
胡光喆
仇文宁
付红
戚海永
张海燕
马衍颂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Jianzhu Institute
Original Assignee
Jiangsu Jianzhu Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Jianzhu Institute filed Critical Jiangsu Jianzhu Institute
Priority to CN202110686544.8A priority Critical patent/CN113298815A/en
Publication of CN113298815A publication Critical patent/CN113298815A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention discloses a semi-supervised remote sensing image semantic segmentation method, a semi-supervised remote sensing image semantic segmentation device and computer equipment, wherein the method comprises the following steps: acquiring an original remote sensing image; scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; performing fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map; and inputting the multi-scale attention feature map into a deep semantic segmentation network to obtain a semantic segmentation prediction map. The semi-supervised remote sensing image semantic segmentation model based on multi-scale attention can train the whole model by using label-free data and fully utilize the global context relationship between feature maps, thereby effectively improving the edge segmentation precision between remote sensing image targets and improving the integral accuracy.

Description

Semi-supervised remote sensing image semantic segmentation method and device and computer equipment
Technical Field
The invention relates to the field of semantic segmentation of remote sensing images, in particular to a semi-supervised remote sensing image semantic segmentation method and device based on multi-scale attention and computer equipment.
Background
In the research of remote sensing images, semantic segmentation of the remote sensing images classifies each pixel point in the remote sensing images, and is always an important research direction in the remote sensing images. The traditional method for semantic segmentation of remote sensing images often uses a machine learning algorithm, but the classification accuracy needs to be further improved. In recent years, with the development of deep learning, a Convolutional Neural Network (CNN) having an excellent feature extraction capability has been widely used in various fields of image processing, such as scene classification and the like. Long proposes a Full Convolutional Network (FCN) to replace the fully connected layer in a CNN network with a full convolutional layer. Unlike conventional image classification methods, FCN can achieve image segmentation of any size. SegNet proposes a deconvolution structure, exploiting the characteristics of the middle layer by skipping the connections. Gangfu et al propose a multi-scale network structure that replaces the traditional convolution, with the dilated convolution increasing the receptive field without reducing spatial resolution. The void space pyramid structure (ASPP) mainly provides a plurality of void convolution branches which have different void rates to extract multi-scale features, and obviously improves the segmentation precision of a target in an image. The Deeplabv3 network is improved for many times and is the most successful network model in the deep learning semantic segmentation field at present. Its latest version, Deeplabv3+, achieves the highest accuracy over multiple public datasets. The multi-scale integration can effectively solve the problem of target segmentation. A single neural network model has multiple different sized receptive fields to accommodate multiple sizes of target segmentation. Since the full convolution network has superior performance compared with the traditional machine learning, many scholars apply CNN to the semantic segmentation of remote sensing images, and the deep convolution network plays an increasingly important role in many fields of remote sensing images. Still another proposal is two independent full convolution network branches, using segmented images and height information from optical remote sensing as inputs to the two branches. After a series of convolution operations, the predicted segmentation results of the two branches are fused. The methods can achieve ideal effects when the marking data are sufficient.
The remote sensing image semantic segmentation can be used for geographic detection, and has an important role in obtaining landmark landform information. In recent years, with the convenience of obtaining remote sensing images and the improvement of image quality, research on remote sensing images is increasing. The remote sensing image semantic segmentation needs to classify each pixel point on the feature map, so for the labeled image, each pixel point also needs to be labeled. With the improvement of the resolution of the remote sensing image acquisition, the semantic segmentation and labeling of the remote sensing image are more difficult, and the edge of the target is difficult to segment accurately. At present, most of mainstream remote sensing image semantic segmentation researches are based on a deep convolutional neural network. Li yu provides an image semantic segmentation method based on a deep convolution fusion conditional random field, shallow layer detail information and high layer semantic information are fused into a network model, meanwhile, parameters of the conditional random field are inferred to be embedded into a network framework in an iteration layer shape, the network model is built, rich detail information and context information of a remote sensing image are comprehensively utilized in the forward and reverse propagation process of model training, and end-to-end remote sensing image semantic segmentation is achieved. And the group peak provides a method based on the connection of the coding and decoding structural features, and improves the DeconvNet network model. When the model is coded, the spatial structure information can be effectively reserved by recording the position of the pooling index and applying the position to the pooling process; during decoding, the model is effectively subjected to feature extraction by using a mode of connecting encoding and decoding corresponding feature layers. The remote sensing image semantic segmentation method based on the improved Deeplabv3 is provided by the Xiong scene, and the semantic integrity of the image on the resolution is ensured by improving a single upsampling layer and performing multi-layer upsampling by using residual errors obtained in a backbone network. However, the existing remote sensing image semantic segmentation method cannot well utilize the non-labeled data, so that the segmentation effect is poor when the labeled data are less. When the remote sensing image labeling data is insufficient, how to improve the semantic segmentation effect and the space for extracting the target. The current semi-supervised segmentation method causes the problem of inaccurate segmentation of target edges in remote sensing images because long-distance correlation is not concerned.
Disclosure of Invention
Based on the above, it is necessary to provide a semi-supervised remote sensing image semantic segmentation method, device and computer equipment for solving the above technical problems.
The embodiment of the invention provides a semi-supervised remote sensing image semantic segmentation method, which comprises the following steps:
acquiring an original remote sensing image;
scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; performing fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map;
and inputting the multi-scale attention feature map into a deep semantic segmentation network to obtain a semantic segmentation prediction map.
In one embodiment, the obtaining of the multi-scale attention feature map specifically includes:
inputting the original remote sensing image into a deep convolution neural network to obtain characteristic images X with different sizes1And characteristic diagram X2And characteristic diagram X3
Will feature diagram X1Characteristic diagramX2And characteristic diagram X3Respectively inputting into 3 criss-cross attention modules to obtain attention feature map C1Attention feature chart C2Attention feature chart C3
To attention feature C1Attention feature chart C2Attention feature chart C3And sequentially carrying out up-sampling and fusion processing to obtain a multi-scale attention feature map.
In one embodiment, the obtaining of the attention feature map specifically includes:
for the characteristics M of the original remote sensing image, belonging to RC×W×HUsing two 1 x 1 convolutional layers, two feature maps are generated, named Q and K, respectively, (Q, K) ∈ RC′×W×H
For the feature mapping Q and the feature mapping K, sequentially carrying out Affinity operation, SoftMax operation and Aggregation operation to obtain an attention feature map A e R(H+W-1)×W×H
Wherein c 'is a channel of the characteristic image, c is the number of channels of the original remote sensing image, and c' is smaller than c; H. and W is the height and the width of the original remote sensing image respectively.
In one embodiment, the Affinity operation, SoftMax operation, and Aggregation operation specifically include:
for each position u of the feature map Q, a vector Q is obtainedu∈RC′(ii) a While obtaining the set omega by extracting from K the eigenvectors in the same row or column as the position uuAnd has the following components:
Figure BDA0003124789550000031
wherein for Ωu∈R(H+W-1)×C′,Ωi,u∈RC′Represents omegauThe ith element in (1); di,uE.g. D represents the characteristic QuAnd Ωi,uThe degree of correlation between i ═ 1,. | Ωu|],D∈R(H+W-1)×W×H
By channel dimension on DApplying SoftMax operation, applying a convolutional layer with 1 x 1 filter on M, and generating V epsilon R for feature adaptationC×W×H(ii) a And mapping V for each position on the feature space dimension to obtain a vector Vu∈RCAnd a set phiu∈R(H+W-1)×CAnd has the following components:
Figure BDA0003124789550000041
set phiuRepresenting a feature vector set in the feature map V in the same column or the same row with the position u; a. thei,uIs the position of scalar value channels i and u in A;
acquiring No-local information of the image through an Aggregation operation; wherein, M'uRepresents M' epsilon R in the output characteristic diagramC×W×HThe feature vector at position u.
In one embodiment, a semi-supervised remote sensing image semantic segmentation method further includes:
inputting the one-hot coding vectors of the semantic segmentation prediction graph and the annotation image into a discriminator network to obtain a semantic segmentation confidence image; wherein the original remote sensing image comprises: and (5) labeling the image.
In one embodiment, the discriminator network comprises:
5 convolution layers, the size of the convolution kernel is 4 multiplied by 4, the number of channels is [64, 128, 256, 512, 1] respectively, and the step length is 2; replacing the ReLU after the convolution layer with Leaky-ReLU; an upsampling layer is added to the last layer.
In one embodiment, a semi-supervised remote sensing image semantic segmentation method further includes:
space-based multi-class cross entropy LceAntagonistic loss function LASemi-supervised loss function LSAnd training the deep semantic segmentation network and the discriminator network.
In one embodiment, the training deep semantic segmentation network and the discriminator network specifically include:
multi-class loss function when using tagged dataNumber LceObtained by the following method:
Figure BDA0003124789550000042
through LDTraining the discriminator network:
Figure BDA0003124789550000043
when x isnWhen the pixel point is equal to 1, the generator generates the pixel point; if y isnIf 1, the sample is from the label image; d (G (X)n))(h,w)) Is a pixel XnA feature at the position of (h, w); d (Y)n)(h,w)Is a pixel YnA feature at the position of (h, w); if it is not
Figure BDA0003124789550000051
To a certain class classification, then Yn (h,w)Is 1, otherwise is 0;
fighting the learning process through loss LATo train the discriminator:
Figure BDA0003124789550000052
when training with unlabeled data, only LAWhere applicable, and for unlabeled data, confidence maps D (G (X) are generated by training the discriminator networkn))(h,w));
Y obtained by performing one-hot coding on annotated imagenThrough element-by-element setting, the method obtains
Figure BDA0003124789550000053
If c is*=argmaxcG(Xn)(h,w,c)Then, then
Figure BDA0003124789550000054
Setting a threshold value
Figure BDA0003124789550000055
Obtaining the areas with confidence by highlighting the confidence map; l isSThe definition is as follows:
Figure BDA0003124789550000056
Figure BDA0003124789550000057
i () is an index function, and control is performed by setting USThe sensitivity is controlled by the value of (a) to adjust the training process of the network.
A semi-supervised remote sensing image semantic segmentation device comprises:
the image acquisition module is used for acquiring an original remote sensing image;
the multi-scale attention feature map determining module is used for scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; performing fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map;
and the semantic segmentation prediction map determining module is used for inputting the multi-scale attention feature map into the deep semantic segmentation network to obtain a semantic segmentation prediction map.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring an original remote sensing image;
scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; performing fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map;
and inputting the multi-scale attention feature map into a deep semantic segmentation network to obtain a semantic segmentation prediction map.
Compared with the prior art, the semi-supervised remote sensing image semantic segmentation method, the semi-supervised remote sensing image semantic segmentation device and the computer equipment provided by the embodiment of the invention have the following beneficial effects:
the invention provides a multiscale attention-based semi-supervised remote sensing image semantic segmentation model, which can be used for training the whole model by using label-free data and fully utilizing the global context relationship among feature maps, thereby effectively improving the edge segmentation precision among remote sensing image targets and improving the integral accuracy. Specifically, in order to fully utilize the global context, long-distance correlation between pixel points is utilized, so that the segmentation precision of the target edge is improved, and a cross attention network is introduced. Through multipath input, image features with different sizes are extracted, and receptive fields with different sizes can be obtained, so that features of different visual angles of training data are fully utilized, and the training data are fully utilized. Meanwhile, in order to solve the problem of difficult semantic annotation, a semi-supervised semantic segmentation method is applied to a remote sensing image, a segmentation network is used as a generator, the output of the generator is as close to an annotated image as possible under the auxiliary training of a discriminator, because the FCN is greatly successful in the semantic segmentation of images in natural scenes, a plurality of scholars apply the FCN to the semantic segmentation of the remote sensing image, a full convolution discriminator is used for distinguishing the annotated image from a predicted image, and a semi-supervised framework can utilize marked data and unmarked data, so that the data can be fully utilized under the condition of small annotated data volume, and the segmentation effect is improved.
Drawings
FIG. 1 is a schematic illustration of an attention mechanism provided in one embodiment;
FIG. 2 is a multi-scale attention diagram provided in one embodiment;
FIG. 3 is a schematic diagram of a network of discriminators provided in one embodiment;
FIG. 4 is a diagram of a semi-supervised semantic segmentation based on multi-scale attention as provided in one embodiment;
FIG. 5 is a segmentation result visualization of a CCF2015 data set based on multi-scale attention provided in an embodiment;
fig. 6 is a visualization of segmentation results of the US2D dataset based on multi-scale attention generation provided in an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, the provided semi-supervised remote sensing image semantic segmentation method specifically comprises the following steps:
attention mechanism
The ability of a neural network to focus on features that it wants to focus on needs to introduce a mechanism of attention, which is also very important for studying the context between feature maps in semantic segmentation. Attention is not paid to the shape of the input data, and the range that can be used is wider. Under the condition that the computing resources are certain, the attention mechanism is taken as an effective means for solving the information overload problem, redundant information can be effectively removed, the computing resources can be allocated to the most needed place, most unimportant information can be ignored, and therefore the effective allocation of the computing resources is achieved. As shown in fig. 1.
Attention mechanism is used to calculate the degree of correlation between features, between Query and Key-Value { Ki,ViMapping | i ═ 1,2, ·, m } to output, where query, key, value are all vectors, weighting all values in output V, calculating query and key to obtain weight. The attention mechanism is calculated as follows:
firstly, the similarity of Q and K is calculated and compared, and is represented by f:
f(Q,Ki),i=1,2,...,m
the similarity between Q and K is calculated by the following four methods: point multiplication calculation, weight method calculation, splicing weight method calculation and sensor method calculation. Respectively as shown in the following formulas:
f(Q,Ki)=QTKi
f(Q,Ki)=QTWKi
f(Q,Ki)=W[Q;Ki]
f(Q,Ki)=VTtanh(WQ+UKi)
and normalizing the scores by utilizing SoftMax, and completing numerical value conversion to obtain the probability distribution with the sum of the weights of all elements being one. The specific operation is shown as the following formula:
Figure BDA0003124789550000081
according to alphaiAnd performing weighted summation calculation on all values in V to obtain an attention vector H. The specific operation is shown as the following formula:
Figure BDA0003124789550000082
multi-scale attention semi-supervised semantic segmentation model
Aiming at the problem that the generator does not pay attention to long-distance correlation, a crisscross attention module is introduced, and useful context information can be captured by utilizing long-distance dependence, so that the problem of visual understanding is facilitated. In the present invention, attention modules that attempt to criss-cross collect long distance related information in both horizontal and vertical directions for enhanced per-pixel functionality. And through multi-scale fusion, each pixel point can be combined with context information of three visual angles, so that the classification of the pixel point is more accurate.
Generator
As shown in fig. 2, the input image is scaled to 3 different sizes, and then the input image is respectively input to the attention module, and the obtained feature maps are merged to obtain a multi-scale attention feature map. As shown in FIG. 2, the input image is passed through a deep convolutional neural network, which is based on ResNet101, and then a feature map X is generated1,X2,X3. The spatial size of the feature map X is H × W. Characteristic diagram X1,X2,X3Respectively obtaining C through the attention module1,C2,C3Each pixel point in the characteristic diagram is related to all pixel points in the longitudinal direction and the transverse direction in a context mode. And then after the images are subjected to upsampling, all the feature maps are restored to the original size and then subjected to feature fusion to obtain a final multi-scale attention feature map.
Discriminator
For the discriminator network, there are 5 convolutional layers, the convolutional kernel size is 4 × 4, the number of channels is [64, 128, 256, 512, 1], the step length is 2, and the ReLU after convolutional layer is changed into Leaky-ReLU. In order to restore the output image to the size of the input image, an upsampling layer is added to the last layer. Since training of the generative challenge network requires a large memory space, a complex discriminator structure is not employed.
Semi-supervised remote sensing image semantic segmentation algorithm based on multi-scale attention
As shown in FIG. 2, for a given feature M ∈ RC×W×HFirst, two 1 × 1 convolutional layers are used for the feature M, so that two feature maps, named Q and K respectively, (Q, K) epsilon RC′×W×H. c' is the number of channels of the feature image, which is smaller than the number c of channels of the image, and the dimensionality of the feature image is reduced. After obtaining the profiles Q and K, further pass di,uAffinity is manipulated so that an attention feature map A ∈ R can be obtained(H+W-1)×W×H
Affinity operates as follows: for each position u of the feature map Q, a vector Q is obtainedu∈RC′. Meanwhile, the set Ω may be obtained by extracting a feature vector in the same row or column as the position u from Ku
Figure BDA0003124789550000091
For omegau∈R(H+W-1)×C′。Ωi,u∈RC′Represents omegauThe ith bit element of (1). Wherein d isi,uE.g. D represents the characteristic QuAnd Ωi,uThe degree of correlation between i ═ 1,. | Ωu|],D∈R(H+W-1)×W×H. Then, applying SoftMax operations on D by channel dimension, an attention map is calculated. Finally, a convolutional layer with 1 x 1 filter is applied to M to generate V e R that can be used for feature adaptationC×W×H. For each location mapping V in the feature space dimension, a vector V can be obtainedu∈RCAnd a set phiu∈R(H+W-1)×C
Figure BDA0003124789550000092
Set phiuThe feature vector set in the feature map V in the same column or the same row as the position u is shown. Obtaining No-local information of the image through Aggregation operation, wherein M'uRepresents M' epsilon R in the output characteristic diagramC×W×HThe feature vector at position u. A. thei,uIs the position of scalar value channels i and u in a. Since the context information is added to the local feature M, the local feature can be paid better attention, and the pixel-level representation of the feature can be improved. Since the attention map may focus on long distance correlations, the feature map has a relatively broad contextual view, and thus context information may be selectively aggregated based on the attention map. As shown in fig. 4, the attention feature maps obtained by the input pictures with different sizes through the cross attention module are up-sampled and restored to the size same as that of the original input picture, and then are fused to obtain a final multi-scale attention feature map.
Prediction graph G (X) of generatorn)(h,w)And a vector Y obtained by the one-hot coding of the annotation imagenAfter being input into the discriminator, the confidence coefficient map with the size of H multiplied by W multiplied by 1 is output after training. Three losses in LLoss function:
L=LceALASLS
Lce,LAand LSRespectively, a multi-class cross entropy loss function, a countermeasure loss function and a semi-supervised loss function. Lambda [ alpha ]A,λSAre two weights for minimizing the loss function L. When using the label data, the multi-class penalty function LceCan be obtained by the following method:
Figure BDA0003124789550000101
through LDTraining the discriminator network:
Figure BDA0003124789550000102
when x isnWhen the value is 1, the representative pixel point is generated by the generator. If y isnIf 1, then the sample is from the label image. D (G (X)n))(h,w)) Is a pixel XnA feature at the position of (h, w). D (Y)n)(h,w)Are defined similarly. In order to convert the discrete label map into the C-channel probability map, the labeling image is transformed by the one-hot coding. If it is not
Figure BDA0003124789550000103
A classification rule Y of a certain categoryn (h,w)Is 1, otherwise is 0. The antagonistic learning process is through loss of LATo the trained discriminator:
Figure BDA0003124789550000104
the generator network is trained to fool the discriminator by increasing the probability of generating a prediction from the true distribution of the annotated images. When training with unlabeled data, only LASince it only requires a discriminatorA network. At this time, since no labeled image exists, the multi-class cross entropy loss function cannot be utilized. In addition, for unlabeled data, confidence maps D (G (X) can be generated by training the discriminatorsn))(h,w)) It can be used to infer which regions are sufficiently close to the true distribution of the annotated image. Can be used to predict the distribution of regions similar to actual values and used as a marker picture. The marked image is subjected to one-hot coding to obtain YnIs arranged element by element to obtain
Figure BDA0003124789550000111
If c is*=argmaxcG(Xn)(h,w,c)Then
Figure BDA0003124789550000112
LsAnd LceSimilarly. Then setting a threshold value
Figure BDA0003124789550000113
To highlight areas of confidence in the confidence map. L isSThe definition is as follows:
Figure BDA0003124789550000114
Figure BDA0003124789550000115
i () is an index function which can be controlled by setting USTo control its sensitivity. Thereby adjusting the training process of the network.
Example analysis
The experiments are all carried out on an Ubuntu 18.04 operating system, and the semi-supervised remote sensing image semantic segmentation based on multi-scale attention is trained by using RTX 2080 ti. The code for all experiments was based on extracting image features using pytorch0.4.0, cuda9, selecting a network model pre-trained with ResNet101 on a pascal voc2012 dataset, using generative confrontation network training aids. For the generator i.e. the split network part,the optimizer adopted during model training is Adam, initial learning rate and L2The coefficients of the regularization terms are all set to 0.0001. For the discriminator network, an initial learning rate of 10 is used for the Adam optimizer-4,L2The coefficient of the regularization term is 0.9. When using annotation data, λAIs set to 0.01, λ when the annotation data is not usedAIs set to 0.001. Lambda [ alpha ]STake 0.1, USSet to 0.2 and I to 0.1. The training period (training epoch) herein is set to 20000, and since the memory required for the generative countermeasure network is large, the size of the trained batch size is set to 2. In the experiment, a Mean Intersection Over Union (MIOU) is used as an evaluation criterion for the quality of the generated segmentation image. The higher the evaluation criterion, the closer the generated description text is to the real annotation description, i.e. the higher the quality of the description text. And calculating the ratio of the intersection and union of the two sets of the real value and the predicted value. This ratio can be transformed into the sum (union) of TP (intersection) over TP, FP, FN. Namely: MIOU is TP/(FP + FN + TP).
Figure BDA0003124789550000121
pijRepresenting the true value i, predicted as the number of j, and k +1 is the number of classes (including empty classes). p is a radical ofiiIs a true quantity. p is a radical ofij、pjiFalse positive and false negative are indicated, respectively.
Example analysis on CCF2015 dataset
The CCF2015 data set contains five classes. There are five maps and four objects marked therein: vegetation, buildings, water, roads, etc. The original picture data set ranges in size from 3000 × 3000 to 6000 × 6000. The original image needs to be processed because of its too large resolution. 13000 images of 256 × 256 size are obtained from the processed image. The specific treatment method is as follows: both the original image and the label image need to be rotated: 90 degrees, 180 degrees, 270 degrees, which are cropped by randomly generating x and y coordinates and then to a thumbnail 256 × 256. A standard validation set of 1000 images was used for the model evaluated.
The experiments performed on CCF2015 based on multi-scale semi-supervision of attention are shown in tables 1, 2. In order to prove that the multiscale generated confrontation network remote sensing image semantic segmentation method has better performance than the existing method, the multiscale attention-based semi-supervised remote sensing image semantic segmentation method is further compared with the fully supervised deep b and semi-supervised Hung methods. As shown in tables 1 to 5, the semi-supervised remote sensing image semantic segmentation method based on multi-scale attention has a larger increase in MIOU compared with a network without long distance correlation introduced before. It shows that introducing multi-scale attention, enhancing the context correlation between pixels is of great significance. The idea is fully proven by experiments on CCF2015 and US2D data sets. The process was tested on 1/8 and 1/2, respectively, on CCF 2015. In order to further verify the feasibility of the method, as shown in fig. 5, the semantic segmentation of the multi-scale attention remote sensing image is improved in both the segmentation effect of the road (blue) and the vegetation (red) and the accuracy of the segmentation of the building (green), and the edge information of the target is closer to the original label. The method for semi-supervised remote sensing image semantic segmentation based on multi-scale attention is proved to be combined with the effectiveness of attention, and the context relation between the longitudinal line and the transverse line of the attention pixel point and other pixel points is paid to, so that the long-distance correlation is better paid to, the feature extraction capability of the generator is further improved, and the performance of the whole network is further improved.
Table 1 experimental results of multi-scale attention on CCF2015 dataset of 1/8
Figure BDA0003124789550000131
Table 3 experimental results of multi-scale attention on CCF2015 dataset of 1/2
Figure BDA0003124789550000132
Example analysis on US2D data set
The city semantic two-dimensional (US2D) dataset for the IGARSS2019 race is a large public dataset containing RGB maps and semantic labels. The US2D dataset covers jackson verl, florida and omaha, nebraska. The Geoscience And Remote Sensing congress (igars) is an influential conference in the field of Remote Sensing. For the data processing of the US2D data set, the image obtained by cutting with the remote sensing 512 × 512 resolution is shown in fig. 5, and the semantic label of the corresponding original image, and the Ground Sampling Distance (GSD) is about 30 cm. For the experiments herein, the crop yielded 13732 training images and 1720 images tested.
The results based on the US2D data set are shown in tables 3 and 4. Experiments were performed on the US2D dataset, with 1/8 and 1/2 labeled data, respectively, and the remainder as unlabeled data. Compared with the existing full-supervision method, the method is found to be greatly improved, and compared with the existing excellent semi-supervision method, the method is also improved to a certain extent. The visualization result on US2D is shown in fig. 6, and it can be seen that the obtained semantic segmentation image can well reflect semantic features by combining the attention-gaining mechanism with the semi-supervised method.
Experimental results of multiscale attention on the US2D dataset of Table 41/8
Figure BDA0003124789550000133
Figure BDA0003124789550000141
Experimental results of multiscale attention on the US2D dataset of Table 51/2
Figure BDA0003124789550000142
In a word, in recent years, the application of a deep convolutional network on remote sensing images is more and more extensive, and aiming at the problem that the existing semi-supervised remote sensing image semantic segmentation method does not concern long-distance correlation between pixels, so that the global context can not be effectively utilized, a cross attention mechanism is introduced, a multi-scale attention module is designed, a generative countermeasure network is combined, the whole network is trained under a semi-supervised framework, the training effect can be improved by using label-free data in data concentration, and the remote sensing image semantic segmentation precision is improved. The effectiveness of the method provided by the invention is verified through experiments on two public remote sensing data sets.
In addition, semantic segmentation is a very important research field of computer vision, and is different from image classification in the field of computer vision, for example, image classification only needs to be performed on each picture, and the semantic segmentation judges the category of each pixel point in an image so as to perform accurate segmentation. Thus, semantic segmentation of images requires much larger annotations than classification of images. It is therefore meaningful to study semantic segmentation with a small amount of labeled data by semi-supervised studies. Since the semantic segmentation of the image is at a pixel level, the specific contour of an object can be well represented through the semantic segmentation, and a target to which each specific pixel belongs is pointed out, so that accurate segmentation is achieved, which is also very helpful for the research of remote sensing images. The remote sensing image has great research value in many fields at present, realizes the semantic segmentation of the remote sensing image, and has profound significance for acquiring the space geographic information and the like needed by people by better utilizing the remote sensing image. The invention provides a semi-supervised remote sensing image semantic division method based on a deep convolutional network and countercheck learning, which aims at the problems that a training model in a remote sensing image needs a large amount of labeled data for training, the deep convolutional model does not concern long-distance correlation, the remote sensing image is difficult to label and the labeled data amount is small, and aims at the problem that the countercheck network remote sensing image semantic division method based on the semi-supervised multi-scale generation has no concern about long-distance correlation. By utilizing input pictures with different scales, the multi-view characteristics of the images are collected, and the semantic segmentation effect of the remote sensing images is further improved. The invention provides a semi-supervised remote sensing image semantic segmentation method combining a deep convolutional neural network and a generative countermeasure network aiming at a remote sensing image semantic segmentation task, difficult acquisition of remote sensing image data under certain conditions, small labeled data amount and great manpower and material resources spent on labeled data reduction as much as possible, and then provides semi-supervised remote sensing image semantic segmentation based on multi-scale attention aiming at extracting features of the current method and irrelevant context correlation, so that the current optimal performance is realized.
In one embodiment, a semi-supervised remote sensing image semantic segmentation device is provided, which comprises:
and the image acquisition module is used for acquiring the original remote sensing image.
The multi-scale attention feature map determining module is used for scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; and carrying out fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map.
And the semantic segmentation prediction map determining module is used for inputting the multi-scale attention feature map into the deep semantic segmentation network to obtain a semantic segmentation prediction map.
The semantic segmentation confidence image determining module is used for inputting the one-hot coding vectors of the semantic segmentation prediction image and the annotation image into a discriminator network to obtain a semantic segmentation confidence image; wherein the original remote sensing image comprises: and (5) labeling the image.
A network training module for cross entropy L based on space multi-classceAntagonistic loss function LASemi-supervised loss function LSTraining the deep semantic segmentation network and the discriminator network.
The specific definition of the semi-supervised remote sensing image semantic segmentation device can refer to the definition of the semi-supervised remote sensing image semantic segmentation method in the above, and is not described herein again. All modules in the semi-supervised remote sensing image semantic segmentation device can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
and acquiring an original remote sensing image.
Scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; and carrying out fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map.
And inputting the multi-scale attention feature map into a deep semantic segmentation network to obtain a semantic segmentation prediction map.
Inputting the one-hot coding vectors of the semantic segmentation prediction graph and the annotation image into a discriminator network to obtain a semantic segmentation confidence image; wherein the original remote sensing image comprises: and (5) labeling the image.
Space-based multi-class cross entropy LceAntagonistic loss function LASemi-supervised loss function LSTraining the deep semantic segmentation network and the discriminator network.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features. Furthermore, the above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A semi-supervised remote sensing image semantic segmentation method is characterized by comprising the following steps:
acquiring an original remote sensing image;
scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; performing fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map;
and inputting the multi-scale attention feature map into a deep semantic segmentation network to obtain a semantic segmentation prediction map.
2. The semi-supervised remote sensing image semantic segmentation method according to claim 1, wherein the obtaining of the multi-scale attention feature map specifically comprises:
inputting the original remote sensing image into a deep convolution neural network to obtain characteristic images X with different sizes1And characteristic diagram X2And characteristic diagram X3
Will feature diagram X1And characteristic diagram X2And characteristic diagram X3Respectively inputting into 3 criss-cross attention modules to obtain attention feature map C1Attention feature chart C2Attention feature chart C3
To attention feature C1Attention feature chart C2Attention feature chart C3And sequentially carrying out up-sampling and fusion processing to obtain a multi-scale attention feature map.
3. The semi-supervised remote sensing image semantic segmentation method according to claim 1, wherein the obtaining of the attention feature map specifically comprises:
for the characteristics M of the original remote sensing image, belonging to RC×W×HUsing two 1 x 1 convolutional layers, two feature maps are generated, named Q and K, respectively, (Q, K) ∈ RC′×W×H
For the feature mapping Q and the feature mapping K, sequentially carrying out Affinity operation, SoftMax operation and Aggregation operation to obtain an attention feature map A e R(H+W-1)×W×H
Wherein c 'is a channel of the characteristic image, c is the number of channels of the original remote sensing image, and c' is smaller than c; H. and W is the height and the width of the original remote sensing image respectively.
4. The semi-supervised remote sensing image semantic segmentation method according to claim 3, wherein the Affinity operation, SoftMax operation and Aggregation operation specifically comprise:
for each position u of the feature map Q, a vector Q is obtainedu∈RC′(ii) a By extracting AND bits from K at the same timeThe feature vectors of u in the same row or column are arranged to obtain the set omegauAnd has the following components:
Figure FDA0003124789540000021
wherein for Ωu∈R(H+W-1)×C′,Ωi,u∈RC′Represents omegauThe ith element in (1); di,uE.g. D represents the characteristic QuAnd Ωi,uThe degree of correlation between i ═ 1,. | Ωu|],D∈R(H+W-1)×W×H
Applying SoftMax operations on D by channel dimension and a convolutional layer with 1 x 1 filter on M to generate V e R for feature adaptationC×W×H(ii) a And mapping V for each position on the feature space dimension to obtain a vector Vu∈RCAnd a set phiu∈R(H+W-1)×CAnd has the following components:
Figure FDA0003124789540000022
set phiuRepresenting a feature vector set in the feature map V in the same column or the same row with the position u; a. thei,uIs the position of scalar value channels i and u in A;
acquiring No-local information of the image through an Aggregation operation; wherein M isu'represents M' epsilon R in the output characteristic diagramC ×W×HThe feature vector at position u.
5. The semi-supervised remote sensing image semantic segmentation method of claim 1, further comprising:
inputting the one-hot coding vectors of the semantic segmentation prediction graph and the annotation image into a discriminator network to obtain a semantic segmentation confidence image; wherein the original remote sensing image comprises: and (5) labeling the image.
6. The semi-supervised remote sensing image semantic segmentation method of claim 5, wherein the discriminator network comprises:
5 convolution layers, the size of the convolution kernel is 4 multiplied by 4, the number of channels is [64, 128, 256, 512, 1] respectively, and the step length is 2; replacing the ReLU after the convolution layer with Leaky-ReLU; an upsampling layer is added to the last layer.
7. The semi-supervised remote sensing image semantic segmentation method of claim 1, further comprising:
space-based multi-class cross entropy LceAntagonistic loss function LASemi-supervised loss function LSAnd training the deep semantic segmentation network and the discriminator network.
8. The semi-supervised remote sensing image semantic segmentation method according to claim 7, wherein the training of the deep semantic segmentation network and the discriminator network specifically comprises:
when using the label data, the multi-class penalty function LceObtained by the following method:
Figure FDA0003124789540000031
through LDTraining the discriminator network:
Figure FDA0003124789540000032
when x isnWhen the pixel point is equal to 1, the generator generates the pixel point; if y isnIf 1, the sample is from the label image; d (G (X)n))(h,w)) Is a pixel XnA feature at the position of (h, w); d (Y)n)(h,w)Is a pixel YnA feature at the position of (h, w); if it is not
Figure FDA0003124789540000033
To a certain class classification, then Yn (h,w)Is 1, otherwise is 0;
fighting the learning process through loss LATo train the discriminator:
Figure FDA0003124789540000034
when training with unlabeled data, only LAWhere applicable, and for unlabeled data, confidence maps D (G (X) are generated by training the discriminator networkn))(h,w));
Y obtained by performing one-hot coding on annotated imagenThrough element-by-element setting, the method obtains
Figure FDA0003124789540000035
If c is*=argmaxcG(Xn)(h,w,c)Then, then
Figure FDA0003124789540000036
Setting a threshold value
Figure FDA0003124789540000037
Obtaining the areas with confidence by highlighting the confidence map; l isSThe definition is as follows:
Figure FDA0003124789540000038
Figure FDA0003124789540000039
i () is an index function, and control is performed by setting USThe sensitivity is controlled by the value of (a) to adjust the training process of the network.
9. A semi-supervised remote sensing image semantic segmentation device is characterized by comprising:
the image acquisition module is used for acquiring an original remote sensing image;
the multi-scale attention feature map determining module is used for scaling the original remote sensing image into 3 scaled images with different sizes; respectively inputting the 3 scaled images into 3 criss-cross attention modules to obtain 3 attention feature maps; performing fusion processing on the 3 attention feature maps to obtain a multi-scale attention feature map;
and the semantic segmentation prediction map determining module is used for inputting the multi-scale attention feature map into the deep semantic segmentation network to obtain a semantic segmentation prediction map.
10. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the method of any of claims 1-8.
CN202110686544.8A 2021-06-21 2021-06-21 Semi-supervised remote sensing image semantic segmentation method and device and computer equipment Withdrawn CN113298815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110686544.8A CN113298815A (en) 2021-06-21 2021-06-21 Semi-supervised remote sensing image semantic segmentation method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110686544.8A CN113298815A (en) 2021-06-21 2021-06-21 Semi-supervised remote sensing image semantic segmentation method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN113298815A true CN113298815A (en) 2021-08-24

Family

ID=77329003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110686544.8A Withdrawn CN113298815A (en) 2021-06-21 2021-06-21 Semi-supervised remote sensing image semantic segmentation method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN113298815A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780296A (en) * 2021-09-13 2021-12-10 山东大学 Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN113989585A (en) * 2021-10-13 2022-01-28 北京科技大学 Medium-thickness plate surface defect detection method based on multi-feature fusion semantic segmentation
CN113989662A (en) * 2021-10-18 2022-01-28 中国电子科技集团公司第五十二研究所 Remote sensing image fine-grained target identification method based on self-supervision mechanism
CN114022762A (en) * 2021-10-26 2022-02-08 三峡大学 Unsupervised domain self-adaption method for extracting area of crop planting area
CN114972293A (en) * 2022-06-14 2022-08-30 深圳市大数据研究院 Video polyp segmentation method and device based on semi-supervised spatio-temporal attention network
CN115222629A (en) * 2022-08-08 2022-10-21 西南交通大学 Single remote sensing image cloud removing method based on cloud thickness estimation and deep learning
CN115375677A (en) * 2022-10-24 2022-11-22 山东省计算中心(国家超级计算济南中心) Wine bottle defect detection method and system based on multi-path and multi-scale feature fusion
CN115496732A (en) * 2022-09-26 2022-12-20 电子科技大学 Semi-supervised heart semantic segmentation algorithm
CN116129117A (en) * 2023-02-03 2023-05-16 中国人民解放军海军工程大学 Sonar small target semi-supervised semantic segmentation method and system based on multi-head attention

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780296A (en) * 2021-09-13 2021-12-10 山东大学 Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN113780296B (en) * 2021-09-13 2024-02-02 山东大学 Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN113989585A (en) * 2021-10-13 2022-01-28 北京科技大学 Medium-thickness plate surface defect detection method based on multi-feature fusion semantic segmentation
CN113989585B (en) * 2021-10-13 2022-08-26 北京科技大学 Medium-thickness plate surface defect detection method based on multi-feature fusion semantic segmentation
CN113989662A (en) * 2021-10-18 2022-01-28 中国电子科技集团公司第五十二研究所 Remote sensing image fine-grained target identification method based on self-supervision mechanism
CN114022762A (en) * 2021-10-26 2022-02-08 三峡大学 Unsupervised domain self-adaption method for extracting area of crop planting area
CN114972293A (en) * 2022-06-14 2022-08-30 深圳市大数据研究院 Video polyp segmentation method and device based on semi-supervised spatio-temporal attention network
CN115222629A (en) * 2022-08-08 2022-10-21 西南交通大学 Single remote sensing image cloud removing method based on cloud thickness estimation and deep learning
CN115496732A (en) * 2022-09-26 2022-12-20 电子科技大学 Semi-supervised heart semantic segmentation algorithm
CN115496732B (en) * 2022-09-26 2024-03-15 电子科技大学 Semi-supervised heart semantic segmentation algorithm
CN115375677A (en) * 2022-10-24 2022-11-22 山东省计算中心(国家超级计算济南中心) Wine bottle defect detection method and system based on multi-path and multi-scale feature fusion
CN116129117A (en) * 2023-02-03 2023-05-16 中国人民解放军海军工程大学 Sonar small target semi-supervised semantic segmentation method and system based on multi-head attention

Similar Documents

Publication Publication Date Title
CN113298815A (en) Semi-supervised remote sensing image semantic segmentation method and device and computer equipment
US11200424B2 (en) Space-time memory network for locating target object in video content
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN111612008B (en) Image segmentation method based on convolution network
CN111860235B (en) Method and system for generating high-low-level feature fused attention remote sensing image description
CN112597941B (en) Face recognition method and device and electronic equipment
CN113780149B (en) Remote sensing image building target efficient extraction method based on attention mechanism
CN111369581A (en) Image processing method, device, equipment and storage medium
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
Chen et al. Corse-to-fine road extraction based on local Dirichlet mixture models and multiscale-high-order deep learning
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN113111716B (en) Remote sensing image semiautomatic labeling method and device based on deep learning
CN111723660A (en) Detection method for long ground target detection network
CN113505670A (en) Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
Malav et al. DHSGAN: An end to end dehazing network for fog and smoke
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
CN115035599A (en) Armed personnel identification method and armed personnel identification system integrating equipment and behavior characteristics
CN114550014A (en) Road segmentation method and computer device
CN117079276B (en) Semantic segmentation method, system, equipment and medium based on knowledge distillation
CN112465847A (en) Edge detection method, device and equipment based on clear boundary prediction
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN115063352A (en) Salient object detection device and method based on multi-graph neural network collaborative learning architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210824