CN113011427B - Remote sensing image semantic segmentation method based on self-supervision contrast learning - Google Patents

Remote sensing image semantic segmentation method based on self-supervision contrast learning Download PDF

Info

Publication number
CN113011427B
CN113011427B CN202110285256.1A CN202110285256A CN113011427B CN 113011427 B CN113011427 B CN 113011427B CN 202110285256 A CN202110285256 A CN 202110285256A CN 113011427 B CN113011427 B CN 113011427B
Authority
CN
China
Prior art keywords
local
semantic segmentation
training
matching
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110285256.1A
Other languages
Chinese (zh)
Other versions
CN113011427A (en
Inventor
李海峰
李益
李朋龙
丁忆
马泽忠
张泽烈
胡艳
肖禾
陶超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Geographic Information And Remote Sensing Application Center
Central South University
Original Assignee
Chongqing Geographic Information And Remote Sensing Application Center
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Geographic Information And Remote Sensing Application Center, Central South University filed Critical Chongqing Geographic Information And Remote Sensing Application Center
Priority to CN202110285256.1A priority Critical patent/CN113011427B/en
Publication of CN113011427A publication Critical patent/CN113011427A/en
Priority to AU2021103625A priority patent/AU2021103625A4/en
Application granted granted Critical
Publication of CN113011427B publication Critical patent/CN113011427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image semantic segmentation method based on self-supervision contrast learning, which comprises the following steps of: constructing a semantic segmentation network model (such as Deeplab v3 +); pre-training an encoder of the network model by using label-free data; after the pre-training is finished, performing supervised semantic segmentation training on the network model on a labeled sample; performing semantic segmentation on the remote sensing image by adopting a network model finished by supervised semantic segmentation training; in the pre-training process, a global style comparison and local matching comparison combined mode is adopted for comparison and learning. The invention applies the contrast self-supervision learning to the remote sensing semantic segmentation data set, provides a global style and local matching contrast learning framework, and forms the remote sensing image semantic segmentation method based on the self-supervision contrast learning, so that the semantic segmentation method has wider application range and better segmentation effect.

Description

Remote sensing image semantic segmentation method based on self-supervision contrast learning
Technical Field
The invention relates to the technical field of semantic segmentation of remote sensing images, in particular to a semantic segmentation method of a remote sensing image based on self-supervision contrast learning.
Background
With the development of remote sensing technology, the high-resolution remote sensing image is easier to obtain, and the remote sensing image is more and more widely applied to the aspects of city planning, disaster monitoring, environmental protection, traffic tourism and the like. The extraction and identification of information in remote sensing images are generally the basis of all applications, and semantic segmentation is a technology for identifying and classifying full-image pixels, so that the semantic segmentation is an important and challenging research direction in the field of remote sensing.
In recent years, with the development of deep learning technology, the semantic segmentation of remote sensing images obtains impressive results, and is more and more widely applied to the aspects of global earth surface coverage, urban built-up area identification and the like. However, the success of the existing deep learning technology heavily depends on a large number of high-quality labeled samples, but due to the high labeling cost of the semantic segmentation task and the huge heterogeneity of the remote sensing image in time and space, the existing labeled data is only a section of the remote sensing image, and the requirements of the diversity and the richness of the samples cannot be met.
For the problem of insufficient labeled samples, a common method is to perform data enhancement to generate more samples, which improves the robustness of the model to a certain extent, and can be used in a general training process, but the effect is limited; other studies attempt to use other labeled data, such as pre-training or migration learning, that is, model parameters trained on other larger data sets or data sets more relevant to the current task are migrated into the existing task instead of random initialization, so that training time can be greatly reduced, and the defect of data shortage can be overcome to some extent.
In fact, although a large number of labels are not available, image data with extremely high diversity and richness is available all over the world, so that how to fully and effectively utilize the data is critical, and a mode of semi-supervised learning is to train a large amount of unlabelled data and a small amount of labeled data. The self-supervision learning provides a new paradigm, does not depend on any labeled data, and directly uses the image data to design a self-supervision signal to guide learning, so that the problems of the supervision paradigm can be well avoided, potential more universal knowledge can be expected to be learned, and then the knowledge is transferred to specific downstream tasks.
Depending on the design of the self-supervision signal, the current self-supervision learning can be broadly divided into three categories: context based, timing based, contrast based. Recent work has shown that methods based on comparative learning can achieve superior performance. Reference documents: chen T, Kornblith S, Norouzi M, et al.A Simple frame for contrast Learning of Visual Representations [ J ]. 2020. The comparative learning is to construct a characterization by learning the similarity or dissimilarity of two things, and the core idea is that the feature expression between the positive samples should be similar, and the feature expression between the negative samples should be dissimilar. The method based on the comparative learning can obtain better performance, and the intuition is that the characteristics of different transformations of the same image are similar and dissimilar to the characteristics of different images, so as to train a proper network.
However, most of the existing comparison learning is example-level comparison, that is, a global feature is extracted from the whole graph, and then the global feature is distinguished. The method has the advantages that good performance is shown on a natural image classification data set, but ground object distribution in a single image of the actually cut remote sensing image is possibly rich, the class of the single image is relatively pure or has a certain prominent class unlike the natural image classification data set, much information is lost if only global features are extracted from the whole image and are distinguished like an original example-level comparison learning method, and meanwhile, the difference between a semantic segmentation task and the classification task is considered, so that only image-level distinction is needed for classification, the semantic segmentation task is pixel-level classification, and different parts in the same image need to be distinguished.
Disclosure of Invention
In view of the above, the present invention aims to solve the problems of learning features directly from unlabeled images to help a downstream semantic segmentation task with only a small number of labels, and simultaneously solving the problems that the category of a single image of image data is not pure and the semantic segmentation task needs to be distinguished locally.
In order to achieve the purpose, the invention adopts the following technical scheme:
the remote sensing image semantic segmentation method based on the self-supervision contrast learning comprises the following steps:
step 1, constructing a Deeplab v3+ network model;
2, pre-training the encoder of the network model by adopting label-free data;
step 3, after the pre-training is finished, performing supervised semantic segmentation training on the network model on a labeled sample;
and 4, performing semantic segmentation on the remote sensing image by adopting a network model finished by supervised semantic segmentation training.
In the pre-training process, the comparison learning is carried out by adopting a mode of combining global style comparison and local matching comparison, and the method comprises the following steps:
step 201, performing random data transformation on the label-free data, and for a given sample xiPerform a random data transformation t' (x)i) And t' (x)i) Thereby producing two related instances x'iAnd x ″)iTaking the sample as a positive sample pair, wherein t 'represents random clipping and scaling, and t' represents random clipping and scaling, random turning, random rotation, random color distortion and random Gaussian blur in sequence;
step 202, using an encoder e (-) in the deep v3+ network model to extract global style features from the transformed sample instance: stylef'i=stylef(x′i)=cat(μ(e(x′i)),σ(e(x′i) Of these, styref'iRepresenting global style characteristics, mu represents the average value of each channel in the characteristic diagram, namely global average pooling, sigma represents the variance of each channel, and cat represents channel splicing;
step 203, processing the global style feature by using a projection head projection header (·), where the projection head is a multi-layer perceptron with a hidden layer:
zi′=g(stylefi′)=W(2)r(W(1)stylefi′),
wherein, W(2)Denotes the second fully-connected layer, W(1)Representing the first fully connected layer, r represents the Relu activation function,zi' represents the global style characteristics after the projection head processing;
step 204, from the transformed sample instance x ', using encoder e (-) and decoder d (-) of the Deeplab v3+ network model'iAnd x ″)iExtracting feature vector d (e (x'i) And d (e (x ″)i) From d (e (x'i) D (e (x'))i) Obtaining a plurality of matching locally corresponding features, p'jAnd p ″)jThe feature map corresponding to the matched local is matched, then the feature map is subjected to global average pooling to obtain a local feature vector, namely:
fL(p′j)=μ(p′j)
wherein f isL(p′j) Is a local feature vector;
step 205, using projection head projection headerL(. to) process the local feature vectors, the projection head is a multi-layer perceptron with one hidden layer:
u′j=gL(fL(p′j))=W(4)r(W(3)fL(p′j)),
wherein, W(4)Denotes the fourth fully-connected layer, W(3)Denotes a third fully-connected layer, u'jRepresenting local matching characteristics processed by the projection head;
step 206, training the encoder using an overall loss function, the overall loss function consisting of global style contrast loss and local matching contrast loss:
L=(1-λ)lG+λ·lL
wherein λ is an adjustable weight parameter, lGRepresenting global style contrast loss,/LIndicating a locally matched contrast loss.
Preferably, for N samples from the same batch, the global style contrast loss is defined as follows:
Figure BDA0002980183900000051
wherein:
Figure BDA0002980183900000052
where sim () represents the similarity between the computed feature vectors, Λ-2(N-1) negative samples except for the positive sample pair are represented, and tau represents a temperature parameter; style () represents extracting a global style feature vector from the encoder-extracted features by computing the mean and variance,
stylef(x′i)=stylef′i=cat(μ(e(x′i)),σ(e(x′i))
for N samples from the same batch, the local match contrast loss is defined as follows:
Figure BDA0002980183900000061
wherein:
Figure BDA0002980183900000062
wherein N isLRepresenting the number, Λ, of all local regions selected from the N samples of the same batch-Is the collection of the characteristic diagrams corresponding to all the other parts except the matched part.
In the process of calculating the local matching contrast loss, selecting and matching local regions, extracting local features of the corresponding local regions, and calculating the local matching contrast loss;
the selection and matching of the local regions comprises: for a given sample xiAfter a random data transformation t' (x)i) And t' (x)i) Then two transformation versions are generated, a plurality of local areas are randomly selected and matched from the two transformation versions, the pixel position is recorded by introducing an index tag, and the central position of the matched local areas is ensured to be in the original imageCorrespondingly; from x'iRandomly selecting a local area, obtaining the index value of the local central position, and then determining x ″' according to the index valueiThe positions of the matched parts are located, excessive overlapping between the parts is ensured, the local area is excluded after each selection is finished, the centers of the subsequently selected parts are ensured not to fall into the selected parts, and the step is repeated for multiple times to obtain a plurality of matched local areas;
the local feature extraction comprises the following steps: from transformed sample instance x 'using the encoder and decoder portions of the Deeplab v3+ network model'iAnd x ″)iExtracting feature vector d (e (x'i) And d (e (x ″)i) D (e (x) 'according to the idea of selecting and matching local regions from among the selection and matching of local regions'i) And d (e (x ″)i) Obtaining a plurality of locally corresponding features matching, let p'jIs derived from d (e (x'i) Selected to obtain p ″)jIs from d (e (x ″)i) Is selected from among p'jAnd p ″)jMatching local corresponding feature maps, and then performing global average pooling on the feature maps to obtain local feature vectors;
the local match contrast loss updates the deplab v3+ network model by the feature characterization of the matching local regions being similar, while the feature expressions of the local regions that are not matching in the same batch are dissimilar.
Preferably, in the pre-training phase, Adam is used as the optimizer and the weight decay is set to 1e-5An initial learning rate of 0.01, using a cosine attenuation strategy, selecting the downstream task with the lowest loss; in the fine tuning phase, Adam is used as the optimizer, the epoch number is 150, the batch _ size is 16, and the initial learning rate is 0.001.
The invention has the beneficial effects that:
(1) the invention applies contrast self-supervision learning to the remote sensing semantic segmentation data set, and can directly learn features from the unlabeled images to help the downstream semantic segmentation task with only a small number of labels;
(2) aiming at the problems that the category of a single image of image data is not pure and the semantic segmentation task needs to be distinguished locally, a global style and local matching contrast learning framework is provided, so that the semantic segmentation effect is better.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a block diagram of the global style comparison and local matching comparison combination proposed by the present invention;
FIG. 3 is a schematic diagram of local region selection and matching according to the present invention;
FIG. 4 is a graph comparing the results of examples of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
As shown in FIG. 1, the method for segmenting the semantics of the remote sensing image based on the self-supervision contrast learning comprises the following steps:
step 1, constructing a Deeplab v3+ network model;
2, pre-training the encoder of the network model by adopting label-free data;
step 3, after the pre-training is finished, performing supervised semantic segmentation training on the network model on a labeled sample;
and 4, performing semantic segmentation on the remote sensing image by adopting a network model finished by supervised semantic segmentation training.
The comparative learning is to construct a characterization by learning the similarity or dissimilarity of two things, and the core idea is that the feature expression between the positive samples should be similar, and the feature expression between the negative samples should be dissimilar. Considering that the ground feature distribution on a single remote sensing image which is actually cut out may be abundant, and many detailed information is inevitably lost only by extracting image-level representation for distinguishing, the invention provides global style and local matching comparison learning aiming at a remote sensing semantic segmentation task, and the overall framework of the method is shown in figure 2 and mainly comprises two parts: global style contrasts and local match contrasts. It mainly consists of two modules: 1) the global style comparison module is mainly used for improving the overall characteristics that the overall average pooling characteristics adopted by the existing comparison learning in the process of measuring the characteristics of the sample can not well replace the overall characteristics of an image, and introduces style characteristics which can represent the overall characteristics of the image so as to help the model to better learn the image-level characteristics;
2) the local feature matching comparison module mainly considers that the ground feature types on a single image of a semantic segmentation data set are rich, only global features are extracted, a lot of detail information can be lost, and meanwhile, image-level representation is possibly suboptimal for a semantic segmentation task needing pixel-level differentiation, which needs local (pixel) level differentiation.
In the pre-training process, the comparison learning is carried out by adopting a mode of combining global style comparison and local matching comparison, and the method comprises the following steps:
step 201, performing random data transformation on the label-free data, and for a given sample xiPerform a random data transformation t' (x)i) And t' (x)i) Thereby producing two related instances x'iAnd x ″)iTaking the sample as a positive sample pair, wherein t 'represents random clipping and scaling, and t' represents random clipping and scaling, random turning, random rotation, random color distortion and random Gaussian blur in sequence; in order to prompt the model to learn the general space-time invariance characteristics, the embodiment learns the space invariance characteristics by performing spatial transformation such as random clipping scaling, inversion and rotation, and expects to learn the time invariance characteristics by simulating the transformation on a time phase by color distortion, gaussian blur, random noise addition and the like.
Step 202, extracting global style features from the transformed sample instance by using an encoder e (-) in the Deeplab v3+ network model: styrene ef'i=stylef(x′i)=cat(μ(e(x′i)),σ(e(x′i) Of these, styref'iRepresenting global style features, μ represents the averaging of each channel in the feature map, i.e. global mean pooling, and σ represents the average of each channelCalculating variance, and cat represents channel splicing;
step 203, processing the global style feature by using a projection head projection header (·), where the projection head is a multi-layer perceptron with a hidden layer:
zi′=g(stylefi′)=W(2)r(W(1)stylefi′),
wherein, W(2)Denotes the second fully-connected layer, W(1)Representing the first fully connected layer, r represents the Relu activation function;
step 204, encoding and decoding from transformed sample instance x 'using Deeplab v3+ network model'iAnd x ″)iExtracting feature vector d (e (x'i) And d (e (x ″)i) From d (e (x'i) And d (e (x ″)i) Obtaining features, p 'corresponding to a plurality of matching local portions'jAnd p ″)jThe feature map corresponding to the matched local is obtained, then the feature map is subjected to global average pooling to obtain a local feature vector, namely:
fL(p′j)=μ(p′j)
step 205, project header is adoptedLProcessing the local features, wherein the projection head is a multilayer perceptron with an implied layer:
u′j=gL(fL(p′j))=W(4)r(W(3)fL(p′j)),
wherein, W(4)Denotes the fourth fully-connected layer, W(3)Represents a third fully connected layer;
step 206, training the encoder using an overall loss function, the overall loss function consisting of global style contrast loss and local matching contrast loss:
L=(1-λ)lG+λ·lL
wherein λ is an adjustable weight parameter, lGRepresenting global style contrast loss,/LIndicating a locally matched contrast loss.
In the prior art, the mean and variance of each channel extracted in the convolutional neural network can represent the style of a picture, so that in the invention, global style feature vectors are extracted from features extracted by an encoder by calculating the mean and variance, as shown in a formula:
stylef′i=stylef(x′i)=cat(μ(e(x′i)),σ(e(x′i))
therefore, for N samples from the same batch, the global style contrast loss is defined as follows:
Figure BDA0002980183900000111
Figure BDA0002980183900000112
where sim () denotes computing the similarity between the feature vectors, Λ-2(N-1) negative samples except for the positive sample pair are represented, and tau represents a temperature parameter; style () represents the global style feature vector extracted from the encoder by computing the mean and variance of the features: styryl ef (x'i)=stylef′i=cat(μ(e(x′i)),σ(e(x′i))。
In the actual cutting image, the category of a single image is not necessarily pure, and may be relatively rich, if only a whole image is extracted to perform measurement and differentiation, a lot of information is inevitably lost, and meanwhile, considering that the semantic segmentation task is pixel-level classification, unlike the image classification task, the semantic segmentation task only needs to distinguish the whole image, and also needs to distinguish different parts in the single image, and mainly comprises the following parts:
(1) local region selection and matching
As shown in fig. 3, for a given sample xiTransformed by random data t' (x)i) And t' (x)i) Two transformed versions are then generated from which a plurality of local regions are randomly selected and matched. Wherein due to dataDuring transformation, operations such as random scaling and rotation are carried out, positions of the pixels are not matched, and therefore the pixel positions are recorded by introducing an index tag, and the central position of the matched local is guaranteed to be corresponding in an original image. Specifically, first from x'iRandomly selecting a local area, obtaining the index value of the local central position, and then determining x ″' according to the index valueiAnd the positions of the matched local parts are matched, excessive overlapping of the local parts is ensured, the local area is excluded after each selection is finished, the center of the subsequently selected local part is ensured not to fall into the selected local part, and the step is repeated for multiple times to obtain a plurality of matched local areas.
(2) Local feature extraction
From transformed sample instances x 'using a full codec network'iAnd x ″)iExtracting feature vector d (e (x'i) And d (e (x ″)i) E and d can be the encoding-decoding part of any semantic segmentation network, and in the embodiment, e and d respectively correspond to the encoding and decoding parts of Deeplab v3 +. From d (e (x'i) D (e (x'))i) Obtaining a plurality of matching locally corresponding features, e.g. p'jIs derived from d (e (x'i) Selected to obtain p ″)jIs derived from d (e (x ″)i) Is selected from among p'jAnd p ″)jIs the corresponding characteristic diagram of the matched local. Then, global average pooling is carried out on the local feature vectors to obtain local feature vectors, namely:
fL(p′j)=μ(p′j)
(3) local feature extraction
The local contrast loss is similar through the feature characterization of the matched local region, and the feature expression of the local region which is not matched with the local region in the same batch is dissimilar to update a complete semantic segmentation network, and for N samples from the same batch, the local contrast loss is defined as follows:
Figure BDA0002980183900000121
Figure BDA0002980183900000122
wherein N isLRepresenting the number of all local regions selected from the N samples of the same batch, Λ-Is the collection of the characteristic diagrams corresponding to all the other parts except the matched part. gLSimilar to g (-) is a projection head.
In order to illustrate the effectiveness of the method, experiments are performed on four data sets, as shown in table 1, wherein ISPRS-Potsdam Dataset and deep global Land Cover Classification Dataset are public data sets, the labeling quality of the data sets is relatively high, Hubei Dataset and xiangan Dataset are real surface coverage Classification datasets, the image resolution and the Classification system of the two data sets are basically consistent, and the influence of domain difference on self-supervision is conveniently researched subsequently, wherein the labeling quality of the Hubei data set is good and inconsistent, the image time is inconsistent with the labeling time, and the label of the Hunan Tan data set is roughly corrected manually, so that the quality is relatively high.
Table 1 data set description
Figure BDA0002980183900000131
The ISPRS Potsdam data set consists of 38 high-resolution remote sensing aerial images, images are collected in a Germany Potsdam city, the size of the images is 6000 pixels multiplied by 6000 pixels, the spatial resolution is 5cm, each image has four wave bands (NIR, R, G and B), the labels are manually labeled pixel-level labels, the quality is high, and 6 categories are provided: impervious surfaces, buildings, low-rise vegetation, trees, automobiles, and others. To train and evaluate the network, 24 of them were selected as the training set and the remaining 14 as the test set. The data set was slide-clipped to 256 × 256 patches, resulting in 13824 images for self-supervised training, 138 randomly selected as labeled training data downstream, and 8064 downstream test set data volumes.
Deep global Land Cover Classification Challenge provides high resolution sub-scale satellite images of 2448 × 2448, mainly covering rural areas, for a total of 8 categories: towns, rural areas, agricultural areas, pastures, forests, bodies of water, wastelands, unknown lands (clouds and others). 730 out of the experiments were selected as training sets and 73 were selected as test sets. The data set was slide-clipped to 512 x 512 size image blocks, and finally 18248 images were used for the self-supervised training, with the downstream default training set data volume of 182 and the test set data volume of 1825.
The Hubei data set is from a real project, the image data is from a high-resolution second satellite, the resolution is 2m, the label is from binary data, and the labels are artificially combined into 10 types: background, arable land, towns, rural areas, water, forestry, grasslands, other structures, traffic facilities, and others. The time of the label is not necessarily corresponding to the image, the quality of the labeled data is not uniform, and the definition among the classes in the class merging process is questioned, so the data quality is not high. We split the entire hubei province into several panels 13889 x 9259, and we randomly selected only 34 of them as training and 5 as testing due to limited resources. The data set is slide-clipped to 256 × 256 image blocks, and finally 66471 images are generated for self-supervision, the data volume of the downstream default training set is 664, and the data volume of the test set is 9211.
The Hunan pond data set is also from a real project, the image data is from a high-resolution second satellite, the resolution is 2m, the coverage range is Hunan pond city in Hunan province of China, and the labels are artificially merged into 8 types: background, cultivated land, cities and towns, rural areas, water bodies, forest lands, grasslands and traffic facilities. Because the images of the labels in the region in the same year are artificially and roughly corrected during the manufacturing process, the quality of the quan-Hunan data is higher than that of the Hubei. The entire Hunan Tan City was divided into 4096 × 4096 size panels, 85 of which were randomly selected for training and 21 for testing. The data set was slide-cropped to 256 × 256 image blocks, yielding 16051 images for self-supervision, 160 downstream default training set data volume and 3815 test set data volume.
The method used for comparison is as follows: random baseline: in the fine adjustment stage, no pre-training model is loaded, and the network is initialized randomly; ImageNet Pre-training, namely initializing a fine tuning stage model backbone by using a Pre-training model on ImageNet; jigsaw, Inpating, MoCo v2, and simCLR will not be described in detail.
The merits of the proposed self-supervision method need to be evaluated on a specific downstream semantic segmentation task, specifically we measure the overall accuracy on a test set of downstream labeled data using OA and Kappa, where OA represents the overall accuracy of all pixels, defined as follows:
Figure BDA0002980183900000151
wherein TP represents the total number of pixels predicted correctly, and N represents the total number of pixels.
Although OA can directly reflect the correct proportion of the overall classification, when the sample is unbalanced, the partial classification may be very low when OA is high, and Kappa is used as an index for measuring consistency, which can well reflect the situation, and is specifically defined as follows:
Figure BDA0002980183900000152
wherein,
Figure BDA0002980183900000153
acnumber of real pixels representing class c, bcAnd (4) expressing the number of the predicted c-th type pixels, wherein N is the total number of the pixels.
In the pre-training process, the model adopts Deeplab v3+, wherein the method such as simCLR and the like as the baseline is only designed for training the encoder part, Adam is used as an optimizer, the weight attenuation is set to be 1e-5, the initial learning rate is 0.01, a cosine attenuation strategy is used for training 400 epochs, and the lowest loss is selected for a downstream task. The Batch _ size is 64. The input image size is 256 × 256, random cropping scaling resize is performed to 224 × 224,
although our method can train the decoding part of the network at the same time during the self-supervised training, since the simCLR method for comparison is only designed for training the encoder, as in the following experimental results, as not specifically stated, only the encoder part of the pre-training model is loaded during the fine tuning, then the supervised semantic segmentation training is performed on a small number of labeled samples, Adam is used as an optimizer, the epoch number is 150, the batch _ size is 16, the initial learning rate of 0.001, and each epoch decays to 0.98.
The fine adjustment effect of the proposed method on a small number of labeled samples is explored on a plurality of data sets, the labeled sample amount used in fine adjustment is set to be 1% of the self-supervision data amount, the result is shown in table 2, different self-supervision modes can be found from the result to have great influence on the result, and the method of the invention achieves the optimal effect; in addition, the method mostly exceeds the result of loading ImageNet pre-training parameters, wherein the ImageNet pre-training parameters are obtained by performing supervised training on million levels of ImageNet, and the amount of self-supervision data in the experiment is mostly only about 2 ten thousand, so that the method can be used for explaining that although the pre-training model for loading ImageNet can be greatly improved, the pre-training parameter for loading ImageNet is not an optimal mode due to the huge difference between a natural image and a remote sensing image, and the method is more reasonable if a strong model can be directly trained from the unmarked remote sensing image. Furthermore, it is noted that in the experiments the images used for self-supervision are similar to the images of the downstream tasks, both from the same dataset, but such an arrangement is available in practice, since a large number of images of the same origin can be easily obtained by satellite technology.
Table 2 four data set methods compare results.
Figure BDA0002980183900000171
Since the self-supervision task does not need a label, and a large amount of abundant image data is available, the experiment explores whether the result has a gain when the self-supervision data volume increases, the experiment is performed on two datasets potsdam and xiangtan, the self-supervision data volume is respectively randomly extracted by 20%, 50% to 100%, the result is shown in fig. 4, wherein None represents that the self-supervision pre-training parameters are not loaded, the network parameters are directly and randomly initialized, and it can be found from the result that the whole body has a rising trend along with the increase of the self-supervision data volume on the two datasets, and meanwhile, compared with the simlr method as a baseline, the method is relatively more obviously improved, so that the method is expected to be more meaningful when the self-supervision training is performed by using a larger dataset.
TABLE 3 Effect of self-supervised training of different domain datasets on results
Figure BDA0002980183900000172
Figure BDA0002980183900000181
In addition, we have compared the model performance of the pre-training using the data sets of different domains, and as the result is shown in table 3, it can be found from the result that the performance of the pre-training model using the data set similar to the data set of the downstream task is better, and at the same time, most of the time, our method surpasses the supervised learning, except in the case of the domain difference being extremely small (HuBei → XiangTan, XiangTan → HuBei), mainly because the two domains not only have the same image resolution, but also are close in physical position, and the key is that the classification system is also completely consistent, so it is difficult to exceed the precision of the supervised learning at present. Although the model performance is further improved with the increase of the self-supervision training amount, in the experiment, it is found that if images which are not similar to the data set of the downstream task are mixed, the model performance can not be further improved and even can be damaged, but since the self-supervision task does not need a label, a large amount of image data which is similar to the target data set can be obtained.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (4)

1. The remote sensing image semantic segmentation method based on the self-supervision contrast learning is characterized by comprising the following steps of:
step 1, constructing a Deeplab v3+ network model;
2, pre-training the encoder of the network model by adopting label-free data;
step 3, after the pre-training is finished, performing supervised semantic segmentation training on the network model on a labeled sample;
step 4, performing semantic segmentation on the remote sensing image by adopting a network model finished by supervised semantic segmentation training;
in the pre-training process, comparison learning is carried out in a mode of combining global style comparison and local matching comparison, and the method comprises the following steps:
step 201, performing random data transformation on the label-free data, and for a given sample xiPerform a random data transformation t' (x)i) And t' (x)i) Thereby producing two related instances x'iAnd x ″)iTaking the sample as a positive sample pair, wherein t 'represents random clipping and scaling, and t' represents random clipping and scaling, random turning, random rotation, random color distortion and random Gaussian blur in sequence;
step 202, utilizing an encoder e (-) in the Deeplab v3+ network model to extract a feature map of global style features from the transformed sample instance: style efi′=stylef(x′i)=cat(μ(e(x′i)),σ(e(x′i) Of which styrefi' global style feature, μ represents the average of each channel in the feature map,global average pooling, wherein sigma represents the variance of each channel, and cat represents the channel splicing;
step 203, processing the global style feature by using a projection head, where the projection head g (-) is a multi-layer perceptron with a hidden layer:
zi′=g(stylefi′)=W(2)r(W(1)stylefi′)
wherein, W(2)Denotes the second fully-connected layer, W(1)Representing the first fully-connected layer, r representing the Relu activation function, zi' represents the global style characteristics processed by the projection head g (-);
step 204, from the transformed sample instance x ', using encoder e (-) and decoder d (-) of the Deeplab v3+ network model'iAnd x ″)iExtracting feature vector d (e (x'i) And d (e (x ″)i) From d (e (x'i) And d (e (x ″)i) Obtaining features, p 'corresponding to a plurality of matching local portions'jAnd p ″)jThe feature map corresponding to the matched local is obtained, then the feature map is subjected to global average pooling to obtain a local feature vector, namely:
fL(p′j)=μ(p′j)
wherein, fL(p′j) Is a local feature vector;
step 205, project head g is adoptedL(. processing local feature vectors, the projection head gL(.) is a multi-layer perceptron with one hidden layer:
u′j=gL(fL(p′j))=W(4)r(W(3)fL(p′j))
wherein, W(4)Denotes the fourth fully-connected layer, W(3)Denotes a third fully connected layer, ujRepresenting said projection head gL() processed local match features;
step 206, training the encoder using an overall loss function, the overall loss function consisting of global style contrast loss and local matching contrast loss:
L=(1-λ)lG+λ·lL
wherein λ is an adjustable weight parameter, lGRepresenting global style contrast loss,/LIndicating a locally matched contrast loss.
2. The method for semantic segmentation of remote sensing images based on self-supervised contrast learning according to claim 1, wherein for N samples from the same batch, the global style contrast loss is defined as follows:
Figure FDA0003593058260000031
wherein:
Figure FDA0003593058260000032
where sim () denotes computing the similarity between the feature vectors, Λ-2(N-1) negative samples except for the positive sample pair are represented, and tau represents a temperature parameter; style () represents extracting a global style feature vector from the encoder-extracted features by computing the mean and variance,
stylef(x′i)=stylefi′=cat(μ(e(x′i)),σ(e(x′i))
for N samples from the same batch, the local match contrast loss is defined as follows:
Figure FDA0003593058260000033
wherein:
Figure FDA0003593058260000034
wherein N isLRepresenting the number, Λ, of all local regions selected from the N samples of the same batch-Is the collection of the characteristic diagrams corresponding to all the other parts except the matched part.
3. The remote sensing image semantic segmentation method based on the self-supervision contrast learning according to claim 2, characterized in that in the process of calculating the local matching contrast loss, the local regions are selected and matched first, then the local features of the corresponding local regions are extracted, and finally the local matching contrast loss is calculated;
the selection and matching of the local region comprises: for a given sample xiAfter a random data transformation t' (x)i) And t' (x)i) Then two transformation versions are generated, a plurality of local areas are randomly selected and matched, the pixel position is recorded by introducing an index tag, and the central position of the matched local area is ensured to be corresponding in the original image; from x'iRandomly selecting a local area, obtaining the index value of the local central position, and determining x ″' according to the index valueiThe positions of the matched parts are located, excessive overlapping between the parts is ensured, the local area is excluded after each selection is finished, the centers of the subsequently selected parts are ensured not to fall into the selected parts, and the step is repeated for multiple times to obtain a plurality of matched local areas;
the local feature extraction includes: from transformed sample instance x 'using the encoder and decoder portions of the Deeplab v3+ network model'iAnd x ″)iExtracting feature vector d (e (x)'i) And d (e (x ″)i) Selecting and matching local regions from d (e (x ') according to the idea of selecting and matching local regions'i) And d (e (x ″)i) Obtaining features, let p ', corresponding to multiple matching local portions'jIs derived from d (e (x'i) Selected from among) p ″jIs derived from d (e (x ″)i) Is selected from among p'jAnd p ″)jIs the matching local corresponding characteristic diagram, and then carries out global operation on the characteristic diagramObtaining local feature vectors by average pooling;
the local match contrast loss updates the deepab v3+ network model by the feature characterization of the matching local regions being similar, while the feature expressions of the local regions that are not matching in the same batch are dissimilar.
4. The remote sensing image semantic segmentation method based on the self-supervision contrast learning as claimed in claim 1 is characterized in that Adam is adopted as an optimizer in a pre-training stage, and weight attenuation is set to be 1e-5An initial learning rate of 0.01, and a cosine attenuation strategy is used to select the downstream task with the lowest loss; in the fine tuning phase, Adam is used as the optimizer, the epoch number is 150, the batch _ size is 16, the initial learning rate is 0.001, and each epoch decays to 0.98.
CN202110285256.1A 2021-03-17 2021-03-17 Remote sensing image semantic segmentation method based on self-supervision contrast learning Active CN113011427B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110285256.1A CN113011427B (en) 2021-03-17 2021-03-17 Remote sensing image semantic segmentation method based on self-supervision contrast learning
AU2021103625A AU2021103625A4 (en) 2021-03-17 2021-06-25 Remote sensing image semantic segmentation method based on contrastive self-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110285256.1A CN113011427B (en) 2021-03-17 2021-03-17 Remote sensing image semantic segmentation method based on self-supervision contrast learning

Publications (2)

Publication Number Publication Date
CN113011427A CN113011427A (en) 2021-06-22
CN113011427B true CN113011427B (en) 2022-06-21

Family

ID=76409098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110285256.1A Active CN113011427B (en) 2021-03-17 2021-03-17 Remote sensing image semantic segmentation method based on self-supervision contrast learning

Country Status (2)

Country Link
CN (1) CN113011427B (en)
AU (1) AU2021103625A4 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554656B (en) * 2021-07-13 2022-02-11 中国科学院空间应用工程与技术中心 Optical remote sensing image example segmentation method and device based on graph neural network
CN113989582B (en) * 2021-08-26 2024-08-02 中国科学院信息工程研究所 Self-supervision visual model pre-training method based on dense semantic comparison
CN113947196A (en) * 2021-10-25 2022-01-18 中兴通讯股份有限公司 Network model training method and device and computer readable storage medium
CN114330312B (en) * 2021-11-03 2024-06-14 腾讯科技(深圳)有限公司 Title text processing method, title text processing device, title text processing program, and recording medium
CN114240966B (en) * 2021-12-13 2024-03-15 西北工业大学 Self-supervision learning method for 3D medical image segmentation training feature extractor
CN114240958B (en) * 2021-12-23 2024-04-05 西安交通大学 Contrast learning method applied to pathological tissue segmentation
CN114266952B (en) * 2021-12-24 2024-06-14 福州大学 Real-time semantic segmentation method based on deep supervision
CN114463549A (en) * 2021-12-29 2022-05-10 广州极飞科技股份有限公司 Training method of feature extraction network model, image processing method and device thereof
CN114399731B (en) * 2021-12-31 2022-12-20 中国科学院大学 Target positioning method under supervision of single coarse point
CN114861865B (en) * 2022-03-10 2023-07-21 长江三峡技术经济发展有限公司 Self-supervision learning method, system, medium and electronic equipment of hyperspectral image classification model
CN114881917A (en) * 2022-03-17 2022-08-09 深圳大学 Thrombolytic curative effect prediction method based on self-supervision and semantic segmentation and related device
CN114926835A (en) * 2022-05-20 2022-08-19 京东科技控股股份有限公司 Text generation method and device, and model training method and device
CN115019123B (en) * 2022-05-20 2023-04-18 中南大学 Self-distillation contrast learning method for remote sensing image scene classification
CN114970716A (en) * 2022-05-26 2022-08-30 支付宝(杭州)信息技术有限公司 Method and device for training representation model, readable storage medium and computing equipment
CN114972313B (en) * 2022-06-22 2024-04-19 北京航空航天大学 Image segmentation network pre-training method and device
CN115100390B (en) * 2022-08-24 2022-11-18 华东交通大学 Image emotion prediction method combining contrast learning and self-supervision region positioning
CN115131361A (en) * 2022-09-02 2022-09-30 北方健康医疗大数据科技有限公司 Training of target segmentation model, focus segmentation method and device
CN115909045B (en) * 2022-09-23 2024-04-30 中国自然资源航空物探遥感中心 Two-stage landslide map feature intelligent recognition method based on contrast learning
CN115661460B (en) * 2022-11-03 2023-07-14 广东工业大学 Medical image segmentation method of similarity perception frame with comparison mechanism
CN115797632B (en) * 2022-12-01 2024-02-09 北京科技大学 Image segmentation method based on multi-task learning
CN115690592B (en) * 2023-01-05 2023-04-25 阿里巴巴(中国)有限公司 Image processing method and model training method
CN116109823B (en) * 2023-01-13 2024-07-30 腾讯科技(深圳)有限公司 Data processing method, apparatus, electronic device, storage medium, and program product
CN115861663B (en) * 2023-03-01 2023-05-23 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Document image content comparison method based on self-supervision learning model
CN116431831B (en) * 2023-04-18 2023-09-22 延边大学 Supervised relation extraction method based on label contrast learning
CN116188918B (en) * 2023-04-27 2023-07-25 上海齐感电子信息科技有限公司 Image denoising method, training method of network model, device, medium and equipment
CN116935242B (en) * 2023-07-24 2024-08-06 哈尔滨工业大学 Remote sensing image semantic segmentation method and system based on space and semantic consistency contrast learning
CN117036756B (en) * 2023-08-08 2024-04-05 重庆市地理信息和遥感应用中心(重庆市测绘产品质量检验测试中心) Remote sensing image matching method and system based on variation automatic encoder

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680109A (en) * 2017-09-15 2018-02-09 盐城禅图智能科技有限公司 It is a kind of to quote inverse notice and the image, semantic dividing method of pixel similarity study
CN108549895A (en) * 2018-04-17 2018-09-18 深圳市唯特视科技有限公司 A kind of semi-supervised semantic segmentation method based on confrontation network
CN110148129A (en) * 2018-05-24 2019-08-20 深圳科亚医疗科技有限公司 Training method, dividing method, segmenting device and the medium of the segmentation learning network of 3D rendering
CN110175613A (en) * 2019-06-03 2019-08-27 常熟理工学院 Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models
CN110443805A (en) * 2019-07-09 2019-11-12 浙江大学 A kind of semantic segmentation method spent closely based on pixel
CN110827505A (en) * 2019-10-29 2020-02-21 天津大学 Smoke segmentation method based on deep learning
CN111047565A (en) * 2019-11-29 2020-04-21 南京恩博科技有限公司 Method, storage medium and equipment for forest cloud image segmentation
CN111476781A (en) * 2020-04-08 2020-07-31 浙江大学 Concrete crack identification method and device based on video semantic segmentation technology
CN111860514A (en) * 2020-05-21 2020-10-30 江苏大学 Orchard scene multi-class real-time segmentation method based on improved deep Lab
CN112329680A (en) * 2020-11-13 2021-02-05 重庆邮电大学 Semi-supervised remote sensing image target detection and segmentation method based on class activation graph
CN112419333A (en) * 2020-11-17 2021-02-26 武汉大学 Remote sensing image self-adaptive feature selection segmentation method and system
CN112115951B (en) * 2020-11-19 2021-03-09 之江实验室 RGB-D image semantic segmentation method based on spatial relationship
CN112508977A (en) * 2020-12-29 2021-03-16 天津科技大学 Deep learning-based semantic segmentation method for automatic driving scene

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222690B (en) * 2019-04-29 2021-08-10 浙江大学 Unsupervised domain adaptive semantic segmentation method based on maximum quadratic loss
CN111080645B (en) * 2019-11-12 2023-08-15 中国矿业大学 Remote sensing image semi-supervised semantic segmentation method based on generation type countermeasure network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680109A (en) * 2017-09-15 2018-02-09 盐城禅图智能科技有限公司 It is a kind of to quote inverse notice and the image, semantic dividing method of pixel similarity study
CN108549895A (en) * 2018-04-17 2018-09-18 深圳市唯特视科技有限公司 A kind of semi-supervised semantic segmentation method based on confrontation network
CN110148129A (en) * 2018-05-24 2019-08-20 深圳科亚医疗科技有限公司 Training method, dividing method, segmenting device and the medium of the segmentation learning network of 3D rendering
CN110175613A (en) * 2019-06-03 2019-08-27 常熟理工学院 Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models
CN110443805A (en) * 2019-07-09 2019-11-12 浙江大学 A kind of semantic segmentation method spent closely based on pixel
CN110827505A (en) * 2019-10-29 2020-02-21 天津大学 Smoke segmentation method based on deep learning
CN111047565A (en) * 2019-11-29 2020-04-21 南京恩博科技有限公司 Method, storage medium and equipment for forest cloud image segmentation
CN111476781A (en) * 2020-04-08 2020-07-31 浙江大学 Concrete crack identification method and device based on video semantic segmentation technology
CN111860514A (en) * 2020-05-21 2020-10-30 江苏大学 Orchard scene multi-class real-time segmentation method based on improved deep Lab
CN112329680A (en) * 2020-11-13 2021-02-05 重庆邮电大学 Semi-supervised remote sensing image target detection and segmentation method based on class activation graph
CN112419333A (en) * 2020-11-17 2021-02-26 武汉大学 Remote sensing image self-adaptive feature selection segmentation method and system
CN112115951B (en) * 2020-11-19 2021-03-09 之江实验室 RGB-D image semantic segmentation method based on spatial relationship
CN112508977A (en) * 2020-12-29 2021-03-16 天津科技大学 Deep learning-based semantic segmentation method for automatic driving scene

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Momentum Contrast for Unsupervised Visual Representation Learning》;Kaiming He等;《2020 IEEE/CVF Confefence on Computer Vision and Pattern Recognition》;20201231;第9726-9735页 *
《Semantic Segmentation of Litchi Branches Using DeepLabV3+ Model》;HONGXING PENG等;《IEEE Access》;20200904;第8卷;第164546-164555页 *

Also Published As

Publication number Publication date
CN113011427A (en) 2021-06-22
AU2021103625A4 (en) 2021-08-19

Similar Documents

Publication Publication Date Title
CN113011427B (en) Remote sensing image semantic segmentation method based on self-supervision contrast learning
CN111986099B (en) Tillage monitoring method and system based on convolutional neural network with residual error correction fused
CN111047551B (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN111311563A (en) Image tampering detection method based on multi-domain feature fusion
CN109934154B (en) Remote sensing image change detection method and detection device
CN110599502B (en) Skin lesion segmentation method based on deep learning
CN113223042B (en) Intelligent acquisition method and equipment for remote sensing image deep learning sample
CN110675421B (en) Depth image collaborative segmentation method based on few labeling frames
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN112561876A (en) Image-based pond and reservoir water quality detection method and system
CN112508973A (en) MRI image segmentation method based on deep learning
CN111860233A (en) SAR image complex building extraction method and system based on attention network selection
CN113269224A (en) Scene image classification method, system and storage medium
CN110853053A (en) Salient object detection method taking multiple candidate objects as semantic knowledge
CN112329559A (en) Method for detecting homestead target based on deep convolutional neural network
CN114283285A (en) Cross consistency self-training remote sensing image semantic segmentation network training method and device
CN116206112A (en) Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM
Wang et al. PACCDU: Pyramid attention cross-convolutional dual UNet for infrared and visible image fusion
CN113591614B (en) Remote sensing image road extraction method based on close-proximity spatial feature learning
Thati et al. A systematic extraction of glacial lakes for satellite imagery using deep learning based technique
Li et al. AAFormer: Attention-Attended Transformer for Semantic Segmentation of Remote Sensing Images
CN117315473A (en) Strawberry maturity detection method and system based on improved YOLOv8
CN117197462A (en) Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment
CN115830322A (en) Building semantic segmentation label expansion method based on weak supervision network
CN116310628A (en) Token mask mechanism-based large-scale village-in-city extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant