CN117253044A - Farmland remote sensing image segmentation method based on semi-supervised interactive learning - Google Patents
Farmland remote sensing image segmentation method based on semi-supervised interactive learning Download PDFInfo
- Publication number
- CN117253044A CN117253044A CN202311334268.4A CN202311334268A CN117253044A CN 117253044 A CN117253044 A CN 117253044A CN 202311334268 A CN202311334268 A CN 202311334268A CN 117253044 A CN117253044 A CN 117253044A
- Authority
- CN
- China
- Prior art keywords
- image
- loss
- cnn
- supervised
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000003709 image segmentation Methods 0.000 title claims abstract description 41
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 32
- 230000006870 function Effects 0.000 claims abstract description 49
- 238000012549 training Methods 0.000 claims abstract description 40
- 238000012545 processing Methods 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 21
- 230000011218 segmentation Effects 0.000 claims description 18
- 238000001914 filtration Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 14
- 230000008447 perception Effects 0.000 claims description 9
- 238000005520 cutting process Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000014509 gene expression Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 230000001902 propagating effect Effects 0.000 claims 1
- 238000010191 image analysis Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 56
- 230000007246 mechanism Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/188—Vegetation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention is suitable for the technical field of agricultural image analysis, and particularly provides a farmland remote sensing image segmentation method based on semi-supervised interactive learning; and secondly, introducing a directivity contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure the consistency of the same identity features in the pictures under different scenes, thereby improving the generalization capability and robustness of the model.
Description
Technical Field
The invention belongs to the technical field of agricultural image analysis, and particularly relates to a farmland remote sensing image segmentation method based on semi-supervised interactive learning.
Background
The farmland remote sensing image segmentation is an important task, and the aim is to classify the farmland remote sensing image at the pixel level so as to improve the efficiency of agricultural land production and management.
The conventional farmland remote sensing image segmentation method based on deep learning generally needs a large amount of labeling data for training, but the labeling data is high in acquisition cost, and the requirements are often difficult to meet in practical application. Therefore, semi-supervised learning is one of the effective methods to solve this problem.
Currently, semi-supervised learning methods use a small amount of labeled data and a large amount of unlabeled data for training to improve the performance of the model. In addition, due to the large number of parameters, the situation of over fitting is easy to occur, namely, the model performs well on the training set, but performs poorly on the test set. Therefore, the generalization capability of the farmland remote sensing image segmentation model is an important problem to be considered when being applied to actual scenes.
The existing framework for improving generalization capability and robustness of semi-supervised agricultural image segmentation algorithm can be divided into two main types: an agricultural image segmentation method based on a Convolutional Neural Network (CNN) and an agricultural image segmentation method based on a Transformer; the former, CNN, extracts features in image space by convolution operation, has the disadvantage that: CNNs use local receptive fields in processing images and gradually reduce image resolution from lower to higher layers through convolution and pooling operations, such local receptive field limitations may result in loss of detail information and global context information in the image, particularly for fine-grained segmentation tasks for large-scale farmland areas; the latter convertors model global relationships in sequence space by self-attention mechanisms, which have the disadvantage that: the goal of the transducer is to model the dependency relationship between each pixel and other pixels through global context information, and there is a limitation in processing local features. In a farmland remote sensing image, different crops or land types may have different scales, some fine feature details need finer perceptibility, and a Transformer may not accurately capture the details when processing different scale features, which leads to reduced accuracy and robustness of a segmentation result.
Disclosure of Invention
The embodiment of the invention aims to provide a farmland remote sensing image segmentation method based on semi-supervised interactive learning, which comprises the following steps of firstly, mutually cooperating CNN and a transducer through interactive learning, and mutually transmitting local characteristics and global characteristics of pixels through self-supervised training on unlabeled data, so that the requirement of labeling data is reduced, and meanwhile, the possible defects of the two existing methods are effectively avoided; secondly, introducing a directional contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure consistency of the same identity features in the pictures under different scenes, so as to improve generalization capability and robustness of the model.
In view of the above, the invention provides a farmland remote sensing image segmentation method based on semi-supervised interactive learning, which comprises the following steps:
step S10: m input images divided with labelsAnd N images without labels;
Step S20: training CNN and transducer using the tagged image data, respectively;
step S30: weak enhancement processing for Gaussian filtering and brightness adjustment of unlabeled image, and randomly cutting out two new images with overlapping area in the same image,Meanwhile, the pixels of the unlabeled image are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity characteristic in the image under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; calculating consistency regularization loss by using the CNN prediction result as a pseudo tag of the transform prediction result;
step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
As a further limitation of the technical solution of the present invention, the step of performing the weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly cropping two new images with overlapping areas in the same image includes:
step S31: applying gaussian filtering to reduce noise and detail in unlabeled images, for eachPerforming weighted average on the neighborhood around the pixel; for each pixelFiltering is carried out by using a Gaussian kernel with the size of k, and the pixel value after filtering is as follows:
(1)
wherein,is the pixel value in the neighborhood, +.>Is the weight value of the gaussian kernel;
finally, adjusting the brightness of the image;
step S32: for a given weakly enhanced unlabeled image, randomly selecting the size and position of a cropping window;
moving the cutting window upwards and leftwards for a certain distance to obtain two new images with overlapping areas、Training a model;
step S33: using bicubic interpolation algorithm to interpolate all imagesAre scaled to size +.>And uses a two-line interpolation algorithm to add the corresponding tag +.>Scaled to the same size so that the input image meets the input specifications of the DeepLab v3+ network.
As a further limitation of the technical solution of the present invention, the training process of the tagged image data includes:
encoding matrices using tagged imagesLabel->Training CNN and transducer two backbone network models respectively, and calculating the loss function of the real label>。
As a further limitation of the technical scheme of the invention, the loss function of the calculation and the real labelThe method comprises the following steps:
data to be taggedInputting CNN to obtain predictive probability corresponding to each pixel point>Inputting a transducer to obtain a predictive probability +.>Calculating a loss function between the corresponding real value>;
Loss function between the computation and corresponding real valueThe process of (2) is as follows:
with real labelsIn contrast, the loss function of the CNN fraction +.>As shown in formula (2):
(2)
loss function of a transducer sectionAs shown in formula (3):
(3)
wherein the method comprises the steps ofRepresenting the ReLU activation function, ">Representing Focal Loss, the expression is shown in formula (4):
(4)
wherein the method comprises the steps ofAnd->Is a super parameter, here set as +.>=0.25,=2;
The total loss calculation of the supervised learning model is shown in the formula (5):
(5)
wherein whenFor a real label->=1, vice versa->=0,Is a real number ranging from 0 to 1, indicating the probability that the image belongs to the category noted in the label.
As a further limitation of the technical solution of the present invention, the training process of the label-free image data includes:
through inputting two groups of weak enhancement unlabeled images which are cut randomly, obtaining a prediction result through a CNN network framework and taking the prediction result as a pseudo label predicted by a transducer model, and calculating consistency regularization loss;
Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency lossAnd the interactive transmission of the image local information and the context global information in the training process is ensured, so that the model fully learns the consistency regularization capability.
As a further limitation of the technical scheme of the invention, in the training process of the unlabeled images, the two groups of input unlabeled images subjected to weak enhancement are subjected to,Generating two groups of predicted values +.>、The method comprises the steps of carrying out a first treatment on the surface of the Similarly, two sets of predictors are generated via a transducer model framework>、:
(6)
Wherein,representing CNN network model,/->Representing a transducer network model;
pseudo tag、、、The calculation method of (2) is shown in the formula (7):
(7)
wherein,representation such that the prediction probability value +.>The label corresponding to the maximum;
CNN prediction result is used as pseudo tag of transducer, and consistency regularization loss is carried outThe calculation method of (2) is shown in the formula (8):
(8)
wherein,representing the ReLU activation function, ">Representing a Dice loss function;
,the feature map is obtained by the Encoder of deep Labv3+>And->Then by a non-linear projector +>Projection is +.>And->Using the directional contrast loss function, the overlapping area is encouraged +.>Aligning the contrast features with high confidence under different backgrounds, and finally keeping consistency;
for the firstNo label image, loss of directivity contrast->The calculation formula of (2) is as follows:
(9)
(10)
(11)
wherein N represents the number of spatial positions of the overlapping feature regions;calculating the feature similarity;Representing a two-dimensional spatial position;Representing a negative set of images, +.>Representation->Negative samples of (2),>representing a classifier;
calculating a consistency loss function after context awareness constraint by using a prediction result of a transducer as a pseudo tagThe formula is as follows:
(12)
wherein,representing a Dice loss function;Representing the classifier.
As a further limitation of the technical scheme of the invention, the overall loss is minimized, in the training process, firstly, parameters of a classification network are initialized, then, training data are used for forward propagation and backward propagation, gradients of a loss function are calculated, and network parameters are updated by utilizing optimization algorithms such as gradient descent and the like until the overall loss function is reachedReaching a preset convergence condition;
the total lossTotal loss for supervised learning model>Uniformity regularization loss->Loss of directivity contrast->Context aware consistency loss->Wherein the total loss +.>The calculation formula of (2) is as follows:
(13)
wherein,and->Is a weight factor, with the aim of controlling the directional contrast loss +.>And consistency regularization lossIn the total loss function->Is a ratio of the number of the first and second groups.
As a further limitation of the technical scheme of the present invention, the step S40 specifically includes a model test, for a given test set farmland remote sensing image, CNN is used as a backbone network model to extract features, the probability of the category to which each pixel point of the target image belongs is segmented and output, and a threshold is set to mark the pixel point as a segmentation target or background.
As a further limitation of the present invention, the step of marking the pixel point as a segmentation target or background by the set threshold includes: obtaining maximum gray value of image by dividing modelAnd minimum gray value->Let the initial threshold be +.>According to->Dividing an image into a foreground and a background, and respectively obtaining average gray values of the foreground and the background>And->Find a new threshold +.>Iterating until if->The obtained value is the threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so that the final segmentation mask is obtained.
Compared with the prior art, the farmland remote sensing image segmentation method based on semi-supervised interactive learning has the beneficial effects that:
firstly, a consistency constraint module for directional perception is inserted into a CNN and a Transformer interactive learning network, the CNN is excellent in image processing, spatial features in farmland remote sensing images can be effectively extracted, local and global features are captured through convolution and pooling operations, the Transformer is excellent in the field of natural language processing, sequence data and long-distance dependency relations are good in processing, and the advantages of the CNN in spatial feature extraction and the advantages of the Transformer in long-distance dependent modeling can be combined, so that the features in farmland remote sensing images can be better extracted and modeled, and the segmentation accuracy is improved;
secondly, the transducer uses a self-attention mechanism in the encoder-decoder framework, can effectively capture the context information in the image, is very important for accurate segmentation results for farmland remote sensing image segmentation, and can better model the relationship between pixels in the image due to the fact that crops and backgrounds in the image often have wide spatial correlation, so that the segmentation accuracy is improved; on the other hand, farmland remote sensing images usually have higher resolution, detail information needs to be accurately recovered in image segmentation, and the calculation and memory requirements of the traditional CNN decoder on the high-resolution images are higher; the invention can gradually restore the image resolution between the encoder and the decoder by introducing the Transformer layer, and reduce the consumption of calculation and memory resources, thereby more effectively processing high-resolution images.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a system architecture diagram of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;
FIG. 2 is a flow chart of an implementation of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;
FIG. 3 is a sub-flow of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;
FIG. 4 is a block diagram of a farmland remote sensing image segmentation system provided by the invention;
fig. 5 is a block diagram of a computer device according to the present invention.
Detailed Description
The present application will be further described with reference to the drawings and detailed description, which should be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two non-identical entities with the same name or non-identical parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such as a process, method, system, article, or other step or unit that comprises a list of steps or units.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
At present, the existing framework for improving generalization capability and robustness of a semi-supervised agricultural image segmentation algorithm can be divided into two main categories: an agricultural image segmentation method based on a Convolutional Neural Network (CNN) and an agricultural image segmentation method based on a Transformer; the former, CNN, extracts features in image space by convolution operation, has the disadvantage that: CNNs use local receptive fields in processing images and gradually reduce image resolution from lower to higher layers through convolution and pooling operations, such local receptive field limitations may result in loss of detail information and global context information in the image, particularly for fine-grained segmentation tasks for large-scale farmland areas; the latter convertors model global relationships in sequence space by self-attention mechanisms, which have the disadvantage that: the goal of the transducer is to model the dependency relationship between each pixel and other pixels through global context information, and there is a limitation in processing local features. In a farmland remote sensing image, different crops or land types may have different scales, some fine feature details need finer perceptibility, and a Transformer may not accurately capture the details when processing different scale features, which leads to reduced accuracy and robustness of a segmentation result.
In order to solve the problems, the invention designs a farmland remote sensing image segmentation method of semi-supervised interactive learning, which comprises the steps of firstly, mutually cooperating CNN and a transducer through interactive learning, and mutually transmitting local characteristics and global characteristics of pixels through self-supervised training on unlabeled data, so that the requirement of labeling data is reduced, and meanwhile, the possible defects of the two existing methods are effectively avoided; and secondly, introducing a directivity contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure the consistency of the same identity features in the pictures under different scenes, thereby improving the generalization capability and robustness of the model.
Specific implementations of the invention are described in detail below in connection with specific embodiments.
Example 1
FIG. 1 illustrates an exemplary system architecture for implementing a semi-supervised interactive learning based farmland remote sensing image segmentation method.
FIG. 2 shows the implementation flow of the farmland remote sensing image segmentation method based on semi-supervised interactive learning;
as shown in fig. 1 and fig. 2, in an embodiment of the present invention, a farmland remote sensing image segmentation method based on semi-supervised interactive learning includes the following steps:
step S10: m input images divided with labelsAnd N images without labels;
Step S20: training CNN and transducer using the tagged image data, respectively;
step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled image, namely, randomly cutting out two new images with overlapped areas in the same imageEach of the unlabeled N imagesRandom cropping, two sets of new images with overlapping areas are obtained +.>,The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.
Step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
Further, as shown in fig. 3, in the step S30, the step of performing weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly cropping two new images with overlapping areas in the same image includes:
step S31: applying gaussian filtering to reduce noise and detail in the unlabeled image, weighted averaging the neighborhood around each pixel; for each pixel (x, y), filtering is performed using a gaussian kernel of size k, the filtered pixel values being:
(1)
wherein,is the pixel value in the neighborhood, +.>Is the weight value of the gaussian kernel;
finally, adjusting the brightness of the image;
step S32: for a given weakly enhanced unlabeled image, randomly selecting the size and position of a cropping window;
moving the cutting window upwards and leftwards for a certain distance to obtain two new images with overlapping areas、Training a model;
step S33: using bicubic interpolation algorithm to interpolate all imagesAre scaled to an image of size 513px ∗ px and the corresponding tag +.>Scaled to the same size so that the input image meets the input specifications of the DeepLab v3+ network.
Further, in an embodiment of the present invention, the training process of the tagged image data includes:
encoding matrices using tagged imagesLabel->Training CNN and transducer two backbone network models respectively, and calculating the loss function of the real label>。
As a further limitation of the technical scheme of the invention, the loss function of the calculation and the real labelThe method comprises the following steps:
number of labels to be providedAccording toInputting CNN to obtain predictive probability corresponding to each pixel point>Inputting a transducer to obtain a predictive probability +.>Calculating a loss function between the corresponding real value>;
Loss function between the computation and corresponding real valueThe process of (2) is as follows:
with real labelsIn contrast, the loss function of the CNN fraction +.>As shown in formula (2):
(2)
loss function of a transducer sectionAs shown in formula (3):
(3)
wherein the method comprises the steps ofRepresenting the ReLU activation function, ">Representing Focal Loss, the expression is shown in formula (4):
(4)
wherein the method comprises the steps ofAnd->Is a super parameter, here set as +.>=0.25,=2;
The total loss calculation of the supervised learning model is shown in the formula (5):
(5)
wherein whenFor a real label->=1, vice versa->=0,Is a real number ranging from 0 to 1, indicating the probability that the image belongs to the category noted in the label.
Further, in an embodiment of the present invention, the training process of the label-free image data includes:
through inputting two groups of weak enhancement unlabeled images which are cut randomly, obtaining a prediction result through a CNN network framework and taking the prediction result as a pseudo label predicted by a transducer model, and calculating consistency regularization loss;
Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency lossAnd the interactive transmission of the image local information and the context global information in the training process is ensured, so that the model fully learns the consistency regularization capability.
Further, in the embodiment of the invention, in the label-free image training process, two main network models are respectively focused on learning local features and global features, information interaction is used for feature knowledge transfer, short plates are complementary, and meanwhile, in order to ensure that a CNN module has better robustness and generalization capability on the premise of only a small amount of data, a direction perception consistency constraint is introduced; specifically:
for two groups of input non-label images after weak enhancement,Generating two groups of predicted values +.>、The method comprises the steps of carrying out a first treatment on the surface of the Similarly, two sets of predictors are generated via a transducer model framework>、:
(6)
Wherein,representing CNN network model,/->Representing a transducer network model;
CNN is good at capturing local characteristics and spatial correlation in image processing, and the local structure of the image is extracted through convolution operation of local receptive fields;
the transducer is more suitable for modeling global dependence and long-range relation, and global information interaction can be established in the whole input sequence through a self-attention mechanism;
thus, these predictions have essentially different properties at the output level, pseudo tags、、、The calculation method of (2) is shown in the formula (7):
(7)
wherein,representation such that the prediction probability value +.>The label corresponding to the maximum;
CNN prediction result is used as pseudo tag of transducer, and consistency regularization loss is carried outThe calculation method of (2) is shown in the formula (8):
(8)
wherein,representing the ReLU activation function, ">Representing a Dice loss function;
,the feature map is obtained by the Encoder of deep Labv3+>And->Then by a non-linear projector +>Projection is +.>And->Using the directional contrast loss function, the overlapping area is encouraged +.>Aligning the contrast features with high confidence under different backgrounds, and finally keeping consistency;
for the firstNo label image, loss of directivity contrast->Is of the meter(s)The calculation formula is as follows:
(9)
(10)
(11)
wherein N represents the number of spatial positions of the overlapping feature regions;calculating the feature similarity; h, w represents a two-dimensional spatial position;Representing a negative set of images, +.>Representation->Negative samples of (2),>representing a classifier;
calculating a consistency loss function after context awareness constraint by using a prediction result of a transducer as a pseudo tagThe formula is as follows:
(12)
wherein,representing a Dice loss function;Representing the classifier.
Further, in the embodiment of the present invention, the overall loss is minimized, in the training process, parameters of the classification network are initialized first, then forward propagation and backward propagation are performed using training data, gradients of the loss function are calculated, and the network parameters are updated by using optimization algorithms such as gradient descent until the overall loss functionReaching a preset convergence condition;
by iteratively updating network parameters, we want the classification network to learn the proper feature representation so that the difference between the predicted result and the real label is minimized;
the total lossTotal loss for supervised learning model>Uniformity regularization loss->Loss of directivity contrast->Context aware consistency loss->Wherein the total loss +.>The calculation formula of (2) is as follows:
(13)
wherein,and->Is a weight factor, with the aim of controlling the directional contrast loss +.>And consistency regularization lossIn the total loss function->Is a ratio of the number of the first and second groups.
As a further limitation of the technical scheme of the present invention, the step S40 specifically includes a model test, for a given test set farmland remote sensing image, CNN is used as a backbone network model to extract features, the probability of the category to which each pixel point of the target image belongs is segmented and output, and a threshold is set to mark the pixel point as a segmentation target or background.
As a further limitation of the present invention, the step of marking the pixel point as a segmentation target or background by the set threshold includes: obtaining maximum gray value of image by dividing modelAnd minimum gray value->Let the initial threshold value beAccording to->Dividing an image into a foreground and a background, and respectively obtaining average gray values of the foreground and the background>And->Find a new threshold +.>Iterating until if->The obtained value is the threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so that the final segmentation mask is obtained.
In summary, the invention inserts the consistency constraint module of the directional perception in the interactive learning network of the CNN and the Transformer, the CNN is excellent in image processing, the spatial characteristics in the farmland remote sensing image can be effectively extracted, the local and global characteristics are captured through convolution and pooling operation, the Transformer is excellent in the natural language processing field, the sequence data and the long-distance dependency relationship are good in processing, and the advantages of the CNN in the aspect of spatial characteristic extraction and the advantages of the Transformer in the aspect of long-distance dependency modeling can be combined, the characteristics in the farmland remote sensing image can be better extracted and modeled, and the segmentation accuracy is improved.
In addition, the transducer uses a self-attention mechanism in the encoder-decoder framework, can effectively capture the context information in the image, is very important for accurate segmentation results for farmland remote sensing image segmentation, and can better model the relation between pixels in the image because crops and backgrounds in the image often have wide spatial relevance, thereby improving the segmentation accuracy;
on the other hand, farmland remote sensing images usually have higher resolution, detail information needs to be accurately recovered in image segmentation, and the calculation and memory requirements of the traditional CNN decoder on the high-resolution images are higher; the invention can gradually restore the image resolution between the encoder and the decoder by introducing the Transformer layer, and reduce the consumption of calculation and memory resources, thereby more effectively processing high-resolution images.
Example 2
As shown in fig. 4, in an exemplary embodiment provided by the present disclosure, the present invention further provides a farmland remote sensing image segmentation system, the farmland remote sensing image segmentation system 50 includes:
a preprocessing module 51, the preprocessing module 51 being used for dividing M input images with labelsAnd N images without tag +.>;
A first training module 52, wherein the first training module 52 is configured to train CNN and a transducer using the tagged image data, respectively;
a second training module 53, where the second training module 53 is configured to perform weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly clip two new images with overlapping areas in the same image, i.e. for each of the N unlabeled imagesRandom cropping to obtain two groups of new images with overlapping areas,The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.
The model test module 54 is used for dividing the test set image by taking the trained CNN model as a backbone network and evaluating the accuracy of the result.
Example 3
As shown in fig. 5, in an embodiment of the present invention, the present invention further provides a computer device.
The computer device 60 comprises a memory 61, a processor 62 and computer readable instructions stored in the memory 61 and executable on the processor 62, which processor 62 when executing the computer readable instructions implements the farmland telemetry image segmentation method based on semi-supervised interactive learning as provided by embodiment 1.
The farmland remote sensing image segmentation method based on semi-supervised interactive learning comprises the following steps:
step S10: m input images divided with labelsAnd N images without labels;
Step S20: training CNN and transducer using the tagged image data, respectively;
step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled images, randomly cropping two new images with overlapping regions in the same image, i.e. for each of the N unlabeled imagesRandom cropping, two sets of new images with overlapping areas are obtained +.>,The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; pseudo-label using CNN predictors as Transformer predictorsThe signature calculates a consistency regularization loss.
Step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
In addition, the device 60 according to the embodiment of the present invention may further have a communication interface 63 for receiving a control command.
Example 4
In an exemplary embodiment provided by the present disclosure, a computer-readable storage medium is also provided.
Specifically, in an exemplary embodiment of the present disclosure, the storage medium stores computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform a farmland remote sensing image segmentation method based on semi-supervised interaction learning as provided by embodiment 1.
The farmland remote sensing image segmentation method based on semi-supervised interactive learning comprises the following steps:
step S10: m input images divided with labelsAnd N images without labels;
Step S20: training CNN and transducer using the tagged image data, respectively;
step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled images, randomly cropping two new images with overlapping regions in the same image, i.e. for each of the N unlabeled imagesRandom cropping, two sets of new images with overlapping areas are obtained +.>,The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.
Step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
In various embodiments of the present invention, it should be understood that the size of the sequence numbers of the processes does not mean that the execution sequence of the processes is necessarily sequential, and the execution sequence of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present invention, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, comprising several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in a computer device) to execute some or all of the steps of the method according to the embodiments of the present invention.
Those of ordinary skill in the art will appreciate that some or all of the steps of the various methods of the described embodiments may be implemented by hardware associated with a program that may be stored in a computer-readable storage medium, including Read-Only Memory (ROM), random access Memory (RandomAccess Memory,11 RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (CD-ROM) or other optical disc Memory, magnetic disk Memory, tape Memory, or any other medium capable of being used to carry or store data.
The farmland remote sensing image segmentation method based on semi-supervised interactive learning disclosed by the embodiment of the invention is described in detail, and specific examples are applied to explain the principle and the implementation mode of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (9)
1. A farmland remote sensing image segmentation method based on semi-supervised interactive learning is characterized by comprising the following steps:
step S10: m input images divided with labelsAnd N images without labels;
Step S20: training CNN and transducer using the tagged image data, respectively;
step S30: weak enhancement processing for Gaussian filtering and brightness adjustment of unlabeled image, and randomly cutting out two new images with overlapping area in the same image,Meanwhile, the pixels of the unlabeled image are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity characteristic in the image under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; calculating consistency regularization loss by using the CNN prediction result as a pseudo tag of the transform prediction result;
step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
2. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 1, wherein the step of performing weak enhancement processing of gaussian filtering and brightness adjustment on unlabeled images, and randomly cropping two new images with overlapping areas in the same image comprises:
step S31: applying gaussian filtering to reduce noise and detail in the unlabeled image, weighted averaging the neighborhood around each pixel; for each pixel (x, y), filtering is performed using a gaussian kernel of size k, the filtered pixel values being:
(1)
wherein,is the pixel value in the neighborhood, +.>Is the weight value of the gaussian kernel;
finally, adjusting the brightness of the image;
step S32: for a given weakly enhanced unlabeled image, randomly selecting the size and position of a cropping window;
moving the cutting window upwards and leftwards for a certain distance to obtain two new images with overlapping areas、Training a model;
step S33: using bicubic interpolation algorithm to interpolate all imagesAre scaled to size +.>And uses a two-line interpolation algorithm to add the corresponding tag +.>Scaled to the same size so that the input image meets the input specifications of the DeepLab v3+ network.
3. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 2, wherein the training process of the tagged image data comprises:
encoding matrices using tagged imagesLabel->Training CNN and transducer two backbone network models respectively, and calculating the loss function of the real label>。
4. The farmland remote sensing image segmentation method based on semi-supervised interactive learning as set forth in claim 3, wherein the loss function with the real label is calculatedThe method comprises the following steps:
data to be taggedInputting CNN to obtain predictive probability corresponding to each pixel point>Inputting a transducer to obtain a predictive probability +.>Calculating a loss function between the corresponding real value>;
Loss function between the computation and corresponding real valueThe process of (2) is as follows:
with real labelsIn contrast, loss of CNN fractionFunction->As shown in formula (2):
(2)
loss function of a transducer sectionAs shown in formula (3):
(3)
wherein the method comprises the steps ofRepresenting the ReLU activation function, ">Representing Focal Loss, the expression is shown in formula (4):
(4)
wherein the method comprises the steps ofAnd->Is a super parameter, here set as +.>=0.25,=2;
The total loss calculation of the supervised learning model is shown in the formula (5):
(5)
wherein whenFor a real label->=1, vice versa->=0;Is a real number ranging from 0 to 1, indicating the probability that the image belongs to the category noted in the label.
5. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 4, wherein the training process of the unlabeled image data comprises:
through inputting two groups of weak enhancement unlabeled images which are cut randomly, obtaining a prediction result through a CNN network framework and taking the prediction result as a pseudo label predicted by a transducer model, and calculating consistency regularization loss;
Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency loss。
6. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 5, characterized in that in the non-labeled image training process, the two groups of input are subjected to weak enhancementIs not a label image of (a),Generating two groups of predicted values +.>、The method comprises the steps of carrying out a first treatment on the surface of the Similarly, two sets of predictors are generated via a transducer model framework>、:
(6)
Wherein,representing CNN network model,/->Representing a transducer network model;
pseudo tag、、、The calculation method of (2) is shown in the formula (7):
(7)
wherein,representation such that the prediction probability value +.>The label corresponding to the maximum;
CNN prediction result is used as pseudo tag of transducer, and consistency regularization loss is carried outThe calculation method of (2) is shown in the formula (8):
(8)
wherein,representing the ReLU activation function, ">Representing a Dice loss function;
,the feature map is obtained by the Encoder of deep Labv3+>And->Then by a non-linear projector +>Projection is +.>And->Using the directional contrast loss function, the overlapping area is encouraged +.>Aligning the contrast features with high confidence under different backgrounds, and finally keeping consistency;
for the firstNo label image, loss of directivity contrast->The calculation formula of (2) is as follows:
(9)
(10)
(11)
wherein N represents the number of spatial positions of the overlapping feature regions;calculating the feature similarity; h, w represents a two-dimensional spatial position;Representing a negative set of images, +.>Representation->Negative samples of (2),>representing a classifier;
calculating a consistency loss function after context awareness constraint by using a prediction result of a transducer as a pseudo tagThe formula is as follows:
(12)
wherein,representing a Dice loss function;Representing the classifier.
7. The method of claim 6, further comprising minimizing overall loss, initializing parameters of the classification network during training, then forward and backward propagating using training data, calculating gradients of the loss function, and updating network parameters using optimization algorithms such as gradient descent until the overall loss functionReaching a preset convergence condition;
the total lossTotal loss for supervised learning model>Uniformity regularization loss->Loss of directivity contrast->Context aware consistency loss->Wherein the total loss +.>The calculation formula of (2) is as follows:
(13)
wherein,and->Is a weight factor, with the aim of controlling the directional contrast loss +.>And consistency regularization loss->In the total loss function->Is a ratio of the number of the first and second groups.
8. The method for segmenting the farmland remote sensing image based on semi-supervised interactive learning according to claim 7, wherein the step S40 specifically comprises model testing, extracting features of the farmland remote sensing image of a given test set by using CNN as a backbone network model, segmenting and outputting class probability of each pixel point of the target image, and setting a threshold value to mark the pixel point as a segmentation target or background.
9. The method for segmenting the farmland remote sensing image based on semi-supervised interactive learning according to claim 8, wherein the step of marking the pixel point as a segmentation target or background by the set threshold value comprises the following steps: obtaining maximum gray value of image by dividing modelAnd minimum gray value->Let the initial threshold be +.>According to->Dividing an image into a foreground and a background, and respectively obtaining average gray values of the foreground and the background>And->Find a new thresholdIterating until if->The obtained value is a threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so as to obtain the final scoreAnd (5) cutting the mask. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311334268.4A CN117253044B (en) | 2023-10-16 | 2023-10-16 | Farmland remote sensing image segmentation method based on semi-supervised interactive learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311334268.4A CN117253044B (en) | 2023-10-16 | 2023-10-16 | Farmland remote sensing image segmentation method based on semi-supervised interactive learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117253044A true CN117253044A (en) | 2023-12-19 |
CN117253044B CN117253044B (en) | 2024-05-24 |
Family
ID=89134963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311334268.4A Active CN117253044B (en) | 2023-10-16 | 2023-10-16 | Farmland remote sensing image segmentation method based on semi-supervised interactive learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117253044B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117437426A (en) * | 2023-12-21 | 2024-01-23 | 苏州元瞰科技有限公司 | Semi-supervised semantic segmentation method for high-density representative prototype guidance |
CN118155284A (en) * | 2024-03-20 | 2024-06-07 | 飞虎互动科技(北京)有限公司 | Signature action detection method, signature action detection device, electronic equipment and readable storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507343A (en) * | 2019-01-30 | 2020-08-07 | 广州市百果园信息技术有限公司 | Training of semantic segmentation network and image processing method and device thereof |
CN113469283A (en) * | 2021-07-23 | 2021-10-01 | 山东力聚机器人科技股份有限公司 | Image classification method, and training method and device of image classification model |
CN114943831A (en) * | 2022-07-25 | 2022-08-26 | 安徽农业大学 | Knowledge distillation-based mobile terminal pest target detection method and mobile terminal equipment |
WO2023024920A1 (en) * | 2021-08-24 | 2023-03-02 | 华为云计算技术有限公司 | Model training method and system, cluster, and medium |
CN116051574A (en) * | 2022-12-28 | 2023-05-02 | 河南大学 | Semi-supervised segmentation model construction and image analysis method, device and system |
CN116258730A (en) * | 2023-05-16 | 2023-06-13 | 先进计算与关键软件(信创)海河实验室 | Semi-supervised medical image segmentation method based on consistency loss function |
CN116258695A (en) * | 2023-02-03 | 2023-06-13 | 浙江大学 | Semi-supervised medical image segmentation method based on interaction of Transformer and CNN |
CN116402838A (en) * | 2023-06-08 | 2023-07-07 | 吉林大学 | Semi-supervised image segmentation method and system for intracranial hemorrhage |
-
2023
- 2023-10-16 CN CN202311334268.4A patent/CN117253044B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507343A (en) * | 2019-01-30 | 2020-08-07 | 广州市百果园信息技术有限公司 | Training of semantic segmentation network and image processing method and device thereof |
CN113469283A (en) * | 2021-07-23 | 2021-10-01 | 山东力聚机器人科技股份有限公司 | Image classification method, and training method and device of image classification model |
WO2023024920A1 (en) * | 2021-08-24 | 2023-03-02 | 华为云计算技术有限公司 | Model training method and system, cluster, and medium |
CN114943831A (en) * | 2022-07-25 | 2022-08-26 | 安徽农业大学 | Knowledge distillation-based mobile terminal pest target detection method and mobile terminal equipment |
CN116051574A (en) * | 2022-12-28 | 2023-05-02 | 河南大学 | Semi-supervised segmentation model construction and image analysis method, device and system |
CN116258695A (en) * | 2023-02-03 | 2023-06-13 | 浙江大学 | Semi-supervised medical image segmentation method based on interaction of Transformer and CNN |
CN116258730A (en) * | 2023-05-16 | 2023-06-13 | 先进计算与关键软件(信创)海河实验室 | Semi-supervised medical image segmentation method based on consistency loss function |
CN116402838A (en) * | 2023-06-08 | 2023-07-07 | 吉林大学 | Semi-supervised image segmentation method and system for intracranial hemorrhage |
Non-Patent Citations (2)
Title |
---|
RUJING WANG 等: "S-RPN: Sampling-balanced region proposal network for small crop pest detection", COMPUTERS AND ELECTRONICS IN AGRICULTURE, vol. 187, 31 August 2021 (2021-08-31), pages 1 - 11 * |
杨鑫;于重重;王鑫;陈秀新;: "融合ASPP-Attention和上下文的复杂场景语义分割", 计算机仿真, no. 09, 15 September 2020 (2020-09-15), pages 209 - 213 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117437426A (en) * | 2023-12-21 | 2024-01-23 | 苏州元瞰科技有限公司 | Semi-supervised semantic segmentation method for high-density representative prototype guidance |
CN117437426B (en) * | 2023-12-21 | 2024-09-10 | 苏州元瞰科技有限公司 | Semi-supervised semantic segmentation method for high-density representative prototype guidance |
CN118155284A (en) * | 2024-03-20 | 2024-06-07 | 飞虎互动科技(北京)有限公司 | Signature action detection method, signature action detection device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117253044B (en) | 2024-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111902825B (en) | Polygonal object labeling system and method for training object labeling system | |
Jiang et al. | Edge-enhanced GAN for remote sensing image superresolution | |
JP7236545B2 (en) | Video target tracking method and apparatus, computer apparatus, program | |
US11176381B2 (en) | Video object segmentation by reference-guided mask propagation | |
CN108230339B (en) | Stomach cancer pathological section labeling completion method based on pseudo label iterative labeling | |
WO2023015743A1 (en) | Lesion detection model training method, and method for recognizing lesion in image | |
CN109086811B (en) | Multi-label image classification method and device and electronic equipment | |
CN117253044B (en) | Farmland remote sensing image segmentation method based on semi-supervised interactive learning | |
CN113609896B (en) | Object-level remote sensing change detection method and system based on dual-related attention | |
CN111950453A (en) | Optional-shape text recognition method based on selective attention mechanism | |
WO2022218396A1 (en) | Image processing method and apparatus, and computer readable storage medium | |
CN113076871A (en) | Fish shoal automatic detection method based on target shielding compensation | |
KR102321998B1 (en) | Method and system for estimating position and direction of image | |
CN112634369A (en) | Space and or graph model generation method and device, electronic equipment and storage medium | |
CN116453121B (en) | Training method and device for lane line recognition model | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN116310128A (en) | Dynamic environment monocular multi-object SLAM method based on instance segmentation and three-dimensional reconstruction | |
Zhang et al. | Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention | |
CN117727046A (en) | Novel mountain torrent front-end instrument and meter reading automatic identification method and system | |
Zuo et al. | A remote sensing image semantic segmentation method by combining deformable convolution with conditional random fields | |
CN115577768A (en) | Semi-supervised model training method and device | |
CN115713632A (en) | Feature extraction method and device based on multi-scale attention mechanism | |
CN110472632A (en) | Character segmentation method, device and computer storage medium based on character feature | |
US20240013497A1 (en) | Learning Articulated Shape Reconstruction from Imagery | |
Zhu et al. | Unrestricted region and scale: Deep self-supervised building mapping framework across different cities from five continents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |