CN117253044B - Farmland remote sensing image segmentation method based on semi-supervised interactive learning - Google Patents

Farmland remote sensing image segmentation method based on semi-supervised interactive learning Download PDF

Info

Publication number
CN117253044B
CN117253044B CN202311334268.4A CN202311334268A CN117253044B CN 117253044 B CN117253044 B CN 117253044B CN 202311334268 A CN202311334268 A CN 202311334268A CN 117253044 B CN117253044 B CN 117253044B
Authority
CN
China
Prior art keywords
image
loss
cnn
loss function
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311334268.4A
Other languages
Chinese (zh)
Other versions
CN117253044A (en
Inventor
文思鉴
王永梅
王芃力
张友华
吴雷
吴海涛
轩亚恒
郑雪瑞
张世豪
潘海瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Agricultural University AHAU
Original Assignee
Anhui Agricultural University AHAU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Agricultural University AHAU filed Critical Anhui Agricultural University AHAU
Priority to CN202311334268.4A priority Critical patent/CN117253044B/en
Publication of CN117253044A publication Critical patent/CN117253044A/en
Application granted granted Critical
Publication of CN117253044B publication Critical patent/CN117253044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/188Vegetation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention is suitable for the technical field of agricultural image analysis, and particularly provides a farmland remote sensing image segmentation method based on semi-supervised interactive learning; and secondly, introducing a directivity contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure the consistency of the same identity features in the pictures under different scenes, thereby improving the generalization capability and robustness of the model.

Description

Farmland remote sensing image segmentation method based on semi-supervised interactive learning
Technical Field
The invention belongs to the technical field of agricultural image analysis, and particularly relates to a farmland remote sensing image segmentation method based on semi-supervised interactive learning.
Background
The farmland remote sensing image segmentation is an important task, and the aim is to classify the farmland remote sensing image at the pixel level so as to improve the efficiency of agricultural land production and management.
The conventional farmland remote sensing image segmentation method based on deep learning generally needs a large amount of labeling data for training, but the labeling data is high in acquisition cost, and the requirements are often difficult to meet in practical application. Therefore, semi-supervised learning is one of the effective methods to solve this problem.
Currently, semi-supervised learning methods use a small amount of labeled data and a large amount of unlabeled data for training to improve the performance of the model. In addition, due to the large number of parameters, the situation of over fitting is easy to occur, namely, the model performs well on the training set, but performs poorly on the test set. Therefore, the generalization capability of the farmland remote sensing image segmentation model is an important problem to be considered when being applied to actual scenes.
The existing framework for improving generalization capability and robustness of semi-supervised agricultural image segmentation algorithm can be divided into two main types: an agricultural image segmentation method based on a Convolutional Neural Network (CNN) and an agricultural image segmentation method based on a Transformer; the former, CNN, extracts features in image space by convolution operation, has the disadvantage that: CNNs use local receptive fields in processing images and gradually reduce image resolution from lower to higher layers through convolution and pooling operations, such local receptive field limitations may result in loss of detail information and global context information in the image, particularly for fine-grained segmentation tasks for large-scale farmland areas; the latter convertors model global relationships in sequence space by self-attention mechanisms, which have the disadvantage that: the goal of the transducer is to model the dependency relationship between each pixel and other pixels through global context information, and there is a limitation in processing local features. In a farmland remote sensing image, different crops or land types may have different scales, some fine feature details need finer perceptibility, and a Transformer may not accurately capture the details when processing different scale features, which leads to reduced accuracy and robustness of a segmentation result.
Disclosure of Invention
The embodiment of the invention aims to provide a farmland remote sensing image segmentation method based on semi-supervised interactive learning, which comprises the following steps of firstly, mutually cooperating CNN and a transducer through interactive learning, and mutually transmitting local characteristics and global characteristics of pixels through self-supervised training on unlabeled data, so that the requirement of labeling data is reduced, and meanwhile, the possible defects of the two existing methods are effectively avoided; secondly, introducing a directional contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure consistency of the same identity features in the pictures under different scenes, so as to improve generalization capability and robustness of the model.
In view of the above, the invention provides a farmland remote sensing image segmentation method based on semi-supervised interactive learning, which comprises the following steps:
Step S10: m input images divided with labels And N images without labels
Step S20: training CNN and transducer using the tagged image data, respectively;
step S30: weak enhancement processing for Gaussian filtering and brightness adjustment of unlabeled image, and randomly cutting out two new images with overlapping area in the same image ,/>Meanwhile, the pixels of the unlabeled image are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity characteristic in the image under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; calculating consistency regularization loss by using the CNN prediction result as a pseudo tag of the transform prediction result;
Step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
As a further limitation of the technical solution of the present invention, the step of performing the weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly cropping two new images with overlapping areas in the same image includes:
Step S31: applying gaussian filtering to reduce noise and detail in the unlabeled image, weighted averaging the neighborhood around each pixel; for each pixel Filtering is carried out by using a Gaussian kernel with the size of k, and the pixel value after filtering is as follows:
(1)
wherein, Is the pixel value in the neighborhood,/>Is the weight value of the gaussian kernel;
Finally, adjusting the brightness of the image;
step S32: for a given weakly enhanced unlabeled image, randomly selecting the size and position of a cropping window;
moving the cutting window upwards and leftwards for a certain distance to obtain two new images with overlapping areas 、/>Training a model;
Step S33: using bicubic interpolation algorithm to interpolate all images Are scaled to size/>And uses a bilinear interpolation algorithm to interpolate the corresponding label/>Scaled to the same size so that the input image meets the input specifications of DeepLab v + network.
As a further limitation of the technical solution of the present invention, the training process of the tagged image data includes:
Encoding matrices using tagged images Tag/>Respectively training two backbone network models of CNN and transducer, and calculating the loss function/>, which is related to the real label
As a further limitation of the technical scheme of the invention, the loss function of the calculation and the real labelThe method comprises the following steps:
Data to be tagged Inputting CNN to obtain prediction probability/>, corresponding to each pixel pointInputting a transducer to obtain a prediction probability/>Calculating a loss function/>, between the calculated and corresponding real values
Loss function between the computation and corresponding real valueThe process of (2) is as follows:
with real labels In contrast, the loss function/>, of the CNN fractionAs shown in formula (2):
(2)
Loss function of a transducer section As shown in formula (3):
(3)
Wherein the method comprises the steps of Representing a ReLU activation function,/>Representing Focal Loss, the expression is shown in formula (4):
(4)
Wherein the method comprises the steps of And/>Is a super parameter, here set as/>=0.25,/>=2;
The total loss calculation of the supervised learning model is shown in the formula (5):
(5)
Wherein when In the case of a real tag,/>=1, Vice versa/>=0,/>Is a real number ranging from 0 to 1, indicating the probability that the image belongs to the category noted in the label.
As a further limitation of the technical solution of the present invention, the training process of the label-free image data includes:
through inputting two groups of weak enhancement unlabeled images which are cut randomly, obtaining a prediction result through a CNN network framework and taking the prediction result as a pseudo label predicted by a transducer model, and calculating consistency regularization loss
Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency lossAnd the interactive transmission of the image local information and the context global information in the training process is ensured, so that the model fully learns the consistency regularization capability.
As a further limitation of the technical scheme of the invention, in the training process of the unlabeled images, the two groups of input unlabeled images subjected to weak enhancement are subjected to,/>Two groups of predicted values/>, are generated through a CNN model framework、/>; Similarly, two sets of predicted values/>, are generated through a transducer model framework、/>
(6)
Wherein,Representing CNN network model,/>Representing a transducer network model;
Pseudo tag 、/>、/>、/>The calculation method of (2) is shown in the formula (7):
(7)
wherein, Representation such that the probability value/>The label corresponding to the maximum;
CNN prediction result is used as pseudo tag of transducer, and consistency regularization loss is carried out The calculation method of (2) is shown in the formula (8):
(8)
wherein, Representing a ReLU activation function,/>Representing a Dice loss function;
,/> Feature map/>, obtained by DeepLab v < 3+ > encoder Encoder And/>And then is passed through a nonlinear projector/>Projection as/>And/>Using a directional contrast loss function, overlapping regions/>, are encouragedAligning the contrast features with high confidence under different backgrounds, and finally keeping consistency;
for the first Label-free images, loss of directionality contrast/>The calculation formula of (2) is as follows:
(9)
(10)
(11)
wherein N represents the number of spatial positions of the overlapping feature regions; calculating the feature similarity; /(I) Representing a two-dimensional spatial position; /(I)Representing a negative set of images,/>Representation/>Negative samples in/>Representing a classifier;
Calculating a consistency loss function after context awareness constraint by using a prediction result of a transducer as a pseudo tag The formula is as follows:
(12)
wherein, Representing a Dice loss function; /(I)Representing the classifier.
As a further limitation of the technical scheme of the invention, the overall loss is minimized, in the training process, firstly, parameters of a classification network are initialized, then, training data are used for forward propagation and backward propagation, gradients of a loss function are calculated, and network parameters are updated by utilizing optimization algorithms such as gradient descent and the like until the overall loss function is reachedReaching a preset convergence condition;
The total loss To supervise learning model total loss/>Consistency regularization loss/>Loss of directivity contrast/>Context aware consistency loss/>Wherein the total loss/>The calculation formula of (2) is as follows:
(13)
wherein, And/>Is a weight factor, with the aim of controlling the directional contrast loss/>And consistency regularization lossIn the total loss function/>Is a ratio of the number of the first and second groups.
As a further limitation of the technical scheme of the present invention, the step S40 specifically includes a model test, for a given test set farmland remote sensing image, CNN is used as a backbone network model to extract features, the probability of the category to which each pixel point of the target image belongs is segmented and output, and a threshold is set to mark the pixel point as a segmentation target or background.
As a further limitation of the present invention, the step of marking the pixel point as a segmentation target or background by the set threshold includes: obtaining maximum gray value of image by dividing modelAnd minimum gray value/>Let the initial threshold be/>According to/>Dividing an image into a foreground and a background, and respectively obtaining the average gray value/>And/>Find new threshold/>Iterate until ifThe obtained value is the threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so that the final segmentation mask is obtained.
Compared with the prior art, the farmland remote sensing image segmentation method based on semi-supervised interactive learning has the beneficial effects that:
Firstly, a consistency constraint module for directional perception is inserted into a CNN and a Transformer interactive learning network, the CNN is excellent in image processing, spatial features in farmland remote sensing images can be effectively extracted, local and global features are captured through convolution and pooling operations, the Transformer is excellent in the field of natural language processing, sequence data and long-distance dependency relations are good in processing, and the advantages of the CNN in spatial feature extraction and the advantages of the Transformer in long-distance dependent modeling can be combined, so that the features in farmland remote sensing images can be better extracted and modeled, and the segmentation accuracy is improved;
secondly, the transducer uses a self-attention mechanism in the encoder-decoder framework, can effectively capture the context information in the image, is very important for accurate segmentation results for farmland remote sensing image segmentation, and can better model the relationship between pixels in the image due to the fact that crops and backgrounds in the image often have wide spatial correlation, so that the segmentation accuracy is improved; on the other hand, farmland remote sensing images usually have higher resolution, detail information needs to be accurately recovered in image segmentation, and the calculation and memory requirements of the traditional CNN decoder on the high-resolution images are higher; the invention can gradually restore the image resolution between the encoder and the decoder by introducing the Transformer layer, and reduce the consumption of calculation and memory resources, thereby more effectively processing high-resolution images.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a system architecture diagram of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;
FIG. 2 is a flow chart of an implementation of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;
FIG. 3 is a sub-flow of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;
FIG. 4 is a block diagram of a farmland remote sensing image segmentation system provided by the invention;
Fig. 5 is a block diagram of a computer device according to the present invention.
Detailed Description
The present application will be further described with reference to the accompanying drawings and detailed description, wherein it is to be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.
In order to make the objects, technical solutions and advantages of the present application more apparent, the following embodiments of the present application will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two non-identical entities with the same name or non-identical parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such as a process, method, system, article, or other step or unit that comprises a list of steps or units.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
At present, the existing framework for improving generalization capability and robustness of a semi-supervised agricultural image segmentation algorithm can be divided into two main categories: an agricultural image segmentation method based on a Convolutional Neural Network (CNN) and an agricultural image segmentation method based on a Transformer; the former, CNN, extracts features in image space by convolution operation, has the disadvantage that: CNNs use local receptive fields in processing images and gradually reduce image resolution from lower to higher layers through convolution and pooling operations, such local receptive field limitations may result in loss of detail information and global context information in the image, particularly for fine-grained segmentation tasks for large-scale farmland areas; the latter convertors model global relationships in sequence space by self-attention mechanisms, which have the disadvantage that: the goal of the transducer is to model the dependency relationship between each pixel and other pixels through global context information, and there is a limitation in processing local features. In a farmland remote sensing image, different crops or land types may have different scales, some fine feature details need finer perceptibility, and a Transformer may not accurately capture the details when processing different scale features, which leads to reduced accuracy and robustness of a segmentation result.
In order to solve the problems, the invention designs a farmland remote sensing image segmentation method of semi-supervised interactive learning, which comprises the steps of firstly, mutually cooperating CNN and a transducer through interactive learning, and mutually transmitting local characteristics and global characteristics of pixels through self-supervised training on unlabeled data, so that the requirement of labeling data is reduced, and meanwhile, the possible defects of the two existing methods are effectively avoided; and secondly, introducing a directivity contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure the consistency of the same identity features in the pictures under different scenes, thereby improving the generalization capability and robustness of the model.
Specific implementations of the invention are described in detail below in connection with specific embodiments.
Example 1
FIG. 1 illustrates an exemplary system architecture for implementing a semi-supervised interactive learning based farmland remote sensing image segmentation method.
FIG. 2 shows the implementation flow of the farmland remote sensing image segmentation method based on semi-supervised interactive learning;
As shown in fig. 1 and fig. 2, in an embodiment of the present invention, a farmland remote sensing image segmentation method based on semi-supervised interactive learning includes the following steps:
Step S10: m input images divided with labels And N images without labels
Step S20: training CNN and transducer using the tagged image data, respectively;
Step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled images, randomly cropping two new images with overlapping regions in the same image, i.e. for each of the N unlabeled images Random cropping to obtain two groups of new images with overlapping areas/>,/>The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.
Step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
Further, as shown in fig. 3, in the step S30, the step of performing weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly cropping two new images with overlapping areas in the same image includes:
Step S31: applying gaussian filtering to reduce noise and detail in the unlabeled image, weighted averaging the neighborhood around each pixel; for each pixel (x, y), filtering is performed using a gaussian kernel of size k, the filtered pixel values being:
(1)
wherein, Is the pixel value in the neighborhood,/>Is the weight value of the gaussian kernel;
Finally, adjusting the brightness of the image;
step S32: for a given weakly enhanced unlabeled image, randomly selecting the size and position of a cropping window;
moving the cutting window upwards and leftwards for a certain distance to obtain two new images with overlapping areas 、/>Training a model;
Step S33: using bicubic interpolation algorithm to interpolate all images Are scaled to an image of size 513px ∗ px and the corresponding label/>, using a two-line interpolation algorithmScaled to the same size so that the input image meets the input specifications of DeepLab v + network.
Further, in an embodiment of the present invention, the training process of the tagged image data includes:
Encoding matrices using tagged images Tag/>Respectively training two backbone network models of CNN and transducer, and calculating the loss function/>, which is related to the real label
As a further limitation of the technical scheme of the invention, the loss function of the calculation and the real labelThe method comprises the following steps:
Data to be tagged Inputting CNN to obtain prediction probability/>, corresponding to each pixel pointInputting a transducer to obtain a prediction probability/>Calculating a loss function/>, between the calculated and corresponding real values
Loss function between the computation and corresponding real valueThe process of (2) is as follows:
with real labels In contrast, the loss function/>, of the CNN fractionAs shown in formula (2):
(2)
Loss function of a transducer section As shown in formula (3):
(3)
Wherein the method comprises the steps of Representing a ReLU activation function,/>Representing Focal Loss, the expression is shown in formula (4):
(4)
Wherein the method comprises the steps of And/>Is a super parameter, here set as/>=0.25,/>=2;
The total loss calculation of the supervised learning model is shown in the formula (5):
(5)
Wherein when In the case of a real tag,/>=1, Vice versa/>=0,/>Is a real number ranging from 0 to 1, indicating the probability that the image belongs to the category noted in the label.
Further, in an embodiment of the present invention, the training process of the label-free image data includes:
through inputting two groups of weak enhancement unlabeled images which are cut randomly, obtaining a prediction result through a CNN network framework and taking the prediction result as a pseudo label predicted by a transducer model, and calculating consistency regularization loss
Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency lossAnd the interactive transmission of the image local information and the context global information in the training process is ensured, so that the model fully learns the consistency regularization capability.
Further, in the embodiment of the invention, in the label-free image training process, two main network models are respectively focused on learning local features and global features, information interaction is used for feature knowledge transfer, short plates are complementary, and meanwhile, in order to ensure that a CNN module has better robustness and generalization capability on the premise of only a small amount of data, a direction perception consistency constraint is introduced; specifically:
For two groups of input non-label images after weak enhancement ,/>Two groups of predicted values/>, are generated through a CNN model framework、/>; Similarly, two sets of predicted values/>, are generated through a transducer model framework、/>
(6)
Wherein,Representing CNN network model,/>Representing a transducer network model;
CNN is good at capturing local characteristics and spatial correlation in image processing, and the local structure of the image is extracted through convolution operation of local receptive fields;
The transducer is more suitable for modeling global dependence and long-range relation, and global information interaction can be established in the whole input sequence through a self-attention mechanism;
Thus, these predictions have essentially different properties at the output level, pseudo tags 、/>、/>、/>The calculation method of (2) is shown in the formula (7):
(7)
wherein, Representation such that the probability value/>The label corresponding to the maximum;
CNN prediction result is used as pseudo tag of transducer, and consistency regularization loss is carried out The calculation method of (2) is shown in the formula (8):
(8)
wherein, Representing a ReLU activation function,/>Representing a Dice loss function;
,/> Feature map/>, obtained by DeepLab v < 3+ > encoder Encoder And/>And then is passed through a nonlinear projector/>Projection as/>And/>Using a directional contrast loss function, overlapping regions/>, are encouragedAligning the contrast features with high confidence under different backgrounds, and finally keeping consistency;
for the first Label-free images, loss of directionality contrast/>The calculation formula of (2) is as follows:
(9)
(10)
(11)
wherein N represents the number of spatial positions of the overlapping feature regions; calculating the feature similarity; h, w represents a two-dimensional spatial position; /(I) Representing a negative set of images,/>Representation/>Negative samples in/>Representing a classifier;
Calculating a consistency loss function after context awareness constraint by using a prediction result of a transducer as a pseudo tag The formula is as follows:
(12)
wherein, Representing a Dice loss function; /(I)Representing the classifier.
Further, in the embodiment of the present invention, the overall loss is minimized, in the training process, parameters of the classification network are initialized first, then forward propagation and backward propagation are performed using training data, gradients of the loss function are calculated, and the network parameters are updated by using optimization algorithms such as gradient descent until the overall loss functionReaching a preset convergence condition;
By iteratively updating network parameters, we want the classification network to learn the proper feature representation so that the difference between the predicted result and the real label is minimized;
The total loss To supervise learning model total loss/>Consistency regularization loss/>Loss of directivity contrast/>Context aware consistency loss/>Wherein the total loss/>The calculation formula of (2) is as follows:
(13)
wherein, And/>Is a weight factor, with the aim of controlling the directional contrast loss/>And consistency regularization lossIn the total loss function/>Is a ratio of the number of the first and second groups.
As a further limitation of the technical scheme of the present invention, the step S40 specifically includes a model test, for a given test set farmland remote sensing image, CNN is used as a backbone network model to extract features, the probability of the category to which each pixel point of the target image belongs is segmented and output, and a threshold is set to mark the pixel point as a segmentation target or background.
As a further limitation of the present invention, the step of marking the pixel point as a segmentation target or background by the set threshold includes: obtaining maximum gray value of image by dividing modelAnd minimum gray value/>Let the initial threshold be/>According to/>Dividing an image into a foreground and a background, and respectively obtaining the average gray value/>And/>Find new threshold/>Iterating until if/>The obtained value is the threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so that the final segmentation mask is obtained.
In summary, the invention inserts the consistency constraint module of the directional perception in the interactive learning network of the CNN and the Transformer, the CNN is excellent in image processing, the spatial characteristics in the farmland remote sensing image can be effectively extracted, the local and global characteristics are captured through convolution and pooling operation, the Transformer is excellent in the natural language processing field, the sequence data and the long-distance dependency relationship are good in processing, and the advantages of the CNN in the aspect of spatial characteristic extraction and the advantages of the Transformer in the aspect of long-distance dependency modeling can be combined, the characteristics in the farmland remote sensing image can be better extracted and modeled, and the segmentation accuracy is improved.
In addition, the transducer uses a self-attention mechanism in the encoder-decoder framework, can effectively capture the context information in the image, is very important for accurate segmentation results for farmland remote sensing image segmentation, and can better model the relation between pixels in the image because crops and backgrounds in the image often have wide spatial relevance, thereby improving the segmentation accuracy;
On the other hand, farmland remote sensing images usually have higher resolution, detail information needs to be accurately recovered in image segmentation, and the calculation and memory requirements of the traditional CNN decoder on the high-resolution images are higher; the invention can gradually restore the image resolution between the encoder and the decoder by introducing the Transformer layer, and reduce the consumption of calculation and memory resources, thereby more effectively processing high-resolution images.
Example 2
As shown in fig. 4, in an exemplary embodiment provided by the present disclosure, the present invention further provides a farmland remote sensing image segmentation system, the farmland remote sensing image segmentation system 50 includes:
A preprocessing module 51, the preprocessing module 51 being used for dividing M input images with labels And unlabeled N images/>
A first training module 52, wherein the first training module 52 is configured to train CNN and a transducer using the tagged image data, respectively;
a second training module 53, where the second training module 53 is configured to perform weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly clip two new images with overlapping areas in the same image, i.e. for each of the N unlabeled images Random cropping to obtain two groups of new images with overlapping areas,/>The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.
The model test module 54 is used for dividing the test set image by taking the trained CNN model as a backbone network and evaluating the accuracy of the result.
Example 3
As shown in fig. 5, in an embodiment of the present invention, the present invention further provides a computer device.
The computer device 60 comprises a memory 61, a processor 62 and computer readable instructions stored in the memory 61 and executable on the processor 62, which processor 62 when executing the computer readable instructions implements the farmland telemetry image segmentation method based on semi-supervised interactive learning as provided by embodiment 1.
The farmland remote sensing image segmentation method based on semi-supervised interactive learning comprises the following steps:
Step S10: m input images divided with labels And N images without labels
Step S20: training CNN and transducer using the tagged image data, respectively;
Step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled images, randomly cropping two new images with overlapping regions in the same image, i.e. for each of the N unlabeled images Random cropping to obtain two groups of new images with overlapping areas/>The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.
Step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
In addition, the device 60 according to the embodiment of the present invention may further have a communication interface 63 for receiving a control command.
Example 4
In an exemplary embodiment provided by the present disclosure, a computer-readable storage medium is also provided.
Specifically, in an exemplary embodiment of the present disclosure, the storage medium stores computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform a farmland remote sensing image segmentation method based on semi-supervised interaction learning as provided by embodiment 1.
The farmland remote sensing image segmentation method based on semi-supervised interactive learning comprises the following steps:
Step S10: m input images divided with labels And N images without labels
Step S20: training CNN and transducer using the tagged image data, respectively;
Step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled images, randomly cropping two new images with overlapping regions in the same image, i.e. for each of the N unlabeled images Random cropping to obtain two groups of new images with overlapping areas/>The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.
Step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.
In various embodiments of the present invention, it should be understood that the size of the sequence numbers of the processes does not mean that the execution sequence of the processes is necessarily sequential, and the execution sequence of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present invention, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, comprising several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in a computer device) to execute some or all of the steps of the method according to the embodiments of the present invention.
Those of ordinary skill in the art will appreciate that some or all of the steps of the various methods of the described embodiments may be implemented by hardware associated with a program that may be stored in a computer-readable storage medium, including Read-Only Memory (ROM), random-access Memory (RandomAccess Memory,11 RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (CD-ROM) or other optical disc Memory, magnetic disk Memory, tape Memory, or any other medium capable of being used to carry or store data.
The farmland remote sensing image segmentation method based on semi-supervised interactive learning disclosed by the embodiment of the invention is described in detail, and specific examples are applied to explain the principle and the implementation mode of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (3)

1. A farmland remote sensing image segmentation method based on semi-supervised interactive learning is characterized by comprising the following steps:
Step S10: dividing the M input images x L={x1,x2,...,xM with labels and the N images x U={x1,x2,...,xN without labels;
step S20: training CNN and transducer using the tagged image data, respectively;
Step S30: carrying out Gaussian filtering and brightness adjustment weak enhancement processing on the unlabeled image, randomly cutting out two new images xU1={x11,x21,...,xN1},xU2={x12,x22,...,xN2}, with overlapping areas in the same image, simultaneously projecting pixels of the unlabeled image between an encoder and a decoder of the CNN, introducing a directional contrast loss function, guaranteeing consistency of the same identity characteristic in the image under different scenes, and calculating context perception consistency loss by using a transform prediction result as a pseudo tag; calculating consistency regularization loss by using the CNN prediction result as a pseudo tag of the transform prediction result;
Step S40: the trained CNN model is used as a backbone network to divide the test set image and evaluate the accuracy of the result;
The method for performing weak enhancement processing of Gaussian filtering and brightness adjustment on the unlabeled image comprises the following steps of randomly cutting out two new images with overlapping areas from the same image:
step S31: applying gaussian filtering to reduce noise and detail in the unlabeled image, weighted averaging the neighborhood around each pixel; for each pixel (x, y), filtering is performed using a gaussian kernel of size k, the filtered pixel values being:
I(x,y)=∑(G(x′,y′)*I(x′,y′)) (1)
Wherein I (x ', y') is the pixel value in the neighborhood and G (x ', y') is the weight value of the gaussian kernel;
Finally, adjusting the brightness of the image;
step S32: for a given weakly enhanced unlabeled image, randomly selecting the size and position of a cropping window;
moving the cutting window upwards and leftwards by a certain distance respectively to obtain two new images x u1、xu2 with overlapping areas to train the model;
Step S33: scaling all images χ to an image of size 513px x 513px using a bicubic interpolation algorithm, and scaling the corresponding labels y to the same size using a bilinear interpolation algorithm so that the input image meets the input specification of DeepLab v < 3+ > network;
The training process of the tagged image data comprises the following steps:
Training two backbone network models of CNN and Transformer respectively by using a tagged image coding matrix x L={χ12,...,χM and a tag y L={y1,y2,...,yM, and calculating a loss function with a real tag
Calculating a loss function with a genuine tagThe method comprises the following steps:
Inputting tagged data χ L into CNN to obtain the corresponding prediction probability of each pixel point Inputting a transducer to obtain a prediction probability/>Calculating a loss function/>, between the corresponding real values
Loss function between the computation and corresponding real valueThe process of (2) is as follows:
Loss function of CNN part in comparison with real tag y l As shown in formula (2):
Loss function of a transducer section As shown in formula (3):
wherein σ represents a ReLU activation function, l FL represents a Focal Loss, and the expression is shown in formula (4):
Where α l and γ are hyper-parameters, here set to α l =0.25, γ=2;
the total loss calculation of the supervised learning model is shown in the formula (5):
Wherein when l is a real tag, y l=1, and vice versa y l=0; real numbers ranging from 0 to 1 represent the probability that the image belongs to the category marked in the label;
The training process of the label-free image data comprises the following steps:
through inputting two groups of weak enhancement unlabeled images which are cut randomly, obtaining a prediction result through a CNN network framework and taking the prediction result as a pseudo label predicted by a transducer model, and calculating consistency regularization loss
Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency loss
In the non-label image training process, two groups of predictive values are generated for two groups of input non-label images x u1,xu2 subjected to weak enhancement through a CNN model frameSimilarly, two groups of predicted values are generated through a transducer model framework
Wherein,Representing CNN network model,/>Representing a transducer network model;
Pseudo tag The calculation method of (2) is shown in the formula (7):
wherein argmax (p) represents a label corresponding to the time when the predicted probability value p is made maximum;
CNN prediction result is used as pseudo tag of transducer, and consistency regularization loss is carried out The calculation method of (2) is shown in the formula (8):
Wherein σ represents the ReLU activation function, and l dice represents the Dice loss function;
x u1,xu2 gets the feature maps M u1 and M u2 by the encoder Encoder of DeepLab v3+, then by a nonlinear projector Projected as M o1 and M o2, using a directional contrast loss function, encourages alignment of the overlap region x o to contrast features with high confidence under different contexts, ultimately remaining consistent;
for the ith no-label image, loss of directional contrast The calculation formula of (2) is as follows:
wherein N represents the number of spatial positions of the overlapping feature regions; r, calculating feature similarity; h, w represents a two-dimensional spatial position; m u denotes the image negative set, M denotes the negative in M u, C denotes the classifier;
Calculating a consistency loss function after context awareness constraint by using a prediction result of a transducer as a pseudo tag The formula is as follows:
wherein l dice represents the Dice loss function; c (x) represents a classifier;
Also includes minimizing overall loss, initializing parameters of the classification network during training, then forward and backward propagating using the training data, calculating gradients of the loss function, and updating network parameters using a gradient descent optimization algorithm until the overall loss function Reaching a preset convergence condition;
The total loss To supervise learning model total loss/>Consistency regularization loss/>Loss of directivity contrast/>Context aware consistency loss/>Wherein the total loss/>The calculation formula of (2) is as follows:
Wherein λ and λ w are weight factors for the purpose of controlling the directional contrast loss And consistency regularization loss/>In the total loss function/>Is a ratio of the number of the first and second groups.
2. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 1, wherein the step S40 specifically comprises model testing, for a given test set farmland remote sensing image, extracting features by using CNN as a backbone network model, segmenting and outputting class probability of each pixel point of the target image, and setting a threshold value to mark the pixel point as a segmentation target or background.
3. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 2, wherein the step of marking the pixel point as a segmentation target or background by the set threshold value comprises: the segmentation model obtains a maximum gray value P max and a minimum gray value P min of the image, the initial threshold is T 0=(Pmax+Pmin)/2, the image is segmented into a foreground and a background according to T (k), k=0, 1,2, k is respectively obtained, the average gray values H 1 and H 2 of the two are obtained, a new threshold T (k+1)=(H1+H2)/2 is obtained, and iteration is carried out until the threshold is obtained if T (k)=T(k+1), the prediction probability is greater than the threshold and is marked as the foreground, and less than the threshold and is marked as the background, so that the final segmentation mask is obtained.
CN202311334268.4A 2023-10-16 2023-10-16 Farmland remote sensing image segmentation method based on semi-supervised interactive learning Active CN117253044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311334268.4A CN117253044B (en) 2023-10-16 2023-10-16 Farmland remote sensing image segmentation method based on semi-supervised interactive learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311334268.4A CN117253044B (en) 2023-10-16 2023-10-16 Farmland remote sensing image segmentation method based on semi-supervised interactive learning

Publications (2)

Publication Number Publication Date
CN117253044A CN117253044A (en) 2023-12-19
CN117253044B true CN117253044B (en) 2024-05-24

Family

ID=89134963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311334268.4A Active CN117253044B (en) 2023-10-16 2023-10-16 Farmland remote sensing image segmentation method based on semi-supervised interactive learning

Country Status (1)

Country Link
CN (1) CN117253044B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117437426B (en) * 2023-12-21 2024-09-10 苏州元瞰科技有限公司 Semi-supervised semantic segmentation method for high-density representative prototype guidance
CN118155284A (en) * 2024-03-20 2024-06-07 飞虎互动科技(北京)有限公司 Signature action detection method, signature action detection device, electronic equipment and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507343A (en) * 2019-01-30 2020-08-07 广州市百果园信息技术有限公司 Training of semantic segmentation network and image processing method and device thereof
CN113469283A (en) * 2021-07-23 2021-10-01 山东力聚机器人科技股份有限公司 Image classification method, and training method and device of image classification model
CN114943831A (en) * 2022-07-25 2022-08-26 安徽农业大学 Knowledge distillation-based mobile terminal pest target detection method and mobile terminal equipment
WO2023024920A1 (en) * 2021-08-24 2023-03-02 华为云计算技术有限公司 Model training method and system, cluster, and medium
CN116051574A (en) * 2022-12-28 2023-05-02 河南大学 Semi-supervised segmentation model construction and image analysis method, device and system
CN116258695A (en) * 2023-02-03 2023-06-13 浙江大学 Semi-supervised medical image segmentation method based on interaction of Transformer and CNN
CN116258730A (en) * 2023-05-16 2023-06-13 先进计算与关键软件(信创)海河实验室 Semi-supervised medical image segmentation method based on consistency loss function
CN116402838A (en) * 2023-06-08 2023-07-07 吉林大学 Semi-supervised image segmentation method and system for intracranial hemorrhage

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507343A (en) * 2019-01-30 2020-08-07 广州市百果园信息技术有限公司 Training of semantic segmentation network and image processing method and device thereof
CN113469283A (en) * 2021-07-23 2021-10-01 山东力聚机器人科技股份有限公司 Image classification method, and training method and device of image classification model
WO2023024920A1 (en) * 2021-08-24 2023-03-02 华为云计算技术有限公司 Model training method and system, cluster, and medium
CN114943831A (en) * 2022-07-25 2022-08-26 安徽农业大学 Knowledge distillation-based mobile terminal pest target detection method and mobile terminal equipment
CN116051574A (en) * 2022-12-28 2023-05-02 河南大学 Semi-supervised segmentation model construction and image analysis method, device and system
CN116258695A (en) * 2023-02-03 2023-06-13 浙江大学 Semi-supervised medical image segmentation method based on interaction of Transformer and CNN
CN116258730A (en) * 2023-05-16 2023-06-13 先进计算与关键软件(信创)海河实验室 Semi-supervised medical image segmentation method based on consistency loss function
CN116402838A (en) * 2023-06-08 2023-07-07 吉林大学 Semi-supervised image segmentation method and system for intracranial hemorrhage

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S-RPN: Sampling-balanced region proposal network for small crop pest detection;Rujing Wang 等;Computers and Electronics in Agriculture;20210831;第187卷;1-11 *
融合ASPP-Attention和上下文的复杂场景语义分割;杨鑫;于重重;王鑫;陈秀新;;计算机仿真;20200915(第09期);209-213 *

Also Published As

Publication number Publication date
CN117253044A (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN111902825B (en) Polygonal object labeling system and method for training object labeling system
Minaee et al. Image segmentation using deep learning: A survey
US11176381B2 (en) Video object segmentation by reference-guided mask propagation
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109886121B (en) Human face key point positioning method for shielding robustness
CN117253044B (en) Farmland remote sensing image segmentation method based on semi-supervised interactive learning
US11704817B2 (en) Method, apparatus, terminal, and storage medium for training model
CN113609896B (en) Object-level remote sensing change detection method and system based on dual-related attention
Khalel et al. Automatic pixelwise object labeling for aerial imagery using stacked u-nets
Huang et al. Efficient inference in occlusion-aware generative models of images
CN110163188B (en) Video processing and method, device and equipment for embedding target object in video
CN114155481A (en) Method and device for recognizing unstructured field road scene based on semantic segmentation
CN112258436B (en) Training method and device for image processing model, image processing method and model
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN113705371B (en) Water visual scene segmentation method and device
CN115761222B (en) Image segmentation method, remote sensing image segmentation method and device
Veeravasarapu et al. Adversarially tuned scene generation
CN116453121B (en) Training method and device for lane line recognition model
CN116645592B (en) Crack detection method based on image processing and storage medium
CN114444565B (en) Image tampering detection method, terminal equipment and storage medium
CN113378897A (en) Neural network-based remote sensing image classification method, computing device and storage medium
CN116363357A (en) Semi-supervised semantic segmentation method and device based on MIM and contrast learning
Zhang et al. Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention
CN115577768A (en) Semi-supervised model training method and device
Sun et al. Which target to focus on: Class-perception for semantic segmentation of remote sensing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant