CN117253044A

CN117253044A - Farmland remote sensing image segmentation method based on semi-supervised interactive learning

Info

Publication number: CN117253044A
Application number: CN202311334268.4A
Authority: CN
Inventors: 文思鉴; 王永梅; 王芃力; 张友华; 吴雷; 吴海涛; 轩亚恒; 郑雪瑞; 张世豪; 潘海瑞
Original assignee: Anhui Agricultural University AHAU
Current assignee: Anhui Agricultural University AHAU
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2023-12-19
Anticipated expiration: 2043-10-16
Also published as: CN117253044B

Abstract

The invention is suitable for the technical field of agricultural image analysis, and particularly provides a farmland remote sensing image segmentation method based on semi-supervised interactive learning; and secondly, introducing a directivity contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure the consistency of the same identity features in the pictures under different scenes, thereby improving the generalization capability and robustness of the model.

Description

Farmland remote sensing image segmentation method based on semi-supervised interactive learning

Technical Field

The invention belongs to the technical field of agricultural image analysis, and particularly relates to a farmland remote sensing image segmentation method based on semi-supervised interactive learning.

Background

The farmland remote sensing image segmentation is an important task, and the aim is to classify the farmland remote sensing image at the pixel level so as to improve the efficiency of agricultural land production and management.

The conventional farmland remote sensing image segmentation method based on deep learning generally needs a large amount of labeling data for training, but the labeling data is high in acquisition cost, and the requirements are often difficult to meet in practical application. Therefore, semi-supervised learning is one of the effective methods to solve this problem.

Currently, semi-supervised learning methods use a small amount of labeled data and a large amount of unlabeled data for training to improve the performance of the model. In addition, due to the large number of parameters, the situation of over fitting is easy to occur, namely, the model performs well on the training set, but performs poorly on the test set. Therefore, the generalization capability of the farmland remote sensing image segmentation model is an important problem to be considered when being applied to actual scenes.

The existing framework for improving generalization capability and robustness of semi-supervised agricultural image segmentation algorithm can be divided into two main types: an agricultural image segmentation method based on a Convolutional Neural Network (CNN) and an agricultural image segmentation method based on a Transformer; the former, CNN, extracts features in image space by convolution operation, has the disadvantage that: CNNs use local receptive fields in processing images and gradually reduce image resolution from lower to higher layers through convolution and pooling operations, such local receptive field limitations may result in loss of detail information and global context information in the image, particularly for fine-grained segmentation tasks for large-scale farmland areas; the latter convertors model global relationships in sequence space by self-attention mechanisms, which have the disadvantage that: the goal of the transducer is to model the dependency relationship between each pixel and other pixels through global context information, and there is a limitation in processing local features. In a farmland remote sensing image, different crops or land types may have different scales, some fine feature details need finer perceptibility, and a Transformer may not accurately capture the details when processing different scale features, which leads to reduced accuracy and robustness of a segmentation result.

Disclosure of Invention

The embodiment of the invention aims to provide a farmland remote sensing image segmentation method based on semi-supervised interactive learning, which comprises the following steps of firstly, mutually cooperating CNN and a transducer through interactive learning, and mutually transmitting local characteristics and global characteristics of pixels through self-supervised training on unlabeled data, so that the requirement of labeling data is reduced, and meanwhile, the possible defects of the two existing methods are effectively avoided; secondly, introducing a directional contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure consistency of the same identity features in the pictures under different scenes, so as to improve generalization capability and robustness of the model.

In view of the above, the invention provides a farmland remote sensing image segmentation method based on semi-supervised interactive learning, which comprises the following steps:

step S10: m input images divided with labelsAnd N images without labels；

Step S20: training CNN and transducer using the tagged image data, respectively;

step S30: weak enhancement processing for Gaussian filtering and brightness adjustment of unlabeled image, and randomly cutting out two new images with overlapping area in the same image，Meanwhile, the pixels of the unlabeled image are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity characteristic in the image under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; calculating consistency regularization loss by using the CNN prediction result as a pseudo tag of the transform prediction result;

step S40: and (3) taking the trained CNN model as a backbone network, segmenting the test set image, and evaluating the accuracy of the result.

As a further limitation of the technical solution of the present invention, the step of performing the weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly cropping two new images with overlapping areas in the same image includes:

step S31: applying gaussian filtering to reduce noise and detail in unlabeled images, for eachPerforming weighted average on the neighborhood around the pixel; for each pixelFiltering is carried out by using a Gaussian kernel with the size of k, and the pixel value after filtering is as follows:

（1）

wherein,is the pixel value in the neighborhood, +.>Is the weight value of the gaussian kernel;

finally, adjusting the brightness of the image;

step S32: for a given weakly enhanced unlabeled image, randomly selecting the size and position of a cropping window;

moving the cutting window upwards and leftwards for a certain distance to obtain two new images with overlapping areas、Training a model;

step S33: using bicubic interpolation algorithm to interpolate all imagesAre scaled to size +.>And uses a two-line interpolation algorithm to add the corresponding tag +.>Scaled to the same size so that the input image meets the input specifications of the DeepLab v3+ network.

As a further limitation of the technical solution of the present invention, the training process of the tagged image data includes:

encoding matrices using tagged imagesLabel->Training CNN and transducer two backbone network models respectively, and calculating the loss function of the real label>。

As a further limitation of the technical scheme of the invention, the loss function of the calculation and the real labelThe method comprises the following steps:

data to be taggedInputting CNN to obtain predictive probability corresponding to each pixel point>Inputting a transducer to obtain a predictive probability +.>Calculating a loss function between the corresponding real value>；

Loss function between the computation and corresponding real valueThe process of (2) is as follows:

with real labelsIn contrast, the loss function of the CNN fraction +.>As shown in formula (2):

（2）

loss function of a transducer sectionAs shown in formula (3):

（3）

wherein the method comprises the steps ofRepresenting the ReLU activation function, ">Representing Focal Loss, the expression is shown in formula (4):

（4）

wherein the method comprises the steps ofAnd->Is a super parameter, here set as +.>=0.25,=2；

The total loss calculation of the supervised learning model is shown in the formula (5):

（5）

wherein whenFor a real label->=1, vice versa->=0，Is a real number ranging from 0 to 1, indicating the probability that the image belongs to the category noted in the label.

As a further limitation of the technical solution of the present invention, the training process of the label-free image data includes:

through inputting two groups of weak enhancement unlabeled images which are cut randomly, obtaining a prediction result through a CNN network framework and taking the prediction result as a pseudo label predicted by a transducer model, and calculating consistency regularization loss；

Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency lossAnd the interactive transmission of the image local information and the context global information in the training process is ensured, so that the model fully learns the consistency regularization capability.

As a further limitation of the technical scheme of the invention, in the training process of the unlabeled images, the two groups of input unlabeled images subjected to weak enhancement are subjected to,Generating two groups of predicted values +.>、The method comprises the steps of carrying out a first treatment on the surface of the Similarly, two sets of predictors are generated via a transducer model framework>、：

（6）

Wherein,representing CNN network model,/->Representing a transducer network model;

pseudo tag、、、The calculation method of (2) is shown in the formula (7):

（7）

wherein,representation such that the prediction probability value +.>The label corresponding to the maximum;

CNN prediction result is used as pseudo tag of transducer, and consistency regularization loss is carried outThe calculation method of (2) is shown in the formula (8):

（8）

wherein,representing the ReLU activation function, ">Representing a Dice loss function;

,the feature map is obtained by the Encoder of deep Labv3+>And->Then by a non-linear projector +>Projection is +.>And->Using the directional contrast loss function, the overlapping area is encouraged +.>Aligning the contrast features with high confidence under different backgrounds, and finally keeping consistency;

for the firstNo label image, loss of directivity contrast->The calculation formula of (2) is as follows:

（9）

（10）

（11）

wherein N represents the number of spatial positions of the overlapping feature regions;calculating the feature similarity;Representing a two-dimensional spatial position;Representing a negative set of images, +.>Representation->Negative samples of (2),>representing a classifier;

calculating a consistency loss function after context awareness constraint by using a prediction result of a transducer as a pseudo tagThe formula is as follows:

（12）

wherein,representing a Dice loss function;Representing the classifier.

As a further limitation of the technical scheme of the invention, the overall loss is minimized, in the training process, firstly, parameters of a classification network are initialized, then, training data are used for forward propagation and backward propagation, gradients of a loss function are calculated, and network parameters are updated by utilizing optimization algorithms such as gradient descent and the like until the overall loss function is reachedReaching a preset convergence condition;

the total lossTotal loss for supervised learning model>Uniformity regularization loss->Loss of directivity contrast->Context aware consistency loss->Wherein the total loss +.>The calculation formula of (2) is as follows:

（13）

wherein,and->Is a weight factor, with the aim of controlling the directional contrast loss +.>And consistency regularization lossIn the total loss function->Is a ratio of the number of the first and second groups.

As a further limitation of the technical scheme of the present invention, the step S40 specifically includes a model test, for a given test set farmland remote sensing image, CNN is used as a backbone network model to extract features, the probability of the category to which each pixel point of the target image belongs is segmented and output, and a threshold is set to mark the pixel point as a segmentation target or background.

As a further limitation of the present invention, the step of marking the pixel point as a segmentation target or background by the set threshold includes: obtaining maximum gray value of image by dividing modelAnd minimum gray value->Let the initial threshold be +.>According to->Dividing an image into a foreground and a background, and respectively obtaining average gray values of the foreground and the background>And->Find a new threshold +.>Iterating until if->The obtained value is the threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so that the final segmentation mask is obtained.

Compared with the prior art, the farmland remote sensing image segmentation method based on semi-supervised interactive learning has the beneficial effects that:

firstly, a consistency constraint module for directional perception is inserted into a CNN and a Transformer interactive learning network, the CNN is excellent in image processing, spatial features in farmland remote sensing images can be effectively extracted, local and global features are captured through convolution and pooling operations, the Transformer is excellent in the field of natural language processing, sequence data and long-distance dependency relations are good in processing, and the advantages of the CNN in spatial feature extraction and the advantages of the Transformer in long-distance dependent modeling can be combined, so that the features in farmland remote sensing images can be better extracted and modeled, and the segmentation accuracy is improved;

secondly, the transducer uses a self-attention mechanism in the encoder-decoder framework, can effectively capture the context information in the image, is very important for accurate segmentation results for farmland remote sensing image segmentation, and can better model the relationship between pixels in the image due to the fact that crops and backgrounds in the image often have wide spatial correlation, so that the segmentation accuracy is improved; on the other hand, farmland remote sensing images usually have higher resolution, detail information needs to be accurately recovered in image segmentation, and the calculation and memory requirements of the traditional CNN decoder on the high-resolution images are higher; the invention can gradually restore the image resolution between the encoder and the decoder by introducing the Transformer layer, and reduce the consumption of calculation and memory resources, thereby more effectively processing high-resolution images.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention.

FIG. 1 is a system architecture diagram of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;

FIG. 2 is a flow chart of an implementation of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;

FIG. 3 is a sub-flow of a farmland remote sensing image segmentation method based on semi-supervised interactive learning;

FIG. 4 is a block diagram of a farmland remote sensing image segmentation system provided by the invention;

fig. 5 is a block diagram of a computer device according to the present invention.

Detailed Description

The present application will be further described with reference to the drawings and detailed description, which should be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two non-identical entities with the same name or non-identical parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such as a process, method, system, article, or other step or unit that comprises a list of steps or units.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

At present, the existing framework for improving generalization capability and robustness of a semi-supervised agricultural image segmentation algorithm can be divided into two main categories: an agricultural image segmentation method based on a Convolutional Neural Network (CNN) and an agricultural image segmentation method based on a Transformer; the former, CNN, extracts features in image space by convolution operation, has the disadvantage that: CNNs use local receptive fields in processing images and gradually reduce image resolution from lower to higher layers through convolution and pooling operations, such local receptive field limitations may result in loss of detail information and global context information in the image, particularly for fine-grained segmentation tasks for large-scale farmland areas; the latter convertors model global relationships in sequence space by self-attention mechanisms, which have the disadvantage that: the goal of the transducer is to model the dependency relationship between each pixel and other pixels through global context information, and there is a limitation in processing local features. In a farmland remote sensing image, different crops or land types may have different scales, some fine feature details need finer perceptibility, and a Transformer may not accurately capture the details when processing different scale features, which leads to reduced accuracy and robustness of a segmentation result.

In order to solve the problems, the invention designs a farmland remote sensing image segmentation method of semi-supervised interactive learning, which comprises the steps of firstly, mutually cooperating CNN and a transducer through interactive learning, and mutually transmitting local characteristics and global characteristics of pixels through self-supervised training on unlabeled data, so that the requirement of labeling data is reduced, and meanwhile, the possible defects of the two existing methods are effectively avoided; and secondly, introducing a directivity contrast loss function into the CNN, and performing full-supervision training on the tagged data to ensure the consistency of the same identity features in the pictures under different scenes, thereby improving the generalization capability and robustness of the model.

Specific implementations of the invention are described in detail below in connection with specific embodiments.

Example 1

FIG. 1 illustrates an exemplary system architecture for implementing a semi-supervised interactive learning based farmland remote sensing image segmentation method.

FIG. 2 shows the implementation flow of the farmland remote sensing image segmentation method based on semi-supervised interactive learning;

as shown in fig. 1 and fig. 2, in an embodiment of the present invention, a farmland remote sensing image segmentation method based on semi-supervised interactive learning includes the following steps:

step S10: m input images divided with labelsAnd N images without labels；

step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled image, namely, randomly cutting out two new images with overlapped areas in the same imageEach of the unlabeled N imagesRandom cropping, two sets of new images with overlapping areas are obtained +.>，The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.

Further, as shown in fig. 3, in the step S30, the step of performing weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly cropping two new images with overlapping areas in the same image includes:

step S31: applying gaussian filtering to reduce noise and detail in the unlabeled image, weighted averaging the neighborhood around each pixel; for each pixel (x, y), filtering is performed using a gaussian kernel of size k, the filtered pixel values being:

（1）

finally, adjusting the brightness of the image;

step S33: using bicubic interpolation algorithm to interpolate all imagesAre scaled to an image of size 513px ∗ px and the corresponding tag +.>Scaled to the same size so that the input image meets the input specifications of the DeepLab v3+ network.

Further, in an embodiment of the present invention, the training process of the tagged image data includes:

number of labels to be providedAccording toInputting CNN to obtain predictive probability corresponding to each pixel point>Inputting a transducer to obtain a predictive probability +.>Calculating a loss function between the corresponding real value>；

（2）

loss function of a transducer sectionAs shown in formula (3):

（3）

（4）

（5）

Further, in an embodiment of the present invention, the training process of the label-free image data includes:

Further, in the embodiment of the invention, in the label-free image training process, two main network models are respectively focused on learning local features and global features, information interaction is used for feature knowledge transfer, short plates are complementary, and meanwhile, in order to ensure that a CNN module has better robustness and generalization capability on the premise of only a small amount of data, a direction perception consistency constraint is introduced; specifically:

for two groups of input non-label images after weak enhancement,Generating two groups of predicted values +.>、The method comprises the steps of carrying out a first treatment on the surface of the Similarly, two sets of predictors are generated via a transducer model framework>、：

（6）

CNN is good at capturing local characteristics and spatial correlation in image processing, and the local structure of the image is extracted through convolution operation of local receptive fields;

the transducer is more suitable for modeling global dependence and long-range relation, and global information interaction can be established in the whole input sequence through a self-attention mechanism;

thus, these predictions have essentially different properties at the output level, pseudo tags、、、The calculation method of (2) is shown in the formula (7):

（7）

（8）

for the firstNo label image, loss of directivity contrast->Is of the meter(s)The calculation formula is as follows:

（9）

（10）

（11）

wherein N represents the number of spatial positions of the overlapping feature regions;calculating the feature similarity; h, w represents a two-dimensional spatial position;Representing a negative set of images, +.>Representation->Negative samples of (2),>representing a classifier;

（12）

wherein,representing a Dice loss function;Representing the classifier.

Further, in the embodiment of the present invention, the overall loss is minimized, in the training process, parameters of the classification network are initialized first, then forward propagation and backward propagation are performed using training data, gradients of the loss function are calculated, and the network parameters are updated by using optimization algorithms such as gradient descent until the overall loss functionReaching a preset convergence condition;

by iteratively updating network parameters, we want the classification network to learn the proper feature representation so that the difference between the predicted result and the real label is minimized;

（13）

As a further limitation of the present invention, the step of marking the pixel point as a segmentation target or background by the set threshold includes: obtaining maximum gray value of image by dividing modelAnd minimum gray value->Let the initial threshold value beAccording to->Dividing an image into a foreground and a background, and respectively obtaining average gray values of the foreground and the background>And->Find a new threshold +.>Iterating until if->The obtained value is the threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so that the final segmentation mask is obtained.

In summary, the invention inserts the consistency constraint module of the directional perception in the interactive learning network of the CNN and the Transformer, the CNN is excellent in image processing, the spatial characteristics in the farmland remote sensing image can be effectively extracted, the local and global characteristics are captured through convolution and pooling operation, the Transformer is excellent in the natural language processing field, the sequence data and the long-distance dependency relationship are good in processing, and the advantages of the CNN in the aspect of spatial characteristic extraction and the advantages of the Transformer in the aspect of long-distance dependency modeling can be combined, the characteristics in the farmland remote sensing image can be better extracted and modeled, and the segmentation accuracy is improved.

In addition, the transducer uses a self-attention mechanism in the encoder-decoder framework, can effectively capture the context information in the image, is very important for accurate segmentation results for farmland remote sensing image segmentation, and can better model the relation between pixels in the image because crops and backgrounds in the image often have wide spatial relevance, thereby improving the segmentation accuracy;

on the other hand, farmland remote sensing images usually have higher resolution, detail information needs to be accurately recovered in image segmentation, and the calculation and memory requirements of the traditional CNN decoder on the high-resolution images are higher; the invention can gradually restore the image resolution between the encoder and the decoder by introducing the Transformer layer, and reduce the consumption of calculation and memory resources, thereby more effectively processing high-resolution images.

Example 2

As shown in fig. 4, in an exemplary embodiment provided by the present disclosure, the present invention further provides a farmland remote sensing image segmentation system, the farmland remote sensing image segmentation system 50 includes:

a preprocessing module 51, the preprocessing module 51 being used for dividing M input images with labelsAnd N images without tag +.>；

A first training module 52, wherein the first training module 52 is configured to train CNN and a transducer using the tagged image data, respectively;

a second training module 53, where the second training module 53 is configured to perform weak enhancement processing of gaussian filtering and brightness adjustment on the unlabeled image, and randomly clip two new images with overlapping areas in the same image, i.e. for each of the N unlabeled imagesRandom cropping to obtain two groups of new images with overlapping areas，The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.

The model test module 54 is used for dividing the test set image by taking the trained CNN model as a backbone network and evaluating the accuracy of the result.

Example 3

As shown in fig. 5, in an embodiment of the present invention, the present invention further provides a computer device.

The computer device 60 comprises a memory 61, a processor 62 and computer readable instructions stored in the memory 61 and executable on the processor 62, which processor 62 when executing the computer readable instructions implements the farmland telemetry image segmentation method based on semi-supervised interactive learning as provided by embodiment 1.

The farmland remote sensing image segmentation method based on semi-supervised interactive learning comprises the following steps:

step S10: m input images divided with labelsAnd N images without labels；

step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled images, randomly cropping two new images with overlapping regions in the same image, i.e. for each of the N unlabeled imagesRandom cropping, two sets of new images with overlapping areas are obtained +.>，The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; pseudo-label using CNN predictors as Transformer predictorsThe signature calculates a consistency regularization loss.

In addition, the device 60 according to the embodiment of the present invention may further have a communication interface 63 for receiving a control command.

Example 4

In an exemplary embodiment provided by the present disclosure, a computer-readable storage medium is also provided.

Specifically, in an exemplary embodiment of the present disclosure, the storage medium stores computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform a farmland remote sensing image segmentation method based on semi-supervised interaction learning as provided by embodiment 1.

step S10: m input images divided with labelsAnd N images without labels；

step S30: weak enhancement processing of Gaussian filtering and brightness adjustment on unlabeled images, randomly cropping two new images with overlapping regions in the same image, i.e. for each of the N unlabeled imagesRandom cropping, two sets of new images with overlapping areas are obtained +.>，The sizes of all images are unified, meanwhile, pixels of the unlabeled images are projected between an encoder and a decoder of the CNN, a directional contrast loss function is introduced, consistency of the same identity features in the images under different scenes is guaranteed, and a transform prediction result is used as a pseudo tag to calculate context perception consistency loss; and calculating consistency regularization loss by using the CNN predicted result as a pseudo tag of the converter predicted result.

In various embodiments of the present invention, it should be understood that the size of the sequence numbers of the processes does not mean that the execution sequence of the processes is necessarily sequential, and the execution sequence of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present invention, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, comprising several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in a computer device) to execute some or all of the steps of the method according to the embodiments of the present invention.

Those of ordinary skill in the art will appreciate that some or all of the steps of the various methods of the described embodiments may be implemented by hardware associated with a program that may be stored in a computer-readable storage medium, including Read-Only Memory (ROM), random access Memory (RandomAccess Memory,11 RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (CD-ROM) or other optical disc Memory, magnetic disk Memory, tape Memory, or any other medium capable of being used to carry or store data.

The farmland remote sensing image segmentation method based on semi-supervised interactive learning disclosed by the embodiment of the invention is described in detail, and specific examples are applied to explain the principle and the implementation mode of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A farmland remote sensing image segmentation method based on semi-supervised interactive learning is characterized by comprising the following steps:

step S10: m input images divided with labelsAnd N images without labels；

2. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 1, wherein the step of performing weak enhancement processing of gaussian filtering and brightness adjustment on unlabeled images, and randomly cropping two new images with overlapping areas in the same image comprises:

（1）

finally, adjusting the brightness of the image;

3. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 2, wherein the training process of the tagged image data comprises:

4. The farmland remote sensing image segmentation method based on semi-supervised interactive learning as set forth in claim 3, wherein the loss function with the real label is calculatedThe method comprises the following steps:

with real labelsIn contrast, loss of CNN fractionFunction->As shown in formula (2):

（2）

loss function of a transducer sectionAs shown in formula (3):

（3）

（4）

（5）

wherein whenFor a real label->=1, vice versa->=0；Is a real number ranging from 0 to 1, indicating the probability that the image belongs to the category noted in the label.

5. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 4, wherein the training process of the unlabeled image data comprises:

Obtaining a prediction result through a transducer network framework and taking the prediction result as a pseudo tag of intermediate projection to calculate the context-aware consistency loss。

6. The farmland remote sensing image segmentation method based on semi-supervised interactive learning according to claim 5, characterized in that in the non-labeled image training process, the two groups of input are subjected to weak enhancementIs not a label image of (a),Generating two groups of predicted values +.>、The method comprises the steps of carrying out a first treatment on the surface of the Similarly, two sets of predictors are generated via a transducer model framework>、：

（6）

pseudo tag、、、The calculation method of (2) is shown in the formula (7):

（7）

（8）

（9）

（10）

（11）

（12）

wherein,representing a Dice loss function;Representing the classifier.

7. The method of claim 6, further comprising minimizing overall loss, initializing parameters of the classification network during training, then forward and backward propagating using training data, calculating gradients of the loss function, and updating network parameters using optimization algorithms such as gradient descent until the overall loss functionReaching a preset convergence condition;

（13）

wherein,and->Is a weight factor, with the aim of controlling the directional contrast loss +.>And consistency regularization loss->In the total loss function->Is a ratio of the number of the first and second groups.

8. The method for segmenting the farmland remote sensing image based on semi-supervised interactive learning according to claim 7, wherein the step S40 specifically comprises model testing, extracting features of the farmland remote sensing image of a given test set by using CNN as a backbone network model, segmenting and outputting class probability of each pixel point of the target image, and setting a threshold value to mark the pixel point as a segmentation target or background.

9. The method for segmenting the farmland remote sensing image based on semi-supervised interactive learning according to claim 8, wherein the step of marking the pixel point as a segmentation target or background by the set threshold value comprises the following steps: obtaining maximum gray value of image by dividing modelAnd minimum gray value->Let the initial threshold be +.>According to->Dividing an image into a foreground and a background, and respectively obtaining average gray values of the foreground and the background>And->Find a new thresholdIterating until if->The obtained value is a threshold value, the prediction probability is larger than the threshold value and marked as the foreground, and smaller than the threshold value and marked as the background, so as to obtain the final scoreAnd (5) cutting the mask. />