CN117011699A

CN117011699A - GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof

Info

Publication number: CN117011699A
Application number: CN202310760237.9A
Authority: CN
Inventors: 李虎; 陈冬花; 邹陈; 汪左; 张乃明; 刘赛赛; 叶李灶; 常竹
Original assignee: Anhui Normal University; Chuzhou University
Current assignee: Anhui Normal University; Chuzhou University
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-11-07

Abstract

The invention discloses a GAN model-based high-resolution remote sensing image crop recognition model and a recognition method, which are applied to the technical field of remote sensing image recognition, wherein a constructed improved deep Lab V < 3+ > model can have higher recognition precision in high-resolution remote sensing wheat and rape recognition, but needs a large number of sample supports marked pixel by pixel, and further, a first three modules and a second three modules of a SegNet network are used for constructing a discrimination network; meanwhile, the Leakey ReLU activation function of the original discrimination network is reserved; accordingly, a need for sample reduction through the challenge process of segmenting and discriminating networks based on generating a semi-supervised semantic segmentation model of the challenge network is presented herein. The model is enabled to improve the crop identification precision and simultaneously reduce the requirement on the label sample.

Description

GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof

Technical Field

The invention relates to the technical field of remote sensing image recognition, in particular to a high-resolution remote sensing image crop recognition model and a recognition method based on a GAN model.

Background

Timely and accurate acquisition of crop type distribution information has important influence on agricultural production. The remote sensing data of the high-resolution land satellite can be utilized to describe the spatial distribution information of crops in more detail, so that data support is provided for adjusting planting structures, guaranteeing farmland safety, formulating reasonable grain policies and the like.

The semantic segmentation model based on deep learning needs a large number of label samples based on pixel level, and huge manpower and material resources are consumed for labeling a large number of pixel level labels; when the semantic segmentation model is applied to the field of crop identification, a large number of label samples marked pixel by pixel are needed. Because crops are planted scattered, land parcels are broken, a large amount of manpower and material resources are consumed for constructing a large number of crop label samples.

Therefore, the application of the semi-supervised semantic segmentation model based on the generation of the antagonistic neural network to crop recognition in order to reduce the need for the label sample while maintaining the same recognition accuracy as the fully-supervised semantic segmentation model is a technical problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of the above, the present invention provides a high-resolution remote sensing crop identification model and an identification method based on a GAN model, so as to solve the problems in the background art.

In order to achieve the above object, the present invention provides the following technical solutions:

in one aspect, a GAN model-based crop identification model for high resolution remote sensing images is disclosed, comprising:

the network is divided, and improvement on the basis of a traditional deep Lab V < 3+ > model comprises adding vegetation index characteristics into an output layer, replacing an Xreception backbone network with a MobileNet, adding a CBAM module into an up-sampling layer and an ASPP module, changing the original 16-time down-sampling into 8-time down-sampling, and simultaneously adding an up-sampling layer;

judging the network, wherein the first three modules and the last three modules of the SegNet network are used; meanwhile, the Leakey ReLU activation function of the original discrimination network is reserved;

inputting the remote sensing image into a segmentation network, and outputting a pseudo label by the segmentation network; the judging network input data comprises pseudo-annotation and real-annotation one-hot coding data generated by dividing a network; the output of the discrimination network is a confidence map representing the probability of each pixel from a true label or a pseudo label generated through the segmentation network.

Preferably, in the above-mentioned high-resolution remote sensing image crop identification model based on the GAN model, the discrimination network includes: an encoding layer and a decoding layer; the coding layer and the decoding layer are symmetrically presented; the coding layer is used for extracting features and storing pooled indexes; the decoding layer is a deconvolution and up-sampling process, and the extracted feature images are restored to the original sizes of the images through deconvolution and up-sampling; the Sigmoid is used to convert the output value of each pixel to a probability representing whether it is from a true annotation or a pseudo annotation generated via a segmentation network.

On the other hand, the invention discloses a GAN model-based high-resolution remote sensing image crop identification method, which adopts the high-resolution remote sensing crop identification model and comprises the following specific steps:

acquiring a high-resolution remote sensing image as a data set;

and training the high-resolution remote sensing crop identification model by using the data sample set through a segmentation loss function and a discrimination loss function to obtain an optimal high-resolution remote sensing crop identification model, inputting data to be detected into the high-resolution remote sensing crop identification model, and outputting an identification result.

Preferably, in the above method for identifying a crop using a high-resolution remote sensing image based on a GAN model, the calculation formula of the discriminant loss function is as follows:

Y _n one-hot coding for true annotation, X _n Is an unlabeled original image, S (X _n ) Representing pseudo-annotations generated by the original image through the segmentation network, D (Y _n ) Representing the result of the output of the genuine tag through the discrimination network, D (S (X _n ) A result of the pseudo-label generated by the segmentation network being output through the discrimination network.

Preferably, in the above method for identifying a crop in a high-resolution remote sensing image based on a GAN model, the segmentation loss function includes a weighted sum of a cross entropy loss function and a contrast loss function and an additional adaptive semi-supervised loss function.

Preferably, in the above method for identifying a crop in a high-resolution remote sensing image based on a GAN model, the calculation formula of the segmentation loss function is as follows:

L _s ＝L _ce +λ _αdv L _adv +λ _semi L _semi ；

wherein,

L _ce a loss function in supervised training of the segmentation network; l (L) _adv To combat the loss function, the arbiter constrains the direction of updating the split network gradient, L _semi Is a semi-supervised loss function, inOne-hot encoding representing generation of pseudo tags by a split network, i.e. if c is present ^* ＝argmaxS(X _n ) ^(i，j，c) Then->T _semi Is the threshold of confidence, i.e. when D (S (X _n )) ^(i，j) ＞T _semi When the image is detected, the pixels corresponding to (I, j) can be deceived from the discriminator, I (), which is an indication function, all the pixels which can escape from the discriminator in the image are selected through the indication function, and the segmentation network is further optimized through the area formed by the pixels; lambda (lambda) _adv And lambda (lambda) _semi The countering loss and the semi-supervising loss are weighted, respectively.

Compared with the prior art, the invention discloses a GAN model-based high-resolution remote sensing image crop recognition model and a GAN model-based high-resolution remote sensing image crop recognition method, errors caused by edge information loss caused by violent up-sampling in a full convolution network are effectively improved through a built discrimination network, and negative gradients can be effectively reserved by using a leakage ReLU activation function, so that forward constraint of the discrimination network on gradient descent directions in a segmentation network training process can be better promoted. Meanwhile, the constructed MyGAN network has higher crop identification precision under the condition of few samples.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a model structure of the present invention;

FIG. 2 is a block diagram of a split network according to the present invention;

FIG. 3 is a diagram of a discrimination network according to the present invention;

FIG. 4 is a schematic diagram of the maximum pooling of the present invention;

FIG. 5 is a graph of MIoU of a different discrimination network model validation set in accordance with the present invention;

FIG. 6 is a graph showing the comparison of the recognition accuracy of different discrimination network models in a test set according to the present invention;

FIG. 7 (a) is a chart showing the comparison of Baseline and GAN recognition accuracy at a 1/4 sample scale according to the present invention;

FIG. 7 (b) is a chart showing the comparison of Baseline and GAN recognition accuracy at a 1/8 sample scale according to the present invention;

FIG. 8 is a graph of comparison of GAN recognition accuracy at different sample ratios in accordance with the present invention;

FIG. 9 (a) is a graph showing the wheat recognition result at the full sample scale of the present invention;

FIG. 9 (b) is a graph showing the wheat discrimination result at a sample ratio of 1/8 according to the present invention;

FIG. 10 (a) is a graph showing the range of rape planting at the full sample scale of the present invention;

FIG. 10 (b) is a graph showing the range of rape planting at a 1/8 sample scale according to the present invention;

FIG. 11 (a) is a diagram showing the image classification result at full sample scale according to the present invention;

FIG. 11 (b) is a graph showing the image classification result at a 1/8 sample scale according to the present invention;

FIG. 12 is a graph showing the comparison of wheat and rape identification areas at different sample ratios according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention discloses a GAN model-based high-resolution remote sensing image crop identification model, which comprises the following steps as shown in figure 1:

In order to reduce the requirement on a label sample while improving the identification precision of crops, the automatic identification efficiency of remote sensing crops is improved. The embodiment applies the semi-supervised semantic segmentation model based on the generation of the countermeasure network to remote sensing crop identification. The constructed MyGAN model is shown in fig. 1. The network is divided into an improved deep Lab V < 3+ > network, and the best effect of identifying wheat and rape in a remote sensing image is proved through experiments. Therefore, it is used as a segmentation network for constructing the MyGAN model. In this embodiment, a full convolution network with a coding-decoding structure is used as a discriminating network of the MyGAN model, and the input data includes the pseudo-label and the real-label one-hot coded data generated by the dividing network. The output of the discrimination network is a confidence map representing the probability of each pixel from a true label or a pseudo label generated via the segmentation network.

Further, as shown in fig. 2, the dividing network firstly increases the vegetation index in the input layer, and increases the degree of distinction between the ground objects; in order to reduce the calculated amount, reduce the occupation of the memory, improve the calculation speed, change backbone network Xnaption network into a lighter-weight MobileNet V2 network; secondly, in order to reduce the loss of the edge information of wheat and rape and the blurring of the edges caused by the repeated downsampling of the model, the embodiment reduces the downsampling by 16 times to the downsampling by 8 times and increases the upsampling layer at the same time; to increase the sensitivity of the model to the wheat and oilseed rape region, a dual-attention mechanism (Convolutional Block Attention Module, CBAM) was added to the ASPP module and the upsampling layer; finally, a weighted cross entropy loss function is introduced for the problem of sample unbalance among wheat, rape and other ground objects.

Specifically, the splitting network includes: an input layer, an encoding layer, a decoding layer, and an output layer; the input layer inputs a normalized remote sensing image and a vegetation index; the coding layer extracts a multi-scale deep semantic feature map, a first shallow semantic feature map and a second shallow feature map of an input image; the decoding layer combines the deep semantic features with the first shallow semantic features, performs upper-layer sampling, and fuses the second shallow feature map to obtain a fused feature map; and carrying out size adjustment on the fusion feature map, and outputting the fusion feature map through the output layer.

Coding layer: the Mobilene V2 network is a depth separable network, and aims to extract deep semantic features of an input image through a multi-layer convolution layer and a pooling layer, and reduce the operation amount through channel-by-channel convolution and point-by-point convolution operations. The output of the mobilet V2 is divided into three, wherein two of the outputs are used for preserving the shallow layer characteristics of the ground object during up-sampling, and the output sizes are respectively 16 x 64 and 24 x 32; the other is used as input of the ASPP module, deep semantic features of multiple scales are further extracted, and the size of the output is 320 x 16.

Decoding layer: the output 256 x 16 of the ASPP module is restored to 256 x 32 through linear upper layer sample, and is combined with the 24 x 32 output by the mobilet V2 through a concentration mechanism and a shallow layer characteristic diagram obtained through a 1*1 convolution layer, and a combined result is convolved through a 3*3 convolution to obtain a fusion characteristic diagram with the output size of 256 x 32.

The obtained 256 x 32 feature map is subjected to linear upsampling to obtain 256 x 64 feature map, the 256 x 64 feature map is combined with the shallow feature map obtained by the 16 x 64 output by the mobilet V2 through the attention mechanism and the 1*1 convolution layer, and the result is subjected to a 3*3 convolution layer and an upsampling layer to obtain the feature map with the output size of 256 x 64.

In order to further optimize the above technical solution, as shown in fig. 3, the discrimination network includes: an encoding layer and a decoding layer; the coding layer and the decoding layer are symmetrically presented; the coding layer is used for extracting features and storing pooled indexes; the decoding layer is a deconvolution and up-sampling process, and the extracted feature images are restored to the original sizes of the images through deconvolution and up-sampling; the Sigmoid is used to convert the output value of each pixel to a probability representing whether it is from a true annotation or a pseudo annotation generated via a segmentation network.

The input layers of the discrimination network are divided into two classes: a. a real label encoded by one-hot; b. pseudo-labels generated via a partitioning network. The input size is: 3*128*128

Coding layer: the coding layer is divided into 3 modules. The first module and the second module each contain two convolution layers with a convolution kernel 3*3 and padding 1, and one max-pooling layer that retains the max-pooling index. Output sizes are respectively is 64 x 64 128 x 32. The third module corresponds to three convolution layers with a convolution kernel 3*3 and padding 1, and a max-pooling layer that retains the max-pooling index, with an output size of 256 x 16.

Decoding layer: the decoding layer and the encoding layer are in symmetrical structures. The decoding layer also has three modules corresponding to the encoding layer. The first module performs inverse maximum pooling by the maximum pooling index reserved by the coding layer to recover the feature map size, and then outputs 128×32 by the three convolution layers of 3*3 and padding 1. The second module and the third module firstly carry out inverse maximum pooling through the maximum pooling index reserved by the coding layer, restore the size of the characteristic diagram, and then output the size by two convolution layers with convolution kernel 3*3 and padding 1 into 64×64 and 1×128×128.

Output layer: and (3) keeping the output value range of the obtained feature map to be (0, 1) through a sigmoid function, namely indicating whether the input data is derived from a real label or a pseudo label obtained through a segmentation network, wherein the closer to 1 is the greater the probability that the input data is derived from the real label.

The discrimination network of the present embodiment is formed by making some modifications on the basis of the SegNet network. The SegNet network adopts the pooling index of the maximum pooling to carry out up-sampling, so that the problem of serious loss of edge information caused by violent pooling can be effectively reduced. The modification of this embodiment based on the SegNet model is as follows: firstly, due to the hardware limitation of a laboratory computer, the embodiment only keeps the first three modules and the last three modules of the SegNet; meanwhile, in order to avoid the occurrence of dead neurons in the training process of the model, the model judges the Leakey ReLU activation function of the network.

The discrimination network uses the mode of maximum pooling to pool in the coding layer, as shown in fig. 4, and at the same time, the index information of each pooling is reserved, and the nonlinear upsampling of the feature map is completed in the decoding layer by using the corresponding pooling index. The up-sampling of the discrimination network only needs to record the pooling index, so that the edge information of the image can be well reserved, the memory space of a computer can be saved, and the running speed of the network can be improved without training and learning in the up-sampling stage.

On the other hand, the embodiment of the invention discloses a GAN model-based crop identification method by using the high-resolution remote sensing image crop identification model, which comprises the following specific steps:

acquiring a high-resolution remote sensing image as a data set;

In order to further optimize the above solution, the segmentation loss function comprises a weighted sum of a cross entropy loss function and a contrast loss function and an additional adaptive semi-supervised loss function.

In order to further optimize the above technical solution, the calculation formula of the segmentation loss function is as follows:

L _s ＝L _ce +λ _adv L _adv +λ _semi L _semi ；

wherein,

Setting model parameters: for the discrimination network, an Adam optimizer function is used as a parameter optimizer during training. In order to reduce the "ringing" phenomenon of the discrimination network while accelerating the convergence of the discrimination network, the present embodiment uses a dynamically attenuated learning rate, using the StepLR function provided in the pyrerch model, where the initial learning rate is set to 0.01, the step_size is set to 10, and the gamma is set to 0.1, meaning that the learning rate is reduced to 0.1 times per 10 iterations. In the training of the countermeasure network, the super parameters λadv, λsemi and Tsemi in the countermeasure network are set to 0.01, 0.2 and 0.3 respectively by continuously trying.

When semi-supervised training is performed, the labeled samples are derived from corresponding proportion data randomly selected in the sample set constructed in the embodiment, and the rest samples are used as unlabeled samples to train the model. The advantage of random selection is that by randomly selecting sample data more random, the robustness of the model can be increased. In the experiment, only the sample with the label is used for carrying out 20 times of iterative training on the model, and the sample without the label is added to train the model. The method has the advantages that the model can have certain competitiveness with the true labels through the pseudo labels generated by the segmentation network when the model performs semi-supervision (adds semi-supervision loss) training, so that parameters of the segmentation network and the discrimination network can be effectively updated.

Compared with a general semantic segmentation model, the semi-supervised semantic segmentation model based on the generation of the countermeasure network has three super parameters of lambda adv, lambda semi and Tsemi in the training process. The selection of appropriate hyper-parameters can have a significant impact on the performance of the model rows. Therefore, in this embodiment, three sets of experiments are set to determine the appropriate λadv, λsemi, tsemi respectively using the eighth labeled sample ratio by using the control variable method.

(1) Respectively fixing lambda semi to be 0.2, tsemi to be 0.3 and changing lambda adv;

(2) Respectively fixing lambda adv to be 0.01, tsemi to be 0.3 and changing lambda semi;

(3) Values of λadv of 0.01, λsemi of 0.2, and change Tsemi are fixed, respectively.

The recognition accuracy obtained by the split network on the test set when setting different super parameters is shown in table 1. From the data in the table, it can be found that the segmentation effect is best when λadv is 0.01, λsemi is 0.2, and Tsemi is 0.3. Furthermore, it can be found from the table data that when λsemi and Tsemi are fixed, the MIoU of the model and the f1_score of the rape are respectively improved from the lowest 83.68% and 81.57% to 85.23% and 83.99% along with the change of the value of λadv, which indicates that the restriction of the magnitude of the value of λadv on the gradient update direction of the split network is very important. By comparing the influence of the values of the changes Tsimi and lambda semi on the experimental precision, the influence of lambda adv on the experimental precision can be found to be the largest, and then the influence of the value of lambda semi on the experimental precision is smaller than lambda adv and Tsimi by the threshold Tsimi of the confidence coefficient. Therefore, the present embodiment sets the super parameters λadv, λsemi, tsemi for generating the countermeasure network to 0.01, 0.2, 0.3, respectively.

TABLE 1

And when different discrimination networks are used for training the split network, the constraint effect of the gradient descending direction of the split network is achieved. In the embodiment, under the condition of 1/8 sample, the following three groups of discrimination networks are respectively designed for comparative analysis by using the modified deep Lab V3+ network as a segmentation network.

Scheme one: the split network uses an improved deep Lab V3+ network, and the discrimination network uses the discrimination network constructed by the embodiment;

scheme II: the split network uses an improved DeepLab v3+ network, and the discrimination network uses a discrimination network having the same network structure as the discrimination network constructed in the present embodiment, but with an activation function of ReLU;

scheme III: the split network uses a modified deep lab v3+ network, a discrimination network proposed below. The discrimination network comprises 5 convolution layers, wherein the first four layers are downsampling layers, the channel numbers are 64, 128, 256 and 512 respectively, the convolution kernel sizes are 4*4, and the step sizes and the padding are 2; the fifth layer is an upsampling layer, which restores the image size to the original image size through upsampling.

As shown in fig. 5, when three discrimination networks are used under 1/8 sample conditions, respectively, the segmentation network verifies the precision curves on the set during the training process. As can be seen from the graph, the convergence speed of the first and second schemes and the final convergence accuracy are higher than that of the third scheme. This shows that the discrimination network using the SegNet-like structure has a better constraining effect on the segmentation network during training. Meanwhile, as can be seen from the figure, the convergence speed of the first scheme and the final convergence accuracy are higher than those of the second scheme. This shows that using a leak ReLU activation function may better facilitate training of a split network than a ReLU activation function.

The recognition accuracy of the split network on the test set using the three discrimination network conditions respectively is shown in fig. 6. Wherein, the overall precision MIoU and PA of crops obtained by a model corresponding to the scheme in the test set are 85.23% and 95.11%, respectively, and IoU and F1_score of wheat and rape are 93.58% and 96.69%, and 72.40% and 83.99%, respectively; the overall precision MIoU and PA of the crops obtained in the test set by the corresponding model of scheme two are 84.50% and 94.87%, respectively, and IoU and f1_score of wheat and rape are 93.31% and 96.54%, and 70.92% and 82.99%, respectively; the overall precision MIoU and PA of the crops obtained in the test set for the model corresponding to scheme three were 84.21% and 94.48%, respectively, ioU and f1_score for wheat and canola were 92.66% and 96.19%, and 71.54% and 83.41%, respectively;

it can be derived from the graph that the overall accuracy of the model on the test set and the identification accuracy of rape and wheat under the condition of the network are highest by using the embodiment. In the third scheme, four downsampling layers are used, and finally, the recovery of the size of the feature map is realized through one upsampling layer, so that the phenomenon of losing edge information is easy to occur in the process. The embodiment uses the encoding-decoding structure in the discrimination network, and realizes the nonlinear upsampling of the feature map by the largest pooled index, so that the error caused by the loss of the edge information due to the violent upsampling of the network used in the scheme III can be effectively improved. Meanwhile, the embodiment can effectively reserve the gradient of the negative value by using the leak ReLU activation function, so that the forward constraint of the discrimination network on the gradient descent direction in the training process of the segmentation network can be better promoted. In summary, the discriminating network designed in this embodiment can effectively improve the performance of the splitting network for crop identification.

In order to verify the feasibility of crop identification under the condition of few samples of the MyGAN network constructed in the embodiment, the MyGAN model constructed in the embodiment is respectively compared with the DeepLab V2, the constructed MyDeepLab V3+ and the existing AdvSemiSeg on the test set under the condition of 1/8 labeling samples. The devlab V2 used by the AdvSemiSeg splitting network uses a full convolution network comprising four downsampling convolution layers and one upsampling convolution layer.

It can be derived from table 2 that, in the model obtained by training under the condition of 1/8 labeled sample, the overall accuracy of MyGAN constructed in this example on the test set and the recognition accuracy of rape and wheat are the highest. Comparing the segmentation performance of the basic segmentation network of the two countermeasure networks under the condition of few samples, the MyDeepLab V3+ model constructed in the embodiment can be obtained, the recognition effect is obviously better than that of DeepLab V2, and the superiority of the segmentation network constructed in the embodiment in crop recognition is further shown. Comparing the performance of MyGAN and AdvSemiSeg constructed in this example under the condition of few samples, it can be obtained that the MyGAN recognition effect constructed in this example is significantly better than AdvSemiSeg, MIoU is improved by 2.85% on the test set, and IoU of wheat and rape are respectively improved by 1.35% and 5.36%. In summary, the MyGAN network constructed in the embodiment is used for high-resolution remote sensing crop identification under the condition of few samples, so that higher precision can be obtained.

TABLE 2

To verify the superiority of the model proposed in this example under the condition of few samples, a comparative experiment was performed under the conditions of quarter-marked samples and eighth-marked samples. The recognition accuracy of the semi-supervised semantic segmentation model using only the original network (Baseline) and using MyGAN was compared.

As shown in FIG. 7, the precision curves of the split network on the validation set are shown when training with Baseline and MyGAN under 1/4 and 1/8 label sample conditions. It can be seen from the graph that, under the proportion of quarter marked samples, after the MyGAN is added to the gradient descending direction of the segmentation network after the anti-loss and semi-supervision loss (after 20 epochs) are removed, the recognition accuracy of the model starts to be reduced, and a large-scale 'concussion' phenomenon appears, but as the iteration number increases, the accuracy of the model gradually increases, and when the convergence state is reached, the recognition accuracy of the MyGAN model is obviously superior to that of the model using only the Baseline network.

In the case of the eighth-labeled sample, myGAN uses only the gradient descent direction of the contrast loss constrained split network, the "concussion" of the model is better than using only the Baseline network. When the gradient descending direction of the anti-loss and semi-supervision loss constraint segmentation network is added at the same time, the model starts to generate a large-amplitude 'concussion' phenomenon, but gradually converges with the increase of iteration times, and the identification precision of the MyGAN model is obviously superior to that of the model which only uses a Baseline network in a converging state.

As shown in fig. 8, the precision curves of the split network over the validation set during model training using only Baseline network under fully supervised conditions and MyGAN under quarter and eighth labeled sample conditions are shown. It can be seen from the graph that when the model reaches the convergence state, the MIoU of the MyGAN model using only one-fourth of the labeled samples is highest, and the MIoU curves of the MyGAN model using only the Baseline network and only one-eighth of the labeled samples tend to agree under the full-sample condition. The method shows that on the verification set, the crop identification precision of the MyGAN model constructed by using the embodiment is highest under the condition of 1/4 sample, and the similar crop identification effect can be obtained by using the full-supervision semantic segmentation model under the condition of full sample and the MyGAN model constructed by using the embodiment under the condition of 1/8 sample.

As shown in Table 3, the accuracy of identification on the test set using Baseline alone under full annotated sample conditions versus the accuracy of identification on the test set using Baseline and MyGAN at quarter and eighth annotated sample ratios, respectively. From the data in the table, it can be derived that the accuracy of the model gradually decreases as the number of samples decreases when only the Baseline network is used. When the sample is marked in a quarter, the MIoU of the model is reduced by 1.54%, the IoU and F1_Scare of rape are respectively reduced by 3.72% and 2.68%, and the PA of the model and IoU and F1_Scare of wheat are reduced to a smaller extent within 1%. The MIoU of the model was reduced by 3.26% and the IoU and F1_Scare of canola were reduced by 6.74% and 4.92% respectively at the eighth labeling of the samples, and the PA of the model and IoU and F1_Scare of wheat were reduced by about 1%.

The recognition accuracy on the test set of the semantic segmentation model using only Baseline at a small sample scale and the semi-supervised semantic segmentation model using MyGAN is compared. When the sample is quarter marked, MIoU is increased by 2.16% by using MyGAN model, and IoU of wheat and rape are respectively increased by 1.16% and 5.30%, and IoU of MIoU and rape are obtained higher than those obtained by using Baseline network only under the condition of the whole sample. MIoU was increased by 2.86% by using MyGAN model when the sample was marked eighth, and IoU for wheat and canola was increased by 1.36% and 4.90%, respectively. Compared to the use of Baseline network alone under full sample conditions, MIoU and PA differ only by 0.4%, 0.19%, canola IoU and F1_Scare differ only by 1.84%, 1.52%, wheat IoU and F1_Scare differ only by 0.18%, 0.09%. Therefore, it can be concluded that applying the MyGAN-based semi-supervised semantic segmentation model to the crop recognition in this embodiment can improve the accuracy of crop recognition under the condition of few samples, and can achieve recognition accuracy similar to that of a full-supervision model using full-label samples when only one eighth of samples are used.

TABLE 3 Table 3

In order to further verify the feasibility of the semi-supervised method provided by the embodiment, the embodiment respectively uses the split network parameters trained by the full-supervision model and the split network parameters trained by the semi-supervision model under the condition of one eighth sample to classify the whole scene image, and calculates the areas of the identified wheat and rape.

9 (a) - (b) are graphs comparing the wheat planting ranges identified from the whole scene images, showing models obtained by training under the conditions of whole samples and one-eighth samples; FIGS. 10 (a) - (b) are graphs showing comparison of rape planting ranges identified by two models from whole-scene images; fig. 11 (a) - (b) are graphs showing the comparison of the classification results of the two models for the whole scene image. As can be seen from fig. 9 (a) - (b) and fig. 11 (a) - (b), the range distribution rules of wheat recognition are similar at the two sample ratios, and are mainly concentrated in the northwest, southwest and middle positions of the image. It can be seen from FIGS. 10 (a) - (b) and 11 (a) - (b) that the rape identification effect was similar at both sample ratios, and that the characteristics of small and scattered distribution were exhibited.

The area comparison of wheat and canola identified by the split network obtained using MyGAN with the whole sample using only the Baseline network and one-eighth the annotated sample is shown in figure 12. From the bar graph, the area of the wheat identified under the condition of one eighth sample and the total area of the crops are slightly smaller than the identification area obtained under the condition of the whole sample. Under the condition of the ratio of the two samples, the identification area of the rape is similar. Wherein the wheat identification area under the full sample condition is 185.84 square kilometers and the rape identification area is 13.27 square kilometers. The identified area of wheat under the condition of one eighth sample is 183.07 square kilometer and the identified area of rape is 13.42 square kilometer. The identified wheat area under the eighth sample condition was only 1.49% different from the identified wheat area under the full sample condition, and the identified canola area was only 1.13% different. Thus, it is further demonstrated that the semi-supervised method used in this example can achieve similar results under conditions of few samples as under conditions of full samples.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A GAN model-based high resolution remote sensing image crop identification model, comprising:

2. The GAN model-based high resolution remote sensing image crop identification model of claim 1, wherein said discrimination network comprises: an encoding layer and a decoding layer; the coding layer and the decoding layer are symmetrically presented; the coding layer is used for extracting features and storing pooled indexes; the decoding layer is a deconvolution and up-sampling process, and the extracted feature images are restored to the original sizes of the images through deconvolution and up-sampling; the Sigmoid is used to convert the output value of each pixel to a probability representing whether it is from a true annotation or a pseudo annotation generated via a segmentation network.

3. A method for identifying crops by using a high-resolution remote sensing image based on a GAN model, which is characterized by comprising the following specific steps of:

acquiring a high-resolution remote sensing image as a data set;

4. The method for recognizing crop in high resolution remote sensing image based on GAN model as set forth in claim 3, wherein the calculation formula of said discriminant loss function is as follows:

5. The GAN model based high resolution remote sensing image crop identification method of claim 4, wherein said segmentation loss function comprises a weighted sum of a cross entropy loss function and a contrast loss function and an additional adaptive semi-supervised loss function.

6. The method for recognizing crops in high-resolution remote sensing images based on a GAN model as claimed in claim 5, wherein the segmentation loss function is calculated as follows:

L _s ＝L _ce +λ _adv L _adv +λ _semi L _semi ；

wherein,