CN117011699A - GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof - Google Patents

GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof Download PDF

Info

Publication number
CN117011699A
CN117011699A CN202310760237.9A CN202310760237A CN117011699A CN 117011699 A CN117011699 A CN 117011699A CN 202310760237 A CN202310760237 A CN 202310760237A CN 117011699 A CN117011699 A CN 117011699A
Authority
CN
China
Prior art keywords
network
model
remote sensing
semi
resolution remote
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310760237.9A
Other languages
Chinese (zh)
Inventor
李虎
陈冬花
邹陈
汪左
张乃明
刘赛赛
叶李灶
常竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Normal University
Chuzhou University
Original Assignee
Anhui Normal University
Chuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Normal University, Chuzhou University filed Critical Anhui Normal University
Priority to CN202310760237.9A priority Critical patent/CN117011699A/en
Publication of CN117011699A publication Critical patent/CN117011699A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/188Vegetation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a GAN model-based high-resolution remote sensing image crop recognition model and a recognition method, which are applied to the technical field of remote sensing image recognition, wherein a constructed improved deep Lab V < 3+ > model can have higher recognition precision in high-resolution remote sensing wheat and rape recognition, but needs a large number of sample supports marked pixel by pixel, and further, a first three modules and a second three modules of a SegNet network are used for constructing a discrimination network; meanwhile, the Leakey ReLU activation function of the original discrimination network is reserved; accordingly, a need for sample reduction through the challenge process of segmenting and discriminating networks based on generating a semi-supervised semantic segmentation model of the challenge network is presented herein. The model is enabled to improve the crop identification precision and simultaneously reduce the requirement on the label sample.

Description

GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof
Technical Field
The invention relates to the technical field of remote sensing image recognition, in particular to a high-resolution remote sensing image crop recognition model and a recognition method based on a GAN model.
Background
Timely and accurate acquisition of crop type distribution information has important influence on agricultural production. The remote sensing data of the high-resolution land satellite can be utilized to describe the spatial distribution information of crops in more detail, so that data support is provided for adjusting planting structures, guaranteeing farmland safety, formulating reasonable grain policies and the like.
The semantic segmentation model based on deep learning needs a large number of label samples based on pixel level, and huge manpower and material resources are consumed for labeling a large number of pixel level labels; when the semantic segmentation model is applied to the field of crop identification, a large number of label samples marked pixel by pixel are needed. Because crops are planted scattered, land parcels are broken, a large amount of manpower and material resources are consumed for constructing a large number of crop label samples.
Therefore, the application of the semi-supervised semantic segmentation model based on the generation of the antagonistic neural network to crop recognition in order to reduce the need for the label sample while maintaining the same recognition accuracy as the fully-supervised semantic segmentation model is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a high-resolution remote sensing crop identification model and an identification method based on a GAN model, so as to solve the problems in the background art.
In order to achieve the above object, the present invention provides the following technical solutions:
in one aspect, a GAN model-based crop identification model for high resolution remote sensing images is disclosed, comprising:
the network is divided, and improvement on the basis of a traditional deep Lab V < 3+ > model comprises adding vegetation index characteristics into an output layer, replacing an Xreception backbone network with a MobileNet, adding a CBAM module into an up-sampling layer and an ASPP module, changing the original 16-time down-sampling into 8-time down-sampling, and simultaneously adding an up-sampling layer;
judging the network, wherein the first three modules and the last three modules of the SegNet network are used; meanwhile, the Leakey ReLU activation function of the original discrimination network is reserved;
inputting the remote sensing image into a segmentation network, and outputting a pseudo label by the segmentation network; the judging network input data comprises pseudo-annotation and real-annotation one-hot coding data generated by dividing a network; the output of the discrimination network is a confidence map representing the probability of each pixel from a true label or a pseudo label generated through the segmentation network.
Preferably, in the above-mentioned high-resolution remote sensing image crop identification model based on the GAN model, the discrimination network includes: an encoding layer and a decoding layer; the coding layer and the decoding layer are symmetrically presented; the coding layer is used for extracting features and storing pooled indexes; the decoding layer is a deconvolution and up-sampling process, and the extracted feature images are restored to the original sizes of the images through deconvolution and up-sampling; the Sigmoid is used to convert the output value of each pixel to a probability representing whether it is from a true annotation or a pseudo annotation generated via a segmentation network.
On the other hand, the invention discloses a GAN model-based high-resolution remote sensing image crop identification method, which adopts the high-resolution remote sensing crop identification model and comprises the following specific steps:
acquiring a high-resolution remote sensing image as a data set;
and training the high-resolution remote sensing crop identification model by using the data sample set through a segmentation loss function and a discrimination loss function to obtain an optimal high-resolution remote sensing crop identification model, inputting data to be detected into the high-resolution remote sensing crop identification model, and outputting an identification result.
Preferably, in the above method for identifying a crop using a high-resolution remote sensing image based on a GAN model, the calculation formula of the discriminant loss function is as follows:
Y n one-hot coding for true annotation, X n Is an unlabeled original image, S (X n ) Representing pseudo-annotations generated by the original image through the segmentation network, D (Y n ) Representing the result of the output of the genuine tag through the discrimination network, D (S (X n ) A result of the pseudo-label generated by the segmentation network being output through the discrimination network.
Preferably, in the above method for identifying a crop in a high-resolution remote sensing image based on a GAN model, the segmentation loss function includes a weighted sum of a cross entropy loss function and a contrast loss function and an additional adaptive semi-supervised loss function.
Preferably, in the above method for identifying a crop in a high-resolution remote sensing image based on a GAN model, the calculation formula of the segmentation loss function is as follows:
L s =L ceαdv L advsemi L semi
wherein,
L ce a loss function in supervised training of the segmentation network; l (L) adv To combat the loss function, the arbiter constrains the direction of updating the split network gradient, L semi Is a semi-supervised loss function, inOne-hot encoding representing generation of pseudo tags by a split network, i.e. if c is present * =argmaxS(X n ) (i,j,c) Then->T semi Is the threshold of confidence, i.e. when D (S (X n )) (i,j) >T semi When the image is detected, the pixels corresponding to (I, j) can be deceived from the discriminator, I (), which is an indication function, all the pixels which can escape from the discriminator in the image are selected through the indication function, and the segmentation network is further optimized through the area formed by the pixels; lambda (lambda) adv And lambda (lambda) semi The countering loss and the semi-supervising loss are weighted, respectively.
Compared with the prior art, the invention discloses a GAN model-based high-resolution remote sensing image crop recognition model and a GAN model-based high-resolution remote sensing image crop recognition method, errors caused by edge information loss caused by violent up-sampling in a full convolution network are effectively improved through a built discrimination network, and negative gradients can be effectively reserved by using a leakage ReLU activation function, so that forward constraint of the discrimination network on gradient descent directions in a segmentation network training process can be better promoted. Meanwhile, the constructed MyGAN network has higher crop identification precision under the condition of few samples.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a model structure of the present invention;
FIG. 2 is a block diagram of a split network according to the present invention;
FIG. 3 is a diagram of a discrimination network according to the present invention;
FIG. 4 is a schematic diagram of the maximum pooling of the present invention;
FIG. 5 is a graph of MIoU of a different discrimination network model validation set in accordance with the present invention;
FIG. 6 is a graph showing the comparison of the recognition accuracy of different discrimination network models in a test set according to the present invention;
FIG. 7 (a) is a chart showing the comparison of Baseline and GAN recognition accuracy at a 1/4 sample scale according to the present invention;
FIG. 7 (b) is a chart showing the comparison of Baseline and GAN recognition accuracy at a 1/8 sample scale according to the present invention;
FIG. 8 is a graph of comparison of GAN recognition accuracy at different sample ratios in accordance with the present invention;
FIG. 9 (a) is a graph showing the wheat recognition result at the full sample scale of the present invention;
FIG. 9 (b) is a graph showing the wheat discrimination result at a sample ratio of 1/8 according to the present invention;
FIG. 10 (a) is a graph showing the range of rape planting at the full sample scale of the present invention;
FIG. 10 (b) is a graph showing the range of rape planting at a 1/8 sample scale according to the present invention;
FIG. 11 (a) is a diagram showing the image classification result at full sample scale according to the present invention;
FIG. 11 (b) is a graph showing the image classification result at a 1/8 sample scale according to the present invention;
FIG. 12 is a graph showing the comparison of wheat and rape identification areas at different sample ratios according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a GAN model-based high-resolution remote sensing image crop identification model, which comprises the following steps as shown in figure 1:
the network is divided, and improvement on the basis of a traditional deep Lab V < 3+ > model comprises adding vegetation index characteristics into an output layer, replacing an Xreception backbone network with a MobileNet, adding a CBAM module into an up-sampling layer and an ASPP module, changing the original 16-time down-sampling into 8-time down-sampling, and simultaneously adding an up-sampling layer;
judging the network, wherein the first three modules and the last three modules of the SegNet network are used; meanwhile, the Leakey ReLU activation function of the original discrimination network is reserved;
inputting the remote sensing image into a segmentation network, and outputting a pseudo label by the segmentation network; the judging network input data comprises pseudo-annotation and real-annotation one-hot coding data generated by dividing a network; the output of the discrimination network is a confidence map representing the probability of each pixel from a true label or a pseudo label generated through the segmentation network.
In order to reduce the requirement on a label sample while improving the identification precision of crops, the automatic identification efficiency of remote sensing crops is improved. The embodiment applies the semi-supervised semantic segmentation model based on the generation of the countermeasure network to remote sensing crop identification. The constructed MyGAN model is shown in fig. 1. The network is divided into an improved deep Lab V < 3+ > network, and the best effect of identifying wheat and rape in a remote sensing image is proved through experiments. Therefore, it is used as a segmentation network for constructing the MyGAN model. In this embodiment, a full convolution network with a coding-decoding structure is used as a discriminating network of the MyGAN model, and the input data includes the pseudo-label and the real-label one-hot coded data generated by the dividing network. The output of the discrimination network is a confidence map representing the probability of each pixel from a true label or a pseudo label generated via the segmentation network.
Further, as shown in fig. 2, the dividing network firstly increases the vegetation index in the input layer, and increases the degree of distinction between the ground objects; in order to reduce the calculated amount, reduce the occupation of the memory, improve the calculation speed, change backbone network Xnaption network into a lighter-weight MobileNet V2 network; secondly, in order to reduce the loss of the edge information of wheat and rape and the blurring of the edges caused by the repeated downsampling of the model, the embodiment reduces the downsampling by 16 times to the downsampling by 8 times and increases the upsampling layer at the same time; to increase the sensitivity of the model to the wheat and oilseed rape region, a dual-attention mechanism (Convolutional Block Attention Module, CBAM) was added to the ASPP module and the upsampling layer; finally, a weighted cross entropy loss function is introduced for the problem of sample unbalance among wheat, rape and other ground objects.
Specifically, the splitting network includes: an input layer, an encoding layer, a decoding layer, and an output layer; the input layer inputs a normalized remote sensing image and a vegetation index; the coding layer extracts a multi-scale deep semantic feature map, a first shallow semantic feature map and a second shallow feature map of an input image; the decoding layer combines the deep semantic features with the first shallow semantic features, performs upper-layer sampling, and fuses the second shallow feature map to obtain a fused feature map; and carrying out size adjustment on the fusion feature map, and outputting the fusion feature map through the output layer.
Coding layer: the Mobilene V2 network is a depth separable network, and aims to extract deep semantic features of an input image through a multi-layer convolution layer and a pooling layer, and reduce the operation amount through channel-by-channel convolution and point-by-point convolution operations. The output of the mobilet V2 is divided into three, wherein two of the outputs are used for preserving the shallow layer characteristics of the ground object during up-sampling, and the output sizes are respectively 16 x 64 and 24 x 32; the other is used as input of the ASPP module, deep semantic features of multiple scales are further extracted, and the size of the output is 320 x 16.
Decoding layer: the output 256 x 16 of the ASPP module is restored to 256 x 32 through linear upper layer sample, and is combined with the 24 x 32 output by the mobilet V2 through a concentration mechanism and a shallow layer characteristic diagram obtained through a 1*1 convolution layer, and a combined result is convolved through a 3*3 convolution to obtain a fusion characteristic diagram with the output size of 256 x 32.
The obtained 256 x 32 feature map is subjected to linear upsampling to obtain 256 x 64 feature map, the 256 x 64 feature map is combined with the shallow feature map obtained by the 16 x 64 output by the mobilet V2 through the attention mechanism and the 1*1 convolution layer, and the result is subjected to a 3*3 convolution layer and an upsampling layer to obtain the feature map with the output size of 256 x 64.
In order to further optimize the above technical solution, as shown in fig. 3, the discrimination network includes: an encoding layer and a decoding layer; the coding layer and the decoding layer are symmetrically presented; the coding layer is used for extracting features and storing pooled indexes; the decoding layer is a deconvolution and up-sampling process, and the extracted feature images are restored to the original sizes of the images through deconvolution and up-sampling; the Sigmoid is used to convert the output value of each pixel to a probability representing whether it is from a true annotation or a pseudo annotation generated via a segmentation network.
The input layers of the discrimination network are divided into two classes: a. a real label encoded by one-hot; b. pseudo-labels generated via a partitioning network. The input size is: 3*128*128
Coding layer: the coding layer is divided into 3 modules. The first module and the second module each contain two convolution layers with a convolution kernel 3*3 and padding 1, and one max-pooling layer that retains the max-pooling index. Output sizes are respectively is 64 x 64 128 x 32. The third module corresponds to three convolution layers with a convolution kernel 3*3 and padding 1, and a max-pooling layer that retains the max-pooling index, with an output size of 256 x 16.
Decoding layer: the decoding layer and the encoding layer are in symmetrical structures. The decoding layer also has three modules corresponding to the encoding layer. The first module performs inverse maximum pooling by the maximum pooling index reserved by the coding layer to recover the feature map size, and then outputs 128×32 by the three convolution layers of 3*3 and padding 1. The second module and the third module firstly carry out inverse maximum pooling through the maximum pooling index reserved by the coding layer, restore the size of the characteristic diagram, and then output the size by two convolution layers with convolution kernel 3*3 and padding 1 into 64×64 and 1×128×128.
Output layer: and (3) keeping the output value range of the obtained feature map to be (0, 1) through a sigmoid function, namely indicating whether the input data is derived from a real label or a pseudo label obtained through a segmentation network, wherein the closer to 1 is the greater the probability that the input data is derived from the real label.
The discrimination network of the present embodiment is formed by making some modifications on the basis of the SegNet network. The SegNet network adopts the pooling index of the maximum pooling to carry out up-sampling, so that the problem of serious loss of edge information caused by violent pooling can be effectively reduced. The modification of this embodiment based on the SegNet model is as follows: firstly, due to the hardware limitation of a laboratory computer, the embodiment only keeps the first three modules and the last three modules of the SegNet; meanwhile, in order to avoid the occurrence of dead neurons in the training process of the model, the model judges the Leakey ReLU activation function of the network.
The discrimination network uses the mode of maximum pooling to pool in the coding layer, as shown in fig. 4, and at the same time, the index information of each pooling is reserved, and the nonlinear upsampling of the feature map is completed in the decoding layer by using the corresponding pooling index. The up-sampling of the discrimination network only needs to record the pooling index, so that the edge information of the image can be well reserved, the memory space of a computer can be saved, and the running speed of the network can be improved without training and learning in the up-sampling stage.
On the other hand, the embodiment of the invention discloses a GAN model-based crop identification method by using the high-resolution remote sensing image crop identification model, which comprises the following specific steps:
acquiring a high-resolution remote sensing image as a data set;
and training the high-resolution remote sensing crop identification model by using the data sample set through a segmentation loss function and a discrimination loss function to obtain an optimal high-resolution remote sensing crop identification model, inputting data to be detected into the high-resolution remote sensing crop identification model, and outputting an identification result.
Preferably, in the above method for identifying a crop using a high-resolution remote sensing image based on a GAN model, the calculation formula of the discriminant loss function is as follows:
Y n one-hot coding for true annotation, X n Is an unlabeled original image, S (X n ) Representing pseudo-annotations generated by the original image through the segmentation network, D (Y n ) Representing the result of the output of the genuine tag through the discrimination network, D (S (X n ) A result of the pseudo-label generated by the segmentation network being output through the discrimination network.
In order to further optimize the above solution, the segmentation loss function comprises a weighted sum of a cross entropy loss function and a contrast loss function and an additional adaptive semi-supervised loss function.
In order to further optimize the above technical solution, the calculation formula of the segmentation loss function is as follows:
L s =L ceadv L advsemi L semi
wherein,
L ce a loss function in supervised training of the segmentation network; l (L) adv To combat the loss function, the arbiter constrains the direction of updating the split network gradient, L semi Is a semi-supervised loss function, inOne-hot encoding representing generation of pseudo tags by a split network, i.e. if c is present * =argmaxS(X n ) (i,j,c) Then->T semi Is the threshold of confidence, i.e. when D (S (X n )) (i,j) >T semi When the image is detected, the pixels corresponding to (I, j) can be deceived from the discriminator, I (), which is an indication function, all the pixels which can escape from the discriminator in the image are selected through the indication function, and the segmentation network is further optimized through the area formed by the pixels; lambda (lambda) adv And lambda (lambda) semi The countering loss and the semi-supervising loss are weighted, respectively.
Setting model parameters: for the discrimination network, an Adam optimizer function is used as a parameter optimizer during training. In order to reduce the "ringing" phenomenon of the discrimination network while accelerating the convergence of the discrimination network, the present embodiment uses a dynamically attenuated learning rate, using the StepLR function provided in the pyrerch model, where the initial learning rate is set to 0.01, the step_size is set to 10, and the gamma is set to 0.1, meaning that the learning rate is reduced to 0.1 times per 10 iterations. In the training of the countermeasure network, the super parameters λadv, λsemi and Tsemi in the countermeasure network are set to 0.01, 0.2 and 0.3 respectively by continuously trying.
When semi-supervised training is performed, the labeled samples are derived from corresponding proportion data randomly selected in the sample set constructed in the embodiment, and the rest samples are used as unlabeled samples to train the model. The advantage of random selection is that by randomly selecting sample data more random, the robustness of the model can be increased. In the experiment, only the sample with the label is used for carrying out 20 times of iterative training on the model, and the sample without the label is added to train the model. The method has the advantages that the model can have certain competitiveness with the true labels through the pseudo labels generated by the segmentation network when the model performs semi-supervision (adds semi-supervision loss) training, so that parameters of the segmentation network and the discrimination network can be effectively updated.
Compared with a general semantic segmentation model, the semi-supervised semantic segmentation model based on the generation of the countermeasure network has three super parameters of lambda adv, lambda semi and Tsemi in the training process. The selection of appropriate hyper-parameters can have a significant impact on the performance of the model rows. Therefore, in this embodiment, three sets of experiments are set to determine the appropriate λadv, λsemi, tsemi respectively using the eighth labeled sample ratio by using the control variable method.
(1) Respectively fixing lambda semi to be 0.2, tsemi to be 0.3 and changing lambda adv;
(2) Respectively fixing lambda adv to be 0.01, tsemi to be 0.3 and changing lambda semi;
(3) Values of λadv of 0.01, λsemi of 0.2, and change Tsemi are fixed, respectively.
The recognition accuracy obtained by the split network on the test set when setting different super parameters is shown in table 1. From the data in the table, it can be found that the segmentation effect is best when λadv is 0.01, λsemi is 0.2, and Tsemi is 0.3. Furthermore, it can be found from the table data that when λsemi and Tsemi are fixed, the MIoU of the model and the f1_score of the rape are respectively improved from the lowest 83.68% and 81.57% to 85.23% and 83.99% along with the change of the value of λadv, which indicates that the restriction of the magnitude of the value of λadv on the gradient update direction of the split network is very important. By comparing the influence of the values of the changes Tsimi and lambda semi on the experimental precision, the influence of lambda adv on the experimental precision can be found to be the largest, and then the influence of the value of lambda semi on the experimental precision is smaller than lambda adv and Tsimi by the threshold Tsimi of the confidence coefficient. Therefore, the present embodiment sets the super parameters λadv, λsemi, tsemi for generating the countermeasure network to 0.01, 0.2, 0.3, respectively.
TABLE 1
And when different discrimination networks are used for training the split network, the constraint effect of the gradient descending direction of the split network is achieved. In the embodiment, under the condition of 1/8 sample, the following three groups of discrimination networks are respectively designed for comparative analysis by using the modified deep Lab V3+ network as a segmentation network.
Scheme one: the split network uses an improved deep Lab V3+ network, and the discrimination network uses the discrimination network constructed by the embodiment;
scheme II: the split network uses an improved DeepLab v3+ network, and the discrimination network uses a discrimination network having the same network structure as the discrimination network constructed in the present embodiment, but with an activation function of ReLU;
scheme III: the split network uses a modified deep lab v3+ network, a discrimination network proposed below. The discrimination network comprises 5 convolution layers, wherein the first four layers are downsampling layers, the channel numbers are 64, 128, 256 and 512 respectively, the convolution kernel sizes are 4*4, and the step sizes and the padding are 2; the fifth layer is an upsampling layer, which restores the image size to the original image size through upsampling.
As shown in fig. 5, when three discrimination networks are used under 1/8 sample conditions, respectively, the segmentation network verifies the precision curves on the set during the training process. As can be seen from the graph, the convergence speed of the first and second schemes and the final convergence accuracy are higher than that of the third scheme. This shows that the discrimination network using the SegNet-like structure has a better constraining effect on the segmentation network during training. Meanwhile, as can be seen from the figure, the convergence speed of the first scheme and the final convergence accuracy are higher than those of the second scheme. This shows that using a leak ReLU activation function may better facilitate training of a split network than a ReLU activation function.
The recognition accuracy of the split network on the test set using the three discrimination network conditions respectively is shown in fig. 6. Wherein, the overall precision MIoU and PA of crops obtained by a model corresponding to the scheme in the test set are 85.23% and 95.11%, respectively, and IoU and F1_score of wheat and rape are 93.58% and 96.69%, and 72.40% and 83.99%, respectively; the overall precision MIoU and PA of the crops obtained in the test set by the corresponding model of scheme two are 84.50% and 94.87%, respectively, and IoU and f1_score of wheat and rape are 93.31% and 96.54%, and 70.92% and 82.99%, respectively; the overall precision MIoU and PA of the crops obtained in the test set for the model corresponding to scheme three were 84.21% and 94.48%, respectively, ioU and f1_score for wheat and canola were 92.66% and 96.19%, and 71.54% and 83.41%, respectively;
it can be derived from the graph that the overall accuracy of the model on the test set and the identification accuracy of rape and wheat under the condition of the network are highest by using the embodiment. In the third scheme, four downsampling layers are used, and finally, the recovery of the size of the feature map is realized through one upsampling layer, so that the phenomenon of losing edge information is easy to occur in the process. The embodiment uses the encoding-decoding structure in the discrimination network, and realizes the nonlinear upsampling of the feature map by the largest pooled index, so that the error caused by the loss of the edge information due to the violent upsampling of the network used in the scheme III can be effectively improved. Meanwhile, the embodiment can effectively reserve the gradient of the negative value by using the leak ReLU activation function, so that the forward constraint of the discrimination network on the gradient descent direction in the training process of the segmentation network can be better promoted. In summary, the discriminating network designed in this embodiment can effectively improve the performance of the splitting network for crop identification.
In order to verify the feasibility of crop identification under the condition of few samples of the MyGAN network constructed in the embodiment, the MyGAN model constructed in the embodiment is respectively compared with the DeepLab V2, the constructed MyDeepLab V3+ and the existing AdvSemiSeg on the test set under the condition of 1/8 labeling samples. The devlab V2 used by the AdvSemiSeg splitting network uses a full convolution network comprising four downsampling convolution layers and one upsampling convolution layer.
It can be derived from table 2 that, in the model obtained by training under the condition of 1/8 labeled sample, the overall accuracy of MyGAN constructed in this example on the test set and the recognition accuracy of rape and wheat are the highest. Comparing the segmentation performance of the basic segmentation network of the two countermeasure networks under the condition of few samples, the MyDeepLab V3+ model constructed in the embodiment can be obtained, the recognition effect is obviously better than that of DeepLab V2, and the superiority of the segmentation network constructed in the embodiment in crop recognition is further shown. Comparing the performance of MyGAN and AdvSemiSeg constructed in this example under the condition of few samples, it can be obtained that the MyGAN recognition effect constructed in this example is significantly better than AdvSemiSeg, MIoU is improved by 2.85% on the test set, and IoU of wheat and rape are respectively improved by 1.35% and 5.36%. In summary, the MyGAN network constructed in the embodiment is used for high-resolution remote sensing crop identification under the condition of few samples, so that higher precision can be obtained.
TABLE 2
To verify the superiority of the model proposed in this example under the condition of few samples, a comparative experiment was performed under the conditions of quarter-marked samples and eighth-marked samples. The recognition accuracy of the semi-supervised semantic segmentation model using only the original network (Baseline) and using MyGAN was compared.
As shown in FIG. 7, the precision curves of the split network on the validation set are shown when training with Baseline and MyGAN under 1/4 and 1/8 label sample conditions. It can be seen from the graph that, under the proportion of quarter marked samples, after the MyGAN is added to the gradient descending direction of the segmentation network after the anti-loss and semi-supervision loss (after 20 epochs) are removed, the recognition accuracy of the model starts to be reduced, and a large-scale 'concussion' phenomenon appears, but as the iteration number increases, the accuracy of the model gradually increases, and when the convergence state is reached, the recognition accuracy of the MyGAN model is obviously superior to that of the model using only the Baseline network.
In the case of the eighth-labeled sample, myGAN uses only the gradient descent direction of the contrast loss constrained split network, the "concussion" of the model is better than using only the Baseline network. When the gradient descending direction of the anti-loss and semi-supervision loss constraint segmentation network is added at the same time, the model starts to generate a large-amplitude 'concussion' phenomenon, but gradually converges with the increase of iteration times, and the identification precision of the MyGAN model is obviously superior to that of the model which only uses a Baseline network in a converging state.
As shown in fig. 8, the precision curves of the split network over the validation set during model training using only Baseline network under fully supervised conditions and MyGAN under quarter and eighth labeled sample conditions are shown. It can be seen from the graph that when the model reaches the convergence state, the MIoU of the MyGAN model using only one-fourth of the labeled samples is highest, and the MIoU curves of the MyGAN model using only the Baseline network and only one-eighth of the labeled samples tend to agree under the full-sample condition. The method shows that on the verification set, the crop identification precision of the MyGAN model constructed by using the embodiment is highest under the condition of 1/4 sample, and the similar crop identification effect can be obtained by using the full-supervision semantic segmentation model under the condition of full sample and the MyGAN model constructed by using the embodiment under the condition of 1/8 sample.
As shown in Table 3, the accuracy of identification on the test set using Baseline alone under full annotated sample conditions versus the accuracy of identification on the test set using Baseline and MyGAN at quarter and eighth annotated sample ratios, respectively. From the data in the table, it can be derived that the accuracy of the model gradually decreases as the number of samples decreases when only the Baseline network is used. When the sample is marked in a quarter, the MIoU of the model is reduced by 1.54%, the IoU and F1_Scare of rape are respectively reduced by 3.72% and 2.68%, and the PA of the model and IoU and F1_Scare of wheat are reduced to a smaller extent within 1%. The MIoU of the model was reduced by 3.26% and the IoU and F1_Scare of canola were reduced by 6.74% and 4.92% respectively at the eighth labeling of the samples, and the PA of the model and IoU and F1_Scare of wheat were reduced by about 1%.
The recognition accuracy on the test set of the semantic segmentation model using only Baseline at a small sample scale and the semi-supervised semantic segmentation model using MyGAN is compared. When the sample is quarter marked, MIoU is increased by 2.16% by using MyGAN model, and IoU of wheat and rape are respectively increased by 1.16% and 5.30%, and IoU of MIoU and rape are obtained higher than those obtained by using Baseline network only under the condition of the whole sample. MIoU was increased by 2.86% by using MyGAN model when the sample was marked eighth, and IoU for wheat and canola was increased by 1.36% and 4.90%, respectively. Compared to the use of Baseline network alone under full sample conditions, MIoU and PA differ only by 0.4%, 0.19%, canola IoU and F1_Scare differ only by 1.84%, 1.52%, wheat IoU and F1_Scare differ only by 0.18%, 0.09%. Therefore, it can be concluded that applying the MyGAN-based semi-supervised semantic segmentation model to the crop recognition in this embodiment can improve the accuracy of crop recognition under the condition of few samples, and can achieve recognition accuracy similar to that of a full-supervision model using full-label samples when only one eighth of samples are used.
TABLE 3 Table 3
In order to further verify the feasibility of the semi-supervised method provided by the embodiment, the embodiment respectively uses the split network parameters trained by the full-supervision model and the split network parameters trained by the semi-supervision model under the condition of one eighth sample to classify the whole scene image, and calculates the areas of the identified wheat and rape.
9 (a) - (b) are graphs comparing the wheat planting ranges identified from the whole scene images, showing models obtained by training under the conditions of whole samples and one-eighth samples; FIGS. 10 (a) - (b) are graphs showing comparison of rape planting ranges identified by two models from whole-scene images; fig. 11 (a) - (b) are graphs showing the comparison of the classification results of the two models for the whole scene image. As can be seen from fig. 9 (a) - (b) and fig. 11 (a) - (b), the range distribution rules of wheat recognition are similar at the two sample ratios, and are mainly concentrated in the northwest, southwest and middle positions of the image. It can be seen from FIGS. 10 (a) - (b) and 11 (a) - (b) that the rape identification effect was similar at both sample ratios, and that the characteristics of small and scattered distribution were exhibited.
The area comparison of wheat and canola identified by the split network obtained using MyGAN with the whole sample using only the Baseline network and one-eighth the annotated sample is shown in figure 12. From the bar graph, the area of the wheat identified under the condition of one eighth sample and the total area of the crops are slightly smaller than the identification area obtained under the condition of the whole sample. Under the condition of the ratio of the two samples, the identification area of the rape is similar. Wherein the wheat identification area under the full sample condition is 185.84 square kilometers and the rape identification area is 13.27 square kilometers. The identified area of wheat under the condition of one eighth sample is 183.07 square kilometer and the identified area of rape is 13.42 square kilometer. The identified wheat area under the eighth sample condition was only 1.49% different from the identified wheat area under the full sample condition, and the identified canola area was only 1.13% different. Thus, it is further demonstrated that the semi-supervised method used in this example can achieve similar results under conditions of few samples as under conditions of full samples.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A GAN model-based high resolution remote sensing image crop identification model, comprising:
the network is divided, and improvement on the basis of a traditional deep Lab V < 3+ > model comprises adding vegetation index characteristics into an output layer, replacing an Xreception backbone network with a MobileNet, adding a CBAM module into an up-sampling layer and an ASPP module, changing the original 16-time down-sampling into 8-time down-sampling, and simultaneously adding an up-sampling layer;
judging the network, wherein the first three modules and the last three modules of the SegNet network are used; meanwhile, the Leakey ReLU activation function of the original discrimination network is reserved;
inputting the remote sensing image into a segmentation network, and outputting a pseudo label by the segmentation network; the judging network input data comprises pseudo-annotation and real-annotation one-hot coding data generated by dividing a network; the output of the discrimination network is a confidence map representing the probability of each pixel from a true label or a pseudo label generated through the segmentation network.
2. The GAN model-based high resolution remote sensing image crop identification model of claim 1, wherein said discrimination network comprises: an encoding layer and a decoding layer; the coding layer and the decoding layer are symmetrically presented; the coding layer is used for extracting features and storing pooled indexes; the decoding layer is a deconvolution and up-sampling process, and the extracted feature images are restored to the original sizes of the images through deconvolution and up-sampling; the Sigmoid is used to convert the output value of each pixel to a probability representing whether it is from a true annotation or a pseudo annotation generated via a segmentation network.
3. A method for identifying crops by using a high-resolution remote sensing image based on a GAN model, which is characterized by comprising the following specific steps of:
acquiring a high-resolution remote sensing image as a data set;
and training the high-resolution remote sensing crop identification model by using the data sample set through a segmentation loss function and a discrimination loss function to obtain an optimal high-resolution remote sensing crop identification model, inputting data to be detected into the high-resolution remote sensing crop identification model, and outputting an identification result.
4. The method for recognizing crop in high resolution remote sensing image based on GAN model as set forth in claim 3, wherein the calculation formula of said discriminant loss function is as follows:
Y n one-hot coding for true annotation, X n Is an unlabeled original image, S (X n ) Representing pseudo-annotations generated by the original image through the segmentation network, D (Y n ) Representing the result of the output of the genuine tag through the discrimination network, D (S (X n ) A result of the pseudo-label generated by the segmentation network being output through the discrimination network.
5. The GAN model based high resolution remote sensing image crop identification method of claim 4, wherein said segmentation loss function comprises a weighted sum of a cross entropy loss function and a contrast loss function and an additional adaptive semi-supervised loss function.
6. The method for recognizing crops in high-resolution remote sensing images based on a GAN model as claimed in claim 5, wherein the segmentation loss function is calculated as follows:
L s =L ceadv L advsemi L semi
wherein,
L ce a loss function in supervised training of the segmentation network; l (L) adv To combat the loss function, the arbiter constrains the direction of updating the split network gradient, L semi Is a semi-supervised loss function, inOne-hot encoding representing generation of pseudo tags by a split network, i.e. if c is present * =argmaxS(X n ) (i,j,c) Then->T semi Is the threshold of confidence, i.e. when D (S (X n )) (i,j) >T semi When the image is detected, the pixels corresponding to (I, j) can be deceived from the discriminator, I (), which is an indication function, all the pixels which can escape from the discriminator in the image are selected through the indication function, and the segmentation network is further optimized through the area formed by the pixels; lambda (lambda) adv And lambda (lambda) semi The countering loss and the semi-supervising loss are weighted, respectively.
CN202310760237.9A 2023-06-25 2023-06-25 GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof Pending CN117011699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310760237.9A CN117011699A (en) 2023-06-25 2023-06-25 GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310760237.9A CN117011699A (en) 2023-06-25 2023-06-25 GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof

Publications (1)

Publication Number Publication Date
CN117011699A true CN117011699A (en) 2023-11-07

Family

ID=88573590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310760237.9A Pending CN117011699A (en) 2023-06-25 2023-06-25 GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof

Country Status (1)

Country Link
CN (1) CN117011699A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117542105A (en) * 2024-01-09 2024-02-09 江西师范大学 Facial super-resolution and expression recognition method for low-resolution images under classroom teaching

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117542105A (en) * 2024-01-09 2024-02-09 江西师范大学 Facial super-resolution and expression recognition method for low-resolution images under classroom teaching

Similar Documents

Publication Publication Date Title
CN111259905B (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
CN109859190B (en) Target area detection method based on deep learning
US20210326656A1 (en) Panoptic segmentation
CN107526785B (en) Text classification method and device
CN114255403A (en) Optical remote sensing image data processing method and system based on deep learning
CN112287983B (en) Remote sensing image target extraction system and method based on deep learning
CN112464717A (en) Remote sensing image target detection method, system, electronic equipment and storage medium
CN117011699A (en) GAN model-based crop identification model of high-resolution remote sensing image and identification method thereof
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN114821069A (en) Building semantic segmentation method for double-branch network remote sensing image fused with rich scale features
CN113762396A (en) Two-dimensional image semantic segmentation method
CN115456043A (en) Classification model processing method, intent recognition method, device and computer equipment
CN116645592A (en) Crack detection method based on image processing and storage medium
CN115346071A (en) Image classification method and system for high-confidence local feature and global feature learning
CN115311508A (en) Single-frame image infrared dim target detection method based on depth U-type network
CN113538359B (en) System and method for finger vein image segmentation
Zhou et al. ECA-mobilenetv3 (large)+ SegNet model for binary sugarcane classification of remotely sensed images
CN113963272A (en) Unmanned aerial vehicle image target detection method based on improved yolov3
CN117671271A (en) Model training method, image segmentation method, device, equipment and medium
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
CN112149518A (en) Pine cone detection method based on BEGAN and YOLOV3 models
CN116631190A (en) Intelligent traffic monitoring system and method thereof
CN116778318A (en) Convolutional neural network remote sensing image road extraction model and method
CN116543250A (en) Model compression method based on class attention transmission
CN115205624A (en) Cross-dimension attention-convergence cloud and snow identification method and equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination