CN113469122A

CN113469122A - Deep learning based crop space-time generalization classification method and system

Info

Publication number: CN113469122A
Application number: CN202110826057.7A
Authority: CN
Inventors: 张锦水; 许晴; 潘耀忠; 段雅鸣; 陈津乐
Original assignee: Beijing Normal University
Current assignee: Beijing Normal University
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2021-10-01

Abstract

The invention relates to a crop space-time generalization classification method and a crop space-time generalization classification system based on deep learning, wherein the method comprises the following steps: collecting a plurality of remote sensing images of a set place in a set time period; obtaining real images corresponding to the remote sensing images by a manual visual interpretation method; the real image is an image marked on an agricultural crop area and a non-agricultural crop area; constructing a deep convolutional neural network; the deep convolutional neural network comprises an input layer, a residual error module, a pyramid average pooling module and an output layer which are connected in sequence; training a residual error module by adopting a natural image data set, and taking the trained residual error module as a residual error module in a deep convolutional neural network to obtain a pre-training model; taking the remote sensing images as input, and taking real images corresponding to the remote sensing images as output training pre-training models to obtain remote sensing image classification models; and carrying out image classification by using the remote sensing image classification model. The invention improves the space-time generalization of the remote sensing image classification model.

Description

Deep learning based crop space-time generalization classification method and system

Technical Field

The invention relates to the technical field of image classification, in particular to a crop space-time generalization classification method and system based on deep learning.

Background

An important premise for realizing automatic large-scale crop mapping is that the model has space-time generalization. In actual production, a large range of sample labels are difficult to collect, and the time and human resources consumed by the process can be greatly reduced by adopting a model with strong generalization capability. Spatio-temporal generalization refers to training a model with images and category labels of a certain place and a certain period, and then applying the model to other places and time periods without training data to extract the category. The generalization essence is to construct a model capable of extracting features from a limited training set, so that feature expression has good flexibility and stability. At present, the space-time generalization capability of the deep learning model to crop image classification needs to be enhanced.

Disclosure of Invention

The invention aims to provide a crop space-time generalization classification method and system based on deep learning, and the space-time generalization of an identification model is improved.

In order to achieve the purpose, the invention provides the following scheme:

a crop space-time generalization classification method based on deep learning comprises the following steps:

collecting a plurality of remote sensing images of a set place in a set time period;

obtaining real images corresponding to the remote sensing images through an artificial visual interpretation method; the real image is an image marked on an agricultural crop area and a non-agricultural crop area;

constructing a deep convolutional neural network; the deep convolutional neural network comprises an input layer, a residual error module, a pyramid average pooling module and an output layer which are connected in sequence;

training a residual error module by adopting a natural image data set, and taking the trained residual error module as a residual error module in the deep convolutional neural network to obtain a pre-training model;

training the pre-training model by taking the remote sensing images as input and the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model;

and classifying the crop area and the non-crop area of the remote sensing image to be classified by using the remote sensing image classification model.

Optionally, the training of the pre-training model by using the remote sensing images as input and using the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model specifically includes:

preprocessing the remote sensing image to obtain a surface reflectivity image, wherein the preprocessing comprises radiometric calibration, atmospheric correction, geometric correction, waveband fusion, cutting and embedding;

and training the pre-training model by taking the earth surface reflectivity images as input and the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model.

Optionally, the real images corresponding to the remote sensing images are obtained through an artificial visual interpretation method; the real image is an image marked on an agricultural crop area and a non-agricultural crop area, and specifically comprises the following steps:

and manually drawing a crop area on the ground surface reflectivity image by utilizing Arcmap software to obtain a real image corresponding to the ground surface reflectivity image.

Optionally, the acquiring a plurality of remote sensing images of a set place in a set time period specifically includes:

and collecting remote sensing images of set sites, set years, set crop seeding periods and remote sensing images of set growth periods.

Optionally, the residual error module includes 10 residual error structure blocks connected in sequence, and each residual error structure block is used for feature extraction; and the expansion convolutional layer is connected behind the last 3 residual error structure blocks in the 10 sequentially connected residual error structure blocks.

Optionally, the pyramid average pooling module includes a first convolution layer, an upsampling layer, a second convolution layer, and a plurality of average pooling layers of different scales; the average pooling layers with different scales are connected with the input of the first convolution layer, the output of the first convolution layer is connected with the input of the up-sampling layer, and the input of the up-sampling layer is connected with the second convolution layer.

Optionally, the output layer comprises a Softmax classifier by which the input feature image is classified.

Optionally, the natural image dataset comprises an IMAGENET dataset.

Optionally, the training the pre-training model by using the surface reflectance image as an input and using the real image corresponding to each remote sensing image as an output to obtain a remote sensing image classification model specifically includes:

a random hierarchical grouping method is collected, and 1/7 data are extracted from a training set to serve as verification data; the training set comprises a plurality of remote sensing images and the real images corresponding to the remote sensing images;

and taking the earth surface reflectivity image as input, taking the real image corresponding to each remote sensing image as output, updating the weight in the pre-training model by adopting a random gradient algorithm, repeatedly carrying out forward-backward propagation calculation according to a preset fixed learning rate and a preset capacity of each batch, iteratively optimizing the pre-training model, and determining the optimal weight through a minimum cross entropy loss function value.

The invention discloses a crop space-time generalization classification system based on deep learning, which comprises:

the remote sensing image acquisition module is used for acquiring a plurality of remote sensing images of a set place within a set time period;

the sample marking module is used for obtaining real images corresponding to the remote sensing images through an artificial visual interpretation method; the real image is an image marked on an agricultural crop area and a non-agricultural crop area;

the network model building module is used for building a deep convolutional neural network; the deep convolutional neural network comprises an input layer, a residual error module, a pyramid average pooling module and an output layer which are connected in sequence;

the network model pre-training module is used for training the residual error module by adopting a natural image data set, and the trained residual error module is used as the residual error module in the deep convolutional neural network to obtain a pre-training model;

the network model training module is used for training the pre-training model by taking the remote sensing images as input and the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model;

and the remote sensing image classification module is used for classifying the crop area and the non-crop area of the remote sensing image to be classified by utilizing the remote sensing image classification model.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the method, a natural image data set is adopted to train a residual error module in the deep convolutional neural network, the trained residual error module is used as the residual error module in the deep convolutional neural network to obtain a pre-training model, then a remote sensing image sample is sampled to train the pre-training model to obtain a remote sensing image classification model, and therefore the space-time generalization of the remote sensing image classification model is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a deep learning-based crop spatio-temporal generalization classification method according to the present invention;

FIG. 2 shows GF1PMS surface reflectance images and winter wheat sample distribution in experiment area in year 2018 and 2020 of the example of the present invention;

FIG. 3 is a schematic diagram of a deep convolutional neural network model framework according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating classification results of different test sets according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating the spatial distribution of the classification results of the spatial generalization subset according to the present invention;

FIG. 6 is a diagram illustrating the spatial distribution of the classification results of the temporal generalization subset according to the present invention;

FIG. 7 is a diagram illustrating the spatial distribution of the classification results of the spatio-temporal generalized subsets according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating comparison of classification results according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of OA results for each prototype on different test sets for DCNN and RF in accordance with an embodiment of the present invention;

FIG. 10 is a schematic diagram of the PA results of the DCNN and RF for each prototype on different test sets according to an embodiment of the present invention;

FIG. 11 is a schematic diagram illustrating UA results of DCNN and RF on different test sets for each prototype according to an embodiment of the present invention;

FIG. 12 is a graph showing the F1 score results for each sample on different test sets for DCNN and RF in accordance with an embodiment of the present invention;

FIG. 13 is a schematic structural diagram of a deep learning-based crop spatio-temporal generalization classification system.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Fig. 1 is a schematic flow chart of a deep learning-based crop spatio-temporal generalization classification method of the present invention, and as shown in fig. 1, a deep learning-based crop spatio-temporal generalization classification method includes:

step 101: and collecting a plurality of remote sensing images of the set place in the set time period.

Wherein, step 101 specifically includes:

Step 102: obtaining real images corresponding to the remote sensing images by a manual visual interpretation method; the real image is an image in which an agricultural crop region and a non-agricultural crop region are marked.

The real image is a vector diagram.

After preprocessing the remote sensing image, converting a marked sample vector (a real image) into raster data with the same spatial resolution, and then sequentially carrying out wave band combination with a seeding period image and a growth vigorous period image to obtain a training set containing 8 wave bands; (2) images in the training set are sliced from left to right and from top to bottom by a python program, the overlapping rate is 50%, and the size of each data slice is 256 pixels by 256 pixels. 968 slices were finally obtained. When model training is performed, image slices are input.

Wherein, step 102 specifically comprises:

and manually drawing the crop area on the ground surface reflectivity image by utilizing Arcmap software to obtain a real image corresponding to the ground surface reflectivity image.

Step 103: constructing a deep convolutional neural network; the deep convolutional neural network comprises an input layer, a residual error module, a pyramid average pooling module and an output layer which are sequentially connected.

The residual error module comprises 10 residual error structure blocks which are connected in sequence, and each residual error structure block is used for feature extraction; and the expansion convolutional layer is connected behind the last 3 residual error structure blocks in the 10 sequentially connected residual error structure blocks.

The pyramid average pooling module comprises a first convolution layer, an up-sampling layer, a second convolution layer and a plurality of average pooling layers with different scales; the average pooling layers with different scales are connected with the input of the first convolution layer, the output of the first convolution layer is connected with the input of the upper sampling layer, and the input of the upper sampling layer is connected with the second convolution layer.

The output layer comprises a Softmax classifier, and the input feature images are classified through the Softmax classifier.

Step 104: and training the residual error module by adopting a natural image data set, and taking the trained residual error module as a residual error module in the deep convolutional neural network to obtain a pre-training model.

The natural image dataset comprises an IMAGENET dataset.

Step 105: and training a pre-training model by taking the remote sensing images as input and the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model.

Wherein, step 105 specifically comprises:

and taking the earth surface reflectivity image as input, and taking the real image corresponding to each remote sensing image as an output training pre-training model to obtain a remote sensing image classification model.

The method comprises the following steps of taking a surface reflectivity image as an input, taking a real image corresponding to each remote sensing image as an output training pre-training model, and obtaining a remote sensing image classification model, wherein the method specifically comprises the following steps:

a random hierarchical grouping method is collected, and 1/7 data are extracted from a training set to serve as verification data; the training set comprises a plurality of remote sensing images and real images corresponding to the remote sensing images;

and (3) taking the earth surface reflectivity image as input, taking the real image corresponding to each remote sensing image as output, updating the weight in the pre-training model by adopting a random gradient algorithm, repeatedly carrying out forward-backward propagation calculation according to a preset fixed learning rate and a preset capacity of each batch, iteratively optimizing the pre-training model, determining an optimal weight through a minimum cross entropy loss function value, and carrying out prediction classification on the test set.

The test set comprises a space generalization set, a time generalization set and a space-time generalization set.

A spatially generalized set refers to a set of data that is the same time that the data was acquired as in the training set, but is spatially distinct. A temporal generalization set refers to a set of data that is spatially identical to the data collected in the training set, but temporally different. The space-time generalization set refers to a data set which is different in space and time from the data acquired by the training set.

Step 106: and classifying the crop area and the non-crop area of the remote sensing image to be classified by using the remote sensing image classification model. I.e. identify the crop area in the remote sensing image.

The following describes a crop space-time generalization classification method based on deep learning according to a specific embodiment of the present invention.

The study area of this example is south and district of the central and south of Hebei province and Julu county, the range is 114 deg. 36 '-115 deg. 12' E, 36 deg. 55 '-37 deg. 25' N, and the total area is 1036km². The plain of North China in the south and district and Julu county is flat and open, the altitude is 83m and 1m respectively at the highest and lowest, and the average altitude is about 27 m. Belongs to the temperate continental season and wind climate, is clear in four seasons and is in the same season as rain and heat, and winter wheat is a main food crop.

According to the growth phenological stage of winter wheat (table 1), winter wheat is sown in 10 months (stage i) every year, and the winter wheat grows vigorously in 3-5 months (stage ii) the next year. In the research, GF1PMS remote sensing images (table 2) covering an experimental area from 10 months in 2018 to 5 months in 2020 are selected, and the obtained remote sensing images are good in quality and low in cloud amount, so that a space-time generalization experiment is facilitated. The images from 10 months in 2018 to 3 months in 2019 are used for identifying the winter wheat in 2019, and the data from 10 months in 2019 to 4 months in 2020 is used for identifying the winter wheat in 2020.

TABLE 1 Experimental region winter wheat growth climate table

TABLE 2 Experimental area image time table

Preprocessing GF1PMS imagery using ENVI (The Environment for visualization Images, complete remote sensing image processing platform), including: radiometric calibration, atmospheric correction, geometric correction, band fusion of 2m panchromatic spectrum and 8m multispectral wave bands by using a Pan shaping algorithm, cutting and inlaying by using administrative boundaries and the like. Then, a ground surface reflectivity image with a spatial resolution of 2m is obtained, the image comprises 4 bands of blue (0.45-0.52um), green (0.52-0.59um), red (0.63-0.69um) and near red (0.77-0.89 um), and the data type is 16-bit unsigned integer.

The actual data of the present embodiment is obtained by interpreting the high-resolution image by human eyes. 3 square squares of 5km × 5km were randomly selected in each experimental area, as shown in fig. 2, S1, S2 and S3 were located in the south and district, and T1, T2 and T3 were located in julu county. By manually observing the GF1PMS 2m earth surface reflectivity images of the I stage and the II stage, the Arcmap software is used for manually delineating the winter wheat cultivated land every year. Other types of land cover than wheat ploughing within the sample are considered "other" types. From fig. 2, it can be seen that the S1, S2, T1 and T2 wheat distributions are more sporadic; the wheat cultivated lands S3 and T3 are distributed and concentrated, and almost fill the whole sample. Therefore, the land cover data set with unbalanced different areas is beneficial to evaluating the spatial generalization performance of the model. The crop rotation system and the conversion of cultivated land-non-cultivated land cause the change of the land coverage pattern between the years, the phenomenon also exists in the research area, and as seen from the table 3, the planting area of winter wheat in the T2 sample is obviously reduced by 12.34 percent during the years of 2019 and 2020, while the change of other samples is not obvious. And identifying and monitoring the annual change of winter wheat planting by using the image data of 2018-year 2019-year and 2019-year 2020-year so as to verify the time generalization capability of the model.

TABLE 3 Total annual crop type coverage statistics for each individual sample

Selecting two-stage images and corresponding real data of 2020 winter wheat of 3 sample prescriptions in the south and middle regions as training data and labeled samples, and taking the rest 3 groups of image data as a test data set: the method comprises the following steps of identifying 2020-year winter wheat in the Julu county in a space generalization test set, identifying 2019-year winter wheat in the Nanhe and the district in a time-space generalization test set, and identifying 2019-year winter wheat in the Julu county. Before entering the DCNN model, the data needs further processing: (1) converting the marked sample vector into raster data with the same spatial resolution, and then combining the raster data with the seeding stage image and the growth vigorous stage image in sequence to obtain a training set containing 9 wave bands; (2) the training set was sliced from left to right and top to bottom using the python program with an overlap rate of 50% and each slice size of 256 x 256 pixels. 968 slices were finally obtained.

The Deep convolutional Neural Network model is constructed on the basis of a pyramid scene pooling Network (PSPNet) and is suitable for remote sensing identification of crops. The PSPNet model can fuse multi-level features, retain local spatial information of ground objects, realize pixel-level semantic segmentation, have strong representation capability and improve classification accuracy. The residual error structure block is used as a feature extractor of the model, the depth of the model is increased, and meanwhile, the calculated amount is reduced, so that multi-level feature extraction is realized, the problem that the accuracy rate is reduced along with the deepening of the network in other networks is solved, and the effect of optimizing the model is achieved; the pyramid pooling module can realize seamless splicing of features on different scales to complete matching of context information. On the basis of the original PSPNet model, the DCNN model is improved: (1) the residual module is added with an expansion convolutional layer, 10 residual structure blocks are used for excavating deep abstract features of the image, the expansion convolutional layer is added to the last 3 residual structure blocks in the 10 residual structure blocks, the perceptibility of the model to semantic information around pixels is improved, and the deep features are prevented from losing the spatial relationship among the pixels; the output of each residual error structure block is connected with one expansion convolution layer, the outputs of the first two expansion convolution layers are connected with the next residual error structure block, and the output of the expansion convolution layer connected with the last residual error structure block is connected with the pyramid average pooling module; (2) a pyramid average pooling module is adopted, and context information of deep-layer features is combined by 4 average pooling layers with different scales, so that the positioning accuracy of the target pixel is improved; (3) the input channel is adjusted to be eight channels, and the input module of the model is correspondingly adjusted by considering that the remote sensing images of 8 wave bands in 2 periods are used in the embodiment, so that the traditional three channels are changed into eight channels.

The DCNN model framework is shown in fig. 3, and mainly includes 4 parts: (1) data input (input layer): training set slices with 256 × 256 pixels; (2) the residual error module consists of 10 residual error structure blocks and is used for feature extraction; (3) the pyramid average pooling module is composed of 4 average pooling layers with different scales and used for combining context information; (4) output (output layer): and training a Softmax classifier by using the features extracted by the DCNN model, and inputting the images to be classified into the trained Softmax classifier for classification. The model structure parameter settings are shown in table 4 below. The DCNN model operating platform of this embodiment is a buffer framework (relational Architecture for fast feature Embedding).

TABLE 4 DCNN model structural parameters Table

And for the training of the DCNN model, performing network learning on the DCNN model by adopting a strategy of fine tuning the pre-training model. The fine tuning pre-training network model refers to the optimization training of a model which is trained on a natural image data set in advance by using a remote sensing image data set. Compared with a natural image data set (such as IMAGENET data set) containing more than 1000 ten thousand samples, the number of the existing public remote sensing satellite image sets (such as RSSCN7 data set, UC Merceded data set and WHU-RS19 data set) is quite small, only thousands of samples exist, and a good deep learning model is difficult to train. The network model is trained by utilizing the image fine tuning in advance, and the training effect of the model is better than that of the model which is trained from the beginning by directly using the remote sensing image, because the model after fine tuning can better learn the internal characteristics of the satellite image.

Migrating a pre-trained ResNet-50 model from an IMAGENET training set as a residual error of a DCNN modelAnd the module inputs a training set to train the DCNN model, and sets the hyper-parameters by referring to a learning strategy of Krizhevsky in an ILSVRC-2012 game. In order to prevent over-training, 1/7 data are extracted from the training set as verification data of the DCNN model by adopting a random hierarchical grouping method. Updating the weight value by adopting a random gradient descent method (SGD), wherein the learning rate attenuation strategy is fixed, and the fixed learning rate is 1 multiplied by 10^-10And repeatedly performing forward-backward propagation calculation and iteratively optimizing the DCNN model, wherein the capacity of each batch is 16. Further, the momentum and the weight decay constant were set to 0.9 and 5 × 10, respectively^-4The weight updating of each iteration can be smoothed, and the stability of the network learning is enhanced. In order to avoid model overfitting caused by small training data volume, the model parameters pre-trained in the ImageNet classification data set are used as the initialization weight of the DCNN. And finally, finding the optimal weight through the minimum cross entropy Loss function value Loss, and performing prediction classification on the test set.

Where N is the batch size, i.e., equal to 16; c is the total number of categories, and this example C is 2; a isⁱThe probability of each class label i for the input image.

In order to verify the space-time generalization performance of the DCNN model (a trained module, namely a remote sensing image classification model), a random forest algorithm is selected as a comparison model. Random Forest (RF) is a classifier that builds bagging integration using decision trees as base learners. The random attribute selection is introduced into the decision tree training process by the RF, and the difference between individual learners is increased through sample disturbance and attribute disturbance, so that the generalization of the model is improved. RF can process a large amount of data, avoids the over-fitting phenomenon, and has a small calculation overhead, thus being widely used. RF is often used as a base model for research as one of the most successful ensemble learning methods in the remote sensing land cover classification task.

In this embodiment, as a comparison model of the DCNN model, the RF model and the DCNN model use the same set of training set and test data. Before entering the model, the training set needs further processing: (1) converting the marked sample into point data, and extracting a spectral value of an image wave band by using the point location; (2) and forming a one-dimensional characteristic vector by the label on the sample book and the corresponding spectral value. Thus, a total of 16800 point samples were obtained, with winter wheat and other types 9600, 7200 respectively. In training the RF model, the settings of 2 hyper-parameters need to be considered: the number of features F at the best segmentation point and the number of decision trees T. F is generally set to the square root of the number of features. For T, according to the study of RODRIGUEZ-GALIANO and the like, it is found that when T reaches 100, a very low convergence value of the generalization error is achieved, and therefore T is set to 100 in this embodiment. The other parameters are default values. The RF model calls the Sciket-left library (skleern) to run on the Python platform.

The performance of the model was evaluated by comparing 3 sets of test data predicted by the model with the corresponding real surface data. Using the Overall Accuracy (OA), the User Accuracy (UA), the drawing accuracy (PA), and the F1 score (F1 score) as the indices of the evaluation method:

wherein i denotes the category, n denotes the total number of categories, P_iiIndicates the number of correctly classified pixels, N indicates the total number of reference data pixels, N_rNumber of pixels whose result is classified as i-th class, N_tRefers to the number of picture elements of the ith type in the reference data.

The spatial distribution of winter wheat was extracted from 3 test sets using the trained DCNN and RF models, respectively. Fig. 4 shows the overall classification results, and it can be seen that 2 models can identify the general range of winter wheat, but intuitively, the DCNN model predicts significantly more winter wheat over 3 test sets than the RF model. Table 5 shows a confusion matrix of 2 models for each test set classification result, the numerical value in each column represents the proportion of the model classification category to the real data, and the numerical value in each row represents the proportion of the real data category to the classification result. From table 5, it can be found that the DCNN model has better space-time generalization performance, the average proportion of the winter wheat which is misclassified into other categories is 0.037, which is significantly improved and reduced by 0.264 compared with the RF model.

TABLE 5 Classification precision evaluation confusion matrix for different test sets

The precision evaluation results in table 6 were calculated using the confusion matrix. From the results, the DCNN model has more stable high-precision results, OA and F1 score of the DCNN model are both higher than 0.90, compared with RF model is lower than 0.90. For OA, the DCNN model outperforms the RF model, with an average of up to 0.947 over 3 different spatio-temporal test sets, with a standard deviation of only 0.023, with corresponding values for the RF model of 0.800 and 0.062, respectively; for the classification of winter wheat, the DCNN model F1 score mean was 0.937 higher than the RF model by 0.200, the standard deviation was 0.032, lower than the RF model by 0.078. The DCNN model can keep high precision on the classification results of the 3 different space-time generalization sets, is obviously higher than the result of the RF model on the whole, and the RF model has a sudden drop condition on the classification accuracy on the space-time generalization set, so that the classification stability is poorer than that of the DCNN model.

TABLE 6 Classification precision evaluation indexes of different test sets

FIG. 5 is a diagram showing the spatial distribution of the classification results of the spatially generalized subsets, and (a) and (b) in FIG. 5 are diagrams showing the classification results of the DCNN model and the corresponding error spatial distribution with reference to the real data; (c) and (d) the classification result of the RF model and the corresponding error space distribution diagram.

In fig. 5 (a) and (c), dark gray is winter wheat and light gray is other areas; (b) and (d) middle dark gray for correctly classified regions and light gray for incorrectly classified regions.

FIG. 6 is a spatial distribution of the classification results of the time-generalized subsets, and (a) and (b) in FIG. 6 are a DCNN model classification result and a corresponding error spatial distribution diagram with reference to real data; (c) and (d) the classification result of the RF model and the corresponding error space distribution diagram.

In fig. 6 (a) and (c), dark gray is winter wheat and light gray is other regions; (b) and (d) middle dark gray for correctly classified regions and light gray for incorrectly classified regions.

FIG. 7 is a space distribution of classification results of the spatio-temporal generalization subset, and (a) and (b) in FIG. 7 are a DCNN model classification result and a corresponding error space distribution diagram with reference to real data; (c) and (d) the classification result of the RF model and the corresponding error space distribution diagram.

In fig. 7 (a) and (c), dark gray is winter wheat and light gray is other regions; (b) and (d) middle dark gray for correctly classified regions and light gray for incorrectly classified regions.

Through the visual comparison of each sub-area, the classification performance of the DCNN model is superior to that of the RF model, and the phenomenon of salt and pepper noise of the shallow layer model based on the pixel classification can be avoided. And respectively carrying out visual comparison on the test sample result of each generalized set predicted by the 2 models and corresponding real data to obtain a graph 5, a graph 6 and a graph 7. Comparing the error spatial distribution maps of the DCNN model and the RF model, it can be found that the errors of the classification results of the DCNN on the 3 generalization sets are mainly distributed at the edges of the land parcels, and the more the discretely crushed land parcels are concentrated, the more the errors are distributed, such as T2 samples of the spatial generalization set and the spatial-temporal generalization set. The error distribution of the RF model prediction result also shows the same rule, and simultaneously, a serious salt and pepper noise phenomenon occurs, and concentrated and high-brightness error classification pixels (fig. 5, 6 and 7) occur in all the samples on each generalization set. Due to spectrum confusion among multiple land coverage types in the remote sensing image and spectrum diversity of the same type, the method is a problem which is difficult to avoid by a simple pixel-by-pixel classification method. The DCNN model in this embodiment rarely has such a situation, and can predict a complete winter wheat plot by using convolution operation and considering the spatial relationship between each pixel and surrounding pixels. On the spatio-temporal generalization classification, the DCNN model is more stable than the RF model. The DCNN model can completely extract the winter wheat distribution on 3 generalization sets, the RF model can accurately identify the winter wheat on a space generalization set and a space-time generalization set, but a serious 'missing separation phenomenon' occurs on the space-time generalization set, most of winter wheat pixels are classified into other categories, and therefore the RF model lacks stability in the classification of cross-time and cross-space regions.

Fig. 8 is a comparison of classification results of 2km × 2km sub-regions, and a black frame region in the first two columns of images in fig. 8 represents a sub-region where the land coverage type changes widely; the black frame area in the third column of images represents the distribution of real winter wheat, the black frame area in the fourth column of images represents the classification result of the DCNN model, and the black frame area in the fifth column of images represents the place where the RF model has the phenomenon of salt and pepper noise.

To further verify that the classification performance of the DCNN model is stronger than that of the RF model, OA, winter wheat PA, winter wheat UA and winter wheat F1 score of each sample classification result of the DCNN model and the RF model are compared one by one, and fig. 9, fig. 11, fig. 12 and fig. 13 are obtained. In each sample, OA and PA of the DCNN model are higher than those of the RF model, and particularly on a space-time generalization set, the precision phase difference between the two models is the largest; the backsight UA is that the RF model is slightly higher than the DCNN model, but the difference is not obvious compared with the previous 2 indexes, so for F1 score, the DCNN model still maintains higher accuracy than the RF model. The DCNN model can keep stable high precision in all the samples, the maximum value of precision variance between different empty data sets is only 0.047 in the PA index, and the RF model corresponding to the index is as high as 0.124. The DCNN model is proved to have higher accuracy and stability than the RF model on the space-time generalized classification.

In order to describe the stability and accuracy of the DCNN model in the process of space-time generalization classification, the classification process of the DCNN model is understood by visualizing the feature expressions of convolution layers at different stages of different space-time generalization subsets. From the aspect of classification results, under the condition of keeping the same training set, the DCNN model has stronger space-time generalization performance than the traditional machine learning method. For the classification process, the process of learning the valid features gradually on different empty datasets by the DCNN model can be explained by visualizing the feature maps of the convolutional layers at different stages. The sub-regions with the soil coverage type changing in time and space are selected from the 3 generalization sub-regions in the above figure 8 as the change process of the learned characteristics of the input display model on each layer, and the sizes of the input image and the output characteristic diagram are 256 × 256 pixels. The land cover types between the sub-regions and the training set have time or space difference, so that the space-time generalization capability of the model can be better evaluated. And visualizing the feature graphs of the designated convolutional layers corresponding to the sub-regions by using deconvolution operation, wherein the process is only used for visualizing the features extracted by the trained DCNN model, and the model is not trained again.

Based on the research purpose of the space-time generalization of the deep learning model, firstly, a test set is divided into a space generalization set, a time generalization set and a space-time generalization set, wherein the space-time generalization set comprises 3 different space-time scale data sets; secondly, introducing a construction process of a convolutional neural network model DCNN, explaining the mechanism theory of fine tuning training and pre-training, explaining the parameter setting details of model training, then selecting a prediction model according to a loss function, and respectively identifying 3 generalization sets to obtain a winter wheat spatial distribution result; and comparing the classification result with the traditional classification method RF model, and performing precision evaluation and result analysis. On the whole, the proportion of wrongly divided winter wheat of the DCNN model is low, and both OA and F1 score of the winter wheat are higher than 0.90, so that the result precision is higher than that of the RF model; and the standard deviation of the accuracy of the DCNN model between 3 different space-time scale generalization sets is lower than that of the RF model, so that the DCNN model has stronger stability. Through the visual comparison of each generalization set region, the classification performance of the DCNN model is superior to that of the RF model, the phenomenon of salt and pepper noise of the shallow layer model based on pixel classification can be avoided, and the generalization capability is good. In order to describe the stability and the accuracy of the DCNN model in the space-time generalization classification process, the classification process of the DCNN model is understood by visualizing the feature expressions of convolution layers of different space-time generalization subsets at different stages, and the DCNN model is proved to have the capability of capturing the features of semantics, detail features and the like, and the features can represent the category attribution features and the detail features of shape, position, boundary and the like of winter wheat crops, so that the model has stable and strong generalization capability in different space-time test sets.

Fig. 13 is a schematic structural diagram of a deep learning-based crop spatio-temporal generalization classification system of the present invention, and as shown in fig. 13, a deep learning-based crop spatio-temporal generalization classification system includes:

the remote sensing image acquisition module 201 is used for acquiring a plurality of remote sensing images of a set place within a set time period;

the sample marking module 202 is used for obtaining real images corresponding to the remote sensing images through a manual visual interpretation method; the real image is an image marked on an agricultural crop area and a non-agricultural crop area;

the network model construction module 203 is used for constructing a deep convolutional neural network; the deep convolutional neural network comprises an input layer, a residual error module, a pyramid average pooling module and an output layer which are connected in sequence;

a network model pre-training module 204, configured to train a residual module with a natural image data set, and use the trained residual module as a residual module in a deep convolutional neural network to obtain a pre-training model;

the network model training module 205 is used for training a pre-training model by taking the remote sensing images as input and the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model;

the remote sensing image classification module 206 is configured to classify the remote sensing image to be classified into an crop region and a non-crop region by using the remote sensing image classification model.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A crop space-time generalization classification method based on deep learning is characterized by comprising the following steps:

2. The crop space-time generalization classification method based on deep learning of claim 1, wherein the pre-training model is trained by taking the remote sensing images as input and the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model, specifically comprising:

3. The crop space-time generalization classification method based on deep learning of claim 2, wherein said actual images corresponding to each of said remote sensing images are obtained by a manual visual interpretation method; the real image is an image marked on an agricultural crop area and a non-agricultural crop area, and specifically comprises the following steps:

4. The deep learning-based crop space-time generalization classification method according to claim 1, wherein the acquiring of the plurality of remote sensing images of the set place in the set time period specifically comprises:

5. The deep learning-based crop space-time generalized classification method according to claim 1, wherein the residual error module comprises 10 sequentially connected residual error structure blocks, and each residual error structure block is used for feature extraction; and the expansion convolutional layer is connected behind the last 3 residual error structure blocks in the 10 sequentially connected residual error structure blocks.

6. The deep learning based crop spatio-temporal generalization classification method of claim 1, wherein the pyramid average pooling module comprises a first convolution layer, an upsampling layer, a second convolution layer and a plurality of average pooling layers of different scales; the average pooling layers with different scales are connected with the input of the first convolution layer, the output of the first convolution layer is connected with the input of the up-sampling layer, and the input of the up-sampling layer is connected with the second convolution layer.

7. The deep learning-based crop spatio-temporal generalization classification method of claim 1, wherein the output layer comprises a Softmax classifier by which the input feature images are classified.

8. The deep learning based crop spatio-temporal generalization classification method of claim 1, wherein the natural image dataset comprises an IMAGENET dataset.

9. The deep learning-based crop space-time generalization classification method according to claim 1, wherein the pre-training model is trained by taking the surface reflectance images as input and the real images corresponding to the remote sensing images as output to obtain a remote sensing image classification model, specifically comprising:

10. A crop space-time generalization classification system based on deep learning is characterized by comprising: