CN115984635A

CN115984635A - Multi-source remote sensing data classification model training method, classification method and electronic equipment

Info

Publication number: CN115984635A
Application number: CN202310273286.XA
Authority: CN
Inventors: 王建步; 高云浩; 朱文博; 马元庆; 胡亚斌; 秦华伟; 宋秀凯; 李伟; 宋莎莎; 隋傅; 王玮云
Original assignee: Beijing Institute of Technology BIT; First Institute of Oceanography MNR; Shandong Marine Resource and Environment Research Institute
Current assignee: Beijing Institute of Technology BIT; First Institute of Oceanography MNR; Shandong Marine Resource and Environment Research Institute
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-04-18
Anticipated expiration: 2043-03-21
Also published as: CN115984635B

Abstract

The application provides a training method and a classification method of a multi-source remote sensing data classification model and electronic equipment, and belongs to the technical field of image processing, wherein the classification model comprises a feature generation network and a surface feature classifier, the feature generation network is used for extracting common features and specific features of the multi-source wetland remote sensing data, and the training method comprises the following steps: performing at least one round of alternate training on the discriminator and the feature generation network, and minimizing a discriminator loss function in a discriminator training stage to enable the discriminator to perform modal classification on common features; and in the feature generation network training stage, the loss function of the maximum discriminant is maximized, and the common features and the specific features are linearly independent. The common characteristic can not be distinguished by the modal classifier through counterstudy, and meanwhile, the specific characteristic and the common characteristic are linearly independent and complementary with the advantage of the specific characteristic, so that the redundancy of the multi-source characteristic is avoided to a certain degree.

Description

Multi-source remote sensing data classification model training method, classification method and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a multi-source remote sensing data classification model training method, a classification method and electronic equipment.

Background

Along with the emission of the high-resolution train satellite, the remote sensing data reserves are increasingly abundant, and favorable conditions are provided for carrying out coastal wetland remote sensing monitoring. Remote sensing data often has a variety of heterogeneous images, such as spectral images, lidar images, synthetic aperture radar images, and the like. Taking the spectral Image as an example, the spectral Image may include a Hyperspectral Image (HSI) and a Multispectral Image (MSI), wherein the Hyperspectral Image is capable of providing spectral information from the visible spectrum to the infrared spectrum. Therefore, the hyperspectral image can reflect the material characteristics of the observed object, and fine identification of the ground object is facilitated. However, in order to meet the signal-to-noise ratio requirement, the spatial resolution of the hyperspectral imaging system is relatively low. In contrast, the spectral resolution of the multispectral image is relatively low, and the spatial texture information is more abundant, so that the multispectral image can be complementary with the hyperspectral image. Therefore, when massive remote sensing data is put into application, automatic, rapid and large-range multi-source remote sensing data needs to be classified in a collaborative mode urgently.

In recent years, many researches related to a multi-source remote sensing data collaborative classification method are started, the researches are mainly divided into a data fusion method and a feature fusion method, and the feature fusion method becomes the hottest field at present in consideration of the limitation of the data fusion method on heterogeneous data. Particularly, in recent years, the feature fusion method based on deep learning can obtain the dominant features of multi-source data through feature automatic learning, and finally perform feature fusion through feature stacking, addition or attention mechanism algorithm. However, the multi-source remote sensing data has high redundancy, so that negative gain is caused, and accurate classification of wetland ground objects is difficult to realize.

Therefore, how to improve the precision of wetland ground feature classification becomes an urgent technical problem to be solved.

Disclosure of Invention

In order to solve the technical problem of how to improve the precision of wetland ground object classification in the prior art set forth in the background art, the application provides a multi-source remote sensing data classification model training method, a classification method and electronic equipment.

According to a first aspect, an embodiment of the present application provides a training method for a classification model of multi-source remote sensing data, where the classification model includes a feature generation network and a surface feature classifier, the feature generation network is used to extract common features and specific features of the multi-source wetland remote sensing data, and the training method includes: performing at least one round of alternate training on a discriminator and the feature generation network to obtain a trained feature generation network, wherein each round of alternate training comprises at least one stage of discriminator training and at least one stage of feature generation network training, and a discriminator loss function is minimized in the stage of discriminator training, so that the discriminator can perform modal classification on common features, and the distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the discriminator in a training stage of the feature generation network until the common feature generated by the feature generation network cannot be subjected to modal classification by the discriminator, and the common feature and the specific feature are linearly independent.

Optionally, the training samples of the discriminators in the round of alternating training are the common features and the specific features extracted by the feature generation network after the previous round of alternating training is completed.

Optionally, the process of training the discriminator for the first time includes: extracting initial common characteristics and initial specific characteristics of the multi-source wetland remote sensing data by using an initial characteristic generation network; and training the discriminator by utilizing the initial common characteristic and the initial specific characteristic, so that the trained discriminator can carry out modal classification on the initial common characteristic, and the distance between the common characteristic and the specific characteristic is smaller than a preset distance.

Optionally, the performing at least one round of alternating training on the discriminator and the feature generation network includes: minimizing a loss function of a modal classifier at a training stage of a discriminator so that the modal classifier can carry out modal classification on common features, and simultaneously minimizing a mean square error loss function so that the distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the modal classifier in a feature generation network training stage until the common feature generated by the feature generation network cannot be subjected to modal classification by the modal classifier trained in the current round, and simultaneously maximizing the mean square error loss function until the common feature and the specific feature are linearly independent.

Optionally, the training method of the multi-source remote sensing data classification model further includes: constructing a characteristic reconstruction network, wherein the characteristic reconstruction network is used for reconstructing the common characteristic and the specific characteristic to the original characteristic of the multi-source wetland remote sensing data to obtain a reconstructed common characteristic and a reconstructed specific characteristic; and when the discriminator and the feature generation network are alternately trained for at least one round, the reconstruction network is utilized for constraint, so that the common features and the specific features extracted by the feature generation network can reconstruct the original features of the multi-source wetland remote sensing data.

Optionally, obtaining the trained feature generation network further includes: and training the ground feature classifier by using the common characteristics and the specific characteristics extracted by the trained characteristic generation network to obtain the trained ground feature classifier.

According to a second aspect, an embodiment of the present application provides a multi-source remote sensing data classification method, including: acquiring multi-source remote sensing data to be classified; and classifying the multi-source remote sensing data to be classified by using the multi-source remote sensing data classification model trained by the multi-source remote sensing data classification model training method in any one of the first aspects to obtain a classification result.

According to a third aspect, an embodiment of the present application provides a training apparatus for a multi-source remote sensing data classification model, including: the network generation module is used for constructing a feature generation network, and the feature generation network is used for generating common features and specific features of the multi-source wetland remote sensing data; the training module is used for carrying out at least one round of alternate training on the discriminator and the feature generation network to obtain the trained feature generation network, wherein each round of alternate training comprises at least one stage of discriminator training and at least one stage of feature generation network training, and a discriminator loss function is minimized in the stage of discriminator training, so that the discriminator can carry out modal classification on common features, and the distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the discriminator in the training stage of the feature generation network until the common characteristic generated by the feature generation network can not be subjected to modal classification by the discriminator, and the common characteristic and the specific characteristic are linearly independent.

According to a fourth aspect, an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus, and the memory is used for storing a computer program; the processor is configured to execute the training method for the multi-source remote sensing data classification model according to any one of the first aspect and/or the multi-source remote sensing data classification method according to the second aspect by operating the computer program stored in the memory.

According to a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the storage medium stores a computer program, and the computer program is configured to execute the multi-source remote sensing data classification model training method according to any one of the first aspect and/or the multi-source remote sensing data classification method according to the second aspect when running.

The method comprises the steps of constructing a feature generation network for extracting specific features and common features, performing at least one round of alternate training on a discriminator and the feature generation network, minimizing a discriminator loss function in a discriminator training stage, enabling the discriminator to perform modal classification on the common features, and enabling the distance between the common features and the specific features to be smaller than a preset distance; the method comprises the steps of maximizing a loss function of a discriminator at a characteristic generation network training stage until common characteristics generated by the characteristic generation network cannot be subjected to modal classification by the discriminator, linearly and independently considering data characteristics of a hyperspectral image and a multispectral image with the common characteristics and the specific characteristics, pertinently constructing a specific and common characteristic generation network, ensuring that the common characteristics cannot be distinguished by a modal classifier through a maximum-minimum counterstudy strategy, and simultaneously ensuring that the specific characteristics and the common characteristics are linearly independent, namely the common characteristics do not contain modal related information and complement the advantages of the specific characteristics, so that redundancy of the multisource characteristics is avoided to a certain extent.

Further, a reconstruction network is designed to reconstruct the common characteristics and the specific characteristics, the integrity of the common characteristics and the specific characteristics is kept, and shortcut optimization of a network model is avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of a hardware environment of a multi-source remote sensing data classification model training method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a multi-source remote sensing data classification model training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a spatio-spectral feature generation network architecture in an embodiment of the present application;

FIG. 4 is a schematic diagram of a modal classifier/reconstruction network architecture in an embodiment of the present application;

FIG. 5 is another multi-source remote sensing data classification model training method according to an embodiment of the present application;

FIG. 6 is a method for training a classification model of multi-source remote sensing data added to a reconstruction network according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a training apparatus for a multi-source remote sensing data classification model according to an embodiment of the present application;

fig. 8 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings, in which the same reference numerals indicate the same or structurally similar but functionally identical elements.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein and, therefore, the scope of the present invention is not limited by the specific embodiments disclosed below.

In the following description, various aspects of the invention will be described, but it will be apparent to those skilled in the art that the invention may be practiced with only some or all of the structures or processes of the invention. Specific numbers, configurations and sequences are set forth in order to provide clarity of explanation, but it will be apparent that the invention may be practiced without these specific details. In other instances, well-known features have not been set forth in detail in order not to obscure the invention.

Based on this, the application provides a multi-source remote sensing data classification model training method, which can be applied to a hardware environment formed by a terminal 102 and a server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal 102 through a network, which may be used to provide services for the terminal or a client installed on the terminal, may be provided with a database on the server or independent from the server, may be used to provide data storage services for the server 104, and may also be used to handle cloud services, and the network includes but is not limited to: the terminal 102 is not limited to a PC, a mobile phone, a tablet computer, etc. the terminal may be a wide area network, a metropolitan area network, or a local area network. The multi-source remote sensing data classification model training method based on the counterstudy in the embodiment of the application can be executed by the server 104, the terminal 102, or both the server 104 and the terminal 102. The terminal 102 may also be executed by a client installed thereon to execute the multi-source remote sensing data classification model training method of the embodiment of the present application.

Taking the terminal 102 and/or the server 104 to execute the multi-source remote sensing data classification model training method in this embodiment, the classification model training method is a multi-source remote sensing data classification model training method based on counterstudy as an example, fig. 2 is a schematic flow chart of an optional multi-source remote sensing data classification model training method according to an embodiment of the present application, and as shown in fig. 2, the flow of the method may include the following steps:

s202, constructing a feature generation network, wherein the feature generation network is used for generating common features and specific features of the multi-source wetland remote sensing data. As an exemplary embodiment, the multi-source wetland remote sensing data can comprise a plurality of heterogeneous image data. For example, spectral images, lidar images, synthetic aperture radar images. In this embodiment, the classification model includes a feature generation network and a surface feature classifier, and in this embodiment, the feature generation network is trained.

In this embodiment, a space-spectrum feature generation network may be used to extract, for example, the space-spectrum feature generation network includes 2N groups of extraction networks, where N is the source number of the wetland remote sensing data, where N groups of extraction networks are used to extract the specific features, and in addition, N groups of feature generation networks are used to extract the common features, and the common features generate weight sharing of the networks.

As an exemplary embodiment, the spatio-spectral feature generation network structure shown in fig. 3 may include a plurality of residual blocks, and the structure of each residual block may be input layer → first convolution layer → batch normalization layer → second convolution layer → batch normalization layer → output layer.

After the hyperspectral and multispectral images are obtained, the periphery of a pixel point of a corresponding spatial position in the input hyperspectral and multispectral images is taken

Is used to represent the null spectrum information of the central pixel, will ≥>

As an input stacked image block of the spatio-spectral feature generation network. Wherein it is present>

An odd number of value not less than 5 is taken, thereby obtaining an image block of hyper-and multispectral data, based on the image block, based on the hyper-spectral data and the multispectral data, based on the image block, and based on the image block, based on the image block>

And &>

Respectively representing the input stacked image blocks.

And sequentially performing convolution and batch normalization operations for M times on the stacked image blocks to obtain the specific features and the common features. Illustratively, the common features may be initial common features extracted through the feature generation network.

As an exemplary embodiment, the multi-source wetland remote sensing data can be illustrated by taking a hyperspectral image and a multispectral image as examples. In this embodiment, a feature generation network may be first constructed to extract the common features and the specific features in the hyperspectral image and the multispectral image. Illustratively, the surrounding of the pixel points corresponding to the spatial locations in the input hyperspectral and multispectral images is taken

The neighborhood pixels of (a) are used to represent spatial spectral information of the central pixel, thereby obtaining an image block of hyper-and multi-spectral data,

and &>

Input stacked image blocks representing a hyperspectral image and a multispectral image, respectively.

Performing convolution operation on the stacked image blocks to realize the matching of channel dimensions, wherein the specific calculation process is as follows:

；

wherein, the first and the second end of the pipe are connected with each other,

and &>

For output characteristics, W1 is a weight matrix of a convolution layer when the input stacked image blocks of the hyperspectral image are convolved, and W2 is a weight matrix of a convolution layer when the input stacked image blocks of the hyperspectral image are convolved; b1 is the offset of the convolution layer when convolving the input stacked image blocks of the hyperspectral image,b2to manyThe offset of the convolution layer when the input stacked image blocks of the spectral image are convolved.

The features of the hyperspectral image and the multispectral image extracted by the feature generation network may be:

；

wherein the content of the first and second substances,

and &>

Parameters which respectively represent a specific feature generating network>

And &>

Represents a characteristic of specificity for the hyperspectral and multispectral modes, respectively>

And &>

Parameters respectively representing a commonality feature generating network, in conjunction with a plurality of feature generation networks>

And &>

Respectively representing the common features of the hyperspectral and multispectral modalities.

And S204, performing at least one round of alternate training on the discriminator and the feature generation network to obtain the trained feature generation network.

Each round of alternate training comprises at least one discriminant training stage and at least one feature generation network training stage, wherein a discriminant loss function is minimized in the discriminant training stage, so that the discriminant can carry out modal classification on common features, and the distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the discriminator in the training stage of the feature generation network until the common characteristic generated by the feature generation network can not be subjected to modal classification by the discriminator, and the common characteristic and the specific characteristic are linearly independent.

As an exemplary embodiment, the input features of the initial or pre-trained feature generation network are multi-source remote sensing data, so that the extracted common features tend to have modal information, for example, the common features extracted for the hyperspectral image and the multispectral image tend to have a hyperspectral modality and a multispectral modality, and the correlation between the common features and the specific features tends to be high. It is difficult to extract true common features and specific features that are linearly independent of common features. Therefore, in this embodiment, the training process of the feature generation network is constrained by using the classifier, where the trained classifier may be modal information for classifying the common feature, and the distance between the common feature and the specific feature may also be smaller than a preset distance, that is, the distance between the common feature and the specific feature is close, that is, the correlation is high.

Thus, training of the feature generation network may be aided by the discriminators. And performing at least one round of alternate training on the discriminator and the feature generation network, wherein each round of alternate training comprises at least one stage of training the discriminator and at least one stage of training the feature generation network. For example, the training process may be: the method comprises a discriminator training stage, a feature generation network training stage, \8230andan alternate training stage of one turn, and is repeated for multiple times, so that the common feature mode extracted by the finally obtained feature generation network is irrelevant, namely the common feature mode cannot be classified by a preposed discriminator, and the obtained common feature and the specific feature are linearly independent.

Constructing a feature generation network for extracting specific features and common features, performing at least one round of alternate training on a discriminator and the feature generation network, minimizing a discriminator loss function in a discriminator training stage, enabling the discriminator to perform modal classification on the common features, and enabling the distance between the common features and the specific features to be smaller than a preset distance; the method comprises the steps of maximizing a loss function of a discriminator at a characteristic generation network training stage until common characteristics generated by the characteristic generation network cannot be subjected to modal classification by the discriminator, linearly and independently considering data characteristics of a hyperspectral image and a multispectral image with the common characteristics and the specific characteristics, pertinently constructing a specific and common characteristic generation network, ensuring that the common characteristics cannot be distinguished by a modal classifier through a maximum-minimum counterstudy strategy, and simultaneously ensuring that the specific characteristics and the common characteristics are linearly independent, namely the common characteristics do not contain modal related information and complement the advantages of the specific characteristics, so that redundancy of the multisource characteristics is avoided to a certain extent.

The specific training method will be described in detail below, referring to the training process of the classification model shown in fig. 4:

in this embodiment, the classifier may include a mode classifier for classifying the common feature modes and a mean square error loss function for constraining the distance between the common feature and the specific feature, and in this embodiment, in the training phase of the classifier, the loss function of the mode classifier is minimized until the mode label of the mode classifier, which can distinguish the common feature, is trained; in the training stage of the feature generation network, the loss function of the modal classifier is maximized until the common features extracted through the feature generation network are trained and cannot be classified by the modal classifier.

Similarly, for the mean square error loss function, in the training stage of the discriminator, the mean square error loss function is minimized until the distance between the common characteristic and the specific characteristic is close, namely the correlation is higher; and in the training stage of the feature generation network, the mean square error loss function is maximized until the common feature and the specific feature extracted by the feature generation network are linearly independent. In this embodiment, the expression that the common features and the specific features are linearly independent can be understood as that the similarity of the common features and the specific features is smaller than a preset value or the distance is larger than a preset value. The preset value can be determined based on the task, or can be determined manually. In the present embodiment, no limitation is made.

Illustratively, the training samples of the discriminator in the current alternating training are the common features and the specific features extracted by the feature generation network after the previous alternating training is completed. That is, in one round of alternating training, the common features extracted from the feature generation network as the training result cannot be assigned by the classifier (modal classifier) of the training round, and the common features and the specific features are linearly independent.

After entering the current training round, training a discriminator by using the common characteristic and the specific characteristic extracted by the feature generation network trained in the previous round, so that the discriminator can distinguish the common characteristic which is irrelevant to the mode and the common characteristic and the specific characteristic which are independent in linearity, which are extracted by the feature generation network trained in the previous round, namely, the common characteristic which is irrelevant to the mode and extracted by the feature generation network trained in the previous round can be subjected to mode classification, the common characteristic and the specific characteristic which are independent in linearity and extracted by the feature generation network trained in the previous round are close in distance, and then the discriminator trained in the current round is used as the constraint of the feature generation network training to train the feature generation network at least once, so that the common characteristic and the specific characteristic output by the feature generation network trained in the current round can not be classified by the discriminator in the current round, and the common characteristic and the specific characteristic are independent in linearity again, and cycle sequentially until the preset alternate training is achieved or the preset training effect is achieved.

As an illustrative example, after an initial feature generation network is established, extracting initial common features and initial specific features of multi-source wetland remote sensing data by using the initial feature generation network; and training the discriminator by utilizing the initial common characteristic and the initial specific characteristic, so that the trained discriminator can carry out modal classification on the initial common characteristic, and the distance between the common characteristic and the specific characteristic is smaller than a preset distance.

As an exemplary embodiment, a modality classifier as shown in fig. 5 may be constructed for the modality classification of the common features, and the structure of the modality classifier is as follows: input layer → first fully connected layer → second fully connected layer → output layer, taking hyper-spectral image and multi-spectral image as examples, the output dimension of the modal classifier is 2.

The commonality feature of the hyper-spectral modality is labeled as label 0 and the commonality feature of the multi-spectral modality is labeled as label 1, so the computation process of the loss function of the modality classifier is as follows:

；

wherein, the first and the second end of the pipe are connected with each other,mrepresentation modality classifier

Based on the one-hot coding of the modal signature of (4), based on the result of the comparison>

To pass the mode classifier->

The obtained prediction probability of the commonality characteristic of the hyperspectral modality>

To pass the mode classifier->

The predicted probability of the commonality feature of the multispectral modality obtained, log, represents the base 10 logarithmic operation.

As an exemplary embodiment, after the modality classifier is determined, the feature generation network is trained secondarily based on the parameters of the modality classifier, and in the process, the inspection function of the modality classifier is used as a constraint to maximize the loss function of the modality classifier, so that the common features generated by the feature generation network in the training process are gradually modal-independent.

As an exemplary embodiment, the modality classifier serves as a discriminator in the countermeasure network, and since the trained modality classifier determines modality information of the hyperspectral image and the multispectral image, and the common features generated by the feature generation network are modality-independent, common features and specific features can be obtained by classifying modalities, that is, the modality-independent features are the common features, and the modality-dependent features are the specific features. Therefore, the loss function of the discriminator can be determined based on the loss function of the modality classifier, and the discriminator can accurately classify the common features and the specific features of different images.

And optimizing the generating loss function and the discriminant loss function, alternately maximizing the generating loss function and minimizing the discriminant loss function, and optimizing the model parameters by using a random gradient descent algorithm.

And extracting the common characteristic and the specific characteristic by using the optimized characteristic generation network, and training the surface feature classifier to obtain a trained surface feature classification model. And using the test data set for testing the optimized network model, and obtaining a ground object classification result through a ground object classifier. Inputting a test data set into the optimized network model, generating common characteristics and specific characteristics of multi-source data, performing category prediction by using a surface feature classifier after stacking operation, wherein the output dimensionality of the surface feature classifier is the surface feature category quantity, and the specific calculation process is as follows:

；

wherein the content of the first and second substances,pis a predicted probability vector of a ground object class, concat (. Cndot.) represents the stacking of features along the channel dimension, F _con In order to be a feature after the stacking,δ(. Cndot.) is a Softmax function, W represents a weight matrix and a probability vector of a full connection layer in the ground object classifierpThe position with the largest value is the classification result of the current data,

indicating the common characteristics after integration.

And combining the label information of the training data set and the test data set to obtain a final ground object type image.

As an exemplary embodiment, as an optional embodiment, in order to further distinguish the common characteristic and the specific characteristic, and ensure that the common characteristic and the specific characteristic are linearly independent, in this embodiment, the common characteristic and the specific characteristic need to be effectively distinguished when determining the discriminant loss function.

In order to ensure that the specific features and the common features are linearly independent, in this embodiment, a similarity algorithm is used to perform independent constraint on the common features and the specific features, for example, a euclidean distance algorithm, a cosine distance algorithm, a pearson correlation coefficient, and other methods may be used to perform independent constraint, and in this embodiment, a mean square error may also be used as a method for measuring independence.

In this embodiment, the constraint is performed by using the mean square error loss, and the calculation process of the mean square error loss function is as follows:

；

wherein the content of the first and second substances,

and &>

And &>

Respectively representing the common characteristics of the hyperspectral modality and the multispectral modality; />

Represents->

Norm>

An example of the norm is calculated as follows: />

；

Wherein, the first and the second end of the pipe are connected with each other,nis a vectorx,yThe total number of pixels.

By means of the constraint of the loss of the mean square error, the complementarity of the common characteristic and the specific characteristic can be ensured.

As an exemplary embodiment, in order to ensure the integrity of data, construct a feature reconstruction network for reconstruction of common information and specific information into original features, and ensure the integrity of features to avoid the random growth of common features and specific features, for example, see fig. 6:

and S602, constructing a feature reconstruction network. As an exemplary embodiment, a feature reconstruction network is constructed for reconstructing common information and specific information to original features, and the integrity of the features is ensured; the feature reconstruction network is composed of two fully connected layers, as shown in fig. 3, the structure of the reconstruction network may be: the input layer → the first full connection layer → the second full connection layer → the output layer, and the output dimension of the reconstruction network is the characteristic dimension of the image block output after the convolution operation. Reconstructing the common characteristics and the specific characteristics, wherein the calculation process of characteristic reconstruction is as follows:

；

wherein the content of the first and second substances,

and &>

Represents reconstructed features of hyperspectral and multispectral modalities, respectively, concat (·) represents features stacked along the channel dimension, and/or->

And &>

The common characteristics after the integration are shown,σ(. Represents)ReLUActivation function, W _d1 Weight matrix, W, representing the first fully-connected layer in the reconstructed network _d2 A weight matrix representing a second fully connected layer in the reconstructed network.

S604, determining a reconstruction loss function of the reconstructed network based on the integrity of the reconstruction characteristics. Designing a reconstruction loss function to keep the feature integrity, wherein the reconstruction loss function is calculated as follows:

；

wherein the content of the first and second substances,

represents->

Norm>

An example of the norm is calculated as follows:

；

wherein the content of the first and second substances,nis a vectorx,yThe total number of pixels.

And S606, when the discriminator and the feature generation network are alternately trained for at least one round, utilizing the loss function of the reconstruction network to carry out constraint, so that the common feature and the specific feature extracted by the feature generation network can reconstruct the original feature of the multi-source wetland remote sensing data.

As an exemplary embodiment, in order to ensure that the features generated by the feature generation network avoid shortcut optimization of the network model during subsequent training of the network model, in this embodiment, a loss function constraint also needs to be reconstructed for the generation loss function of the feature generation network.

In order to generate features by the feature generation network, which can satisfy the classification task, the feature generation network needs to be adjusted by referring to the substrate classification task of the surface feature classification model, and the generation loss function needs to be constrained by the classification loss function of the surface feature classification model, which is exemplified by:

the common characteristics and the specific characteristics are classified through a ground feature classifier, the wetland ground feature classification task is optimized through a classification loss function, and the calculation process of the classification loss function is as follows:

；

wherein a represents a modality classifier

Indicates passage of the modality classifier->

And obtaining the prediction probability of the characteristics of the multisource wetland remote sensing data extracted by the characteristic generation network.

As an exemplary embodiment, the generation loss function and the discriminant loss function are designed for the counterlearning, and are calculated as follows:

；

and &>

Respectively representing a generative loss function and a discriminant loss function.

And using the test data set for testing the optimized network model, and obtaining a ground object classification result through a ground object classifier.

Inputting a test data set into the optimized network model, generating common characteristics and specific characteristics of multi-source data, performing category prediction by using a surface feature classifier after stacking operation, wherein the output dimensionality of the surface feature classifier is the surface feature category quantity, and the specific calculation process is as follows:

；

indicating the common characteristics after integration.

The following describes the method for constructing the classification model in detail with reference to the method for constructing the classification model shown in fig. 5:

step 1: and constructing a space-spectrum characteristic generation network to extract common information and specific information of the multi-source remote sensing data, and constructing a modal classifier for modal classification of the multi-source data.

Step 1.1: taking around the pixel point corresponding to the spatial position in the input hyperspectral and multispectral images

Is used to represent spatial spectrum information for the central pixel, based on the spatial spectrum information of the central pixel>

And &>

Respectively representing the input stacked image blocks.

Step 1.2: performing convolution operation on the image blocks to realize the matching of channel dimensions, wherein the specific calculation process is as follows:

；

and &>

For output characteristics, W1 is a weight matrix of a convolution layer when the input stacked image blocks of the hyperspectral image are convolved, and W2 is a weight matrix of a convolution layer when the input stacked image blocks of the hyperspectral image are convolved; b1 is the offset of the convolution layer when convolving the input stacked image blocks of the hyperspectral image,b2an offset of the convolution layer when convolving the input stacked image blocks of the multispectral image.

Step 1.3: four groups of space-spectrum characteristic generation networks with the same structure are constructed, each characteristic generation network consists of three residual blocks, and the structures of the residual blocks are as follows: input layer → first convolution layer → batch normalization layer → second convolution layer → batch normalization layer → output layer. The method comprises the following steps that two groups of feature generation networks are used for extracting specific features of multi-source data, the other two groups of feature generation networks are used for extracting common features, and weights of the common feature generation networks are shared, wherein the calculation process of feature extraction is as follows:

；

wherein the content of the first and second substances,

and &>

Parameters which respectively represent a specific feature generating network>

And &>

Characteristic features representing respectively a hyperspectral and multispectral mode, based on the measured values>

And &>

Parameters which respectively represent a common characteristic generation network, are present>

And &>

Step 1.4: constructing a modal classifier for classifying the modes of the common characteristics, wherein the structure of the modal classifier is as follows: input layer → first fully connected layer → second fully connected layer → output layer, the output dimension of the modality classifier is 2 (number of data modalities).

And 2, step: and constructing a feature reconstruction network for reconstructing the common information and the specific information to the original features, and ensuring the integrity of the features.

Step 2.1: constructing a characteristic reconstruction network, wherein the characteristic reconstruction network consists of two full-connection layers, and the reconstruction network structure is as follows: input layer → first full-link layer → second full-link layer → output layer, the output dimension of the reconstructed network is the characteristic dimension output in step 1.2

。

Step 2.2: reconstructing the common characteristics and the specific characteristics, wherein the calculation process of characteristic reconstruction is as follows:

；

wherein the content of the first and second substances,

and &>

And &>

Step 2.3: designing a reconstruction loss function to keep the feature integrity, wherein the reconstruction loss function is calculated as follows:

；

wherein the content of the first and second substances,

represents->

Norm>

An example of the norm is calculated as follows: />

；

And step 3: and designing a generating loss function and a discriminating loss function to optimize parameters of the network model.

Step 3.1: performing modal classification on the common features of the hyperspectral modality and the multispectral modality in the step 1.3 by using a modal classifier, wherein the common feature of the hyperspectral modality is marked as a label 0, and the common feature of the multispectral modality is marked as a label 1, so that the calculation process of the loss function of the modal classifier is as follows:

；

wherein the content of the first and second substances,mrepresentation modality classifier

To pass the mode classifier->

Obtaining a prediction probability of a commonality characteristic of a hyperspectral modality>

To pass the mode classifier->

A predicted probability of the common features of the obtained multispectral modalities is obtained, log representing a base-10 logarithmic operation.

Step 3.2: in order to ensure the linear independence of the common characteristic and the specific characteristic, the mean square error loss is used for constraint, and the calculation process of the mean square error loss function is as follows:

；

wherein the content of the first and second substances,

and &>

And &>

Represents->

Norm>

An example of the norm is calculated as follows:

；

Step 3.3: the common characteristic and the specific characteristic are classified through a ground feature classifier, the wetland ground feature classification task is optimized through a classification loss function, and the calculation process of the classification loss function is as follows:

；

wherein a represents a modality classifier

One-hot encoding of the modal label of (4), based on the status of the device>

Indicates passage of the modality classifier->

Step 3.4: designing a generating loss function and a discriminating loss function for counterstudy, wherein the generating loss function and the discriminating loss function are calculated as follows:

；

wherein the content of the first and second substances,

and &>

Step 3.5: and optimizing the generation loss function and the discriminant loss function, alternately minimizing the generation loss function and maximizing the discriminant loss function, and optimizing the model parameters by using a random gradient descent algorithm.

And 4, step 4: and using the test data set for testing the optimized network model, and obtaining a ground object classification result through a ground object classifier.

Step 4.1: inputting a test data set into the optimized network model, generating common characteristics and specific characteristics of multi-source data, performing category prediction by using a surface feature classifier after stacking operation, wherein the output dimensionality of the surface feature classifier is the surface feature category quantity, and the specific calculation process is as follows:

；

showing the common features after integration.

Step 4.2: and combining the label information of the training data set and the test data set to obtain a final ground object type image.

The embodiment of the application further provides a method for classifying multi-source remote sensing data, which comprises the following steps:

and acquiring multi-source remote sensing data to be classified.

And classifying the multi-source remote sensing data to be classified by using the multi-source remote sensing data classification model trained by the multi-source remote sensing data classification model training method in any one of the embodiments to obtain a classification result.

It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., a ROM (Read-Only Memory)/RAM (Random Access Memory), a magnetic disk, an optical disk) and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the methods according to the embodiments of the present application.

According to another aspect of the embodiment of the application, a multi-source remote sensing data classification model training device for implementing the multi-source remote sensing data classification model training method is further provided. FIG. 7 is a schematic diagram of an alternative training apparatus for a classification model of multi-source remote sensing data according to an embodiment of the present application, and as shown in FIG. 7, the apparatus may include:

the network generation module 702 is used for constructing a feature generation network, and the feature generation network is used for generating common features and specific features of the multi-source wetland remote sensing data;

a training module 704, configured to perform at least one round of alternating training on a classifier and the feature generation network to obtain a trained feature generation network, where each round of alternating training includes at least one classifier training stage and at least one feature generation network training stage, and a classifier loss function is minimized in the classifier training stage, so that the classifier can perform modal classification on common features, and a distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the discriminator in a training stage of the feature generation network until the common feature generated by the feature generation network cannot be subjected to modal classification by the discriminator, and the common feature and the specific feature are linearly independent.

It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.

According to another aspect of the embodiment of the application, there is also provided an electronic device for implementing the above multi-source remote sensing data classification model training method, where the electronic device may be a server, a terminal, or a combination thereof.

Fig. 8 is a block diagram of an alternative electronic device according to an embodiment of the present application, as shown in fig. 8, including a processor 802, a communication interface 804, a memory 806, and a communication bus 808, where the processor 802, the communication interface 804, and the memory 806 are configured to communicate with each other through the communication bus 808, and the memory 806 is configured to store a computer program; the processor 802, when executing the computer program stored in the memory 806, performs the following steps:

performing at least one round of alternate training on a discriminator and the feature generation network to obtain a trained feature generation network, wherein each round of alternate training comprises at least one stage of discriminator training and at least one stage of feature generation network training, and a discriminator loss function is minimized in the stage of discriminator training, so that the discriminator can perform modal classification on common features, and the distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the discriminator in a training stage of the feature generation network until the common feature generated by the feature generation network cannot be subjected to modal classification by the discriminator, and the common feature and the specific feature are linearly independent.

Optionally, the electronic device may be an engine control system or an on-board computer.

Alternatively, in this embodiment, the communication bus may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The memory may include RAM, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The processor may be a general-purpose processor, and may include but is not limited to: a CPU (Central Processing Unit), an NP (Network Processor), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.

It can be understood by those skilled in the art that the structure shown in fig. 8 is only an illustration, and the device for implementing the multi-source remote sensing data classification model training method may be a terminal device, and the terminal device may be a terminal device such as a smart phone (e.g., an Android Mobile phone, an iOS Mobile phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 8 is a diagram illustrating a structure of the electronic device. For example, the terminal device may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 8, or have a different configuration than shown in FIG. 8.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disk, ROM, RAM, magnetic or optical disk, and the like.

According to still another aspect of an embodiment of the present application, there is also provided a storage medium. Optionally, in this embodiment, the storage medium may be used to execute a program code of a multi-source remote sensing data classification model training method.

Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:

Optionally, the specific example in this embodiment may refer to the example described in the above embodiment, which is not described again in this embodiment.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the methods described in the embodiments of the present application.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate components may or may not be physically separate, and components displayed as units may or may not be physical units, may be located in one position, and may also be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. The training method of the classification model of the multi-source remote sensing data is characterized in that the classification model comprises a feature generation network and a terrain classifier, the feature generation network is used for extracting common features and specific features of the multi-source wetland remote sensing data, and the training method comprises the following steps:

performing at least one round of alternate training on a discriminator and the feature generation network to obtain a trained feature generation network, wherein each round of alternate training comprises at least one stage of discriminator training and at least one stage of feature generation network training, and a discriminator loss function is minimized in the stage of discriminator training, so that the discriminator can perform modal classification on common features, and the distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the discriminator in the training stage of the feature generation network until the common characteristic generated by the feature generation network can not be subjected to modal classification by the discriminator, and the common characteristic and the specific characteristic are linearly independent.

2. The multi-source remote sensing data classification model training method according to claim 1, wherein the training samples of the discriminators in the current round of alternating training are common features and specific features extracted by the feature generation network after the previous round of alternating training is completed.

3. The multi-source remote sensing data classification model training method of claim 2, wherein the process of training the discriminator for the first time comprises:

extracting initial common characteristics and initial specific characteristics of the multi-source wetland remote sensing data by using an initial characteristic generation network;

and training the discriminator by utilizing the initial common characteristic and the initial specific characteristic, so that the trained discriminator can carry out modal classification on the initial common characteristic, and the distance between the common characteristic and the specific characteristic is smaller than a preset distance.

4. The multi-source remote sensing data classification model training method of claim 1, wherein the performing at least one round of alternating training on a discriminator and the feature generation network comprises:

minimizing a loss function of a modal classifier at a training stage of a discriminator so that the modal classifier can carry out modal classification on common features, and simultaneously minimizing a mean square error loss function so that the distance between the common features and the specific features is smaller than a preset distance;

and maximizing the loss function of the modal classifier in a feature generation network training stage until the common feature generated by the feature generation network cannot be subjected to modal classification by the modal classifier trained in the current round, and simultaneously maximizing the mean square error loss function until the common feature and the specific feature are linearly independent.

5. The multi-source remote sensing data classification model training method of claim 1, further comprising:

constructing a characteristic reconstruction network, wherein the characteristic reconstruction network is used for reconstructing the common characteristic and the specific characteristic to the original characteristic of the multi-source wetland remote sensing data to obtain a reconstructed common characteristic and a reconstructed specific characteristic;

and when the discriminator and the feature generation network are subjected to at least one round of alternate training, the reconstruction network is utilized for constraint, so that the common features and the specific features extracted by the feature generation network can reconstruct the original features of the multi-source wetland remote sensing data.

6. The method for training a multi-source remote sensing data classification model according to claim 1, wherein obtaining the trained feature generation network further comprises:

and training the ground feature classifier by using the common characteristics and the specific characteristics extracted by the trained characteristic generation network to obtain the trained ground feature classifier.

7. A multi-source remote sensing data classification method is characterized by comprising the following steps:

obtaining multi-source remote sensing data to be classified;

classifying the multi-source remote sensing data to be classified by using the multi-source remote sensing data classification model trained by the multi-source remote sensing data classification model training method according to any one of claims 1 to 6 to obtain a classification result.

8. The utility model provides a multisource remote sensing data classification model trainer, its characterized in that, classification model includes feature generation network and terrain classifier, feature generation network is used for drawing multisource wetland remote sensing data's commonality characteristic and specific feature, trainer includes:

the network generation module is used for constructing a feature generation network, and the feature generation network is used for generating common features and specific features of the multi-source wetland remote sensing data;

the training module is used for carrying out at least one round of alternate training on the discriminator and the feature generation network to obtain a trained feature generation network, wherein each round of alternate training comprises at least one discriminator training stage and at least one feature generation network training stage, and a discriminator loss function is minimized in the discriminator training stage, so that the discriminator can carry out modal classification on common features, and the distance between the common features and the specific features is smaller than a preset distance; and maximizing the loss function of the discriminator in a training stage of the feature generation network until the common feature generated by the feature generation network cannot be subjected to modal classification by the discriminator, and the common feature and the specific feature are linearly independent.

9. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein said processor, said communication interface and said memory communicate with each other via said communication bus,

the memory for storing a computer program;

the processor is configured to execute the multi-source remote sensing data classification model training method of any one of claims 1 to 6 and/or the multi-source remote sensing data classification method of claim 7 by running the computer program stored in the memory.

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the multi-source remote sensing data classification model training method of any one of claims 1 to 6 and/or the multi-source remote sensing data classification method of claim 7 when running.