CN116434037A

CN116434037A - Multi-mode remote sensing target robust recognition method based on double-layer optimization learning

Info

Publication number: CN116434037A
Application number: CN202310434641.7A
Authority: CN
Inventors: 赵文达; 贾蝶蝶; 王海鹏; 何友; 卢湖川; 夏学知; 刘颢; 杨向广
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-14
Anticipated expiration: 2043-04-21
Also published as: CN116434037B

Abstract

The invention belongs to the technical field of computer vision image processing, and provides a multi-mode remote sensing target robust identification method based on double-layer optimization learning. The method designs a training mode guiding model based on double-layer optimization to adaptively learn knowledge to be learned, and simultaneously utilizes a modulation network to selectively activate an identification network so as to help the identification network learn how to better extract the characteristic representation of an input image. The method and the system can fully utilize the potential of the existing data to deeply mine the model, help the model get rid of multi-mode interference, solve the problem of reduced accuracy of the model on multi-mode target data, and enable the model to obtain higher robustness under limited learning resources.

Description

Multi-mode remote sensing target robust recognition method based on double-layer optimization learning

Technical Field

The invention relates to the technical field of computer vision image processing, in particular to a multi-mode remote sensing target robust identification method based on double-layer optimization learning.

Background

The remote sensing image target identification is remote sensing image classification, which refers to the process of identifying and classifying the ground object target according to the spectrum characteristics, the space characteristics, the time phase characteristics and the like of the ground object in the remote sensing image. The method can provide auxiliary information for other applications such as target detection, and can also provide basic geographic information as a final result for the fields such as map drawing, rescue and relief work, military reconnaissance and the like. Early remote sensing image recognition was accomplished primarily through visual interpretation, relying primarily on the subjective consciousness of the researcher. In recent years, the recognition method mainly uses machine learning, features are extracted by using a convolutional neural network, and a large number of samples are input into a training network to search for the internal rules of the samples, so that an optimal model is obtained. A common network for classification is LeNet, alexNet, VGG, googLeNet, etc. In image net large-scale visual identification competition in 2015, he et al in document Deep Residual Learning for Image Recognition proposes a ResNet network, utilizes data preprocessing and a BN layer to solve gradient disappearance and gradient explosion problems, and simultaneously proposes a residual error structure to solve degradation problems in a deep network, so that the network structure is improved to a deeper level. Subsequently, zhang et al in the article "Shuffenet: an extremely efficient convolutional neural network for mobile devices" propose Shuffenet to increase the computational efficiency of the model using depth separable convolution. In addition to the traditional classification network, ian Goodhellow et al in paper Generative Adversarial Nets use the generator and discriminator for the same task to put forward a GAN model, which helps network training with the idea of oppositional gaming. Many other networks such as DCGAN, styleGAN, bigGAN, stackGAN, cycleGAN and the like are derived later on. With the deep research, ding et al in the literature RepVGG: making VGG-style ConvNets Great Again, equivalently replaced the trained model with a simpler model, and then tested, and the practicality of the model is improved by utilizing structural heavy parameterization. Liu et al in DARTS Differentiable Architecture Search uses gradient descent to search for architecture effectively based on continuous relaxation of architecture representation, solving the scalability problem of architecture search. However, most of research efforts including the above-described methods are directed to improving the classification accuracy or operation speed of the network, but do not pay attention to the distribution difference between the training data and the test data, so that the robustness is low when the model is applied to a remote sensing image having multi-modal information.

The problem of robustness of multi-mode remote sensing target identification aims at solving the problem of reduced accuracy when a model is transferred between different modes of a remote sensing image. The superior performance of most existing algorithms in the remote sensing field relies on an idealized assumption that the source data and the target data are considered to be independently and uniformly distributed, and the real problem of distribution differences between different modalities is ignored. Therefore, when the trained model is transferred to a new data set with a different modality in practical application, the test accuracy is often greatly reduced. This is caused by the fact that the image style difference between different modes of the remote sensing image is large, and the mode difference phenomenon is ubiquitous in the field of remote sensing image processing. How to ensure that a model trained on a source data set still keeps higher performance when being transferred to a target data set of other modes is what is needed to be researched for the problem of remote sensing task robustness.

Disclosure of Invention

Aiming at the problem that the remote sensing target recognition task is poor in robustness on multi-mode data, the invention designs a double-layer optimized training method, adjusts the remote sensing target recognition network by utilizing a modulation network, guides a model to mine self potential, and learns robust feature representation. In the method, a training mode based on double-layer optimization is designed to guide a model to adaptively learn knowledge to be learned, and meanwhile, a modulation network is utilized to selectively activate an identification network, so that the identification network is helped to learn how to better extract the characteristic representation of an input image. The method and the system can fully utilize the potential of the existing data to deeply mine the model, help the model get rid of multi-mode interference, solve the problem of reduced accuracy of the model on multi-mode target data, and enable the model to obtain higher robustness under limited learning resources.

The technical scheme of the invention is as follows: a multimode remote sensing target robust recognition method based on double-layer optimization learning is realized based on two parallel networks, wherein the parallel networks comprise a modulation network and a recognition network; training the identification network by using the modulation network as a gating function, and controlling the selective activation of the identification network; the output of each layer of the modulation network is respectively connected with different positions of the identification network and is used for guiding the parameter updating process of the identification network to be carried out towards the direction of learning the image content characteristics; the modulation network and the identification network are respectively updated in different stages of parallel network optimization through parallel network double-layer optimization, so that separation training of the modulation network and the identification network is realized;

during training, two data sets with different modes are used, namely a known data set D ₁ And adaptation data set D ₂ The method comprises the steps of carrying out a first treatment on the surface of the By means of a known dataset D ₁ The recognition network is pre-trained separately for one time to obtain the recognition network about the known data set D ₁ Optimal parameter θ _P Recording this parameter for later use; before starting double-layer optimization, randomly initializing an identification network and a modulation network; the formal training process adopts a double-layer optimization method and is divided into an inner-layer adaptation stage and a modulation optimization stage; each time the double-layer optimization is started, namely before the inner layer adaptation stage is started, the optimal parameter theta is adopted _P Reinitializing the identification network updated by the previous double-layer optimization stage to enable the parameter theta of the identification network ₀ ＝θ _P I.e. theta _P As an initial parameter of the identification network in the double-layer optimization of the present round;

after the start of the adaptation phase of the inner layer,by adapting the dataset D ₂ Training the adaptation process of the recognition network to the data sets of different modes, updating the recognition network, and not updating the modulation network in the whole inner layer adaptation stage;

where θ represents a parameter identifying the network,

representing parameters of the modulation network, L _c For network loss of the inner layer adaptation stage, alpha is the learning rate of the inner layer adaptation stage, the subscript n corresponds to the nth modulation optimization stage cycle, and the subscript i corresponds to the ith inner layer adaptation stage cycle;

after the inner layer adaptation phase is finished, acquiring the adaptation data set D which is learned by the identification network under the current modulation scheme ₂ Optimal parameter θ _* The method comprises the steps of carrying out a first treatment on the surface of the Starting the modulation optimization phase by the known data set D ₁ Judging the effect of recognition network learning under the current modulation scheme; will know the dataset D ₁ Loss L of input double-layer network in modulation optimization stage _m The loss size is used to reflect the effect of the current modulation scheme; by loss L _m Updating the modulation network and thus updating the modulation scheme while maintaining the identification network unchanged;

wherein L is _m For network loss in the modulation optimization stage, beta is the learning rate in the modulation optimization stage;

when the one-time double-layer optimization process is completed, the modulation network corresponds to the updated modulation scheme when entering the next double-layer optimization process, and the modulation scheme is used for better screening out the parameters which are preferentially updated in the identification network, wherein the parameters which are preferentially updated are neurons.

The modulation network comprises a plurality of convolution blocks which are connected in sequence, and a connecting module is connected among the rest convolution blocks except the initial convolution block in parallel; the identification network comprises a plurality of convolution blocks and classification heads which are connected in sequence; besides the initial convolution blocks, the front branches of all the other convolution blocks and the classification heads are transmitted to corresponding connection modules in the modulation network; each connection module is correspondingly transmitted to the cross multiplication operation among the rest convolution blocks and in front of the classification head.

For a network, different neurons have different degrees of importance for learning different knowledge. Some of the parameters of the neuron learning mainly affect the learning of the knowledge related to the content of the image, while other parameters of the neuron learning are more related to the style of the image. The invention aims to screen out neurons with lower importance for learning image content characteristics, and the neurons are preferably used for parameter updating when new knowledge needs to be learned, while neurons with higher importance for image content learning are prevented from being changed in subsequent training as much as possible. In this way, the guidance network learns new image style information without destroying the ability to learn the original content features. Through continuous training, when the network encounters sample data of different modes, the modulation network can control the recognition network to learn new mode knowledge by preferentially utilizing neurons which are less important for learning content characteristics, so that the capability of learning the content characteristics of an image which is acquired originally is not destroyed, and new mode information can be learned at the same time, and the robustness of the model is improved.

Specifically, the invention adopts two parallel networks as the modulation network and the identification network respectively, which are mutually independent and connected, wherein the output of each layer of the modulation network is respectively connected with different positions of the identification network, thereby realizing the selective activation of the identification network and guiding the parameter updating process of the identification network to be carried out towards the direction which is beneficial to learning the image content characteristics.

The invention has the beneficial effects that:

according to the invention, two parallel networks are designed to serve as a modulation network and an identification network respectively, a network training method based on double-layer optimization is provided, the modulation network is trained to selectively activate the identification network, the identification network is guided to update parameters by preferentially utilizing neurons with lower significance for learning content characteristics when facing sample data of different modes, the accuracy of the network is higher when the network is tested on multi-mode data, and the influence of multi-mode differences in a remote sensing image target identification task on results is improved.

Drawings

Fig. 1 is a schematic diagram of two parallel network architecture frameworks.

Fig. 2 is an overall training flow of a multi-modal remote sensing target robust recognition method based on double-layer optimization learning.

Detailed Description

The following describes the embodiments of the present invention further with reference to the drawings and technical schemes.

The whole training flow adopted by the invention is shown in figure 2, and mainly comprises an inner layer optimization stage for controlling the adaptation process of the identification network to the data of different modes and an outer layer optimization stage for controlling the updating of the modulation network. For convenience of description, these two optimization phases are hereinafter referred to as an inner layer adaptation phase and a modulation optimization phase, respectively. The training process requires the use of two different modalities of known data sets D ₁ Adaptation data set D ₂ Adapting data set D ₂ An adaptation dataset belonging to a different modality than the known dataset is simulated. First, a known data set D with known sample labels is used before training begins ₁ Pre-training the identification network once to obtain the identification network about the known data set D ₁ Optimal parameter θ _P . The theta is used before the inner layer adaptation stage begins in the subsequent training _P Reinitializing the identification network, i.e. letting the identification network parameters theta ₀ ＝θ _P The modulation network is randomly initialized before the whole training is started.

When formally entering the double-layer optimization stage, the adaptive data set D is firstly utilized ₂ Simulating adaptation of an identification network to data sets of different modalities, i.e. using adaptation data set D ₂ The data of the identification network is updated, and the modulation network is not updated all the time in the whole inner layer adaptation stage so as to ensure that the identification network adapts to the dataSet D ₂ Is performed under the same modulation scheme. Expressed by the formula:

where θ represents a parameter identifying the network,

representing parameters of the modulation network, L _c And α are the network loss and learning rate, respectively, of the inner layer adaptation phase.

After the inner layer adaptation phase is completed, the data set D which is learned by the recognition network under the modulation scheme at the moment can be obtained ₂ Optimal parameter θ _* . The modulation optimization phase is then entered, the main function of which is to use the known data set D ₁ To judge whether the network learning is good or bad under the current modulation scheme. Will know the dataset D ₁ Inputting the whole model to obtain the loss L of the modulation optimization stage _m The magnitude of the loss reflects how well the modulation scheme is doing at the time, and the modulation network is updated with the loss while the identification network remains unchanged. The purpose of this update is to allow the modulation network to learn a better modulation scheme and to better screen out neurons in the identification network that need to be updated preferentially (those neurons that are less important for learning image content characteristics). The corresponding formula is:

wherein L is _m For network loss in the modulation optimization phase, β is the learning rate in the modulation optimization phase.

After the double-layer optimization process is finished, the modulation network corresponds to a better modulation scheme when entering the next optimization, so that the identification network can be better controlled to screen out neurons which are updated preferentially.

And continuously performing cyclic optimization according to the flow, and alternately updating the identification network and the modulation network to enable the network to learn to perform classification identification on the multi-mode data.

Through repeated optimization for many times in the training mode, the final model can have better capability of learning image content characteristics, and accuracy is improved when the model is tested on a multi-mode data set, so that the purpose of improving model robustness is achieved.

In summary, the multi-mode remote sensing target robust recognition method based on double-layer optimization learning is provided, so that the interference of modal differences on the accuracy of a model is effectively reduced, and the robustness of the model is improved.

Claims

1. A multi-mode remote sensing target robust identification method based on double-layer optimization learning is characterized in that the method is realized based on two parallel networks, wherein the parallel networks comprise a modulation network and an identification network; training the identification network by using the modulation network as a gating function, and controlling the selective activation of the identification network; the output of each layer of the modulation network is respectively connected with different positions of the identification network and is used for guiding the parameter updating process of the identification network to be carried out towards the direction of learning the image content characteristics; the modulation network and the identification network are respectively updated in different stages of parallel network optimization through parallel network double-layer optimization, so that separation training of the modulation network and the identification network is realized;

during training, two data sets with different modes are used, namely a known data set D ₁ And adaptation data set D ₂ The method comprises the steps of carrying out a first treatment on the surface of the By means of a known dataset D ₁ The recognition network is pre-trained separately for one time to obtain the recognition network about the known data set D ₁ Optimal parameter θ _P The method comprises the steps of carrying out a first treatment on the surface of the Before starting double-layer optimization, randomly initializing an identification network and a modulation network; the formal training process adopts a double-layer optimization method and is divided into an inner-layer adaptation stage and a modulation optimization stage; each time the double-layer optimization is started, namely before the inner layer adaptation stage is started, the optimal parameter theta is adopted _P Reinitializing the identification network updated by the previous double-layer optimization stage to enable the parameter theta of the identification network ₀ ＝θ _P At θ _P As a means ofIdentifying initial parameters of a network in the double-layer optimization of the round;

after the inner adaptation phase begins, the data set D is adapted ₂ Training the adaptation process of the recognition network to the data sets of different modes, updating the recognition network, and not updating the modulation network in the whole inner layer adaptation stage;

where θ represents a parameter identifying the network,

when the one-time double-layer optimization process is completed, the modulation network corresponds to the updated modulation scheme when entering the next double-layer optimization process, and the updated modulation scheme is used for screening out the parameters updated in the identification network.

2. The multi-mode remote sensing target robust recognition method based on double-layer optimization learning according to claim 1, wherein the modulation network comprises a plurality of convolution blocks which are connected in sequence, and a connecting module is connected among all the other convolution blocks except the initial convolution block in parallel; the identification network comprises a plurality of convolution blocks and classification heads which are connected in sequence; besides the initial convolution blocks, the front branches of all the other convolution blocks and the classification heads are transmitted to corresponding connection modules in the modulation network; each connection module is correspondingly transmitted to the cross multiplication operation among the rest convolution blocks and in front of the classification head.