CN114612685B

CN114612685B - Self-supervision information extraction method combining depth features and contrast learning

Info

Publication number: CN114612685B
Application number: CN202210282505.6A
Authority: CN
Inventors: 陈德跃; 彭玲
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-12-23
Anticipated expiration: 2042-03-22
Also published as: CN114612685A

Abstract

The invention relates to a self-supervision information extraction method combining depth features and contrast learning, which comprises the following steps: the first step is as follows: and (3) respectively inputting the image x' without the marked data and the marked image x into the agent task and the final target task through feature coding, and respectively training. The second step is that: during agent task training, fixing network structure parameters of a segmentation task, combining the characteristics obtained by encoding x 'with the characteristics of the x' encoded by a segmentation task characteristic extraction layer, simulating unsupervised extraction of unmarked data by using a neural network structure, and training a unmarked data set; the third step: when a segmentation task is trained, fixing network structure parameters of an agent task, and training a whole data set by combining the characteristics obtained by encoding x with the characteristics of the agent task through a characteristic extraction layer; the fourth step: in the above training, the training is continuously iterated; and when the features extracted by the two feature extraction structures are integrated, the effect of feature migration is realized.

Description

Self-supervision information extraction method combining depth features and contrast learning

Technical Field

The invention relates to the field of computer learning, in particular to a self-supervision information extraction method combining depth characteristics with comparative learning.

Background

In recent years, deep learning is rapidly developed, and related methods are rapidly applied to various industries, but still considerable problems exist, and two most significant problems in deep learning application are poor in model generalization capability and depend on large-scale training samples. The characteristics of remote sensing application determine that most of the cases are experiments for information extraction in a macroscopic range, and if a research area is changed every time, a sample set needs to be marked again, so that great marking cost is brought to the application of the sample set. The difficulty in labeling the data set is almost a consensus in the academic world, and the predicament brings high cost for the deep learning application, so that the remote sensing interpretation application based on the deep learning is difficult to meet the actual application requirement. Therefore, a deep learning interpretation technique that is less dependent on manual annotation is of great importance.

Meanwhile, the remote sensing satellite continuously operates, mass data can be utilized all the time, but the data which are only a few of marked data can be utilized, and a large amount of unmarked data is lack of utilization.

Disclosure of Invention

In order to solve the technical problem, the invention provides an automatic supervision information extraction method combining depth features and contrast learning. During training, based on the idea of reinforcement learning, a channel is constructed between unsupervised learning and supervised learning to learn each other, and the network extraction effect is promoted.

The technical scheme of the invention is as follows: a self-supervision information extraction method combining depth features and contrast learning comprises the following steps:

the first step is as follows: respectively inputting the image x' without the marked data and the marked image x into an agent task and a final target task after feature coding, and respectively training;

the second step is that: during agent task training, fixing network structure parameters of a segmentation task, combining the characteristics obtained by encoding x 'with the characteristics of the x' encoded by a segmentation task characteristic extraction layer, simulating unsupervised extraction of unmarked data by using a neural network structure, and training a unmarked data set;

the third step: when the segmentation task is trained, network structure parameters of the agent task are fixed, and the characteristics obtained by encoding x are trained once through the whole data set by combining the characteristics of the agent task encoded by the characteristic extraction layer;

the fourth step: in the training, continuously iterating and training, after one network structure is trained, fixing parameters of the characteristic extraction network part, and carrying out training on the other network structure, so as to iterate and circulate; and when the features extracted by the two feature extraction structures are integrated, a channel attention mechanism is adopted to set a weight for the feature vectors output by the two feature extraction structures, so that the weighting integration is carried out on the feature vectors obtained by output, and the feature migration effect is realized in the alternative mode.

Has the beneficial effects that:

compared with the traditional method, the technical scheme of the invention has the following advantages:

1. aiming at the problem of insufficient samples in the traditional supervised learning, the method can utilize the idle unmarked samples in the conventional process by applying the self-supervised learning, and can achieve better training effect under the condition of limited samples.

2. The invention establishes a channel between the agent task and the target task of the self-supervision learning based on the idea of reinforcement learning, so that the effects of mutual promotion and mutual learning are achieved.

Drawings

FIG. 1: a schematic diagram of a self-supervised learning process;

FIG. 2: comparing the training process diagram of the learning network structure;

FIG. 3: and (5) an image repairing process diagram.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.

According to the embodiment of the invention, the self-supervision information extraction method combining depth features and comparative learning comprises three parts, namely agent task network training, feature embedding combination and downstream learning task migration, wherein the main structural block diagram is shown in fig. 1 and comprises an agent task network structure and a target task network structure, and the target task generally comprises a classification task, a semantic segmentation task, a target detection task and an instance segmentation task. For convenience of description, the following objective tasks are introduced mainly by segmentation tasks, fig. 1 totally includes three modules, the upper half is a segmentation task structure, the lower half is an agent task structure, and the middle part is an attention connection module, wherein the detailed structure of the agent task is specifically described in fig. 2 and fig. 3, the main process is to train a network structure by designing a task that a computer can simply obtain a sample through the agent task, and a 'semi-automatic' process is used to obtain 'tags' from data itself. And an attention structure is adopted in the feature embedding structure to integrate the features of the agent task and the tasks of the main division layer, and finally, the feature migration of the part without marked data is realized. The method comprises the following steps:

the first step is as follows: and (3) respectively inputting the image x' without the labeled data and the labeled image x into the agent task and the final target task through feature coding, and respectively training.

The second step is that: during agent task training, network structure parameters of a segmentation task are fixed, features obtained by encoding x 'are combined with features of x' encoded by a segmentation task feature extraction layer, non-supervised extraction of label-free data is simulated by a neural network structure, and a label-free data set is trained once.

The third step: when the segmentation task is trained, the network structure parameters of the agent task are fixed, the features obtained by the x coding are combined with the features of the agent task which is coded by the feature extraction layer, and the whole data set is trained once.

The fourth step: in the training, the training is iterated continuously, after one network structure is trained, parameters of the characteristic extraction network part are fixed, and the training of the other network structure is carried out, so as to iterate the loop. And when the features extracted by the two feature extraction structures are integrated, a channel attention mechanism is adopted to set a weight for the feature vectors output by the two feature extraction structures, so that the feature vectors obtained by output are weighted and integrated, and the feature migration effect is realized in the alternative mode.

The selection of the agent task needs to be performed according to local conditions, and some appropriate tasks are selected, wherein building information extraction is taken as an example, a multi-task learning structure is selected, the two modes of image restoration and contrast learning are simultaneously added into the agent task for training, firstly, the contrast learning is performed, the extraction of the features is mainly realized through the principle of feature change invariance, the loss function is constructed through converting the images and then comparing the similarity existing between the converted image results, theoretically, the similarity between the results obtained from the change of the same image is relatively high, and the similarity between the results obtained from the conversion of different images is relatively low. The specific implementation of the comparative learning agent task is as follows:

the first step is as follows: and (3) constructing a feature library, traversing the complete data set, passing through a feature extraction network, and storing the obtained feature vector result, thereby constructing and obtaining the feature library.

The second step: and carrying out enhancement transformation on the input image, and respectively carrying out feature coding on the transformed result to obtain a coding feature result.

The third step: calculating the similarity of the positive samples of the two obtained coding results, calculating the loss of the negative samples of all the characteristics in the characteristic library and the coding results, and finishing the training of comparing and learning part of network structures through loss iteration. The main process is shown in fig. 2, and the loss function between the positive and negative samples is calculated as follows:

wherein gamma is a hyper-parameter, q represents a vector obtained by outputting the enhanced image x1 through a coding structure, and k ₊ Vector, k, output from the coding structure for enhancing image x2 _i Representing vectors stored in a feature library, for k stored each time a loss function is calculated ₊ This is a negative example of each new image input. Then the loss is propagated backwards according to the error in each trainingTheorem, updating network structure to complete network training

The image repair proxy task itself is to restore missing parts in the image. And restoring the missing part in the image based on the existing information in the image. In the process, because part of information of the image is artificially erased, the image needs to supplement the part of content from the surrounding image condition when the part of information is repaired, so that the network structure can learn the texture and the spectral feature of the image in the feature extraction part, and the specific flow is as shown in fig. 2. The detailed steps are as follows:

firstly, artificial shielding is made on an original image for repairing during training, and the size of a shielding square block is designed approximately according to the size occupied by a building.

And secondly, inputting the shielding image into the set symmetrical network structure, and obtaining a repaired image result after training.

And thirdly, calculating loss between the generated image and the original image, updating parameters through a neural network loss back propagation principle after the loss is calculated, and continuously training a network structure until the network structure is stable, wherein a loss function is shown as follows.

Wherein mu _x And mu _y Representing the mean, σ, of the input image and the output image _xy ，σ _x ，σ _y The method is characterized in that covariance and variance of two images are represented, C is a constant, SSIM loss can better evaluate image quality, similarity between a predicted image and an original image is calculated, loss is updated according to an error back propagation theorem, and therefore training of a network is completed.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims

1. A self-supervision information extraction method combining depth features and contrast learning is characterized by comprising the following steps:

the first step is as follows: respectively inputting the image x' without the marked data and the marked image x into an agent task and a final target task after feature coding, and respectively training; the target task is a segmentation task;

the fourth step: in the training, continuously carrying out iterative training, after one network structure is trained, fixing parameters of a characteristic extraction network part, carrying out training of the other network structure, and carrying out iterative loop in the second step and the third step; and when the features extracted by the two feature extraction network structures are integrated, a channel attention mechanism is adopted to set a weight for the feature vectors output by the two feature extraction network structure parts, so that the weighting integration is carried out on the feature vectors obtained by output, and the feature migration effect is realized in the alternative form.

2. The method for extracting self-supervision information by combining depth feature and contrast learning according to claim 1, characterized by comprising:

the method comprises the steps of selecting a multi-task learning structure of an agent task, simultaneously adding the agent task into two modes of image restoration and contrast learning for training, firstly performing contrast learning, wherein the contrast learning is to extract features by the principle of feature change invariance, transform images and then compare similarity existing between transformed image results to construct a loss function.

3. The method for extracting self-supervision information by combining depth feature and contrast learning according to claim 2, wherein the contrast learning agent task is as follows:

the first step is as follows: constructing a feature library, traversing the complete data set, passing through a feature extraction network, and storing the obtained feature vector result, thereby constructing and obtaining the feature library;

the second step is that: carrying out enhancement transformation on an input image, and respectively carrying out feature coding on the transformed result to obtain a coding feature result;

the third step: calculating the similarity of the positive samples of the two obtained coding results, calculating the loss of the negative samples of all the characteristics in the characteristic library and the coding results, and finishing the training of comparing and learning part of network structures through loss iteration.

4. The self-supervision information extraction method combining depth feature and contrast learning according to claim 1, characterized in that the image restoration agent task itself restores the missing part in the image, restores the missing part in the image based on the existing information in the image, and hides a part of the original image of the image subjected to color transformation and geometric change, and uses the data before occlusion as output, thereby constructing a similarity loss function for iterative training.

5. The method for extracting self-supervision information by combining depth feature and contrast learning according to claim 4, wherein the process of performing iterative training is specifically as follows:

firstly, making artificial shielding on an original image for repairing during training, wherein the size of a shielding square block is designed according to the size occupied by a building;

secondly, inputting the shielding image into a set symmetrical network structure, and obtaining a repaired image result after training;

and thirdly, calculating loss between the generated image and the original image, updating parameters through a neural network loss back propagation principle after the loss is calculated, and continuously training a network structure until the network structure is stable, wherein the loss function is as follows:

（1）

whereinμ _x Andμ _y represents the average of the input image and the output image,σ _xy ，σ _x ，σ _y the method is characterized in that covariance and variance of two images are represented, C is a constant, SSIM loss can better evaluate image quality, similarity between a predicted image and an original image is calculated, loss is updated according to an error back propagation theorem, and therefore training of a network is completed.