CN114612685B - Self-supervision information extraction method combining depth features and contrast learning - Google Patents

Self-supervision information extraction method combining depth features and contrast learning Download PDF

Info

Publication number
CN114612685B
CN114612685B CN202210282505.6A CN202210282505A CN114612685B CN 114612685 B CN114612685 B CN 114612685B CN 202210282505 A CN202210282505 A CN 202210282505A CN 114612685 B CN114612685 B CN 114612685B
Authority
CN
China
Prior art keywords
training
image
task
feature
network structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210282505.6A
Other languages
Chinese (zh)
Other versions
CN114612685A (en
Inventor
陈德跃
彭玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202210282505.6A priority Critical patent/CN114612685B/en
Publication of CN114612685A publication Critical patent/CN114612685A/en
Application granted granted Critical
Publication of CN114612685B publication Critical patent/CN114612685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a self-supervision information extraction method combining depth features and contrast learning, which comprises the following steps: the first step is as follows: and (3) respectively inputting the image x' without the marked data and the marked image x into the agent task and the final target task through feature coding, and respectively training. The second step is that: during agent task training, fixing network structure parameters of a segmentation task, combining the characteristics obtained by encoding x 'with the characteristics of the x' encoded by a segmentation task characteristic extraction layer, simulating unsupervised extraction of unmarked data by using a neural network structure, and training a unmarked data set; the third step: when a segmentation task is trained, fixing network structure parameters of an agent task, and training a whole data set by combining the characteristics obtained by encoding x with the characteristics of the agent task through a characteristic extraction layer; the fourth step: in the above training, the training is continuously iterated; and when the features extracted by the two feature extraction structures are integrated, the effect of feature migration is realized.

Description

Self-supervision information extraction method combining depth features and contrast learning
Technical Field
The invention relates to the field of computer learning, in particular to a self-supervision information extraction method combining depth characteristics with comparative learning.
Background
In recent years, deep learning is rapidly developed, and related methods are rapidly applied to various industries, but still considerable problems exist, and two most significant problems in deep learning application are poor in model generalization capability and depend on large-scale training samples. The characteristics of remote sensing application determine that most of the cases are experiments for information extraction in a macroscopic range, and if a research area is changed every time, a sample set needs to be marked again, so that great marking cost is brought to the application of the sample set. The difficulty in labeling the data set is almost a consensus in the academic world, and the predicament brings high cost for the deep learning application, so that the remote sensing interpretation application based on the deep learning is difficult to meet the actual application requirement. Therefore, a deep learning interpretation technique that is less dependent on manual annotation is of great importance.
Meanwhile, the remote sensing satellite continuously operates, mass data can be utilized all the time, but the data which are only a few of marked data can be utilized, and a large amount of unmarked data is lack of utilization.
Disclosure of Invention
In order to solve the technical problem, the invention provides an automatic supervision information extraction method combining depth features and contrast learning. During training, based on the idea of reinforcement learning, a channel is constructed between unsupervised learning and supervised learning to learn each other, and the network extraction effect is promoted.
The technical scheme of the invention is as follows: a self-supervision information extraction method combining depth features and contrast learning comprises the following steps:
the first step is as follows: respectively inputting the image x' without the marked data and the marked image x into an agent task and a final target task after feature coding, and respectively training;
the second step is that: during agent task training, fixing network structure parameters of a segmentation task, combining the characteristics obtained by encoding x 'with the characteristics of the x' encoded by a segmentation task characteristic extraction layer, simulating unsupervised extraction of unmarked data by using a neural network structure, and training a unmarked data set;
the third step: when the segmentation task is trained, network structure parameters of the agent task are fixed, and the characteristics obtained by encoding x are trained once through the whole data set by combining the characteristics of the agent task encoded by the characteristic extraction layer;
the fourth step: in the training, continuously iterating and training, after one network structure is trained, fixing parameters of the characteristic extraction network part, and carrying out training on the other network structure, so as to iterate and circulate; and when the features extracted by the two feature extraction structures are integrated, a channel attention mechanism is adopted to set a weight for the feature vectors output by the two feature extraction structures, so that the weighting integration is carried out on the feature vectors obtained by output, and the feature migration effect is realized in the alternative mode.
Has the beneficial effects that:
compared with the traditional method, the technical scheme of the invention has the following advantages:
1. aiming at the problem of insufficient samples in the traditional supervised learning, the method can utilize the idle unmarked samples in the conventional process by applying the self-supervised learning, and can achieve better training effect under the condition of limited samples.
2. The invention establishes a channel between the agent task and the target task of the self-supervision learning based on the idea of reinforcement learning, so that the effects of mutual promotion and mutual learning are achieved.
Drawings
FIG. 1: a schematic diagram of a self-supervised learning process;
FIG. 2: comparing the training process diagram of the learning network structure;
FIG. 3: and (5) an image repairing process diagram.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
According to the embodiment of the invention, the self-supervision information extraction method combining depth features and comparative learning comprises three parts, namely agent task network training, feature embedding combination and downstream learning task migration, wherein the main structural block diagram is shown in fig. 1 and comprises an agent task network structure and a target task network structure, and the target task generally comprises a classification task, a semantic segmentation task, a target detection task and an instance segmentation task. For convenience of description, the following objective tasks are introduced mainly by segmentation tasks, fig. 1 totally includes three modules, the upper half is a segmentation task structure, the lower half is an agent task structure, and the middle part is an attention connection module, wherein the detailed structure of the agent task is specifically described in fig. 2 and fig. 3, the main process is to train a network structure by designing a task that a computer can simply obtain a sample through the agent task, and a 'semi-automatic' process is used to obtain 'tags' from data itself. And an attention structure is adopted in the feature embedding structure to integrate the features of the agent task and the tasks of the main division layer, and finally, the feature migration of the part without marked data is realized. The method comprises the following steps:
the first step is as follows: and (3) respectively inputting the image x' without the labeled data and the labeled image x into the agent task and the final target task through feature coding, and respectively training.
The second step is that: during agent task training, network structure parameters of a segmentation task are fixed, features obtained by encoding x 'are combined with features of x' encoded by a segmentation task feature extraction layer, non-supervised extraction of label-free data is simulated by a neural network structure, and a label-free data set is trained once.
The third step: when the segmentation task is trained, the network structure parameters of the agent task are fixed, the features obtained by the x coding are combined with the features of the agent task which is coded by the feature extraction layer, and the whole data set is trained once.
The fourth step: in the training, the training is iterated continuously, after one network structure is trained, parameters of the characteristic extraction network part are fixed, and the training of the other network structure is carried out, so as to iterate the loop. And when the features extracted by the two feature extraction structures are integrated, a channel attention mechanism is adopted to set a weight for the feature vectors output by the two feature extraction structures, so that the feature vectors obtained by output are weighted and integrated, and the feature migration effect is realized in the alternative mode.
The selection of the agent task needs to be performed according to local conditions, and some appropriate tasks are selected, wherein building information extraction is taken as an example, a multi-task learning structure is selected, the two modes of image restoration and contrast learning are simultaneously added into the agent task for training, firstly, the contrast learning is performed, the extraction of the features is mainly realized through the principle of feature change invariance, the loss function is constructed through converting the images and then comparing the similarity existing between the converted image results, theoretically, the similarity between the results obtained from the change of the same image is relatively high, and the similarity between the results obtained from the conversion of different images is relatively low. The specific implementation of the comparative learning agent task is as follows:
the first step is as follows: and (3) constructing a feature library, traversing the complete data set, passing through a feature extraction network, and storing the obtained feature vector result, thereby constructing and obtaining the feature library.
The second step: and carrying out enhancement transformation on the input image, and respectively carrying out feature coding on the transformed result to obtain a coding feature result.
The third step: calculating the similarity of the positive samples of the two obtained coding results, calculating the loss of the negative samples of all the characteristics in the characteristic library and the coding results, and finishing the training of comparing and learning part of network structures through loss iteration. The main process is shown in fig. 2, and the loss function between the positive and negative samples is calculated as follows:
Figure BDA0003558393610000031
wherein gamma is a hyper-parameter, q represents a vector obtained by outputting the enhanced image x1 through a coding structure, and k + Vector, k, output from the coding structure for enhancing image x2 i Representing vectors stored in a feature library, for k stored each time a loss function is calculated + This is a negative example of each new image input. Then the loss is propagated backwards according to the error in each trainingTheorem, updating network structure to complete network training
The image repair proxy task itself is to restore missing parts in the image. And restoring the missing part in the image based on the existing information in the image. In the process, because part of information of the image is artificially erased, the image needs to supplement the part of content from the surrounding image condition when the part of information is repaired, so that the network structure can learn the texture and the spectral feature of the image in the feature extraction part, and the specific flow is as shown in fig. 2. The detailed steps are as follows:
firstly, artificial shielding is made on an original image for repairing during training, and the size of a shielding square block is designed approximately according to the size occupied by a building.
And secondly, inputting the shielding image into the set symmetrical network structure, and obtaining a repaired image result after training.
And thirdly, calculating loss between the generated image and the original image, updating parameters through a neural network loss back propagation principle after the loss is calculated, and continuously training a network structure until the network structure is stable, wherein a loss function is shown as follows.
Figure BDA0003558393610000041
Wherein mu x And mu y Representing the mean, σ, of the input image and the output image xy ,σ x ,σ y The method is characterized in that covariance and variance of two images are represented, C is a constant, SSIM loss can better evaluate image quality, similarity between a predicted image and an original image is calculated, loss is updated according to an error back propagation theorem, and therefore training of a network is completed.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (5)

1. A self-supervision information extraction method combining depth features and contrast learning is characterized by comprising the following steps:
the first step is as follows: respectively inputting the image x' without the marked data and the marked image x into an agent task and a final target task after feature coding, and respectively training; the target task is a segmentation task;
the second step is that: during agent task training, fixing network structure parameters of a segmentation task, combining the characteristics obtained by encoding x 'with the characteristics of the x' encoded by a segmentation task characteristic extraction layer, simulating unsupervised extraction of unmarked data by using a neural network structure, and training a unmarked data set;
the third step: when the segmentation task is trained, network structure parameters of the agent task are fixed, and the characteristics obtained by encoding x are trained once through the whole data set by combining the characteristics of the agent task encoded by the characteristic extraction layer;
the fourth step: in the training, continuously carrying out iterative training, after one network structure is trained, fixing parameters of a characteristic extraction network part, carrying out training of the other network structure, and carrying out iterative loop in the second step and the third step; and when the features extracted by the two feature extraction network structures are integrated, a channel attention mechanism is adopted to set a weight for the feature vectors output by the two feature extraction network structure parts, so that the weighting integration is carried out on the feature vectors obtained by output, and the feature migration effect is realized in the alternative form.
2. The method for extracting self-supervision information by combining depth feature and contrast learning according to claim 1, characterized by comprising:
the method comprises the steps of selecting a multi-task learning structure of an agent task, simultaneously adding the agent task into two modes of image restoration and contrast learning for training, firstly performing contrast learning, wherein the contrast learning is to extract features by the principle of feature change invariance, transform images and then compare similarity existing between transformed image results to construct a loss function.
3. The method for extracting self-supervision information by combining depth feature and contrast learning according to claim 2, wherein the contrast learning agent task is as follows:
the first step is as follows: constructing a feature library, traversing the complete data set, passing through a feature extraction network, and storing the obtained feature vector result, thereby constructing and obtaining the feature library;
the second step is that: carrying out enhancement transformation on an input image, and respectively carrying out feature coding on the transformed result to obtain a coding feature result;
the third step: calculating the similarity of the positive samples of the two obtained coding results, calculating the loss of the negative samples of all the characteristics in the characteristic library and the coding results, and finishing the training of comparing and learning part of network structures through loss iteration.
4. The self-supervision information extraction method combining depth feature and contrast learning according to claim 1, characterized in that the image restoration agent task itself restores the missing part in the image, restores the missing part in the image based on the existing information in the image, and hides a part of the original image of the image subjected to color transformation and geometric change, and uses the data before occlusion as output, thereby constructing a similarity loss function for iterative training.
5. The method for extracting self-supervision information by combining depth feature and contrast learning according to claim 4, wherein the process of performing iterative training is specifically as follows:
firstly, making artificial shielding on an original image for repairing during training, wherein the size of a shielding square block is designed according to the size occupied by a building;
secondly, inputting the shielding image into a set symmetrical network structure, and obtaining a repaired image result after training;
and thirdly, calculating loss between the generated image and the original image, updating parameters through a neural network loss back propagation principle after the loss is calculated, and continuously training a network structure until the network structure is stable, wherein the loss function is as follows:
Figure DEST_PATH_IMAGE001
(1)
whereinμ x Andμ y represents the average of the input image and the output image,σ xy σ x σ y the method is characterized in that covariance and variance of two images are represented, C is a constant, SSIM loss can better evaluate image quality, similarity between a predicted image and an original image is calculated, loss is updated according to an error back propagation theorem, and therefore training of a network is completed.
CN202210282505.6A 2022-03-22 2022-03-22 Self-supervision information extraction method combining depth features and contrast learning Active CN114612685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210282505.6A CN114612685B (en) 2022-03-22 2022-03-22 Self-supervision information extraction method combining depth features and contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210282505.6A CN114612685B (en) 2022-03-22 2022-03-22 Self-supervision information extraction method combining depth features and contrast learning

Publications (2)

Publication Number Publication Date
CN114612685A CN114612685A (en) 2022-06-10
CN114612685B true CN114612685B (en) 2022-12-23

Family

ID=81864736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210282505.6A Active CN114612685B (en) 2022-03-22 2022-03-22 Self-supervision information extraction method combining depth features and contrast learning

Country Status (1)

Country Link
CN (1) CN114612685B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668492A (en) * 2020-12-30 2021-04-16 中山大学 Behavior identification method for self-supervised learning and skeletal information
CN112989927A (en) * 2021-02-03 2021-06-18 杭州电子科技大学 Scene graph generation method based on self-supervision pre-training
CN113158949A (en) * 2021-04-30 2021-07-23 湖北工业大学 Motor imagery electroencephalogram signal classification method based on self-supervision learning
CN113314205A (en) * 2021-05-28 2021-08-27 北京航空航天大学 Efficient medical image labeling and learning system
US11200497B1 (en) * 2021-03-16 2021-12-14 Moffett Technologies Co., Limited System and method for knowledge-preserving neural network pruning
CN114037055A (en) * 2021-11-05 2022-02-11 北京市商汤科技开发有限公司 Data processing system, method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657455B (en) * 2021-07-23 2024-02-09 西北工业大学 Semi-supervised learning method based on triple play network and labeling consistency regularization
CN113724206B (en) * 2021-08-12 2023-08-18 武汉大学 Fundus image blood vessel segmentation method and system based on self-supervision learning
CN114038517A (en) * 2021-08-25 2022-02-11 暨南大学 Self-supervision graph neural network pre-training method based on contrast learning
CN113989582A (en) * 2021-08-26 2022-01-28 中国科学院信息工程研究所 Self-supervision visual model pre-training method based on dense semantic comparison

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668492A (en) * 2020-12-30 2021-04-16 中山大学 Behavior identification method for self-supervised learning and skeletal information
CN112989927A (en) * 2021-02-03 2021-06-18 杭州电子科技大学 Scene graph generation method based on self-supervision pre-training
US11200497B1 (en) * 2021-03-16 2021-12-14 Moffett Technologies Co., Limited System and method for knowledge-preserving neural network pruning
CN113158949A (en) * 2021-04-30 2021-07-23 湖北工业大学 Motor imagery electroencephalogram signal classification method based on self-supervision learning
CN113314205A (en) * 2021-05-28 2021-08-27 北京航空航天大学 Efficient medical image labeling and learning system
CN114037055A (en) * 2021-11-05 2022-02-11 北京市商汤科技开发有限公司 Data processing system, method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Improving out-of-distribution generalization via multi-task self-supervised pretraining";Isabela Albuquerque et al.;《arXiv》;20200330;全文 *
"Self-Sipervised Learning via multi-Transformation Classification for Action Recognition";Duc-Quang et al.;《arXiv》;20210220;全文 *
"基于自监督学习的图像特征表示方法研究";彭玉娇;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220315(第03期);全文 *

Also Published As

Publication number Publication date
CN114612685A (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN110059698B (en) Semantic segmentation method and system based on edge dense reconstruction for street view understanding
CN109087258B (en) Deep learning-based image rain removing method and device
CN110070091B (en) Semantic segmentation method and system based on dynamic interpolation reconstruction and used for street view understanding
CN110059769B (en) Semantic segmentation method and system based on pixel rearrangement reconstruction and used for street view understanding
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN111340047B (en) Image semantic segmentation method and system based on multi-scale feature and foreground and background contrast
CN113780149A (en) Method for efficiently extracting building target of remote sensing image based on attention mechanism
CN111310766A (en) License plate identification method based on coding and decoding and two-dimensional attention mechanism
CN112183742B (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN112884758B (en) Defect insulator sample generation method and system based on style migration method
CN116051686B (en) Method, system, equipment and storage medium for erasing characters on graph
CN112990331A (en) Image processing method, electronic device, and storage medium
CN115311555A (en) Remote sensing image building extraction model generalization method based on batch style mixing
CN114677536B (en) Pre-training method and device based on Transformer structure
CN114463340A (en) Edge information guided agile remote sensing image semantic segmentation method
CN113947538A (en) Multi-scale efficient convolution self-attention single image rain removing method
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN114612685B (en) Self-supervision information extraction method combining depth features and contrast learning
CN117152438A (en) Lightweight street view image semantic segmentation method based on improved deep LabV3+ network
CN116958325A (en) Training method and device for image processing model, electronic equipment and storage medium
CN115082778B (en) Multi-branch learning-based homestead identification method and system
CN115660979A (en) Attention mechanism-based double-discriminator image restoration method
AU2021104479A4 (en) Text recognition method and system based on decoupled attention mechanism
CN114331894A (en) Face image restoration method based on potential feature reconstruction and mask perception
CN115187775A (en) Semantic segmentation method and device for remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant