CN109919209B

CN109919209B - Domain self-adaptive deep learning method and readable storage medium

Info

Publication number: CN109919209B
Application number: CN201910139916.8A
Authority: CN
Inventors: 许娇龙; 聂一鸣; 肖良; 朱琪; 商尔科; 戴斌
Original assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Current assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2020-06-19
Anticipated expiration: 2039-02-26
Also published as: CN109919209A

Abstract

The invention discloses a field self-adaptive deep learning method, which carries out rotation transformation on a target domain image to obtain a self-supervised learning training sample set; and performing joint training on the converted self-supervision learning sample set and the source field training sample set to obtain a field self-adaptation deep learning model for the visual task on the target field. The method does not need to label the target domain samples, can effectively learn the feature representation of the target domain, and improves the performance of computer vision tasks on the target domain. The application also discloses a readable storage medium for the field self-adaptive deep learning, which also has the beneficial effects.

Description

Domain self-adaptive deep learning method and readable storage medium

Technical Field

The invention relates to the field of field adaptive deep learning, in particular to a field adaptive deep learning method and a readable storage medium for computer vision tasks.

Background

Models in computer vision tasks, such as image classification, image semantic segmentation, target recognition, target detection and the like, are usually obtained through supervised learning training. Supervised learning, particularly based on deep neural networks, typically requires a large number of labeled training samples. The labeling of the samples needs to consume a large amount of manpower and material resources, for example, the image segmentation needs to be performed with semantic labeling pixel by pixel, the labeling difficulty is very high, and the cost is very high. After the model is trained on the annotation data, it is applied to the test data. Supervised learning is a very effective method when the test data and the training data have the same distribution. However, in practical applications, the distribution of the test data is different from that of the training data, so that the performance of the model learned on the training data is reduced.

Domain adaptation (domain adaptation) is a class of technical approaches to solve the above-mentioned problem of model performance degradation due to differences in the distribution of training and test data. The training data set is often referred to as the source domain and the test data set as the target domain. The data of the source domain is annotated with information, while the data of the target domain is typically annotated without information. The field self-adaptive technology aims to migrate supervision information in a source field to a target field and improve performance of tasks in the target field.

Deep neural network-based domain adaptive learning generally improves the performance of tasks on a target domain by learning a feature representation that is invariant across domains, i.e., a feature representation that has domain commonality. In order to obtain a cross-domain invariant feature representation, the current mainstream method is realized by domain confrontation training. The domain confrontation training is difficult to converge than the non-confrontation training because a pair of target functions confronted with each other needs to be optimized at the same time, and a model obtained by training is often not optimal.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a field self-adaptive deep learning method, which is oriented to computer vision tasks and provides a non-countermeasure field self-adaptive method to improve the performance of tasks in the target field; the application also provides a readable storage medium for field adaptive deep learning, which also solves the technical problem.

The application provides a field self-adaptive deep learning method, which comprises the following steps:

step 1: rotating each target domain image according to a set angle, wherein the images formed after rotation correspond to different category labels respectively; zooming and cutting the images formed after rotation to the same size, then randomly disordering all the images in sequence, keeping the class label corresponding to each image unchanged, and forming a self-supervision learning training sample set;

step 2: constructing a visual task (T) on a target domain for a source domain training sample set through a multi-task learning deep neural network, constructing an image classification task for the self-supervision learning training sample set, and then performing joint training on the source domain training sample set and the self-supervision learning training sample set;

and step 3: and (3) applying the deep learning model obtained by the joint training to a visual task (T) on a target domain.

Optionally, each target domain image in step 1 is rotated by 0 °, 90 °, and 180 ° to form three new pictures.

Optionally, the image formed after rotation in step 1 is scaled and cropped to 224 pixels in length and width.

Optionally, the image formed after rotation in step 1 is subjected to data enhancement before scaling, wherein the data enhancement comprises random brightness or saturation adjustment.

Optionally, the multitask learning deep neural network comprises an encoder backbone network (F), an image classifier network branch (C) and a visual task network branch (S);

the encoder backbone network (F) and the image classifier network branch (C) construct an image classification task for the self-supervision learning training sample set, and the encoder backbone network (F) and the visual task network branch (S) construct a learning task (T) for the source field training sample set.

The invention provides a readable storage medium having stored thereon a program which, when executed by a processor, performs the steps of the domain adaptive deep learning method.

The field self-adaptive deep learning method provided by the invention learns the feature representation of the target field by combining the supervised learning of the source field and the supervised learning task of the target field, thereby realizing the field self-adaptation. The method can not only give full play to the high efficiency of supervised learning in deep neural network training, but also construct a target field self-supervised learning training set without depending on artificial labeling. Through the joint training of the source field samples and the target field samples, the commonality of the source field and the target field is effectively utilized to establish the characteristic representation of the task adapting to the target field, so that the performance of the task on the target field is improved.

The field-adaptive deep learning readable storage medium provided by the invention also has the beneficial effects.

Drawings

Fig. 1 is a schematic flow chart of an adaptive deep learning training process in the field of self-supervision according to an embodiment of the present invention.

FIG. 2 is a schematic flow chart of using the trained model for target domain testing.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

Fig. 1 shows a schematic diagram of a training process of the field adaptive deep learning according to an embodiment of the present invention. The method mainly comprises the following steps:

step 1: carrying out rotation transformation on the target domain image to obtain a self-supervision learning training sample set;

step 2: and performing joint training on the converted self-supervision learning training sample set and source field training samples to obtain a deep learning model.

And step 3: and applying the model obtained by the joint training to the visual task T on the target domain.

And step 1, performing rotation transformation on the target domain image to obtain a self-supervision learning training sample set. The process firstly rotates the target domain image by 0 degrees, 90 degrees and 180 degrees respectively, and the three rotation angles correspond to the category labels 0, 1 and 2 respectively. The process does not need to manually label each picture one by one, and automatic generation of the labeled samples in the self-supervision learning is realized. And performing data enhancement preprocessing on the rotated image, including randomly adjusting contrast and brightness, and then scaling and cutting the image to a uniform picture size, wherein the length and width of the image scaling and cutting are 224 pixels. And (4) disordering the sequence of the processed images randomly, wherein the label corresponding to each image is unchanged. The process inputs the target domain image and outputs the converted target domain image X_tAnd its corresponding rotation type label Y_t。

And 2, performing combined training on the converted self-supervised learning training sample set and the source field training sample to obtain a deep learning model. The process firstly constructs a multitask learning deep neural network, wherein the network comprises a feature extraction encoder backbone network F, an image classifier network branch C and a network branch S corresponding to a visual task T used on a target domain. A coder backbone network F and a visual task network branch S in the network are used for supervising and learning tasks of a task T on a source domain sample; and the encoder backbone network F and the image classifier network branch C are used for image classification tasks on the transformed target field self-supervision learning samples.

In the figure, the black solid arrows represent the forward propagation of data in the neural network, and the dashed lines represent the backward propagation of the gradient. For the image classification task, the image X after the target domain transformation_tInputting the output Y of the encoder backbone network F, the output characteristics of which are used as the input of the image classifier network branch C_t ^*Is the predicted image rotation category. Classifier loss function according to label class Y_tAnd prediction class Y_t ^*The classifier errors are calculated which update the parameters of the image classifier network branch C and the feature encoder F by back-propagation. For visual task T, source domain image X_sThe output characteristics of the input encoder backbone network F, which is used as the input of the network branch S of the visual task T, are obtained as the output Y_s ^*For the predicted value of the visual task T, the loss function of the visual task T is according to the label type Y_sAnd prediction class Y_s ^*And calculating the error of the visual task T, and updating the parameters of the visual task network branch S and the encoder backbone network F through back propagation. Because the samples of the source domain and the target domain both affect the parameter update of the encoder backbone network F, the encoder backbone network F obtained through training can learn the cross-domain feature representation, thereby realizing the self-adaptation to the target domain.

Fig. 2 shows a schematic flow chart of applying the model obtained by the domain adaptive learning to the target domain visual task T.

As shown in fig. 2, the feature extraction encoder backbone network F extracts features from an input target domain image, and then obtains a task T prediction result by taking the obtained features as an input of a task T network branch C and performing forward propagation calculation.

The following describes a readable storage medium provided by an embodiment of the present application, and the readable storage medium described below and the field adaptive deep learning method described above may be referred to correspondingly.

A readable storage medium is disclosed having a program stored thereon, which when executed by a processor, performs the steps of a domain adaptive deep learning method.

It is clear to those skilled in the art that, for convenience and brevity of description, the above-described flows of the program in the readable storage medium may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In order to better illustrate the technical effects of the present invention, taking the task of semantic segmentation of images as an example, the inventors have further performed the following experiments:

experiment 1: adaptive learning in the field of image semantic segmentation from SYNTHIA datasets to cityscaps datasets

The experiment performed domain adaptive learning between SYNTHIA datasets and cityscaps datasets. The SYNTHIA is a virtual scene data set, all data are made through three-dimensional simulation software, and the data comprise 9400 pictures and corresponding semantic segmentation labels. Cityscapes is a real world data set, the training data set contains 2975 pictures, and the verification data set contains 500 pictures. The cityscaps contain a total of 19 semantic tags. In this experiment, we used the SYNTHIA dataset and the class 13 semantic tags common to cityscaps. In this experiment, the SYNTHIA dataset was used as the source domain dataset and the ctyscaps dataset was used as the target domain dataset. The validated dataset of citrescaps in this experiment was used to evaluate the performance of the proposed method. The evaluation index adopts an average intersection ratio mIoU (mean intersection over intersection). mIoU represents the coverage rate of the predicted image semantic segmentation result relative to the true value. The test results are shown below:

table 1:

in table 1, from the third column to the last column, each column represents a certain semantic category, and the average precision of the second column is the average value of the semantic segmentation precision of each category. Table 1 compares three methods, including the method using only source domain samples for training (SRC), the method based on antagonistic training (FCN-W), and the method proposed by the present invention (RotDA). As can be seen from table 1, the performance on the target domain is the worst by using only the method of source domain sample training because no adaptive domain adjustment is performed. By adopting the countertraining method, better self-adaptive learning effect can be obtained due to the fact that the field confusion degree of feature representation is improved. According to the invention, through self-supervision learning, the feature representation more adaptive to the target field is obtained, so that the remarkable performance improvement is achieved in the target field.

Experiment 2: adaptive learning in the field of image semantic segmentation from GTA datasets to Cityscapes datasets

The experiment performed domain adaptive learning between GTA and cityscaps datasets. The GTA dataset is derived from a los angeles city scene in a three-dimensional video game, and contains 24966 pictures and their corresponding semantic segmentation labels. The annotations contain class 19 semantic tags, consistent with the Cityscapes semantic tag definition, so this experiment evaluates on class 19 semantic tags. In the experiment, a GTA data set is used as a source field data set, and a Cityscapes data set is used as a target field data set. The test results are shown below:

table 2:

from table 2, a similar conclusion can be drawn as in table 1, that is, the domain adaptive deep learning method can achieve better performance than the source domain model, and better performance than the resistance training. The method obtains excellent performance due to the fact that the feature expression adaptive to the target field is obtained through the joint training of the source field sample and the target field sample.

Although the present invention has been described in terms of preferred embodiments, it is to be understood that the invention is not limited to the embodiments described herein, but includes various changes and modifications without departing from the scope of the invention.

Claims

1. A domain adaptive deep learning method is characterized by comprising the following steps:

s1: rotating each target domain image according to a set angle, wherein the images formed after rotation correspond to different category labels respectively; zooming and cutting the images formed after rotation to the same size, then randomly disordering all the images in sequence, keeping the class label corresponding to each image unchanged, and forming a target domain self-supervision learning training sample set;

s2: performing joint training on the target domain self-supervision learning training sample set and the source domain training samples to obtain a deep learning model;

s2.1, constructing a multitask learning neural network comprising a visual task network branch S and an image classifier network branch C;

s2.2 Source Domain samples { X_s，Y_sAnd the target domain self-supervised learning training sample set { X ] obtained in the S1_t，Y_tInputting the multitask learning deep neural network for joint training;

wherein X in the source domain sample_sObtaining an output Y through a visual task network branch S_s ^*Visual task T loss function based on sample tag value Y_sAnd the predicted value Y_s ^*Calculating the error of the visual task T; x in target domain self-supervised learning training sample set_tObtaining an output Y through an image classifier network branch C_t ^*The classifier penalty function is based on the sample label value Y_tAnd the predicted value Y_t ^*Calculating the error of the self-supervised learning task;

s3: and applying the deep learning model obtained by the joint training to the visual task T on the target domain.

2. The method of claim 1, wherein: and (4) performing rotation of 0 degrees, 90 degrees and 180 degrees on each target domain image in the step S1 to form three new pictures.

3. The method of claim 2, wherein: the image formed after the rotation in step S1 is scaled and cropped to 224 pixels in both length and width.

4. The method of claim 1, wherein: the image formed after rotation in step S1 is subjected to data enhancement including random brightness or saturation adjustment prior to scaling.

5. The method of claim 1, wherein: the multi-task learning deep neural network comprises an encoder backbone network F, an image classifier network branch C and a visual task network branch S;

the encoder backbone network F and the image classifier network branch C construct an image classification task for the self-supervision learning training sample set, and the encoder backbone network F and the visual task network branch S construct a visual task T for the source field training sample set.

6. A readable storage medium, characterized in that the readable storage medium has stored thereon a program which, when executed by a processor, implements the domain adaptive deep learning method according to any one of claims 1 to 5.