CN109919209A

CN109919209A - A kind of domain-adaptive deep learning method and readable storage medium storing program for executing

Info

Publication number: CN109919209A
Application number: CN201910139916.8A
Authority: CN
Inventors: 许娇龙; 聂一鸣; 肖良; 朱琪; 商尔科; 戴斌
Original assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Current assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2019-06-21
Anticipated expiration: 2039-02-26
Also published as: CN109919209B

Abstract

The invention discloses a kind of domain-adaptive deep learning method, target area image is carried out rotation transformation, obtains self-supervisory learning training sample set by this method；By after conversion self-supervisory learning training sample set and source domain training sample set carry out joint training, obtain domain-adaptive deep learning model for the visual task on aiming field.This method can effectively learning objective characteristic of field indicate without being labeled to aiming field sample, promote the performance of Computer Vision Task on aiming field.Disclosed herein as well is a kind of domain-adaptive deep learning readable storage medium storing program for executing, equally have above-mentioned beneficial effect.

Description

A kind of domain-adaptive deep learning method and readable storage medium storing program for executing

Technical field

The present invention relates to domain-adaptive deep learning fields, the especially domain-adaptive of computer-oriented visual task Deep learning method and readable storage medium storing program for executing.

Background technique

The models such as image classification, image, semantic segmentation, target identification, target detection in Computer Vision Task, usually It is all to be obtained by supervised learning training.Supervised learning is based particularly on the supervised learning of deep neural network, it usually needs The training sample largely marked.The mark of these samples needs to expend a large amount of manpower and material resources, for example, image segmentation need into The semantic tagger of row pixel-by-pixel, mark difficulty is very big, and cost is very high.After model on labeled data complete by training, It applies it in test data.When test data and training data distribution having the same, supervised learning be it is a kind of very Effective method.However usually will appear test data in practical application and be distributed different situations from training data, so that The model performance decline learnt on the training data.

Domain-adaptive (domain adaptation) is that solution is above-mentioned since trained and test data distributional difference causes A kind of technical method of model performance decline problem.Training dataset is usually known as source domain, test data set is known as target Field.The data of source domain are with markup information, and the data of target domain are usually not no markup information.Field is certainly Adaptive technique is intended to move to the supervision message of source domain target domain, promotes the performance of task on target domain.

By learning cross-cutting constant character representation, i.e., domain-adaptive study based on deep neural network is usually Character representation with architecture, the performance of the task on Lai Tisheng target domain.Cross-cutting invariant features in order to obtain It indicates, the method for current main-stream is realized by field dual training.Field dual training since it is desired that optimize one simultaneously To the objective function confronted with each other, the non-dual training of the convergence ratio of training process is difficult, and the model that training obtains is frequently not most Excellent.

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of domain-adaptive deep learning methods, and this method is towards calculating Machine visual task provides a kind of domain-adaptive method of non-confrontation type, to improve the performance of task on target domain；The application A kind of domain-adaptive deep learning readable storage medium storing program for executing is provided simultaneously, equally solution above-mentioned technical problem.

The application provides a kind of domain-adaptive deep learning method, comprising:

Step 1: to each target area image, being rotated by the angle of setting, the image formed after rotation respectively corresponds not Same class label；The image scaling formed after rotation is cropped to identical size, then upsets all image sequences at random, The corresponding class label of each image remains unchanged, and forms self-supervisory learning training sample set；

Step 2: by multi-task learning deep neural network, to the vision on source domain training sample set building aiming field Task (T), while image classification task is constructed to the self-supervisory learning training sample set, then to source domain training sample set Joint training is carried out with self-supervisory learning training sample set；

Step 3: the visual task (T) the deep learning model that above-mentioned joint training obtains being used on aiming field.

Optionally, 0 °, 90 °, 180 ° of rotation are carried out to each target area image in step 1, forms three new figures Piece.

Optionally, it is 224 pixels that the image scaling formed after rotating in step 1, which is cropped to length and width,.

Optionally, carry out data enhancing after rotating in step 1 before the image scaling that is formed, the data enhancing include with Machine brightness or saturation degree are adjusted.

Optionally, the multi-task learning deep neural network includes encoder core network (F), Image Classifier network Branch (C) and visual task network branches (S)；

Wherein encoder core network (F) and Image Classifier network branches (C) are to the self-supervisory learning training sample Collection building image classification task, encoder core network (F) and visual task network branches (S) are to source domain training sample set structure Build learning tasks (T).

Readable storage medium storing program for executing provided by the invention is stored with program on the readable storage medium storing program for executing, and described program is processed The step of device realizes the domain-adaptive deep learning method when executing.

Domain-adaptive deep learning method provided by the invention, by combining source domain supervised learning and target domain certainly Supervised learning task carrys out the expression of learning objective domain features, to realize domain-adaptive.This method can give full play to supervision Learn the high efficiency in deep neural network training, and does not depend on artificial mark to construct target domain self-supervisory learning training Collection.By the joint training of source domain sample and target domain sample, the general character of source domain and target domain is efficiently utilized, To establish the character representation for adapting to target domain task, to improve the performance of task on target domain.

Domain-adaptive deep learning readable storage medium storing program for executing provided by the invention equally has above-mentioned beneficial effect.

Detailed description of the invention

Fig. 1 is the flow diagram of the self-supervisory domain-adaptive deep learning training process of the embodiment of the present invention.

Fig. 2 is the flow diagram that the model that will be trained is used for target domain test.

Specific embodiment

Present invention is further described in detail with reference to the accompanying drawing.

Fig. 1 gives the domain-adaptive deep learning training flow diagram of the embodiment of the present invention.This method is mainly wrapped Include following steps:

Step 1: target area image being subjected to rotation transformation, obtains self-supervisory learning training sample set；

Step 2: by after conversion self-supervisory learning training sample set and source domain training sample carry out joint training, obtain Deep learning model.

Step 3: the visual task T model that above-mentioned joint training obtains being used on aiming field.

Target area image is subjected to rotation transformation in step 1, obtains self-supervisory learning training sample set.The process is right first Target area image carries out 0 °, 90 ° and 180 ° of rotation respectively, and three rotation angles respectively correspond class label 0,1 and 2.This Process does not need manually to be labeled each picture one by one, realizes in self-supervisory study and marks automatically generating for sample. Data enhancing pretreatment is carried out to postrotational image, including adjusts contrast, brightness at random, then scaling is cropped to unification Picture size, it is 224 pixels that the present embodiment image scaling, which is cut into length and width,.By treated, image sequence is upset at random, often The corresponding label of a image is constant.Process input is target area image, is exported as the target area image X after conversion_tAnd its it is right The rotation class label Y answered_t。

Step 2 by after conversion self-supervisory learning training sample set and source domain training sample carry out joint training, obtain Deep learning model.The process constructs a multi-task learning deep neural network first, which includes a feature extraction Encoder core network F, Image Classifier network branches C network point corresponding with the visual task T on aiming field Branch S.The supervised learning of encoder core network F and visual task network branches S for task T on source domain sample in the network Task；Encoder core network F and Image Classifier network branches C is used for transformed target domain self-supervisory learning sample Image classification task.

In figure, solid black lines arrow represents the forward-propagating of data in neural network, and dotted line indicates the backpropagation of gradient. For image classification task, by the transformed image X of aiming field_tInput coding device core network F, the feature of output is as figure As the input of classifier network branches C, obtained output Y_t ^*For the image rotation classification of prediction.Classifier loss function is according to mark Infuse classification Y_tWith prediction classification Y_t ^*Classifier error is calculated, the error is by backpropagation to Image Classifier network branches C It is updated with the parameter of feature coding device F.For visual task T, by source domain image X_sInput coding device core network F, Input of the feature of output as the network branches S of visual task T, obtained output Y_s ^*For the predicted value of visual task T, vision Task T loss function is according to mark classification Y_sWith prediction classification Y_s ^*The error of visual task T is calculated, the error is by reversely passing It broadcasts and the parameter of visual task network branches S and encoder core network F is updated.Due to the sample standard deviation of source domain and aiming field The parameter for influencing encoder core network F updates, therefore can be learnt by the encoder core network F that training obtains to across neck The character representation in domain, to realize to the adaptive of target domain.

Fig. 2 gives the model for obtaining field adaptive learning and illustrates applied to the process of target domain visual task T Figure.

As shown in Fig. 2, target domain image zooming-out feature of the feature extraction encoder core network F to input, then will Task T prediction result is calculated by propagated forward in input of the obtained feature as task T-network branch C.

Readable storage medium storing program for executing provided by the embodiments of the present application is introduced below, readable storage medium storing program for executing described below with Above-described domain-adaptive deep learning method can correspond to each other reference.

A kind of readable storage medium storing program for executing disclosed in the present application, is stored thereon with program, and neck is realized when program is executed by processor The step of adaptive deep learning method in domain.

It is apparent to those skilled in the art that for convenience and simplicity of description, foregoing description it is readable The process of program, can refer to corresponding processes in the foregoing method embodiment in storage medium, and details are not described herein.

Technical effect in order to better illustrate the present invention, by taking image, semantic segmentation task as an example, inventor is also carried out Following experiments:

Experiment 1: from SYNTHIA data set to the segmentation domain-adaptive study of the image, semantic of Cityscapes data set

The experiment carries out domain-adaptive study between SYNTHIA data set and Cityscapes data set.SYNTHIA It is a virtual scene data set, all data are by three-dimensional artificial software development, altogether comprising 9400 pictures and its correspondence Semantic segmentation mark.Cityscapes is a real world data collection, and training dataset includes 2975 pictures, verifies number It include 500 pictures according to collection.It includes 19 kinds of semantic labels that Cityscapes, which has altogether,.In this experiment, we use SYNTHIA number According to collection and 13 Cityscapes public class semantic labels.In the experiment, SYNTHIA data set as source domain data set, Cityscapes data set is as target domain data set.The validation data set of Cityscapes is used to assess to be mentioned in this experiment The performance of method out.Evaluation metrics are handed over and using average than mIoU (mean intersection over union).mIoU What is indicated is the coverage rate of the image, semantic segmentation result relative real value of prediction.Shown in test result is as follows:

Table 1:

It arranges from third to last one and arranges in table 1, each column respectively indicate some semantic classes, and secondary series mean accuracy is The average value of semantic segmentation precision of all categories.Table 1 compared three kinds of methods, the method including only utilizing source domain sample training (SRC), the method based on dual training (FCN-W) and method proposed by the present invention (RotDA).From table 1 it follows that Only with the method for source domain sample training, due to not doing domain-adaptive adjustment, performance is worst on aiming field.Using pair Field degree of aliasing of the method for anti-training due to improving character representation, available preferable adaptive learning effect.This hair It is bright to be learnt by self-supervisory, the character representation for more adapting to target domain is obtained, therefore achieve significant property in target domain It can be promoted.

Experiment 2: from GTA data set to the segmentation domain-adaptive study of the image, semantic of Cityscapes data set

The experiment carries out domain-adaptive study between GTA data set and Cityscapes data set.GTA data set comes Los Angeles City scenarios in 3 D video game, the data set include 24966 pictures and its corresponding semantic segmentation Mark.Mark is consistent with the definition of the semantic label of Cityscapes comprising 19 class semantic labels, therefore this experiment is in 19 classes semanteme It is assessed on label.In the experiment, GTA data set is led as source domain data set, Cityscapes data set as target Numeric field data collection.Shown in test result is as follows:

Table 2:

From the available conclusion similar with table 1 of table 2, i.e., domain-adaptive deep learning method can be obtained leads better than source The performance of domain model, and compare dual training better performance.Have benefited from the joint by source domain and target domain sample Training has obtained adapting to the character representation of target domain, and the present invention achieves excellent performance.

Although the present invention has been described by means of preferred embodiments, the present invention is not limited to as described herein Embodiment further includes made various changes and variation without departing from the present invention.

Claims

1. a kind of domain-adaptive deep learning method, characterized in that it comprises the following steps:

S1: it to each target area image, is rotated by the angle of setting, the image formed after rotation respectively corresponds different classifications Label；The image scaling formed after rotation is cropped to identical size, then upsets all image sequences at random, each image Corresponding class label remains unchanged, and forms self-supervisory learning training sample set；

S2: by multi-task learning deep neural network, to the visual task on source domain training sample set building aiming field (T), image classification task while to the self-supervisory learning training sample set is constructed, then to source domain training sample set and certainly Supervised learning training sample set carries out joint training；

S3: the visual task (T) the deep learning model that above-mentioned joint training obtains being used on aiming field.

2. according to the method described in claim 1, it is characterized by: in step S1 each target area image carry out 0 °, 90 °, 180 ° of rotation form three new pictures.

3. according to the method described in claim 2, it is characterized by: the image scaling formed after rotating in step S1 is cropped to Length and width is 224 pixels.

4. according to the method described in claim 1, it is characterized by: being carried out before the image scaling formed after being rotated in step S1 Data enhancing, the data enhancing include that random brightness or saturation degree are adjusted.

5. according to the method described in claim 1, it is characterized by: the multi-task learning deep neural network includes encoder Core network (F), Image Classifier network branches (C) and visual task network branches (S)；

Wherein encoder core network (F) and Image Classifier network branches (C) are to the self-supervisory learning training sample set structure It builds image classification task, encoder core network (F) and visual task network branches (S) and is constructed to source domain training sample set Habit task (T).

6. a kind of readable storage medium storing program for executing, which is characterized in that be stored with program on the readable storage medium storing program for executing, described program is processed Such as domain-adaptive deep learning method described in any one of claim 1 to 5 is realized when device executes.