CN112016594B

CN112016594B - Collaborative training method based on field self-adaption

Info

Publication number: CN112016594B
Application number: CN202010778786.5A
Authority: CN
Inventors: 李冠彬; 赵赣龙
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2023-06-09
Anticipated expiration: 2040-08-05
Also published as: CN112016594A

Abstract

The invention discloses a collaborative training method based on field self-adaption, which is characterized in that one party is trained by utilizing high confidence coefficient output of the other party, and candidate areas with low confidence coefficients of the two parties are processed by utilizing a maximized classifier difference method; in addition, on the characteristic alignment of the backbone network, the output of the RPN is utilized to calculate the foreground probability of each point on the characteristic diagram, and the region with larger foreground probability is given larger weight when the characteristics are aligned; the method and the device have the advantages that the target detection capability of the model in the label-free field is improved, the requirement of the target detection model on labeling data is reduced, and the dependence on human resources is reduced.

Description

Collaborative training method based on field self-adaption

Technical Field

The invention relates to the field of model training, in particular to a collaborative training method based on field self-adaption.

Background

The development of deep learning makes the computer vision field a great progress, however, the vast amount of labeling data required for training the deep learning model restricts the wide application of the deep learning. The problem is particularly acute because the object detection task requires finer granularity of labeling. Domain adaptation attempts to migrate a model from a domain rich in annotation data (source domain) to a domain lacking annotation data (target domain), thereby solving this problem.

Currently, the field adaptation in target detection is mainly divided into two types of schemes.

(1) The domain adaptation on the feature level is mainly realized by aiming at the alignment of features in two domains through antagonism learning. In this case, since the image of the object detection is relatively complex, additional processing is required. The Faster RCNN is a deep learning model widely applied to the problem of target detection, is representative of a two-stage target detection model, and has certain representativeness. The DA-master processes the fast RCNN model, performs feature alignment on the global features of the image extracted by the backbone network, and performs feature alignment on the ROI features obtained by ROI pooling. The SWDA realizes the field self-adaption on the target detection task through the strong alignment of the local features of the image and the weak alignment of the global features of the image. The SCDA applies the model trained on the source domain to the target domain, constructs new image blocks from the resulting candidate regions, and then aligns the features of these blocks.

(2) The domain self-adaption on the image level mainly comprises the steps of generating an image of a source domain into an image of a target domain by using a generation countermeasure network, and training a model by using original labels. DD & MRL uses Cyc LeGAN to generate different images for training the same source domain picture, so that the target detection model has better domain generalization capability.

However, the domain-adaptive model and flow on the image level are more complex, requiring higher computational effort. The existing domain adaptive methods on the feature level focus on how to make the backbone network extract features consistent to both domains. The object detection network has a more complex structure in which the two-phase network consists of a backbone network, a region candidate network and a candidate region classifier. Neglecting the other two parts, especially the region candidate network, can result in the model failing to generate good quality candidate regions, further limiting detection performance.

As shown in fig. 1, the specific flow of target detection by master fast RCNN is as follows:

1. for any image, the backbone network F takes it as input to extract the features. 2. And sending the features extracted by the backbone network into a region candidate network (RPN), judging the probability of the region candidate network belonging to the foreground and the background of each region by the region candidate network, and taking the region with higher foreground probability as a candidate region. 3. These candidate regions, along with the image features extracted by the backbone network, are fed into a candidate region classifier (RPC). For each candidate region, the RPC obtains the features of that region from the corresponding location on the image feature (i.e., ROI pooling operation). 4. The RPC produces an output for each candidate region's characteristics indicating the probability that the candidate region belongs to each particular class. The "specific category" herein includes several object categories and one background category. At this time, the probability is filtered, so that the target detection frame and the prediction category thereof are obtained.

When the fast RCNN is migrated to a target domain without a tag, performance is degraded due to the existence of domain differences. For this problem we have improved on the architecture of the fast RCNN. We consider that RPN and RPC can be considered approximately two parallel structures that succeed the backbone network, with the information used to determine candidate regions flowing unidirectionally from the RPN into the RPC. For each of these candidate regions, the RPN will generate its foreground probability and background probability, while the RPC will generate the probability that it belongs to the background or either category. In an ideal situation, the two probabilities should be kept consistent, namely, the probability of belonging to the background in the output of the RPC is higher in a candidate region with lower RPN foreground probability; otherwise, in the candidate region with higher RPN foreground probability, the sum of all the class probabilities in the output of the RPC should be higher. On the other hand, we find that RPCs have better robustness against inter-domain differences, and that the high confidence candidate regions in the RPC output can be fully utilized for reverse training of the RPN.

Based on the thought, a cooperative training scheme of the RPN and the RPC is designed, so that the target detection capability of the model in the label-free field is improved, the requirement of the target detection model on labeling data is reduced, and the dependence on human resources is reduced.

Disclosure of Invention

The invention provides a collaborative training method based on field self-adaption, which is used for improving the target detection capability of a model in the field without labels, reducing the requirement of the target detection model on labeling data and reducing the dependence on human resources.

In order to solve the technical problems, an embodiment of the present invention provides a collaborative training method based on domain adaptation, including:

in each training iteration of the fast RCNN model, acquiring a source field picture containing labels and a target field picture without labels;

inputting the source domain picture into a fast RCNN model for target detection and obtaining a first source domain feature output by a backbone network; meanwhile, the first source domain features are transmitted to a domain classifier through a gradient overturning layer to perform loss calculation;

inputting the target field picture into a fast RCNN model for target detection, and obtaining foreground and background probabilities output by an RPN, category probabilities output by an RPC and first target field features output by a backbone network;

calculating a loss total value of the fast RCNN model according to the foreground and background probabilities output by the RPN, the class probability output by the RPC and the first target field characteristic output by the backbone network;

and (3) carrying out optimization adjustment on parameters of the fast RCNN model, and stopping training iteration of the fast RCNN model when the number of training iterations reaches a preset threshold value or the total loss value does not exceed a preset loss value.

As a preferred solution, the step of calculating the total loss value of the fast RCNN model according to the foreground and background probabilities output by the RPN, the class probability output by the RPC, and the first target domain feature output by the backbone network specifically includes:

training a candidate region with high confidence in the RPN according to the foreground and background probabilities output by the RPN and the class probability output by the RPC, self-training the RPC according to the RPN training output result, and calculating a first loss value of a fast RCNN model;

training a candidate region with low confidence in the RPN through a maximum classifier difference algorithm according to the foreground and background probabilities output by the RPN and the class probability output by the RPC, and calculating a second loss value of a fast RCNN model;

according to the foreground and background probabilities output by the RPN and the first target field characteristics output by the backbone network, performing weighted characteristic alignment training on the backbone network, and calculating a third loss value of the Faster RCNN model;

and calculating the total loss value of the Faster RCNN model according to the first loss value, the second loss value and the third loss value.

As a preferred solution, the algorithm for training the candidate region with high confidence in the RPN is as follows:

wherein the method comprises the steps of

Self-training of RPN in the field of targets, L _rpn Representing the manner in which the Faster RCNN model traditionally calculates the RPN loss; />

Representing candidate regions derived from the RPC; lambda is the control weight decay rate; />

Representing the background probability of the RPC for the prediction of the candidate region; f (f) _w (s _cls ) Is a weight calculated from the background probability.

As a preferred scheme, the algorithm for self-training the RPC is as follows: entropy minimization method.

As a preferred scheme, the maximum classifier difference algorithm specifically comprises:

L _MCD ＝f _w (s _cls ,s _rpn )L _discrepancy (s _cls ,s _rpn )；

wherein s is _cls Sum s _rpn Respectively representing the prediction category given by the RPC and the RPN for the candidate region;

and->

Indicating a pre-formMeasuring the foreground probability of the category; l (L) _discrepancy To measure the degree of difference between RPC and RPN, f _w (s _cls ,s _rpn ) And calculating corresponding weights according to the confidence levels of the RPC and the RPN.

As a preferred scheme, the calculation formula for performing weighted feature alignment training on the backbone network specifically includes:

wherein the method comprises the steps of

Domain classification loss for the source domain; />

A domain classification penalty representing a target domain; w and H respectively represent the number of areas in the horizontal direction and the vertical direction; f and D represent backbone network and domain classifier, respectively; x is x _s And x _t Images representing a source domain and a target domain, respectively; f (f) _i The foreground probability of the i-th anchor point in the output of the RPN is represented.

Preferably, in the step of calculating the total loss value of the fast RCNN model according to the first loss value, the second loss value and the third loss value, the method specifically includes:

and summing the first loss value, the second loss value and the third loss value, and taking the sum as the loss sum of the Faster RCNN model.

and respectively setting corresponding proportion parameters for the first loss value, the second loss value and the third loss value according to a preset proportion rule, then summing the values, and taking the sum total value as the loss total value of the Faster RCNN model.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

according to the technical scheme, collaborative training is performed through the corresponding relation between the output of the candidate region network and the candidate region classifier, so that the target detection capability of the model in the label-free field is improved, the requirement of the target detection model on labeling data is reduced, and the dependence on human resources is reduced.

Drawings

Fig. 1: a specific flow schematic diagram for carrying out target detection on a Faster RCNN model in the prior art;

fig. 2: the method is a step flow chart of a collaborative training method based on field adaptation.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

Referring to fig. 2, a step flowchart of a collaborative training method based on domain adaptation provided by an embodiment of the present invention includes steps 101 to 105, where each step is specifically as follows:

step 101, in each training iteration of the fast RCNN model, a source domain picture containing labels and a target domain picture without labels are obtained.

102, inputting the source domain picture to a fast RCNN model for target detection and obtaining a first source domain feature output by a backbone network; and simultaneously, transmitting the first source domain features to a domain classifier through a gradient inversion layer to perform loss calculation.

Specifically, the source domain pictures and their labels are input into the Faster RCNN, and loss calculation and training are performed in the manner of the original Faster RCNN. Meanwhile, the characteristics obtained by extracting the source domain pictures by the backbone network are sent to a domain classifier D through a gradient inversion layer (GRL). The loss calculation is performed in a weighted feature alignment manner, the domain classifier attempts to correctly determine that the feature is from the source domain, and the backbone network attempts to make it misjudged.

And step 103, inputting the target field picture into a fast RCNN model for target detection, and obtaining foreground and background probabilities output by the RPN, category probabilities output by the RPC and first target field features output by a backbone network.

Specifically, a target field picture without labels is input into a fast RCNN, prediction and output are carried out according to the above-mentioned fast RCNN flow, and characteristics extracted by a backbone network, foreground and background probabilities of RPN prediction for each region and category probabilities of RPC prediction for each candidate region are obtained.

And 104, calculating the total loss value of the fast RCNN model according to the foreground and background probabilities output by the RPN, the class probability output by the RPC and the first target domain feature output by the backbone network.

In this embodiment, the step 104 specifically includes steps 1041 to 1044, and each step specifically includes the following steps:

step 1041, training the candidate region with high confidence in the RPN according to the foreground and background probabilities output by the RPN and the class probability output by the RPC, and self-training the RPC according to the RPN training output result, so as to calculate the first loss value of the fast RCNN model.

Specifically, areas with higher confidence in the RPN and RPC outputs are mainly utilized. For each candidate region, the foreground probability and the background probability of the candidate region can be obtained in the output of the RPN; the background probabilities thereof, and the specific probabilities of each foreground object class, are available in the output of the RPC. Note that summing all foreground object class probabilities of the RPC can be regarded approximately as foreground probabilities, so that both can be co-self-trained.

For the RPN, whether the region is a positive example (foreground) or a negative example (background) can be judged according to the RPC score of each final output candidate region, and the region is reversely input into the RPN to guide the training of the RPN. Considering that candidate regions of high confidence have greater training value, the training penalty for each candidate region is multiplied by a weight calculated from the confidence. The higher the confidence, the greater the weight, whereas the lower the confidence, the less the weight. The specific calculation method comprises the following steps:

wherein the method comprises the steps of

Representing self-training of RPN in the target domain, L _rpn The manner in which Faster RCNN would otherwise calculate the RPN loss is shown. />

Representing candidate regions derived from the RPC. Lambda controls the rate of weight decay. />

The background probability of RPC prediction for this candidate region is represented. f (f) _w (s _cls ) The weight is calculated based on this background probability, and the candidate region with greater confidence will have a higher weight.

On the other hand, the self-training of the RPC depends on the output result of the RPN. We self-train the RPC using the entropy minimization method (Entropy minimization). The RPC outputs a classification probability distribution for each candidate region to calculate an information entropy, which the training network uses to minimize. Samples with higher probability confidence for the RPN output have higher weights in the loss calculation. Among other things, the entropy minimization method (Entropy Minimization) is used to solve domain adaptation on classification problems. It calculates the information entropy using the probability distribution of the network's output of the target domain samples and trains the network to minimize the information entropy.

Step 1042, training the candidate region with low confidence in the RPN through a maximum classifier difference algorithm according to the foreground and background probabilities output by the RPN and the class probability output by the RPC, and calculating a second loss value of the fast RCNN model.

Specifically, a method (Maximize Classifier Discrepancy, MCD) of maximizing classifier variance is used to address domain adaptation on classification problems. In terms of classification problems, it separates the network into a feature extractor and two similarly structured classifiers. And attempts to maximize the prediction difference between the two classifiers during training and to minimize the difference by the feature extractor. The step mainly uses the region with lower confidence of the RPN and RPC output. For candidate regions with less confidence we apply the maximum classifier difference method (Maximize Classifier Discrepancy, MCD). The specific calculation method is as follows:

L _MCD ＝f _w (s _cls ,s _rpn )L _discrepancy (s _cls ,s _rpn ).

wherein s is _cls Sum s _rpn Representing predictions given by RPC and RPN for a certain candidate region, respectively.

And->

Showing the foreground probability therein (rather than the background). L (L) _discrepancy Measure the difference degree of the two, f _w (s _cls ,s _rpn ) The corresponding weights are calculated based on the confidence levels of the two. The lower confidence candidate regions have higher weights in the penalty calculation. RPN and RPC attempt to maximize L _MCD While backbone networks attempt to minimize L _MCD . In general, this partial flow may be summarized as follows:

1. for each candidate frame, the foreground probability and the background probability of the candidate frame are obtained from the output of the RPN, the classification probability distribution of the candidate frame is obtained from the output of the RPC, all object categories in the output of the RPC are summed to be used as the foreground probability, and the background probability is kept unchanged. 2. Computing L from the foreground/background probabilities of two modules _MCD . 3. Training RPN and RPC to maximize L _MCD Training backbone networks to minimize L _MCD 。

And step 1043, performing weighted feature alignment training on the backbone network according to the foreground and background probabilities output by the RPN and the first target domain feature output by the backbone network, and calculating a third loss value of the fast RCNN model.

In particular, the weighted feature alignment is used to require that the backbone network generated features be independent of its source domain. The probability that the area represented by each point in the feature map extracted by the backbone network belongs to the foreground can be obtained from the foreground probability map generated by the RPN. At the moment, the regional classifier focuses on the region with larger foreground probability on the feature map by setting the weight of the corresponding position in the loss calculation. The domain classifier accepts the extracted features of the backbone network and attempts to determine from which domain the feature came, and on the other hand, the backbone network attempts to optimize in the opposite direction to classify the feature as erroneous. The two are mutually opposed, and finally, the characteristics irrelevant to the field are extracted for target detection. The calculation method comprises the following steps:

wherein the method comprises the steps of

Domain classification loss representing source domain, +_>

Representing domain classification loss for the target domain. W and H represent the number of regions in the horizontal and vertical directions, and F and D represent the backbone network and domain classifier, respectively. X is x _s And x _t Images representing source and target fields. f (f) _i The foreground probability of the i-th anchor point in the output of the RPN is represented. There are 9 classes in the specific implementation. The domain classifier tries to minimize +.>

And->

While backbone networks attempt to maximize it. The gradient inversion layer (GRL) is used for realizing gradient inversion, and gradient inversion can be performed at the same time, so that the optimization directions of the gradient inversion layer and the gradient inversion layer are opposite.

And step 1044, calculating a total loss value of the fast RCNN model according to the first loss value, the second loss value and the third loss value.

In another embodiment, in the step of calculating the total loss value of the fast RCNN model according to the first loss value, the second loss value and the third loss value, specifically:

And 105, optimally adjusting parameters of the fast RCNN model, and stopping training iteration of the fast RCNN model when the number of training iterations reaches a preset threshold or the total loss value does not exceed a preset loss value.

Specifically, training is stopped after a certain number of iterations is reached or after the loss is less than a certain value. The trained model is applied to the label-free target field for prediction, and the performance of the model is greatly improved compared with the performance of the model which is directly trained by using the source field image.

The invention is based on python and PyTorch deep learning framework implementation. The network training adopts a random gradient descent optimizer, the momentum is 0.9, and the initial learning rate is 0.001. 10000 iterations were first trained using the domain classifier, followed by 6000 iterations using all modules.

The technical scheme of the invention designs a collaborative training scheme of RPN and RPC, wherein one party is trained by using the high confidence output of the other party, and candidate areas with low confidence output of the two parties are processed by using a method of maximizing classifier difference (Maximize Classifier Discrepancy, MCD). In addition, on the characteristic alignment of the backbone network, the output of the RPN is utilized to calculate the foreground probability of each point on the characteristic diagram, and the region with larger foreground probability is given larger weight when the characteristics are aligned. Compared with the prior scheme, the scheme fully considers the mobility of the region candidate network, designs an effective collaborative self-training scheme by independently processing the output of the region candidate network and the candidate region classifier to the candidate region, and carries out different processing on the candidate region with different confidence degrees. On the other hand, the scheme improves the prior characteristic alignment mode by utilizing the information provided by the area candidate network. And excellent detection performance is obtained in the target detection migration tasks of Sim10k to Cityscapes, cityscapes to FoggyCityscapes and the like.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims

1. The field self-adaption based collaborative training method is characterized by comprising the following steps of:

according to the foreground and background probabilities output by the RPN, the class probability output by the RPC and the first target field feature output by the backbone network, calculating the total loss value of the fast RCNN model, specifically:

calculating a total loss value of the Faster RCNN model according to the first loss value, the second loss value and the third loss value;

the maximum classifier difference algorithm specifically comprises the following steps:

L _MCD ＝f _w (s _cls ,s _rpn )L _discrepancy (s _cls ,s _rpn )；

and->

A foreground probability representing a prediction category; l (L) _discrepancy To measure the degree of difference between RPC and RPN, f _w (s _cls ,s _rpn ) Calculating corresponding weights according to the confidence degrees of the RPC and the RPN; wherein lambda is the control weight decay rate;

2. The domain-based adaptive co-training method of claim 1, wherein the algorithm for training the candidate regions of high confidence in the RPN is:

wherein the method comprises the steps of

Representing self-training of RPN in the target domain, L _rpn Representing the manner in which the Faster RCNN model traditionally calculates the RPN loss; />

Representing the background probability of the RPC for the prediction of the candidate region; f (f) _w (s _cls ) Is a weight calculated from the background probability. />

3. The domain-based adaptive co-training method of claim 1, wherein the algorithm for self-training the RPC is: entropy minimization method.

4. The field-adaptive-based collaborative training method according to claim 1, wherein the calculation formula for performing weighted feature alignment training on a backbone network is specifically as follows:

wherein the method comprises the steps of

A domain classification penalty representing a source domain; />

5. The domain-based adaptive co-training method according to claim 1, wherein the step of calculating the total loss value of the fast RCNN model according to the first loss value, the second loss value and the third loss value comprises:

6. The domain-based adaptive co-training method according to claim 1, wherein the step of calculating the total loss value of the fast RCNN model according to the first loss value, the second loss value and the third loss value comprises: