CN113537365A

CN113537365A - Multitask learning self-adaptive balancing method based on information entropy dynamic weighting

Info

Publication number: CN113537365A
Application number: CN202110820646.4A
Authority: CN
Inventors: 王玉峰; 丁文锐; 肖京
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2021-10-22
Anticipated expiration: 2041-07-20
Also published as: CN113537365B

Abstract

The invention discloses a multitask learning method based on information entropy dynamic weighting, and belongs to the technical field of machine learning. Firstly, an initial multi-task learning model M is set up, model inference is carried out on an input image to obtain a plurality of task output graphs, normalization processing is carried out on the task output graphs respectively to obtain corresponding normalized probability graphs; then, calculating a fixed weight multi-task loss function by utilizing each normalized probability graph, and carrying out primary training on the multi-task learning model M; and finally, on the basis of the initially trained multi-task learning model M, constructing a final self-adaptive multi-task loss function through an information entropy dynamic weighting algorithm, performing iterative optimization training on the initially trained multi-task learning model until the multi-task learning model converges, and terminating the training to obtain the optimized multi-task learning model M1. The invention can effectively deal with different types of tasks, adaptively balance the relative importance of each task, and has strong algorithm applicability, simplicity and high efficiency.

Description

Multitask learning self-adaptive balancing method based on information entropy dynamic weighting

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a multitask learning self-adaptive balancing method based on information entropy dynamic weighting.

Background

Machine learning is one of the core technologies of artificial intelligence, and the machine learning improves the performance of a computer algorithm through empirical knowledge to realize intelligent and autonomous learning work. However, machine learning techniques generally require a large number of learning samples, and especially, the most recent and popular deep learning models generally require a large number of labeled samples to train the network. However, in many applications, some task labels of training samples are difficult to collect or are time consuming and laborious to manually label. In this case, multitask learning may be utilized to maximize the utilization of the limited training samples in each task.

The multi-task learning aims at jointly learning a plurality of related tasks to improve the generalization performance of each task, and is widely applied to the fields of natural language processing, computer vision and the like. Where each task may be a general learning task such as a supervised task (e.g., a classification or regression problem), an unsupervised task (e.g., a clustering problem), a reinforcement learning task, or a multi-view learning task, among others.

In recent years, the performance of various computer vision tasks is greatly improved by deep learning, and the combination of the deep multi-task learning and the multi-task learning, namely deep multi-task learning research, is a huge progress by jointly learning a plurality of tasks in one model so as to obtain better generalization performance and lower memory occupation. However, the following problems still exist in the deep multitask learning at present: (1) the information exchange among different subtasks is not sufficient, so that the advantage of multi-task learning is difficult to be fully exerted; (2) the loss function of most existing MTL studies is usually derived from linear weighting of losses for subtasks, which is dependent on human experience and lacks adaptability.

Current deep multitask learning research is mainly focused on the design of network structure and optimization strategies:

in the network structure research, there are two main ways of performing a multitask learning mechanism in a deep neural network, namely hard parameter sharing and soft parameter sharing. Where hard parameter sharing typically shares a hidden layer among all tasks while preserving multiple task-specific output layers. The more tasks that are learned simultaneously, the more the model needs to find an expression that is suitable for all tasks, so hard parameter sharing greatly reduces the risk of overfitting. On the other hand, in soft parameter sharing, each task has its own model and corresponding parameters, and then regularization adjustment is performed on model parameter distances to increase the degree of coherence of the parameters.

In optimization strategy research, most of multi-task learning related works simply set the weight of each task to be a fixed proportion, but the method is heavily dependent on human experience, and in some cases, improper weight can cause some subtasks not to work normally. Therefore, different from the design of the structure of the multi-task sharing model, the other part of research focuses on balancing the influence of different tasks on the network, including research on uncertainty weight, gradient normalization algorithm, dynamic weight averaging strategy and the like.

In summary, since the multitask model includes multiple learning tasks, how to adaptively balance the importance among different tasks has important research significance.

Disclosure of Invention

In order to improve the generalization of the multi-task learning model, the invention designs a multi-task learning self-adaptive balancing method based on information entropy dynamic weighting on the model optimization strategy through the analysis of the characteristics of different tasks and the requirements of multi-task model application, namely, the relative weight of each task loss function is dynamically adjusted in the model training process, and the self-adaptive training and accurate prediction of the multi-task learning model are realized.

The multitask learning self-adaptive balancing method based on the information entropy dynamic weighting comprises the following specific steps:

step one, a multitask learning model M is built, model inference and normalization processing are carried out on an input image through the current multitask learning model M, and different types of normalized probability graphs are obtained;

the initial multi-task learning model M comprises one shared encoder and three task-specific decoders.

After the multi-task learning model M carries out model inference on the input image, three pixel-level task outputs are generated and are respectively a semantic segmentation output graph P_sDepth estimation output map P_dAnd edge detection output map P_bRespectively carrying out normalization processing on each task output graph to obtain different types of normalized probability graphs, which specifically comprises the following steps:

1) semantic segmentation output graph P_sProcessing by adopting a softmax function to obtain a normalized semantic segmentation probability map:

wherein M is the total category number of semantic segmentation, i represents the ith semantic category in the prediction graph, and P_s，iFor model output map P_sLayer i semantically partitions the value map, and P'_s，iThen representing the normalized ith layer semantic segmentation probability map P'_s。

2) Edge detection output graph P_bProcessing by adopting sigmoid function to obtain normalized edge detection probability map P'_b：

3) Depth estimation output map P_dConverting the depth regression task into a classification task by using a logarithmic space discretization strategy, and obtaining a normalized depth classification probability map by using a softmax function;

firstly, a logarithmic space discretization strategy is adopted to discretely divide the depth value of a continuous space into K sub-intervals corresponding to K categories;

the method specifically comprises the following steps: interval of depth values [ D₁，D₂]Mapping to [ D₁+1，D₂+1]Is recorded as [ D'₁，D′₂]And according to a discretized depth threshold d_kDividing to obtain K sub-intervals { [ d ]₀，d₁]，[d₁，d₂]，...，[d_K-1，d_K]}。

Discretized depth threshold d_kIs defined as:

then, discretizing the depth estimation truth value into a depth classification truth value according to the strategy;

i.e. when the depth estimation truth value is at d_k-1，d_k]It is assigned class k and trains the deep task branch with deep classification truth.

Finally, obtaining a depth classification prediction map in a training stage, and processing by adopting a softmax function to obtain a normalized depth classification probability map P'_d，k；

The depth classification probability map is:

wherein K is the total class number of the depth classification, K represents the kth depth class, P_d，kRepresents a k-th layer depth classification prediction map, P'_d，kRepresenting a normalized k-th layer depth classification probability map.

Calculating a multitask loss function by using the normalized probability map, and performing primary training on the current multitask learning model M;

the method specifically comprises the following steps:

firstly, calculating the loss corresponding to each type of normalized probability graph obtained by adopting a cross entropy function;

cross entropy loss function L_tComprises the following steps:

wherein, y_tA one-hot form corresponding to each task is provided with a supervision type label; t is s, d or b, i.e. P'_tIs a normalized probability map of semantic segmentation, edge detection or depth estimation tasks; c is the total category number corresponding to each task, and i represents the ith category in the prediction graph.

Then, an equal-weight-sum multitask loss function L is constructed according to the fixed weight of each task_mtlComprises the following steps:

finally, a multi-tasking penalty function L is utilized_mtlAnd performing gradient back transmission and parameter updating of the network model, and performing iterative training to obtain a multi-task learning model after preliminary training.

Thirdly, on the basis of the initially trained multitask learning model M, constructing a final self-adaptive multitask loss function L 'by utilizing an information entropy dynamic weighting algorithm'_mtl。

The method specifically comprises the following steps:

firstly, calculating the information entropy E of each task by using a multi-layer probability map of each category_t：

Wherein, W and H are respectively probability chart row and column coordinates, and W and H are respectively the maximum values of the probability chart row and column lengths; c is the channel value number of the probability map, and C is the total number of categories corresponding to each task.

Then, relative weight w of each task is distributed by using information entropy value_t；

Relative weight w_tComprises the following steps:

when the prediction result of the task is worse, the uncertainty of the output probability map is higher, and the corresponding information entropy value is larger. Therefore, the task with poor prediction performance is assigned with larger weight, so that the model can be used for training the corresponding task.

Finally, according to the relative weight of each task and the cross entropy loss function L_tAnd constructing a final self-adaptive multitask loss function in a weighted summation mode.

Final adaptive multitask loss function L'_mtlComprises the following steps:

step four, utilizing the final adaptive multitask loss function L'_mtlPerforming back propagation to obtain the parameter gradient of the current multi-task learning model M, updating the parameter of the current multi-task learning model M by using a gradient descent algorithm, and completing one-time iterative training;

and step five, after the iterative training is finished, obtaining a new multi-task learning model M1, returning to the step three to carry out the next iteration until the multi-task learning model M1 reaches convergence, and terminating the training.

The invention has the advantages that:

(1) the invention relates to a multi-task learning self-adaptive balancing method based on information entropy dynamic weighting, which adopts a discretization strategy to convert regression tasks into classification tasks, can effectively deal with different types of tasks and has strong algorithm applicability;

(2) the invention relates to a multi-task learning self-adaptive balancing method based on dynamic weighting of information entropy, which utilizes a prediction graph output by a task to calculate the information entropy without changing the structural design of a model or the updating process of parameters, and is simple, efficient, plug-and-play;

(3) the invention relates to a multitask learning self-adaptive balancing method based on information entropy dynamic weighting, which can be used for dynamically adjusting the weight of a task loss function based on an information entropy value, and can be used for self-adaptively balancing the relative importance of each task so as to improve the performance of the whole task.

(4) The invention relates to a multitask learning self-adaptive balancing method based on information entropy dynamic weighting, which can effectively extract the general shared characteristics and task specific characteristics of a model and quickly and uniformly finish the training of a multitask learning model.

Drawings

FIG. 1 is an overall flow chart of the multi-task learning adaptive balancing method based on information entropy dynamic weighting according to the present invention;

FIG. 2 is a schematic diagram of a multitasking learning model in the present invention;

FIG. 3 is a schematic diagram of the discretization of the regression task in the present invention.

Detailed Description

The following describes in detail a specific implementation method of the present invention with reference to the accompanying drawings and taking a multitask learning network in which semantic segmentation, depth estimation and edge detection are jointly implemented in computer vision as an example.

The invention provides a multitask learning self-adaptive balance method based on information entropy dynamic weighting. In the model training process, the information entropy algorithm can effectively evaluate the prediction result of each task, and the relative weight of the tasks is adjusted through a dynamic weighting strategy, so that the multi-task prediction model focuses more on the tasks with relatively poor performance, and the self-adaptive balance learning of different task performances is realized.

The invention relates to a multitask learning self-adaptive balancing method based on information entropy dynamic weighting, which comprises the following steps as shown in figure 1:

initializing network parameters, and training to obtain an initial multi-task learning model.

A multitask learning network model based on a single encoder-multiple decoders is constructed, as shown in fig. 2, specifically:

the encoder contains network parameters that are shared by all tasks and is initialized with a skeletal network (e.g., ResNet) pre-trained on ImageNet. The decoder comprises task-specific network parameters, each task corresponds to one decoder, and a random parameter initialization mode is adopted. In this embodiment, three tasks are set to be solved: semantic segmentation, depth estimation and edge detection, the multi-task learning model comprises a shared encoder and three task-specific decoders.

After the three tasks are respectively output by a decoder, three cross entropy losses L are obtained₁、L₂And L₃Corresponding relative weight w to each task₁、w₂And w₃And the cross entropy loss of the multi-task loss function is obtained by weighted summation_mtl：

L_mtl＝w₁L₁+w₂L₂+w₃L₃

Performing model inference and normalization processing on the input image through a multi-task learning model to obtain different types of normalized probability maps;

after model inference is carried out on an input image by the multi-task learning model, three pixel-level task outputs are generated and are respectively a semantic segmentation output graph P_sDepth estimation output map P_dAnd edge detection output map P_bNormalizing each task output graph to obtain different types of normalized probability graphs, which specifically comprises the following steps:

1) semantic segmentation output graph P_sProcessing by adopting a softmax function to obtain a normalized multi-classification semantic segmentation probability map:

wherein S is the total category number of semantic segmentation, i represents the ith semantic category in the prediction graph, and P_s，iFor model output map P_sLayer i semantically partitions the value map, and P'_s，iThe normalized ith layer semantic segmentation probability map is represented.

2) Edge detection output graph P_bProcessing by adopting sigmoid function (equivalent to binary softmax function) to obtain normalized edge detection probability map P'_b：

first, as shown in fig. 3, a logarithmic space discretization strategy is adopted to discretely divide the depth value of the continuous space into K sub-intervals corresponding to K categories, specifically:

interval of depth values [ D₁，D₂]Mapping to [ D₁+1，D₂+1]Is recorded as [ D'₁，D′₂]And according to a discretized depth threshold d_kDividing to obtain K sub-intervals { [ d ]₀，d₁]，[d₁，d₂]，...，[d_K-1，d_K]}。

Discretized depth threshold d_kIs defined as:

then, the depth estimation truth value is discretized into a depth classification truth value according to the strategy, namely when the depth estimation truth value is in [ d ]_k-1，d_k]It is assigned class k and trains the deep task branch with deep classification truth.

The depth classification probability map is:

In the embodiment of the present invention, the discretization of the depth estimation is performed by taking K80. The supervised truth of the depth branch is in the form of classification, so the depth estimation task is directly trained in the form of depth classification here.

Step three, performing preliminary training on the multi-task learning model;

because the result error of each task predicted by the initialization model is large and unstable, the multi-task network model needs to be initially trained, specifically:

firstly, calculating the loss corresponding to each type of normalized probability graph obtained by adopting a cross entropy function:

wherein, y_tA one-hot form corresponding to each task is provided with a supervision type label; t corresponds to each task in step one and can be represented as s, d or b, i.e., P'_tIs a normalized probability map of semantic segmentation, edge detection or depth estimation tasks; c is the total category number corresponding to each task, and i represents the ith category in the prediction graph.

Secondly, an equal weight addition and multitask loss function L is constructed_mtlComprises the following steps:

during the preliminary training process, the loss function of each task is given equal fixed weight.

Then, a multitask penalty function L is utilized_mtlAnd (3) performing gradient back transmission and parameter updating of the network model, and training a multi-task learning model obtained after a certain number of iterations to perform preliminary task prediction.

And fourthly, on the basis of the multi-task learning model obtained through the primary training, constructing a self-adaptive multi-task loss function by using an information entropy dynamic weighting algorithm, and further optimizing and training the multi-task learning model.

The method specifically comprises the following steps:

firstly, calculating information entropy E of each task by using various types of multilayer probability graphs_t：

Wherein, W and H are respectively probability chart row and column coordinates, and W and H are respectively the maximum values of the probability chart row and column lengths; c is the number of channels of the probability map, and C is the total number of corresponding categories of each task;

then, relative weight w of each task is distributed by using information entropy value_t：

The information entropy can reflect the uncertainty of the prediction probability map, so the information entropy of the task output probability map can be used for allocating relative weights:

when the prediction result of the task is worse, the uncertainty of the output probability map is higher, and the corresponding information entropy value is larger. Therefore, the task with poor prediction performance is assigned with larger weight so that the model can train the corresponding task.

Finally, according to the relative weight of each task and the cross entropy loss function L_tAnd constructing an integral self-adaptive multitask loss function in a weighted summation mode.

Overall adaptive multitask loss function L'_mtlComprises the following steps:

step five, utilizing an overall adaptive multitask loss function L'_mtlPerforming backward propagation to obtain a model parameter gradient, and then updating the model parameter by using a gradient descent algorithm to complete one-time iterative training;

and step six, after the model parameters are updated, a new multi-task learning model is obtained. And returning to the step four to carry out the next iteration until the multi-task learning model reaches convergence, and terminating the training.

After each network parameter is updated, the prediction performance of each task changes, so that the corresponding relative weight also changes dynamically, and the adaptive adjustment of the loss function in the network model training is realized.

The above embodiment only describes three specific tasks of semantic segmentation, depth estimation and edge detection, but the application of the method of the present invention is not limited to the above three specific tasks, and may also be applied to other tasks, and may also be applied to the case of three or more tasks, and the multi-task learning model is adjusted according to the actual situation. The case including other tasks or three or more tasks is within the technical problem solved by the present invention.

Claims

1. A multitask learning self-adaptive balancing method based on information entropy dynamic weighting is characterized by comprising the following steps:

firstly, building an initial multi-task learning model M, deducing an input image through the multi-task learning model M to obtain different types of outputs of different tasks, and respectively carrying out normalization processing to obtain normalized probability maps corresponding to the different tasks;

then, calculating a multitask loss function by using each normalized probability graph, and performing primary training on the multitask learning model M through the multitask loss function;

finally, on the basis of the initially trained multi-task learning model M, a final self-adaptive multi-task loss function is constructed through an information entropy dynamic weighting algorithm, the parameter gradient of the current multi-task learning model M is obtained through a back propagation algorithm, parameter updating is carried out, and one-time iterative training is completed;

after iterative training, a new multi-task learning model M1 is obtained, the input image is deduced and normalized again, the next iteration is carried out by using the self-adaptive multi-task loss function until the multi-task learning model M1 converges, and the training is terminated.

2. The adaptive balancing method for multitask learning based on information entropy dynamic weighting as claimed in claim 1, wherein the multitask learning model comprises a shared encoder and a decoder corresponding to each specific task.

3. The information entropy dynamic weighting-based multitask learning self as claimed in claim 1The adaptive balancing method is characterized in that the three task output graphs are as follows: semantic segmentation output graph P_sDepth estimation output map P_dAnd edge detection output map P_b(ii) a The corresponding normalized probability map is:

1) semantic segmentation output graph P_sProcessing by adopting a sonmax function to obtain a normalized semantic segmentation probability map:

wherein S is the total category number of semantic segmentation, i represents the ith semantic category in the prediction graph, and P_s，iFor model output map P_sLayer i semantically partitions the value map, and P'_s，iThen representing the normalized ith layer semantic segmentation probability map P'_s；

2) Classified edge detection output graph P_bProcessing by adopting sigmoid function to obtain normalized edge detection probability map P'_b：

3) Depth estimation output map P_dConverting the depth regression task into a classification task by using a logarithmic space discretization strategy, and obtaining a normalized depth classification probability map by using a sonmax function;

firstly, a logarithm space discretization strategy is adopted to discretely divide the depth value of a continuous space into K sub-intervals corresponding to K categories, specifically:

interval of depth values [ D₁，D₂]Mapping to [ D₁+1，D₂+1]Is recorded as [ D'₁，D′₂]And according to a discretized depth threshold d_kDividing to obtain K sub-intervals { [ d ]₀，d₁]，[d₁，d₂]，...，[d_K-1，d_K]}；

Discretized depth threshold d_kIs defined as:

then, the depth estimation truth value is discretized into a depth classification truth value according to the strategy, namely when the depth estimation truth value is in [ d ]_k-1，d_k]The class of the training data is k, and the deep task branches are trained according to the depth classification truth value;

The depth classification probability map is:

4. The information entropy dynamic weighting-based multitask learning self-adaptive balancing method according to claim 1, wherein the specific process of calculating the multitask loss function and initially training the multitask learning model is as follows:

firstly, calculating loss corresponding to each type of normalized probability graph obtained by adopting a cross entropy function;

cross entropy loss function L_tComprises the following steps:

wherein, y_tA one-hot form corresponding to each task is provided with a supervision type label; t is s, d or b, i.e. P'_tIs a semantic meaningA normalized probability map of segmentation, edge detection, or depth estimation tasks; c is the total category number corresponding to each task, and i represents the ith layer category in the prediction graph;

5. The information entropy dynamic weighting-based multitask learning adaptive balancing method according to claim 1, wherein the specific process for constructing the final adaptive multitask loss function is as follows:

step 501, calculating information entropy E of each task by using multi-layer probability maps of various categories_t：

502, distributing relative weight w of each task by using information entropy value_t；

Relative weight w_tComprises the following steps:

step 503, according to the relative weight of each task and the cross entropy loss function L_tConstructing the final self-adaptation by means of weighted summationA multitask penalty function;

final adaptive multitask loss function L'_mtlComprises the following steps:

L′_mtl＝-∑_tw_tL_t。