CN112966811A - Method and network for solving task conflict in MTL convolutional neural network - Google Patents

Method and network for solving task conflict in MTL convolutional neural network Download PDF

Info

Publication number
CN112966811A
CN112966811A CN202110155686.1A CN202110155686A CN112966811A CN 112966811 A CN112966811 A CN 112966811A CN 202110155686 A CN202110155686 A CN 202110155686A CN 112966811 A CN112966811 A CN 112966811A
Authority
CN
China
Prior art keywords
task
mtl
layer
neural network
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110155686.1A
Other languages
Chinese (zh)
Other versions
CN112966811B (en
Inventor
周傲
丁春涛
白乐金
马骁
徐梦炜
孙其博
王尚广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110155686.1A priority Critical patent/CN112966811B/en
Publication of CN112966811A publication Critical patent/CN112966811A/en
Application granted granted Critical
Publication of CN112966811B publication Critical patent/CN112966811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Digital Transmission Methods That Use Modulated Carrier Waves (AREA)

Abstract

The embodiment of the invention discloses a method and a network for solving task conflicts in an MTL convolutional neural network, wherein a shared shallow layer of the MTL convolutional neural network comprises a modulation module obtained by training, the modulation module determines corresponding subnet structures in the shared shallow layer aiming at different tasks, task information of the tasks is input into the corresponding subnet structures for convolutional processing, the task information is modulated by the modulation module and then output to a task specific layer of the tasks for processing, a task result is output, the processing result is subjected to back propagation by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, parameters of the MTL neural network are adjusted, the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method, the processes are circularly carried out for multiple times until the parameters of the MTL convolutional neural network are converged, and obtaining the well-trained MTL convolutional neural network. Therefore, the conflict of multi-task parallel learning is avoided, and the effect of the MTL convolutional neural network obtained by training in processing different tasks is improved.

Description

Method and network for solving task conflict in MTL convolutional neural network
Technical Field
The invention relates to a deep neural network technology, in particular to a method and a network for solving Task conflicts in a Multi-Task Learning (MTL) convolutional neural network.
Background
The MTL convolutional neural network is obtained by training in a multi-task simultaneous learning mode, the model representation and generalization capability of the trained MTL convolutional neural network can be enhanced by the multi-task simultaneous learning mode, the aim is to utilize useful information contained in a plurality of related tasks to help each task to learn in the MTL convolutional neural network, and the more accurate and accurate MTL convolutional neural network is obtained by training.
As shown in fig. 1, fig. 1 is a schematic structural diagram of an MTL convolutional neural network provided in the prior art, which includes a shared shallow layer and a task specific layer, wherein the shared shallow layer is shared by a plurality of tasks and includes a plurality of convolutional layers connected in series, and performs convolutional processing on input task information; the task specific layer is set for the task, and the task information which is subjected to the convolution processing of the shared shallow layer is input into the corresponding task specific layer to be processed, so that a task processing result is obtained.
When the MTL convolutional neural network is trained, the parameter training of the shared shallow layer is determined by multi-task simultaneous learning, specifically, shared task information representations of a plurality of tasks are embedded into the same feature space and input into the shared shallow layer in the MTL convolutional neural network for learning. The task specific layers in the MTL convolutional neural network are obtained by learning based on the specific task characteristic representation of each task. Therefore, during the training of the MTL convolutional neural network, the shared shallow layer multitask information sharing is beneficial to reducing the calculation amount, meanwhile, the shared shallow layer can enable the tasks with the commonality to better utilize the correlation information, the task specific layer can train for each task independently, and the unification of the shared task information and the specific task characteristic information in the same MTL convolutional neural network is achieved.
The shared shallow layer training process of the MTL convolutional neural network comprises the following steps: and simultaneously learning a plurality of related tasks in parallel, performing back propagation according to the gradient of the loss function of the MTL convolutional neural network, adjusting the shared shallow layer parameters in the process, and gradually training the MTL convolutional neural network until convergence. Here, the gradient represents the maximum value of the directional derivative of the loss function at the current point along the direction, and the back propagation is to minimize the loss function by using a gradient descent algorithm after calculating the partial derivative of the loss function to the weight coefficient of the MTL convolutional neural network, so as to gradually determine the optimal point sharing the shallow parameter.
However, in the shared shallow training of the MTL convolutional neural network, task information of a plurality of tasks may assist each other and may interfere with each other. When two tasks have weak correlation or conflict, namely the dependency relationship between the two tasks is weak, the training effect of sharing the same parameter in the shared shallow layer is not good, and they may compete with each other in the training of the shared shallow layer, which causes the gradient directions of the loss function of the MIL convolutional neural network to be inconsistent and the training to be difficult to perform, and causes the MTL convolutional neural network obtained by training to have poor effect when processing different tasks.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method for solving task conflicts in an MTL convolutional neural network, which can alleviate conflict of multi-task parallel learning with weak correlation in a training process of the MTL convolutional neural network, and improve an effect of the MTL convolutional neural network obtained by training when processing different tasks.
The embodiment of the invention also provides a network for solving the task conflict in the MTL convolutional neural network, which can relieve the conflict of the multi-task parallel learning with weak correlation in the training process of the MTL convolutional neural network and improve the effect of the MTL convolutional neural network obtained by training when processing different tasks.
The invention is realized by the following steps:
a solution method for task conflict in multi-task learning MTL convolutional neural network, includes the modulation module obtained by training in the sharing shallow layer of MTL convolutional neural network, the training process of the modulation module and the training of the sharing shallow layer are carried out simultaneously by adopting the multi-task parallel learning method, the method also includes:
the modulation module determines corresponding subnet structures in a shared shallow layer aiming at different tasks, inputs task information of the tasks into the determined corresponding subnet structures for convolution processing, and outputs the task information to a task specific layer of the tasks for processing after modulation of the modulation module to obtain processing results; carrying out back propagation on the processing result by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;
and circularly executing the processes for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the trained MTL convolutional neural network.
Preferably, the module for modulating including training in the shared shallow layer of the MTL convolutional neural network includes:
embedding the trained modulation modules in the convolutional layers of each layer in a shared shallow layer of the MTL convolutional neural network. After the task information of the task enters the convolution layer of each layer, the modulation module selects the subnet structure corresponding to the task and modulates the task information of the task, and then the task information enters the convolution layer of the next layer for processing.
Preferably, the determining, by the modulation module, corresponding subnet structures in the shared shallow layer for different tasks includes:
in each convolutional layer in the shared shallow layer, randomly selecting a plurality of convolutional layer channels from the multi-convolutional layer channels to be distributed to a task, and taking the selected plurality of convolutional layer channels as a sub-network structure aiming at the task;
each convolution channel in the multiple convolution layer channels in the shared shallow layer adopts a binary mask BM mark, and each convolution layer channel is identified by a binary mask BMThe BM of the channels are combined together to form a subnet selector B, C is used for representing the channel number of the convolution channel, and then the subnet selector B for the task t is represented as Bt={BMc},BMc=0or 1,c∈{1,2,…C}。
Preferably, the modulation by the modulation module includes:
modulating the task by adopting a scale vector, wherein the scale vector represents the contribution degree of each convolutional layer channel to the task;
the dimension of the scale vector is the number C of the convolutional layer channels in the sub-network structure corresponding to the tasknewWhere the values of the individual elements are values between 0 and 1, a scale vector M for the task ttIs denoted as Mt={Mc},c∈{1,2,…cnew}。
Preferably, the training process of the modulation module and the training of the shared shallow layer adopt a multi-task parallel learning method to perform simultaneously, including:
scale vector M used for modulating tasks in modulation moduletTraining the parameters of the MTL convolutional neural network together by adopting a gradient back propagation mode of an MTL convolutional neural network loss function through back propagation and the parameters of the MTL convolutional neural network, gradually adjusting the parameters of the MTL neural network and modulating a scale vector M adopted by aiming at a tasktUntil the parameters of the MTL convolutional neural network converge;
the parameters of the MTL convolutional neural network F are represented by theta, the updating of which depends on the gradient thereof
Figure BDA0002933335630000035
The formula is as follows:
Figure BDA0002933335630000031
where L is a loss function of the MTL convolutional neural network F, I is an input of the MTL convolutional neural network F, F is an output of the MTL convolutional neural network FF, and F is F (I | θ, M)t)。
Preferably, the training the modulation module in a gradient back propagation manner using the MTL convolutional neural network loss function further includes:
if the task t conflicts with the task t', the modulation module modulates the gradient directions from different tasks, and a gradient formula behind the modulation module is introduced:
Figure BDA0002933335630000032
wherein,
Figure BDA0002933335630000033
representing the backtransmission gradient of the task t,
Figure BDA0002933335630000034
representing the backtransmission gradient of task t'.
Preferably, the method further comprises:
processing received task information of a certain task by adopting an MTL convolutional neural network obtained through training, wherein during processing, a modulation module is embedded in each convolutional layer of a shared shallow layer in the MTL convolutional neural network, after selecting a subnet structure corresponding to the task in the current convolutional layer for convolutional processing, the task information of the task is modulated by adopting a set scale vector, and then the modulated subnet structure is input into the next convolutional layer for the same processing until the processing is finished through the shared shallow layer.
A network for resolving task conflicts in a multi-task learning MTL convolutional neural network, comprising: sharing shallow layer and task specific layer, sharing shallow layer having multi-layer convolution layer, embedding a modulation module in each convolution layer for selecting convolution layer channel processed in the convolution layer for the incoming task and modulating task information of the task, wherein,
the shared shallow layer is used for the modulation module to determine a corresponding sub-network structure in the convolution layer of each layer aiming at different tasks; inputting task information of a task into the determined corresponding subnet structure for convolution processing, modulating the task information by a modulation module, and outputting the task information to a task specific layer of the task for processing, wherein the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method;
the task specific layer is used for receiving the task information modulated by the modulation module to obtain a processing result;
performing back propagation on the processing result in the task specific layer and the shared shallow layer by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;
and circularly executing the previous process for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the well-trained MTL convolutional neural network.
Preferably, the method further comprises the following steps:
and processing the received task information of a certain task in the shared shallow layer and the task specific layer by adopting the MTL convolutional neural network obtained by training, wherein during processing, a modulation module embedded in each convolutional layer of the shared shallow layer in the MTL convolutional neural network selects a subnet structure corresponding to the task in the current convolutional layer for convolutional processing of the task information of the task, and then the subnet structure is modulated by adopting a set scale vector and then input into the next convolutional layer for the same processing until the task information is processed through the shared shallow layer.
As can be seen from the above, in the embodiment of the present invention, the shared shallow layer of the MTL convolutional neural network includes a modulation module obtained through training, where the modulation module determines corresponding subnet structures in the shared shallow layer for different tasks, inputs task information of the task to the corresponding subnet structures for convolutional processing, and after modulation by the modulation module, outputs the task information to a task specific layer of the task for processing, and outputs a task result, and performs back propagation on the processing result in a gradient back propagation manner of a loss function of the MTL convolutional neural network to adjust parameters of the MTL neural network, where a training process of the modulation module and training of the shared shallow layer are performed simultaneously in a multi-task parallel learning method, and the above process is performed repeatedly until the parameters of the MTL convolutional neural network converge, so as to obtain a trained MTL convolutional neural network. Therefore, the sharing shallow layer in the MTL convolutional neural training process adopts different subnet structures for different tasks to carry out convolutional processing and modulation, the conflict of multi-task parallel learning with weak correlation is relieved, and the effect of the MTL convolutional neural network obtained by training when processing different tasks is improved.
Drawings
Fig. 1 is a schematic structural diagram of an MTL convolutional neural network provided in the prior art;
fig. 2 is a flowchart of a method for resolving task conflicts in an MTL convolutional neural network according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a shared shallow layer of an MTL convolutional neural network with a modulation module according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a network for resolving task conflicts in the MTL convolutional neural network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.
As can be seen from the background art, the reason why the multiple tasks conflict during the training of the MTL convolutional neural network is that shared task information representations of different tasks are embedded into the same feature space and input to a shared shallow layer in the MTL convolutional neural network for learning, but because network architectures of the shared shallow layer for task information of different tasks are the same, task information of multiple tasks is mutually assisted and interfered with each other, so that gradient directions of loss functions of the MIL convolutional neural network are inconsistent and training is difficult to perform.
Therefore, in order to overcome the problems, the embodiment of the invention arranges the trained modulation module in the shared shallow layer of the MTL convolutional neural network, the modulation module determines corresponding subnet structures in the shared shallow layer aiming at different tasks, inputs task information of the tasks into the corresponding subnet structures for convolution processing, and after modulation by the modulation module, then outputting the task result after the task specific layer of the task is processed, performing back propagation on the processing result by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, adjusting parameters of the MTL neural network, wherein, the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method, and (4) circulating the process for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the well-trained MTL convolutional neural network.
Therefore, the sharing shallow layer in the MTL convolutional neural training process adopts different subnet structures for different tasks to carry out convolutional processing and modulation, the conflict of multi-task parallel learning with weak correlation is relieved, and the effect of the MTL convolutional neural network obtained by training when processing different tasks is improved.
Fig. 2 is a flowchart of a method for resolving task conflicts in an MTL convolutional neural network according to an embodiment of the present invention, which includes the following specific steps:
step 201, a modulation module obtained by training is included in a shared shallow layer of an MTL convolutional neural network, and the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method;
step 202, the modulation module determines corresponding subnet structures in a shared shallow layer according to different tasks;
in the step, the corresponding subnet structures are randomly confirmed in the shared shallow layer aiming at different tasks, so that the generalization capability of the MTL convolutional neural network can be improved;
step 203, inputting task information of the task into the determined corresponding subnet structure for convolution processing, modulating the task information by the modulation module, and outputting the task information to a task specific layer of the task for processing to obtain a processing result;
step 204, performing back propagation on the processing result by adopting a gradient back propagation mode of the MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;
and step 205, circularly executing the processes in the steps 202 to 204 for multiple times until the parameters of the MTL convolutional neural network are converged, so as to obtain the trained MTL convolutional neural network.
In the method, the step of including a trained modulation module in a shared shallow layer of the MTL convolutional neural network comprises the following steps:
embedding the trained modulation modules in the convolutional layers of each layer in a shared shallow layer of the MTL convolutional neural network. After the task information of the task enters the convolution layer of each layer, the modulation module selects the subnet structure corresponding to the task and modulates the task information of the task, and then the task information enters the convolution layer of the next layer for processing. Fig. 3 is a schematic structural diagram of an MTL convolutional neural network with a modulation module according to an embodiment of the present invention, as shown in fig. 3, in a shared shallow layer in the MTL convolutional neural network, there are multiple convolutional layers, and a modulation module is embedded in each convolutional layer, and is used for selecting a convolutional layer channel to be processed in the layer for an incoming task and modulating task information of the task. Of course, according to different network architectures, a pooling layer for reducing the feature expression matrix of the task information may be further included between each layer of convolutional layers, and the modulated task information is input into the pooling layer for processing and then input into the convolutional layer of the next layer for processing. In this way, the modulated task information is accompanied by representation information related to the task, so that the coupling relation of the conflict task in the shared shallow layer is weakened.
In the method, the determining, by the modulation module, corresponding subnet structures in a shared shallow layer for different tasks includes:
in each convolutional layer in the shared shallow layer, a plurality of convolutional layer channels are randomly selected from the plurality of convolutional layer channels to be distributed to one task, and the selected plurality of convolutional layer channels serve as subnet structures for the task.
Specifically, each convolution channel in the multiple convolution layer channels in the shared shallow layer is identified by an identifier applied to the corresponding task, where the identifier may be a Binary Mask (BM). The setting of the BM is generated randomly, in particular, when the MTL convolutional neural network is instantiated. Once set, the BM is not modified and does not participate in the training of the MTL convolutional neural network, and therefore, the task corresponding to the convolutional layer channel in the shared shallow layer is also persistent.
The multi-convolution layer channels in the shared shallow layer are respectively distributed to different tasks, a corresponding subnet structure is set for each task, and the task information flow direction of each task can be adjusted. The BMs of the convolutional layer channels are combined together to form a subnet selector B, and C represents the number of channels of the convolutional channel, so that the subnet selector B for the task t is represented as follows:
Bt={BMc},BMc=0or 1,c∈{1,2,…C}。
in the method, the modulating by the modulating module includes:
and modulating the task by adopting a corresponding scale vector, wherein the scale vector represents the contribution degree of each convolutional layer channel to the task. Specifically, after the subnet structure corresponding to the task is formed, the corresponding scale vector is used for modulation, that is, the modulated input is nonzero convolutional layer channel data multiplied by BM. The corresponding scale vector is related to the task, and the dimension of the scale vector is the number C of the convolutional layer channels in the subnet structure corresponding to the tasknewWherein the value of each element is a value between 0 and 1, that is, the number C of convolutional layer channels in the subnet structure is based on the subnet structure corresponding to the tasknewThe vector represents the contribution of the feature representation of each convolutional layer channel to the task. Scale vector M for task ttIs represented as follows:
Mt={Mc},c∈{1,2,…cnew}。
in the method, the training process of the modulation module and the training of the shared shallow layer adopt a multi-task parallel learning method to simultaneously perform the following steps:
scale vector M used for modulating tasks in modulation moduletTraining the parameters of the MTL convolutional neural network together by adopting a gradient back propagation mode of an MTL convolutional neural network loss function through back propagation and the parameters of the MTL convolutional neural network, gradually adjusting the parameters of the MTL neural network and modulating a scale vector M adopted by aiming at a tasktUntil the parameters of the MTL convolutional neural network converge. The parameters of the MTL convolutional neural network F are represented by theta, the updating of which depends on the gradient thereof
Figure BDA0002933335630000071
The formula is as follows:
Figure BDA0002933335630000072
where L is a loss function of the MTL convolutional neural network F, I is an input of the MTL convolutional neural network F, F is an output of the MTL convolutional neural network FF, and F is F (I | θ, M)t)。
In this way, the scale vector M used for modulating the task in the modulation module istAnd introducing the parameters into a parameter set and a loss function of the MTL convolutional neural network, and by adopting a gradient back propagation mode of the loss function in the MTL convolutional neural network, all tasks are learned in parallel and continuously until the parameters of the MTL convolutional neural network are converged.
Training a modulation module in a gradient back propagation mode adopting an MTL convolutional neural network loss function, and further comprising:
assuming that the task t conflicts with the task t', the modulation module modulates the gradient directions from different tasks, and the gradient update formula after the modulation module is introduced is as follows:
Figure BDA0002933335630000073
wherein,
Figure BDA0002933335630000074
representing the backtransmission gradient of the task t,
Figure BDA0002933335630000075
representing the backtransmission gradient of task t'.
Therefore, after the modulation module respectively selects the corresponding subnet structures for the multiple tasks in the MTL convolutional neural network, the embodiment of the invention enables different tasks to be convoluted in the corresponding subnet structures, and modulates the task information after the convolution processing through the scale vector, so that the task of the task after modulation newly comprises the information related to the task, the task conflict problem is solved, and the multiple task parallel learning can achieve the ideal effect.
In the method, after the MTL convolutional neural network is obtained by training using the process shown in fig. 2, the method further includes:
processing received task information of a certain task by adopting an MTL convolutional neural network obtained through training, wherein during processing, a modulation module is embedded in each convolutional layer of a shared shallow layer in the MTL convolutional neural network, after selecting a subnet structure corresponding to the task in the current convolutional layer for convolutional processing, the task information of the task is modulated by adopting a set scale vector, and then the modulated subnet structure is input into the next convolutional layer for the same processing until the processing is finished through the shared shallow layer.
Fig. 4 is a schematic diagram of a network for resolving task conflicts in an MTL convolutional neural network according to an embodiment of the present invention, where the network includes: sharing shallow layer and task specific layer, the sharing shallow layer has multi-layer convolution layer, in each convolution layer a modulation module is embedded, and is used for selecting convolution layer channel to be processed in said layer and modulating task information of said task for incoming task,
the shared shallow layer is used for the modulation module to determine a corresponding sub-network structure in the convolution layer of each layer aiming at different tasks; inputting task information of a task into the determined corresponding subnet structure for convolution processing, modulating the task information by a modulation module, and outputting the task information to a task specific layer of the task for processing, wherein the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method;
the task specific layer is used for receiving the task information modulated by the modulation module to obtain a processing result;
performing back propagation on the processing result in the task specific layer and the shared shallow layer by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;
and circularly executing the previous process for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the well-trained MTL convolutional neural network.
In this structure, further comprising: and processing the received task information of a certain task in the shared shallow layer and the task specific layer by adopting the MTL convolutional neural network obtained by training, wherein during processing, a modulation module embedded in each convolutional layer of the shared shallow layer in the MTL convolutional neural network selects a subnet structure corresponding to the task in the current convolutional layer for convolutional processing of the task information of the task, and then the subnet structure is modulated by adopting a set scale vector and then input into the next convolutional layer for the same processing until the task information is processed through the shared shallow layer.
Therefore, the modulation module is introduced into each convolution layer in the shared shallow layer in the MTL convolutional neural network, convolution processing can be carried out on the sub-network structure corresponding to the task in the convolution layer, and modulation can be carried out, so that the problem of conflict of weak related multi-tasks during parallel learning is avoided, the learning effect is improved, and the effect of processing tasks of the MTL convolutional neural network obtained through final training is better.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A method for solving task conflict in a multi-task learning MTL convolutional neural network is characterized in that a shared shallow layer of the MTL convolutional neural network comprises a modulation module obtained by training, the training process of the modulation module and the training of the shared shallow layer are carried out simultaneously by adopting a multi-task parallel learning method, and the method also comprises the following steps:
the modulation module determines corresponding subnet structures in a shared shallow layer aiming at different tasks, inputs task information of the tasks into the determined corresponding subnet structures for convolution processing, and outputs the task information to a task specific layer of the tasks for processing after modulation of the modulation module to obtain processing results; carrying out back propagation on the processing result by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;
and circularly executing the processes for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the trained MTL convolutional neural network.
2. The method of claim 1, wherein including a trained modulation module in a shared shallow layer of an MTL convolutional neural network comprises:
embedding the trained modulation modules in the convolutional layers of each layer in a shared shallow layer of the MTL convolutional neural network. After the task information of the task enters the convolution layer of each layer, the modulation module selects the subnet structure corresponding to the task and modulates the task information of the task, and then the task information enters the convolution layer of the next layer for processing.
3. The method of claim 1, wherein the modulation module determining corresponding subnet structures in a shared shallow for different tasks comprises:
in each convolutional layer in the shared shallow layer, randomly selecting a plurality of convolutional layer channels from the multi-convolutional layer channels to be distributed to a task, and taking the selected plurality of convolutional layer channels as a sub-network structure aiming at the task;
each convolution channel in the multiple convolution layer channels in the shared shallow layer adopts a binary mask BM identification, the BMs of all the convolution layer channels are combined together to form a subnet selector B, C represents the number of the channels of the convolution channel, and B represents the subnet selector B of the task tt={BMc},BMc=0 or 1,c∈{1,2,…C}。
4. The method of claim 3, wherein the modulation by the modulation module comprises:
modulating the task by adopting a scale vector, wherein the scale vector represents the contribution degree of each convolutional layer channel to the task;
the dimension of the scale vector is the number C of the convolutional layer channels in the sub-network structure corresponding to the tasknewWhere the values of the individual elements are values between 0 and 1, a scale vector M for the task ttIs denoted as Mt={Mc},c∈{1,2,…cnew}。
5. The method of claim 4, wherein the training process of the modulation module and the training of the shared shallow layer simultaneously adopt a multi-task parallel learning method, and comprises the following steps:
scale vector M used for modulating tasks in modulation moduletTraining the parameters of the MTL convolutional neural network together by adopting a gradient back propagation mode of an MTL convolutional neural network loss function through back propagation and the parameters of the MTL convolutional neural network, gradually adjusting the parameters of the MTL neural network and modulating a scale vector M adopted by aiming at a tasktUntil the parameters of the MTL convolutional neural network converge;
the parameters of the MTL convolutional neural network F are represented by theta, the updating of which depends on the gradient thereof
Figure FDA0002933335620000025
The formula is as follows:
Figure FDA0002933335620000021
where L is a loss function of the MTL convolutional neural network F, I is an input of the MTL convolutional neural network F, F is an output of the MTL convolutional neural network FF, and F is F (I | θ, M)t)。
6. The method of claim 5, wherein the training of the modulation module in a gradient backpropagation with MTL convolutional neural network loss function further comprises:
if the task t conflicts with the task t', the modulation module modulates the gradient directions from different tasks, and a gradient formula behind the modulation module is introduced:
Figure FDA0002933335620000022
wherein,
Figure FDA0002933335620000023
representing the backtransmission gradient of the task t,
Figure FDA0002933335620000024
representing the backtransmission gradient of task t'.
7. The method of any of claims 1 to 6, further comprising:
processing received task information of a certain task by adopting an MTL convolutional neural network obtained through training, wherein during processing, a modulation module is embedded in each convolutional layer of a shared shallow layer in the MTL convolutional neural network, after selecting a subnet structure corresponding to the task in the current convolutional layer for convolutional processing, the task information of the task is modulated by adopting a set scale vector, and then the modulated subnet structure is input into the next convolutional layer for the same processing until the processing is finished through the shared shallow layer.
8. A network for resolving task conflicts in a multi-task learning MTL convolutional neural network, comprising: sharing shallow layer and task specific layer, sharing shallow layer having multi-layer convolution layer, embedding a modulation module in each convolution layer for selecting convolution layer channel processed in the convolution layer for the incoming task and modulating task information of the task, wherein,
the shared shallow layer is used for the modulation module to determine a corresponding sub-network structure in the convolution layer of each layer aiming at different tasks; inputting task information of a task into the determined corresponding subnet structure for convolution processing, modulating the task information by a modulation module, and outputting the task information to a task specific layer of the task for processing, wherein the training process of the modulation module and the training of the shared shallow layer are simultaneously carried out by adopting a multi-task parallel learning method;
the task specific layer is used for receiving the task information modulated by the modulation module to obtain a processing result;
performing back propagation on the processing result in the task specific layer and the shared shallow layer by adopting a gradient back propagation mode of an MTL convolutional neural network loss function, and adjusting parameters of the MTL neural network;
and circularly executing the previous process for multiple times until the parameters of the MTL convolutional neural network are converged to obtain the well-trained MTL convolutional neural network.
9. The network of claim 8, further comprising:
and processing the received task information of a certain task in the shared shallow layer and the task specific layer by adopting the MTL convolutional neural network obtained by training, wherein during processing, a modulation module embedded in each convolutional layer of the shared shallow layer in the MTL convolutional neural network selects a subnet structure corresponding to the task in the current convolutional layer for convolutional processing of the task information of the task, and then the subnet structure is modulated by adopting a set scale vector and then input into the next convolutional layer for the same processing until the task information is processed through the shared shallow layer.
CN202110155686.1A 2021-02-04 2021-02-04 Method and network for solving task conflict in MTL convolutional neural network Active CN112966811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110155686.1A CN112966811B (en) 2021-02-04 2021-02-04 Method and network for solving task conflict in MTL convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110155686.1A CN112966811B (en) 2021-02-04 2021-02-04 Method and network for solving task conflict in MTL convolutional neural network

Publications (2)

Publication Number Publication Date
CN112966811A true CN112966811A (en) 2021-06-15
CN112966811B CN112966811B (en) 2023-04-14

Family

ID=76273897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110155686.1A Active CN112966811B (en) 2021-02-04 2021-02-04 Method and network for solving task conflict in MTL convolutional neural network

Country Status (1)

Country Link
CN (1) CN112966811B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108701210A (en) * 2016-02-02 2018-10-23 北京市商汤科技开发有限公司 Method and system for CNN Network adaptations and object online tracing
CN110930356A (en) * 2019-10-12 2020-03-27 上海交通大学 Industrial two-dimensional code reference-free quality evaluation system and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108701210A (en) * 2016-02-02 2018-10-23 北京市商汤科技开发有限公司 Method and system for CNN Network adaptations and object online tracing
CN110930356A (en) * 2019-10-12 2020-03-27 上海交通大学 Industrial two-dimensional code reference-free quality evaluation system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JING WAN 等: "Sparse Bayesian Multi-Task Learning for Predicting Cognitive Outcomes from Neuroimaging Measures in Alzheimer"s Disease", 《COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
张钰 等: "多任务学习", 《计算机学报》 *

Also Published As

Publication number Publication date
CN112966811B (en) 2023-04-14

Similar Documents

Publication Publication Date Title
CN111462137B (en) Point cloud scene segmentation method based on knowledge distillation and semantic fusion
Wang et al. Transfer learning for semi-supervised automatic modulation classification in ZF-MIMO systems
Wu et al. Adaptive antisynchronization of multilayer reaction–diffusion neural networks
CN106709565A (en) Neural network optimization method and device
CN111224905B (en) Multi-user detection method based on convolution residual error network in large-scale Internet of things
CN111368909A (en) Vehicle logo identification method based on convolutional neural network depth features
TW202105258A (en) Depth-first convolution in deep neural networks
CN115767562B (en) Service function chain deployment method based on reinforcement learning joint coordinated multi-point transmission
CN111831359A (en) Weight precision configuration method, device, equipment and storage medium
CN113537365A (en) Multitask learning self-adaptive balancing method based on information entropy dynamic weighting
CN114204971B (en) Iterative aggregate beam forming design and user equipment selection method
CN112966811B (en) Method and network for solving task conflict in MTL convolutional neural network
KR102333730B1 (en) Apparatus And Method For Generating Learning Model
CN114743273A (en) Human skeleton behavior identification method and system based on multi-scale residual error map convolutional network
CN113726894A (en) Multi-vehicle application calculation unloading method and terminal based on deep reinforcement learning
CN117095217A (en) Multi-stage comparative knowledge distillation process
CN116339942A (en) Self-adaptive scheduling method of distributed training task based on reinforcement learning
CN113379593B (en) Image generation method, system and related equipment
CN112765892B (en) Intelligent switching judgment method in heterogeneous Internet of vehicles
CN108304924A (en) A kind of pipeline system pre-training method of depth confidence net
CN115220477A (en) Heterogeneous unmanned aerial vehicle alliance forming method based on quantum genetic algorithm
CN113592079A (en) Cooperative multi-agent communication method oriented to large-scale task space
CN114889644B (en) Unmanned automobile decision system and decision method in complex scene
CN110139208B (en) DAI-based method for predicting MA (maximum Address indication) position of management intelligent body in wireless sensor network cooperative communication
TW202030647A (en) System and method for reducing computational complexity of artificial neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant