CN111027428B - Training method and device for multitasking model and electronic equipment - Google Patents

Training method and device for multitasking model and electronic equipment Download PDF

Info

Publication number
CN111027428B
CN111027428B CN201911205175.5A CN201911205175A CN111027428B CN 111027428 B CN111027428 B CN 111027428B CN 201911205175 A CN201911205175 A CN 201911205175A CN 111027428 B CN111027428 B CN 111027428B
Authority
CN
China
Prior art keywords
task
loss value
type
model
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911205175.5A
Other languages
Chinese (zh)
Other versions
CN111027428A (en
Inventor
刘思阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201911205175.5A priority Critical patent/CN111027428B/en
Publication of CN111027428A publication Critical patent/CN111027428A/en
Application granted granted Critical
Publication of CN111027428B publication Critical patent/CN111027428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a training method and device for a multi-task model and electronic equipment. The method comprises the following steps: inputting the sample data into a multitask model in training to obtain the predicted content of the sample data aiming at each type of task; calculating a loss value of each type of task by using the predicted content of the sample data for each type of task; substituting the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model; ending training when the obtained loss value is a preset expected loss value; otherwise, adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task, reconstructing a total loss calculation function by using the adjusted weight parameters, and returning to the step of inputting the sample data into the multi-task model in training. Through the scheme, the aim of reducing the waste of computing resources while guaranteeing the accuracy of the multi-task model can be fulfilled.

Description

Training method and device for multitasking model and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a training method and apparatus for a multitasking model.
Background
The multitasking model is a neural network model capable of simultaneously outputting processing results of a plurality of classes of tasks. For example: for the image analysis field, the first type of characteristic information and the second type of characteristic information of the image can be output simultaneously through a model, at the moment, the model belongs to a multi-task model, and the multi-task aimed by the multi-task model comprises an identification task related to the first type of characteristic information and an identification task related to the second type of characteristic information. Also, since the multitasking model is directed to multiple classes of tasks, there is a penalty for each class of tasks in training the model for the multitasking model.
In the prior art, when the loss value of a multi-task model is calculated in the process of training the multi-task model, the loss value of each type of task is weighted and summed by utilizing a set of preset weight parameters to obtain the loss value of the multi-task model; and when the loss value of the multi-task model is used for judging that the multi-task model is not converged, adjusting network parameters of the multi-task model, and continuing training the multi-task model. In addition, because the weight parameters used in model training are manually set, in order to ensure that the accuracy of the trained multi-task model is high, a plurality of groups of weight parameters are manually set, each group of weight parameters is used for training the multi-task model, and a model with the best convergence effect is selected from the plurality of trained multi-task models to be used as a final multi-task model.
It can be seen that there is no doubt a problem of wasting computational resources, since in the prior art a complete model training process is performed for each set of weight parameters.
Disclosure of Invention
The embodiment of the invention aims to provide a training method and device for a multi-task model and electronic equipment, so as to achieve the aim of reducing the waste of computing resources while guaranteeing the accuracy of the multi-task model.
The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a training method for a multitasking model, where the method includes:
inputting sample data into a multitasking model in training to obtain the predicted content of the sample data for each type of task;
calculating a loss value of each type of task by using the predicted content of the sample data for each type of task;
substituting the loss value of each type of task into a current total loss calculation function to obtain the loss value of the multi-task model; wherein the total loss value calculation function is for: summing the weighted loss of the loss value of each type of task and the correction value of the weight parameter corresponding to each type of task, wherein the correction value of each weight parameter and the weight parameter are in a negative correlation;
ending training when the obtained loss value is a preset expected loss value; otherwise, adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task, reconstructing the total loss calculation function by using the adjusted weight parameters, and returning to the step of inputting the sample data into the multi-task model in training.
Optionally, the total loss calculation function includes:
wherein L is total L is the loss value of the multitasking model i Loss value, alpha, for task i for which the multitasking model is intended i For the weight parameter corresponding to the task i, n is the total number of tasks for which the multi-task model is aimed, f (alpha) i ) For finding alpha i And f (alpha) i ) Is the function value and alpha i A function that is a negative correlation.
Optionally, the f (α i ) ComprisingWherein r is a preset base number.
Optionally, before substituting the loss value of each task into the current total loss calculation function to obtain the loss value of the multi-task model, the method further includes:
carrying out normalization processing on the loss value of each type of task;
substituting the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model, wherein the method comprises the following steps:
substituting the loss value of each type of task after normalization processing into a current total loss calculation function to obtain the loss value of the multi-task model.
Optionally, the normalizing the loss value of each task includes:
aiming at each type of task, calculating the maximum loss value which can be generated when the type of task generates loss;
and dividing the loss value of the task by the maximum loss value of the task aiming at each type of task to obtain the loss value of the task after normalization processing.
In a second aspect, an embodiment of the present invention further provides a training apparatus for a multitasking model, where the apparatus includes:
the prediction content determining module is used for inputting sample data into the multi-task model in training to obtain the prediction content of the sample data for each type of task;
the first loss value calculation module is used for calculating the loss value of each type of task by using the predicted content of the sample data for each type of task;
the second loss value calculation module is used for substituting the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model; wherein the total loss value calculation function is for: summing the weighted loss of the loss value of each type of task and the correction value of the weight parameter corresponding to each type of task, wherein the correction value of each weight parameter and the weight parameter are in a negative correlation;
the loss value analysis module is used for ending training when the obtained loss value is a preset expected loss value; otherwise, adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task, reconstructing the total loss calculation function by using the adjusted weight parameters, and triggering the prediction content determination module.
Optionally, the total loss calculation function includes:
wherein L is total L is the loss value of the multitasking model i Loss value, alpha, for task i for which the multitasking model is intended i For the weight parameter corresponding to the task i, n is the total number of tasks for which the multi-task model is aimed, f (alpha) i ) For finding alpha i And f (alpha) i ) Is the function value and alpha i A function that is a negative correlation.
Optionally, the f (α i ) ComprisingWherein r is a preset base number.
Optionally, the apparatus further comprises:
the normalization module is used for carrying out normalization processing on the loss value of each type of task before the second loss value calculation module substitutes the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model;
the second loss value calculation module is specifically configured to:
substituting the loss value of each type of task after normalization processing into a current total loss calculation function to obtain the loss value of the multi-task model.
Optionally, the normalization module is specifically configured to:
aiming at each type of task, calculating the maximum loss value which can be generated when the type of task generates loss; dividing the loss value of the task by the maximum loss value of the task to obtain the loss value of the task after normalization processing.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of the training method of the multi-task model provided by the embodiment of the invention when executing the program stored in the memory.
In a fourth aspect, the present embodiment further provides a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the steps of the training method for a multi-task model provided by the embodiment of the present invention.
In the scheme provided by the embodiment of the invention, when the multi-task model is trained, the weight parameters of various tasks and the network parameters of the multi-task model are used as parameters capable of being automatically learned, so that the optimal weight parameters can be obtained by automatically adjusting the weight parameters in a complete model training process; meanwhile, the total loss calculation function considers the correction value of the weight parameter corresponding to each type of task, so that the calculation process of the loss value of the multi-task model becomes a dynamic optimization process, and an implementation basis is provided for the multi-task model to achieve the convergence effect of the loss value as a desired loss value. Therefore, through the scheme, the aim of reducing the waste of computing resources while guaranteeing the accuracy of the multi-task model can be fulfilled.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flowchart of a training method of a multi-task model according to an embodiment of the present invention;
FIG. 2 is another flow chart of a training method of a multi-task model according to an embodiment of the present invention;
FIG. 3 is another flowchart of a training method of a multi-task model according to an embodiment of the present invention;
FIG. 4 (a) is a schematic diagram of a training process of a multi-task model in an embodiment of the present invention when two types of tasks are targeted by the multi-task model;
FIG. 4 (b) is a schematic diagram of a training process of the multi-task model in the embodiment of the present invention when the tasks targeted by the multi-task model are a heat map task and a displacement map task;
FIG. 5 is a schematic structural diagram of a training device for a multi-task model according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
In order to achieve the purpose of reducing the waste of computing resources while guaranteeing the accuracy of a multi-task model, the embodiment of the invention provides a training method and device of the multi-task model and electronic equipment.
The execution subject of the training method of the multi-task model provided by the embodiment of the invention can be a training device of the multi-task model. The training device of the multi-task model can be applied to electronic equipment, and the electronic equipment can be a server or terminal equipment.
In addition, any neural network model capable of simultaneously outputting the processing results of multiple tasks can be used as the multiple task model according to the embodiment of the invention, and the training method of the multiple task model provided by the embodiment of the invention is adopted for training, so that the precision of the multiple task model is ensured, and meanwhile, the waste of computing resources is reduced.
Specifically, for the field of image analysis, a neural network model capable of simultaneously outputting multiple types of characteristic information of images can be used as a multi-task model according to the embodiment of the invention; in the text analysis field, a neural network model capable of simultaneously outputting multiple types of characteristic information of a text can also be used as a multi-task model according to the embodiment of the invention. By way of example, the multi-class feature information of the image may include: human body key point information and portrait mask; or the first type of characteristic information of the human body key points in the image and the second type of characteristic information of the human body key points in the image, and the like. By way of example, the multi-class feature information of the text may include: emotion category information of text and keywords of text, and so on.
The following describes a training method of a multi-task model according to an embodiment of the present invention with reference to the accompanying drawings.
As shown in fig. 1, the training method of the multitasking model provided by the embodiment of the invention may include the following steps:
s101, inputting sample data into a multitasking model in training to obtain the predicted content of the sample data aiming at each type of task;
s102, calculating a loss value of each type of task by using the predicted content of the sample data aiming at each type of task;
s103, substituting the loss value of each task into the current total loss calculation function to obtain the loss value of the multi-task model; wherein the total loss value calculation function is for: summing the weighted loss of the loss value of each type of task and the correction value of the weight parameter corresponding to each type of task, wherein the correction value of each weight parameter and the weight parameter are in a negative correlation;
s104, judging whether the obtained loss value is a preset expected loss value, if so, ending training to obtain a model after training is completed; otherwise, S105 is performed;
s105, adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task, reconstructing the total loss calculation function by using the adjusted weight parameters, and returning to S101.
The sample data is data for training the multi-task model, and if the tasks aimed by the multi-task model are different, the sample data are different. For example, the task aimed by the multitasking model includes an identification task of multi-class feature information of an image, and the sample data is a sample image; and the task aimed by the multi-task model comprises a recognition task of multi-class characteristic information of the text, and the sample data is the text.
In addition, the predicted content of the sample data for each type of task is the processing result of the sample data for each type of task, which is obtained by the processing of the multitask model after the sample data is input into the multitask model. The specific model structure of the multitasking model may be set according to the actual situation, and the embodiment of the present invention is not limited to this.
It can be understood that, by using the predicted content of the sample data for each type of task, the processing thought of calculating the loss value of each type of task may be specifically: and calculating the loss value of each type of task according to the difference between the predicted content of the sample data for each type of task and the labeling content of the sample data for each type of task. Any specific implementation manner capable of calculating the loss value of each type of task can be applied to the embodiment of the invention.
In addition, in this embodiment, in order to avoid the problem of computing resource waste caused by manually setting the weight parameters multiple times, the weight parameters corresponding to each type of task are used as self-learning parameters in the model training process, that is, in a complete training process, the weight parameters corresponding to each type of task are no longer fixed values but updatable values. In order to ensure that the multi-task model effectively converges under higher precision on the premise that the weight parameters are taken as self-learning parameters, a new total loss calculation function is set in the embodiment. The new loss calculation function not only considers the weighted sum of the loss values of each type of task, but also considers the correction value of the weight parameter corresponding to each type of task, and the correction value and the weight parameter are in negative correlation, so that the calculation process of the loss values of the multi-task model becomes a dynamic optimization process, and an implementation basis is provided for the multi-task model to achieve the convergence effect of the loss values to the expected loss values.
Wherein, the negative correlation between the correction value and the weight parameter specifically means that: when the weight parameter increases, the correction value of the weight parameter decreases, and when the weight parameter decreases, the correction value of the weight parameter increases. It is known that, when the function is characterized by the relation between the weight parameter and the correction value of the weight parameter, the function of obtaining the correction value of the weight parameter is a function using the weight parameter as an independent variable.
In the total loss calculation function, the weight parameters corresponding to each task are manually set to initial values, so that each weight parameter in the total loss calculation function is an initial value when the loss value of the multi-task model is calculated for the first time, and each weight parameter in the total loss calculation function is a self-learning value when the loss value of the multi-task model is calculated for each time.
In addition, the adjustment manner adopted by the training device of the multitasking model in adjusting the network parameters of the multitasking model and the weight parameters corresponding to each task can be various, for example: the gradient adjustment method or the counter-propagation method is not limited to this, of course. Any mode capable of adjusting network parameters in the model training process can be used as an adjusting mode used for adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task in the embodiment of the invention.
Wherein the network parameters of the multitasking model are the parameters about the model itself that need to be learned by the model training process, such as: the network parameters of the multitasking model may include convolution kernel weights, full-connection layer weights, and so on.
In the scheme provided by the embodiment of the invention, when the multi-task model is trained, the weight parameters of various tasks and the network parameters of the multi-task model are used as parameters capable of being automatically learned, so that the optimal weight parameters can be obtained by automatically adjusting the weight parameters in a complete model training process; meanwhile, the total loss calculation function considers the correction value of the weight parameter corresponding to each type of task, so that the calculation process of the loss value of the multi-task model becomes a dynamic optimization process, and an implementation basis is provided for the multi-task model to achieve the convergence effect of the loss value as a desired loss value. Therefore, through the scheme, the aim of reducing the waste of computing resources while guaranteeing the accuracy of the multi-task model can be fulfilled.
Optionally, in one implementation, the total loss calculation function includes:
wherein L is total L is the loss value of the multitasking model i Loss value, alpha, for task i for which the multitasking model is intended i For the weight parameter corresponding to the task i, n is the total number of tasks for which the multi-task model is aimed, f (alpha) i ) For finding alpha i And f (alpha) i ) Is the function value and alpha i A function that is a negative correlation.
It is emphasized that f (α i ) Is the function value and alpha i On the premise of a function with a negative correlation, the embodiment of the invention is applied to f (alpha i ) The specific functional form of (a) is not limited. Illustratively, the f (α i ) May includeWherein r is a preset base number, at this time,
the specific value of r can be set according to practical situations, for example: r=10, alternatively r=5, and so on.
Optionally, to reduce the calculation amount, before S103 in the foregoing embodiment, that is, before substituting the loss value of each task into the current total loss calculation function to obtain the loss value of the multi-task model, as shown in fig. 2, the method further includes:
s1030, carrying out normalization processing on the loss value of each type of task;
accordingly, the step of S103 may include:
s1031, substituting the loss value of each type of task after normalization processing into the current total loss calculation function to obtain the loss value of the multi-task model.
For the convenience of understanding the solution, fig. 4 (a) shows a schematic diagram of the training process of the multitasking model in the embodiment of the present invention when the tasks targeted by the multitasking model are two types of tasks.
The normalization denominator used for normalizing the loss value of each type of task may be a value set empirically in advance, or may be a value determined from an actual loss.
Based on the manner of the value determined according to the actual loss, the normalization processing of the loss value of each task may include:
aiming at each type of task, calculating the maximum loss value which can be generated when the type of task generates loss; dividing the loss value of the task by the maximum loss value of the task to obtain the loss value of the task after normalization processing.
Also, in the scheme where there is a normalization process, the total loss calculation function may be, by way of example:
wherein L is total For the loss value of the multitasking model, L i Loss value, alpha, for task i for which the multitasking model is intended i And (3) for the weight parameter corresponding to the task i, n is the total number of the tasks aimed by the multi-task model, and r is a preset base number.
The following describes a training method of a multi-task model provided by the embodiment of the invention by taking an example of a multi-task for which the multi-task model aims at including a task for identifying a heat map of a human body key point in an image and a task for identifying a displacement map of the human body key point in the image. For convenience of description, the task of identifying the heat map of the human body key points in the image is simply referred to as a heat map task, and the task of identifying the displacement map of the human body key points in the image is simply referred to as a displacement map task.
Wherein, the heat map of human key point is: probability distribution map of the possible positions of key points of human body. And the displacement map of the human body key point comprises a displacement map of the human body key point in the x-axis direction and a displacement map of the human body key point in the y-axis direction, wherein each point in the displacement map of the human body key point in the x-axis direction is used for representing: in the x-axis direction, the offset distance of the position of the point relative to the position of the target point is that each point in the displacement diagram of the y-axis direction of the key point of the human body is used for representing: in the y-axis direction, the offset distance of the position of the point relative to the position of the target point, wherein the target point is the mapping point of the human body key point in the displacement diagram.
And the position coordinates of the key points of the human body can be determined by utilizing a preset calculation mode through the heat map and the displacement map of the key points of the human body. Since the position coordinates of the key points of the human body determined by the heat map and the displacement map of the key points of the human body are not the invention points of the invention, the embodiment of the invention is not limited to a predetermined calculation mode.
As shown in fig. 3, the training method of the multitasking model provided by the embodiment of the invention may include the following steps:
s301, inputting a sample image into a multitask model in training to obtain a predicted heat map and a predicted displacement map of each human body key point in the sample image;
each human body key point in the sample image corresponds to a predicted heat map, and simultaneously corresponds to a predicted displacement map in the x-axis direction and a predicted displacement map in the y-axis direction. Moreover, the specific structure of the multitasking model is not limited in the embodiment of the present invention.
S302, calculating a loss value of a heat map task and a loss value of a displacement map task by using the obtained predicted heat maps and the predicted displacement maps;
s303, carrying out normalization processing on the loss value of the heat map task to obtain a normalized loss value of the heat map task, and carrying out normalization processing on the loss value of the displacement map task to obtain a normalized loss value of the displacement map task;
s304, substituting the normalized loss value of the heat map task and the normalized loss value of the displacement map task into a current total loss calculation function to obtain the loss value of the multi-task model;
s305, judging whether the obtained loss value is a preset expected loss value, if so, ending training to obtain a trained multi-task model; otherwise, executing S306;
s306, adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task, reconstructing the total loss calculation function by using the adjusted weight parameters, and returning to S301.
For step S302, the loss value of the heat map task and the loss value of the displacement map task are calculated by using the obtained predicted heat maps and the predicted displacement maps, specifically:
for each predicted heat map, calculating a loss value of the predicted heat map based on the difference between the predicted heat map and a true heat map of the same human key point;
calculating a loss value of the predicted displacement map in the x-axis direction based on the difference between the predicted displacement map and a true value displacement map in the x-axis direction of the same human body key point aiming at the predicted displacement map in each x-axis direction;
calculating a loss value of the predicted displacement map in the y-axis direction based on the difference between the predicted displacement map and a true value displacement map in the y-axis direction of the same human body key point aiming at the predicted displacement map in each y-axis direction;
summing the loss values of each predicted heat map to obtain the loss value of the heat map task;
and summing the loss value of the predicted displacement map in each x-axis direction and the loss value of the predicted displacement map in each y-axis direction to obtain the loss value of the displacement map task.
For step S303, exemplary normalization processing is performed on the loss value of the heat map task to obtain a normalized loss value of the heat map task, which may include:
calculating the maximum loss value which can be generated when the heat map task generates loss;
and dividing the loss value of the heat map task by the maximum loss value of the heat map task to obtain the loss value of the heat map task after normalization processing.
The method for calculating the maximum loss value which can be generated when the heat map task generates loss can comprise the following steps:
calculating a maximum loss value which can be generated when the heat map task generates loss by using a first calculation formula;
the first calculation formula is as follows:
the maximum loss value which can be generated when the heat map task generates loss is represented by n, the number of the predicted heat maps and w h ×h h To predict the size of the heat map, +.>Maximum value of central point in n predicted heat maps, +.>Is the minimum of the edge points in the n predicted heat maps.
The central point of any predicted heat map is the pixel point with the maximum value; the edge point of any predicted heat map is the pixel point except the pixel point with the maximum value.
For step S303, exemplary normalization processing is performed on the loss value of the displacement map task to obtain a normalized loss value of the displacement map task, which may include:
calculating the maximum loss value which can be generated when the displacement map task generates loss;
and dividing the loss value of the displacement map task by the maximum loss value of the displacement map task to obtain the loss value of the displacement map task after normalization processing.
The method for calculating the maximum loss value which can be generated when the displacement map task generates loss can include:
for the maximum loss value that the displacement map task can produce when producing losses, +.>Maximum loss value in x-axis direction which can be generated when loss is generated for displacement map task,/>The maximum loss value in the y-axis direction which can be generated when the loss is generated for the displacement map task; n is the number of predicted displacement graphs in each direction, w h ×h h To predict the size of the displacement map.
For step S304, the total loss calculation function may be, for example:
wherein L is total Loss value, alpha, for a multitasking model 1 For the weight parameter corresponding to the heat map task, L h Loss value, alpha, for heat map task 2 For the weight parameter corresponding to the displacement diagram task, L o The loss value for the displacement map task.
For step S305, when the multi-task model is not converged, not only the network parameters of the multi-task model but also the weight parameters corresponding to the heat map task and the weight parameters corresponding to the displacement map task can be adjusted, and the total loss calculation function is reconstructed by using the adjusted weight parameters. In the total loss calculation function, the weight parameters corresponding to the heat map task and the weight parameters corresponding to the displacement map task are set with initial values manually, so that each weight parameter in the total loss calculation function is an initial value when the loss value of the multi-task model is calculated for the first time, and each weight parameter in the total loss calculation function is a self-learning value when the loss value of the multi-task model is calculated for each time.
For the convenience of understanding the solution, fig. 4 (b) shows a schematic diagram of the training process of the multitask model in the embodiment of the present invention when the tasks targeted by the multitask model are a heat map task and a displacement map task.
In the scheme provided by the embodiment of the invention, the weight parameters of the heat map task and the displacement map task and the network parameters of the multi-task model are taken as the parameters capable of being automatically learned during the multi-task model training, so that the optimal weight parameters and weight parameters can be obtained by automatically adjusting the weight parameters in a complete model training process; meanwhile, the total loss calculation function considers the correction value of the weight parameter corresponding to each type of task, so that the calculation process of the loss value of the multi-task model becomes a dynamic optimization process, and an implementation basis is provided for the multi-task model to achieve the convergence effect of the loss value as a desired loss value. Therefore, through the scheme, the aim of reducing the waste of computing resources while guaranteeing the accuracy of the multi-task model can be fulfilled.
Corresponding to the embodiment of the method, the embodiment of the invention also provides a training device of the multi-task model. As shown in fig. 5, the apparatus may include:
the predicted content determining module 510 is configured to input sample data into the multitasking model in training, and obtain predicted content of the sample data for each task;
a first loss value calculation module 520, configured to calculate a loss value of each type of task using the predicted content of the sample data for each type of task;
a second loss value calculation module 530, configured to substitute the loss value of each task into the current total loss calculation function, to obtain a loss value of the multi-task model; wherein the total loss value calculation function is for: summing the weighted loss of the loss value of each type of task and the correction value of the weight parameter corresponding to each type of task, wherein the correction value of each weight parameter and the weight parameter are in a negative correlation;
the loss value analysis module 540 is configured to end training when the obtained loss value is a preset expected loss value; otherwise, the network parameters of the multi-task model and the weight parameters corresponding to each type of task are adjusted, the total loss calculation function is reconstructed by using the adjusted weight parameters, and the prediction content determination module 510 is triggered.
In the scheme provided by the embodiment of the invention, when the multi-task model is trained, the weight parameters of various tasks and the network parameters of the multi-task model are used as parameters capable of being automatically learned, so that the optimal weight parameters can be obtained by automatically adjusting the weight parameters in a complete model training process; meanwhile, the total loss calculation function considers the correction value of the weight parameter corresponding to each type of task, so that the calculation process of the loss value of the multi-task model becomes a dynamic optimization process, and an implementation basis is provided for the multi-task model to achieve the convergence effect of the loss value as a desired loss value. Therefore, through the scheme, the aim of reducing the waste of computing resources while guaranteeing the accuracy of the multi-task model can be fulfilled.
Optionally, the total loss calculation function includes:
wherein L is total L is the loss value of the multitasking model i Loss value, alpha, for task i for which the multitasking model is intended i For the weight parameter corresponding to the task i, n is the total number of tasks for which the multi-task model is aimed, f (alpha) i ) For finding alpha i And f (alpha) i ) Is the function value and alpha i A function that is a negative correlation.
Optionally, the f (α i ) ComprisingWherein r is a preset base number.
Optionally, the apparatus further comprises:
the normalization module is used for carrying out normalization processing on the loss value of each type of task before the second loss value calculation module substitutes the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model;
the second loss value calculation module is specifically configured to:
substituting the loss value of each type of task after normalization processing into a current total loss calculation function to obtain the loss value of the multi-task model.
Optionally, the normalization module is specifically configured to:
aiming at each type of task, calculating the maximum loss value which can be generated when the type of task generates loss; dividing the loss value of the task by the maximum loss value of the task to obtain the loss value of the task after normalization processing.
The embodiment of the invention also provides an electronic device, as shown in fig. 6, comprising a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,
a memory 603 for storing a computer program;
process 601 is used for implementing the steps of the training method of the multitasking model provided in the embodiment of the invention when executing the program stored in memory 603.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the method for training a multi-tasking model according to any of the above embodiments.
In yet another embodiment of the present invention, a computer program product comprising instructions, which when run on a computer, causes the computer to perform the method of training a multi-tasking model as described in any of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, device, storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (12)

1. A training method of a multitasking model, characterized in that the method is applied to an electronic device; the method comprises the following steps:
inputting sample data into a multitasking model in training to obtain the predicted content of the sample data for each type of task; when the task aimed by the multi-task model is an identification task of multi-class feature information of the image, the sample data is a sample image; or when the task aimed by the multi-task model is the recognition task of the multi-class characteristic information of the text, the sample data is the sample text;
calculating a loss value of each type of task by using the predicted content of the sample data for each type of task; wherein calculating a loss value for each type of task using the predicted content of the sample data for each type of task comprises: calculating a loss value of each type of task according to the difference between the predicted content of the sample data for each type of task and the labeling content of the sample data for each type of task; when the task aimed by the multi-task model is the recognition task of the multi-class feature information of the image, the labeling content aimed at each class of task of the sample data is the labeling content aimed at each class of task of the sample image; or when the task aimed by the multi-task model is an identification task of multi-class feature information of a text, the labeling content aimed at each class of task of the sample data is the labeling content aimed at each class of task of the sample text;
substituting the loss value of each type of task into a current total loss calculation function to obtain the loss value of the multi-task model; wherein the total loss calculation function is to: summing the weighted loss of the loss value of each type of task and the correction value of the weight parameter corresponding to each type of task, wherein the correction value of each weight parameter and the weight parameter are in a negative correlation;
ending training when the obtained loss value is a preset expected loss value; otherwise, adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task, reconstructing the total loss calculation function by using the adjusted weight parameters, and returning to the step of inputting the sample data into the multi-task model in training.
2. The method of claim 1, wherein the total loss calculation function comprises:
wherein L is total L is the loss value of the multitasking model i Loss value, alpha, for task i for which the multitasking model is intended i For any of the followingWeight parameter corresponding to task i, n is total number of tasks for the multi-task model, f (alpha) i ) For finding alpha i And f (alpha) i ) Is the function value and alpha i A function that is a negative correlation.
3. The method according to claim 2, wherein the f (α i ) ComprisingWherein r is a preset base number.
4. A method according to any one of claims 1-3, wherein before substituting the loss value for each class of task into the current total loss calculation function to obtain the loss value for the multi-task model, the method further comprises:
carrying out normalization processing on the loss value of each type of task;
substituting the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model, wherein the method comprises the following steps:
substituting the loss value of each type of task after normalization processing into a current total loss calculation function to obtain the loss value of the multi-task model.
5. The method of claim 4, wherein normalizing the loss value for each class of tasks comprises:
aiming at each type of task, calculating the maximum loss value which can be generated when the type of task generates loss;
and dividing the loss value of the task by the maximum loss value of the task aiming at each type of task to obtain the loss value of the task after normalization processing.
6. A training device of a multitasking model, characterized in that the device is applied to an electronic device; the device comprises:
the prediction content determining module is used for inputting sample data into the multi-task model in training to obtain the prediction content of the sample data for each type of task; when the task aimed by the multi-task model is an identification task of multi-class feature information of the image, the sample data is a sample image; or when the task aimed by the multi-task model is the recognition task of the multi-class characteristic information of the text, the sample data is the sample text;
the first loss value calculation module is used for calculating the loss value of each type of task by using the predicted content of the sample data for each type of task; the first loss value calculation module is specifically configured to: calculating a loss value of each type of task according to the difference between the predicted content of the sample data for each type of task and the labeling content of the sample data for each type of task; when the task aimed by the multi-task model is the recognition task of the multi-class feature information of the image, the labeling content aimed at each class of task of the sample data is the labeling content aimed at each class of task of the sample image; or when the task aimed by the multi-task model is an identification task of multi-class feature information of a text, the labeling content aimed at each class of task of the sample data is the labeling content aimed at each class of task of the sample text;
the second loss value calculation module is used for substituting the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model; wherein the total loss calculation function is to: summing the weighted loss of the loss value of each type of task and the correction value of the weight parameter corresponding to each type of task, wherein the correction value of each weight parameter and the weight parameter are in a negative correlation;
the loss value analysis module is used for ending training when the obtained loss value is a preset expected loss value; otherwise, adjusting the network parameters of the multi-task model and the weight parameters corresponding to each type of task, reconstructing the total loss calculation function by using the adjusted weight parameters, and triggering the prediction content determination module.
7. The apparatus of claim 6, wherein the total loss calculation function comprises:
wherein L is total L is the loss value of the multitasking model i Loss value, alpha, for task i for which the multitasking model is intended i For the weight parameter corresponding to the task i, n is the total number of tasks for which the multi-task model is aimed, f (alpha) i ) For finding alpha i And f (alpha) i ) Is the function value and alpha i A function that is a negative correlation.
8. The apparatus of claim 7, wherein the f (α i ) ComprisingWherein r is a preset base number.
9. The apparatus according to any one of claims 6-8, further comprising:
the normalization module is used for carrying out normalization processing on the loss value of each type of task before the second loss value calculation module substitutes the loss value of each type of task into the current total loss calculation function to obtain the loss value of the multi-task model;
the second loss value calculation module is specifically configured to:
substituting the loss value of each type of task after normalization processing into a current total loss calculation function to obtain the loss value of the multi-task model.
10. The apparatus according to claim 9, wherein the normalization module is specifically configured to:
aiming at each type of task, calculating the maximum loss value which can be generated when the type of task generates loss; dividing the loss value of the task by the maximum loss value of the task to obtain the loss value of the task after normalization processing.
11. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-5 when executing a program stored on a memory.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.
CN201911205175.5A 2019-11-29 2019-11-29 Training method and device for multitasking model and electronic equipment Active CN111027428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911205175.5A CN111027428B (en) 2019-11-29 2019-11-29 Training method and device for multitasking model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911205175.5A CN111027428B (en) 2019-11-29 2019-11-29 Training method and device for multitasking model and electronic equipment

Publications (2)

Publication Number Publication Date
CN111027428A CN111027428A (en) 2020-04-17
CN111027428B true CN111027428B (en) 2024-03-08

Family

ID=70207377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911205175.5A Active CN111027428B (en) 2019-11-29 2019-11-29 Training method and device for multitasking model and electronic equipment

Country Status (1)

Country Link
CN (1) CN111027428B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541124B (en) * 2020-12-24 2024-01-12 北京百度网讯科技有限公司 Method, apparatus, device, medium and program product for generating a multitasking model
CN113516239A (en) * 2021-04-16 2021-10-19 Oppo广东移动通信有限公司 Model training method and device, storage medium and electronic equipment
CN113435528B (en) * 2021-07-06 2024-02-02 北京有竹居网络技术有限公司 Method, device, readable medium and electronic equipment for classifying objects
CN114913371A (en) * 2022-05-10 2022-08-16 平安科技(深圳)有限公司 Multitask learning model training method and device, electronic equipment and storage medium
CN114882464B (en) * 2022-05-31 2023-07-18 小米汽车科技有限公司 Multi-task model training method, multi-task processing method, device and vehicle
CN115081630A (en) * 2022-08-24 2022-09-20 北京百度网讯科技有限公司 Training method of multi-task model, information recommendation method, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503669A (en) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 A kind of based on the training of multitask deep learning network, recognition methods and system
CN109086660A (en) * 2018-06-14 2018-12-25 深圳市博威创盛科技有限公司 Training method, equipment and the storage medium of multi-task learning depth network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503669A (en) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 A kind of based on the training of multitask deep learning network, recognition methods and system
CN109086660A (en) * 2018-06-14 2018-12-25 深圳市博威创盛科技有限公司 Training method, equipment and the storage medium of multi-task learning depth network

Also Published As

Publication number Publication date
CN111027428A (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN111027428B (en) Training method and device for multitasking model and electronic equipment
CN110837410B (en) Task scheduling method and device, electronic equipment and computer readable storage medium
WO2021238262A1 (en) Vehicle recognition method and apparatus, device, and storage medium
CN109598414B (en) Risk assessment model training, risk assessment method and device and electronic equipment
WO2022027913A1 (en) Target detection model generating method and apparatus, device and storage medium
CN111027412B (en) Human body key point identification method and device and electronic equipment
US20220129740A1 (en) Convolutional neural networks with soft kernel selection
US20220383036A1 (en) Clustering data using neural networks based on normalized cuts
CN111967608A (en) Data processing method, device, equipment and storage medium
CN110751175A (en) Method and device for optimizing loss function, computer equipment and storage medium
US20220004849A1 (en) Image processing neural networks with dynamic filter activation
CN116822651A (en) Large model parameter fine adjustment method, device, equipment and medium based on incremental learning
CN110851987A (en) Method, apparatus and storage medium for predicting calculated duration based on acceleration ratio
CN112434717B (en) Model training method and device
CN111340245A (en) Model training method and system
CN112085152A (en) System for preventing countermeasure samples against ML and AI models
CN117114075B (en) Neural network model quantization method, device, equipment and medium
CN117057443A (en) Prompt learning method of visual language model and electronic equipment
CN117014507A (en) Training method of task unloading model, task unloading method and device
CN114170481B (en) Method, apparatus, storage medium, and program product for image processing
CN110443746B (en) Picture processing method and device based on generation countermeasure network and electronic equipment
CN111814051B (en) Resource type determining method and device
CN113015193B (en) Method and device for generating base station configuration file, server and storage medium
CN111582456B (en) Method, apparatus, device and medium for generating network model information
CN112926608A (en) Image classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant