CN111461156A - Multitask training method and device - Google Patents

Multitask training method and device Download PDF

Info

Publication number
CN111461156A
CN111461156A CN201910108378.6A CN201910108378A CN111461156A CN 111461156 A CN111461156 A CN 111461156A CN 201910108378 A CN201910108378 A CN 201910108378A CN 111461156 A CN111461156 A CN 111461156A
Authority
CN
China
Prior art keywords
trained
task
label
model
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910108378.6A
Other languages
Chinese (zh)
Inventor
夏雄尉
谢泽华
周泽南
苏雪峰
许静芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Sogou Hangzhou Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd, Sogou Hangzhou Intelligent Technology Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201910108378.6A priority Critical patent/CN111461156A/en
Publication of CN111461156A publication Critical patent/CN111461156A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a multitask training method and a multitask training device, wherein the method comprises the following steps: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in the model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process. The method solves the problem that the multi-task model is trained to difficultly ensure the convergence consistency among the multiple tasks, and improves the convergence effect of the model to be trained.

Description

Multitask training method and device
Technical Field
The invention relates to the technical field of computer application and intelligent recognition, in particular to a multitask training method and a multitask training device.
Background
In the scenario of multi-task joint training, it is usually required that the training set includes label information corresponding to each task, but in practice, the training set is derived from different data sets, and the label information included in different data sets is often different. Therefore, a problem often encountered is that a data set only contains labels corresponding to one or some of the multiple tasks, in which case data containing different label information is usually trained step by step.
However, it is difficult to ensure convergence consistency between different tasks by performing step-by-step training on different data sets, and the convergence effect of the model in training cannot be ensured.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for multi-task training, which solve the problem that it is difficult to ensure consistency between multiple tasks in the prior art for training a multi-task model, and improve the convergence effect of a model to be trained.
In a first aspect, the present application provides the following technical solutions through an embodiment:
a method of multitask training comprising:
acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
Preferably, the step of obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample includes:
according to task labels carried by the training samples, carrying out task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label; and generating a label mask of each task to be trained according to the task mark.
Preferably, the step of obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample includes:
for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set; marking a second mark on the training sample which does not contain the task label of the current task to be trained in the data set; and obtaining a label mask corresponding to the task to be trained according to the first mark and the second mark.
Preferably, the step of obtaining the gradient value of each task to be trained according to the label mask corresponding to the data set and each task to be trained includes: for the same task to be trained, screening out a target sample containing a task label of the task to be trained from the data set according to a label mask corresponding to the current task to be trained; obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample; and obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained.
Preferably, the representation of the label mask is in the form of a binary vector.
Preferably, the step of obtaining the update parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process specifically includes:
accumulating the gradient values of the N tasks to be trained to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.
Preferably, the step of obtaining the update parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process specifically includes:
distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained; weighting and accumulating the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.
In a second aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment
A multitasking training device comprising:
the data set acquisition module is used for acquiring a data set of a model to be trained, wherein the model to be trained comprises N tasks to be trained, the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; the label mask acquiring module is used for acquiring a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, and the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; the gradient value acquisition module is used for acquiring a gradient value generated by each task to be trained in the model training process according to the data set and the label mask corresponding to each task to be trained; and the updating parameter acquisition module is used for acquiring the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
Preferably, the tag mask obtaining module is further configured to:
according to task labels carried by the training samples, carrying out task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label; and generating a label mask of each task to be trained according to the task mark.
Preferably, the tag mask obtaining module is further configured to:
for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set; marking a second mark on the training sample which does not contain the task label of the current task to be trained in the data set; and obtaining a label mask corresponding to the task to be trained according to the first mark and the second mark.
Preferably, the gradient value obtaining module is further configured to:
for the same task to be trained, screening out a target sample containing a task label of the task to be trained from the data set according to a label mask corresponding to the current task to be trained; obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample; and obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained.
Preferably, the representation of the label mask is in the form of a binary vector.
Preferably, the update parameter obtaining module is further configured to:
accumulating the gradient values of the N tasks to be trained to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.
Preferably, the update parameter obtaining module is further configured to:
distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained; weighting and accumulating the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.
In a third aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment of the present application:
a multitasking training device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors the one or more programs including instructions for:
acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
In a fourth aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment of the present application:
a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
the invention provides a multitask training method and a multitask training device, wherein the method comprises the following steps: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process. By the method, training samples in the data set do not need to be screened and removed. The updating parameters obtained in the training process of the model to be trained integrate the gradient values of a plurality of tasks to be trained, so that the plurality of tasks to be trained are optimized in the same iteration, the information of task labels corresponding to the plurality of tasks to be trained can be integrated, and the information can not be obviously dominated by a certain task to be trained, so that the plurality of tasks to be trained can be optimized well at the same time, the consistency of multi-task training of the model to be trained is ensured, and the convergence effect of the model to be trained is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a multitask training method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of one embodiment of step S20 in FIG. 1;
FIG. 3 is a flowchart of another embodiment of step S20 in FIG. 1;
FIG. 4 is a functional diagram of a label mask of the multitask training method according to the first embodiment of the present invention;
FIG. 5 is a functional block diagram of a multitasking training device according to a second embodiment of the present invention;
FIG. 6 is a block diagram of an exemplary multitask training device according to a third embodiment of the present invention;
fig. 7 is a block diagram of an exemplary server according to a fourth embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
For purposes of explanation and understanding, the following detailed description describes embodiments of the invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numbers indicate like or similar elements or elements having like or similar functionality throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
First embodiment
Referring to fig. 1, a multitask training method is provided in this embodiment, fig. 1 shows a specific flow of the method, and the method will be described in detail with reference to the drawings. The multitask training method comprises the following steps:
step S10: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2.
Step S20: and obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained.
Step S30: and obtaining a gradient value generated by each task to be trained in the model training process according to the label mask corresponding to the data set and each task to be trained.
Step S40: and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
In step S10, the model to be trained may include a plurality of tasks to be trained, in this embodiment, N tasks to be trained, where N is a positive integer greater than or equal to 2, such as a model to be trained for predicting the color and style of clothes, and the model may include a color recognition task and a style recognition task, i.e., includes two tasks to be trained.
The data set can be generated by mixing a plurality of different data sets, the training samples in the data set all carry task labels, the total number of the task labels in the data set is M, and M is a positive integer greater than or equal to 2. The training samples containing the task labels can be used for training the tasks to be trained corresponding to the task labels.
In step S20, a label mask corresponding to each to-be-trained task is obtained according to each to-be-trained task and the task label of each training sample, where the label mask of each to-be-trained task is used to identify the training sample containing the task label of the to-be-trained task.
Specifically, each training sample in the data set may be respectively subjected to task labeling for each task to be trained, whether the training sample contains a task label corresponding to the current task to be trained is referred to when the task labeling is performed, and then a vector formed by or a combination of all task labels in the data set for the current task to be trained may be used as a label mask.
In this embodiment, two specific ways of obtaining the tag mask are provided, as follows:
first, please refer to fig. 2:
step S21 a: and according to task labels carried by the training samples, performing task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label.
In step S21a, task labels that are not included in the training sample (tasks included in the N tasks, but corresponding task labels are not included in the training sample) can be uniformly labeled for uniform recognition. For example:
TABLE 1
Figure BDA0001950522510000091
As shown in Table 1, task labels that are not included in the training samples are uniformly labeled with "00" tasks. In addition, other recognizable labels may be used to represent that the training samples do not include the corresponding task labels. The color label, the style label and the style label are all marked by different tasks, which are respectively '01', '10' and '11'. The task mark used in the present embodiment is not limited, and for example, the task mark may be replaced with "1", "2", "3", or the like.
After the task labeling is performed on the training samples in the step S21a, the training samples can be conveniently screened. For example, in a data set that has been subjected to task labeling, there are a large number of training samples, and a training sample containing a style label needs to be screened out from the set, so that it can be directly identified whether the training sample is labeled with a task label "10", that is, a training sample 1 and a training sample 2 can be screened out.
By the task marking method in step S21a, the type of the task to be trained does not need to be determined, and only the task marks of different tasks to be trained need to be distinguished from each other, so that the marking is convenient and effective, and step S21a can also be performed before step S10.
Step S21 b: and generating a label mask of each task to be trained according to the task mark.
In step S21b, taking table 1 as an example, if the training sample used in the current training process of the model to be trained is the sample in table 1, the label mask of the color task obtained according to the task label is [01,00,01,01], similarly, the style task label mask is [10,10,00,00], and the style task label mask is [11,00,11,11 ].
Secondly, please refer to fig. 3:
step S22 a: and for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set.
In step S22a, the training samples in the data set need to be labeled to identify which training samples contain the task labels corresponding to the task to be trained currently. And marking a first mark on a training sample containing a task label of the current task to be trained. In this embodiment, the first flag may be marked with "1" to form a binary vector with a second flag, which will be described later, that is, the first flag uses binary 1(True) to represent the task label containing the current task to be trained, and binary 0(False) to represent the task label (second flag) not containing the current task to be trained, so that the binary vector can be computed by using bit operations, and the computation is fast and efficient.
Step S22 b: and marking a second mark on the training sample which does not contain the task label of the current task to be trained in the data set.
In step S22b, the type of the second mark should be distinguished from the first mark, i.e. the second mark may be any character different from the first mark, and may be labeled with "0", for example. It should be noted that, in some embodiments, step S22b may not be performed, i.e., the second mark is not labeled. Aiming at the current task to be trained, when the training sample does not contain the first mark, the training sample can be judged to contain no task label of the current task to be trained.
For example, in a specific embodiment, with a first flag "1" and a second flag "0", the labeling of step S22b is performed on the content in table 1 to obtain:
TABLE 2
Figure BDA0001950522510000101
As can be seen from Table 2, whether the training sample contains the task label corresponding to the current task to be trained can be determined by the labels of "1" and "0".
Step S22 c: and obtaining a label mask corresponding to the task to be trained according to the first mark and/or the second mark.
In step S22c, taking table 2 as an example, the training samples adopted by the model to be trained in the current training are the samples in table 2. The obtained label mask corresponding to the color task is [1,0,1,1], the label mask of the style task is [1,1,0,0], the label mask of the style is [1,0,1,1], the label masks are in a binary vector form, whether the training samples contain the task labels or not can be effectively represented, and the identification is more convenient.
Role and use of the tag mask: when the training process of the model to be trained is carried out, the training sample only containing the task to be trained can be selected by using the label mask. For the color task, a training sample 1, a training sample 3 and a training sample 4 are selected through label masks, as shown in fig. 4; for the style task, training sample 1 and training sample 2 are selected through the label mask.
In this embodiment, the selection of the training sample is performed through the label mask, so that the training sample can be fully utilized, the discarding of the training sample is avoided, and the re-labeling of the task label after the merging of a plurality of data sets is also avoided; the utilization rate and the utilization efficiency of the training samples are improved.
Step S30: and obtaining a gradient value generated by each task to be trained in the model training process according to the label mask corresponding to the data set and each task to be trained.
In step S30, a specific implementation step may include:
1. for the same task to be trained, screening out a target sample containing a task label of the current task to be trained from the data set according to a label mask corresponding to the current task to be trained; the process of filtering can refer to the description of the role and use of the label mask in step S22c, that is, the training sample containing the task label of the task to be trained currently serves as the target sample.
2. Obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample; the model training process of the model to be trained is a continuous iteration process, and each iteration comprises the following steps: firstly, inputting a batch of data to perform forward calculation, wherein the last step of the forward calculation is to obtain loss through calculation of a loss function; after the forward calculation, performing backward propagation, wherein the backward propagation is a process of calculating the gradient of the model parameter; after the gradient is obtained through back propagation calculation, the model parameters are updated according to the calculated gradient, and therefore one iteration is completed. The intermediate calculation of the model to be trained and the calculation step of the parameter update are known technical means, and are not described in detail in this embodiment. The input data used in this embodiment is a target sample screened out from the data set.
3. And obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained. Specifically, in the same iteration process, the gradient value corresponding to each task to be trained needs to be reserved so as to update the parameters of the model to be trained after the gradient values of the tasks to be trained are fused.
Step S40: and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
In step S40, gradient values of N tasks to be trained need to be fused. Specifically, the following two fusion methods are provided in this embodiment.
1. The cumulative fusion of the gradients was performed directly.
Step 1: accumulating the gradient values of the N tasks to be trained to obtain a target gradient value; wherein, the accumulation is directly added, and the gradient value obtained after the addition is the target gradient value.
Step 2: and obtaining the updating parameters of the model to be trained according to the target gradient value. The updated parameters are the amount of the corresponding parameters in the model to be trained that need to be adjusted, and the specific calculation process is well known to those skilled in the relevant field and is not described in detail. And updating the corresponding parameters in the model to be trained according to the updated parameters to complete iteration. It should be noted that, when a plurality of model parameters need to be updated, different update parameters can be obtained through the fused target gradient values, and iterative update is performed.
2. And performing weighted fusion on the gradient values of each different task to be trained.
Step 1: and distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained.
Step 2: and performing weighted accumulation on the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value.
And step 3: and obtaining the updating parameters of the model to be trained according to the target gradient value.
For example, when the number N of tasks to be trained is 3, and the corresponding samples used are shown in table 2. If the color task is a more main task for the model to be trained, weights are assigned in a tilting manner, for example, the weight assigned to the gradient value corresponding to the color task is 0.4, the weight assigned to the gradient value corresponding to the style task is 0.3, and the weight assigned to the gradient value corresponding to the style task is 0.3. Namely, the three gradient values are weighted and summed respectively, namely, fused. The updating process is described with reference to the first fusion method.
Through the combination of the label mask and the multi-task gradient fusion, a plurality of tasks to be trained are optimized in the same iteration, information of corresponding task labels can be synthesized, and the information can not be obviously dominated by a certain task to be trained, so that the plurality of tasks to be trained can be optimized better at the same time, and faster and better convergence during multi-task training can be ensured.
It should be noted that, in this embodiment, if the data volumes including different task labels in a batch data set are different greatly, the variance of the training sample gradient is dominated by the task to be trained with a small data volume, which brings instability of training, so that it should be ensured that the data volume of each task is relatively balanced as much as possible during the training process. In the example given above, 3 samples are contained in a batch for the color task, and 2 samples are contained in a batch for the style classification task, with a difference of 1, which is not obvious. However, in actual training, the size of a batch is generally much larger than 4, possibly hundreds of sizes, and if the data volumes of the two tasks are significantly different, the effect is amplified. Therefore, in a preferred embodiment of this embodiment, it is required to ensure that the difference between the numbers of samples corresponding to a plurality of tasks to be trained (i.e. the number of samples corresponding to each task to be trained and including the task label) is not more than 40%, more preferably not more than 30%, and in some cases, may be 10% -20%.
For example: in the same model K to be trained, there are a task A and a task B. 100 training samples corresponding to the task A labels in a data set X; 110 training samples corresponding to the task B labels in the data set X; the difference between the training samples of the task A and the task B is 10%, the difference is not obvious, and the model K to be trained can be stably converged by using the data set X to train.
In summary, the present invention provides a multitask training method and device, wherein the method includes the steps of: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process. By the method, training samples in the data set do not need to be screened and removed. The updating parameters obtained in the training process of the model to be trained integrate the gradient values of a plurality of tasks to be trained, so that the plurality of tasks to be trained are optimized in the same iteration, the information of task labels corresponding to the plurality of tasks to be trained can be integrated, and the information can not be obviously dominated by a certain task to be trained, so that the plurality of tasks to be trained can be optimized well at the same time, the consistency of multi-task training of the model to be trained is ensured, and the convergence effect of the model to be trained is improved.
Second embodiment
Referring to fig. 5, based on the same inventive concept, the present invention further provides a multitask training device 600, wherein the device 600 comprises:
the data set obtaining module 601 is configured to obtain a data set of a model to be trained, where the model to be trained includes N tasks to be trained, where the data set includes M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2.
A label mask obtaining module 602, configured to obtain, according to each to-be-trained task and a task label of each training sample, a label mask corresponding to each to-be-trained task, where the label mask of each to-be-trained task is used to identify the training sample containing the task label of the to-be-trained task.
A gradient value obtaining module 603, configured to obtain, according to the label mask corresponding to the data set and each task to be trained, a gradient value generated by each task to be trained in a model training process.
An update parameter obtaining module 604, configured to obtain an update parameter of the model to be trained according to gradient values generated in a model training process of the N tasks to be trained.
As an optional implementation manner, the tag mask obtaining module 602 is further configured to:
according to task labels carried by the training samples, carrying out task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label; and generating a label mask of each task to be trained according to the task mark.
As an optional implementation manner, the tag mask obtaining module 602 is further configured to:
for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set; marking a second mark on the training sample which does not contain the task label of the current task to be trained in the data set; and obtaining a label mask corresponding to the task to be trained according to the first mark and the second mark.
As an optional implementation manner, the gradient value obtaining module 603 is further configured to:
for the same task to be trained, screening out a target sample containing a task label of the task to be trained from the data set according to a label mask corresponding to the current task to be trained; obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample; and obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained.
As an alternative embodiment, the representation of the label mask is in the form of a binary vector.
As an optional implementation manner, the update parameter obtaining module 604 is further configured to:
accumulating the gradient values of the N tasks to be trained to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.
As an optional implementation manner, the update parameter obtaining module 604 is further configured to:
distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained; weighting and accumulating the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Third embodiment
FIG. 6 is a block diagram illustrating a multitasking training device 800 according to an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 6, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power component 806 provides power to the various components of device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user in some embodiments, the screen may include a liquid crystal display (L CD) and a Touch Panel (TP). if the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), programmable logic devices (P L D), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the methods of the above-described method embodiments.
Fourth embodiment
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A computer readable storage medium, in particular a non-transitory computer readable storage medium, having instructions which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of multitasking training, the method comprising:
acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
Fig. 7 is a schematic structural diagram of a server in an embodiment of the present invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows server, Mac OS XTM, UnixTM, &lttttranslation = L "&tttl &/t &gttinuxtm, FreeBSDTM, and so forth.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of multitask training comprising:
acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2;
obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained;
obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained;
and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
2. The method according to claim 1, wherein the step of obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample comprises:
according to task labels carried by the training samples, carrying out task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label;
and generating a label mask of each task to be trained according to the task mark.
3. The method according to claim 1, wherein the step of obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample comprises:
for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set; and
marking a second mark on a training sample which does not contain the task label of the current task to be trained in the data set;
and obtaining a label mask corresponding to the task to be trained according to the first mark and the second mark.
4. The method according to claim 2 or 3, wherein the step of obtaining the gradient value of each task to be trained according to the label mask corresponding to each task to be trained in the data set comprises:
for the same task to be trained, screening out a target sample containing a task label of the task to be trained from the data set according to a label mask corresponding to the current task to be trained;
obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample;
and obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained.
5. The method of claim 2, wherein the representation of the label mask is in the form of a binary vector.
6. The method according to claim 1, wherein the step of obtaining the update parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process specifically comprises:
accumulating the gradient values of the N tasks to be trained to obtain a target gradient value;
and obtaining the updating parameters of the model to be trained according to the target gradient value.
7. The method according to claim 1, wherein the step of obtaining the update parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process specifically comprises:
distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained;
weighting and accumulating the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value;
and obtaining the updating parameters of the model to be trained according to the target gradient value.
8. A multitasking training device, comprising:
the data set acquisition module is used for acquiring a data set of a model to be trained, wherein the model to be trained comprises N tasks to be trained, the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2;
the label mask acquiring module is used for acquiring a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, and the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained;
the gradient value acquisition module is used for acquiring a gradient value generated by each task to be trained in the model training process according to the data set and the label mask corresponding to each task to be trained;
and the updating parameter acquisition module is used for acquiring the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
9. A multitasking training device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:
acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
10. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the steps of:
acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.
CN201910108378.6A 2019-01-18 2019-01-18 Multitask training method and device Pending CN111461156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910108378.6A CN111461156A (en) 2019-01-18 2019-01-18 Multitask training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910108378.6A CN111461156A (en) 2019-01-18 2019-01-18 Multitask training method and device

Publications (1)

Publication Number Publication Date
CN111461156A true CN111461156A (en) 2020-07-28

Family

ID=71683118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910108378.6A Pending CN111461156A (en) 2019-01-18 2019-01-18 Multitask training method and device

Country Status (1)

Country Link
CN (1) CN111461156A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813869A (en) * 2020-08-21 2020-10-23 支付宝(杭州)信息技术有限公司 Distributed data-based multi-task model training method and system
CN112561077A (en) * 2020-12-14 2021-03-26 北京百度网讯科技有限公司 Training method and device of multi-task model and electronic equipment
CN112559007A (en) * 2020-12-14 2021-03-26 北京百度网讯科技有限公司 Parameter updating method and device of multitask model and electronic equipment
CN114359904A (en) * 2022-01-06 2022-04-15 北京百度网讯科技有限公司 Image recognition method, image recognition device, electronic equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813869A (en) * 2020-08-21 2020-10-23 支付宝(杭州)信息技术有限公司 Distributed data-based multi-task model training method and system
CN111813869B (en) * 2020-08-21 2021-03-23 支付宝(杭州)信息技术有限公司 Distributed data-based multi-task model training method and system
CN112561077A (en) * 2020-12-14 2021-03-26 北京百度网讯科技有限公司 Training method and device of multi-task model and electronic equipment
CN112559007A (en) * 2020-12-14 2021-03-26 北京百度网讯科技有限公司 Parameter updating method and device of multitask model and electronic equipment
JP2022028871A (en) * 2020-12-14 2022-02-16 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Method, device, and electronic apparatus for updating parameters of multi-task model
JP7194252B2 (en) 2020-12-14 2022-12-21 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Multitasking model parameter update method, device and electronic device
CN114359904A (en) * 2022-01-06 2022-04-15 北京百度网讯科技有限公司 Image recognition method, image recognition device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
EP3136658B1 (en) Method, device, terminal device, computer program and recording medium for changing emoticon in chat interface
CN108256555B (en) Image content identification method and device and terminal
CN111461156A (en) Multitask training method and device
EP3089436A1 (en) Methods and devices for calling based on cloud card
EP3151182A1 (en) Method, apparatus and device for changing display background
CN111160448B (en) Training method and device for image classification model
CN105528403B (en) Target data identification method and device
CN105335198B (en) Font adding method and device
CN106155703B (en) Emotional state display method and device
CN110717399A (en) Face recognition method and electronic terminal equipment
CN111343481B (en) Data sharing method and device, electronic equipment and storage medium
CN109358788B (en) Interface display method and device and terminal
CN108629814B (en) Camera adjusting method and device
CN105488074B (en) Photo clustering method and device
CN110019897B (en) Method and device for displaying picture
CN109657097A (en) Video screening technique, device, electronic equipment and storage medium
CN106447747B (en) Image processing method and device
CN112685641A (en) Information processing method and device
CN112416486A (en) Information guiding method, device, terminal and storage medium
CN112784151A (en) Method and related device for determining recommendation information
CN111461151A (en) Multi-group sample construction method and device
CN110659726B (en) Image processing method and device, electronic equipment and storage medium
CN113870195A (en) Target map detection model training and map detection method and device
CN113946228A (en) Statement recommendation method and device, electronic equipment and readable storage medium
CN110765943A (en) Network training and recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220812

Address after: 100084. Room 9, floor 01, cyber building, building 9, building 1, Zhongguancun East Road, Haidian District, Beijing

Applicant after: BEIJING SOGOU TECHNOLOGY DEVELOPMENT Co.,Ltd.

Address before: 310016 room 1501, building 17, No.57, kejiyuan Road, Baiyang street, Hangzhou Economic and Technological Development Zone, Zhejiang Province

Applicant before: SOGOU (HANGZHOU) INTELLIGENT TECHNOLOGY Co.,Ltd.

Applicant before: BEIJING SOGOU TECHNOLOGY DEVELOPMENT Co.,Ltd.

TA01 Transfer of patent application right