CN111461156A

CN111461156A - Multitask training method and device

Info

Publication number: CN111461156A
Application number: CN201910108378.6A
Authority: CN
Inventors: 夏雄尉; 谢泽华; 周泽南; 苏雪峰; 许静芳
Original assignee: Beijing Sogou Technology Development Co Ltd; Sogou Hangzhou Intelligent Technology Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2020-07-28

Abstract

The embodiment of the invention provides a multitask training method and a multitask training device, wherein the method comprises the following steps: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in the model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process. The method solves the problem that the multi-task model is trained to difficultly ensure the convergence consistency among the multiple tasks, and improves the convergence effect of the model to be trained.

Description

Multitask training method and device

Technical Field

The invention relates to the technical field of computer application and intelligent recognition, in particular to a multitask training method and a multitask training device.

Background

In the scenario of multi-task joint training, it is usually required that the training set includes label information corresponding to each task, but in practice, the training set is derived from different data sets, and the label information included in different data sets is often different. Therefore, a problem often encountered is that a data set only contains labels corresponding to one or some of the multiple tasks, in which case data containing different label information is usually trained step by step.

However, it is difficult to ensure convergence consistency between different tasks by performing step-by-step training on different data sets, and the convergence effect of the model in training cannot be ensured.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method and an apparatus for multi-task training, which solve the problem that it is difficult to ensure consistency between multiple tasks in the prior art for training a multi-task model, and improve the convergence effect of a model to be trained.

In a first aspect, the present application provides the following technical solutions through an embodiment:

a method of multitask training comprising:

acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.

Preferably, the step of obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample includes:

according to task labels carried by the training samples, carrying out task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label; and generating a label mask of each task to be trained according to the task mark.

for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set; marking a second mark on the training sample which does not contain the task label of the current task to be trained in the data set; and obtaining a label mask corresponding to the task to be trained according to the first mark and the second mark.

Preferably, the step of obtaining the gradient value of each task to be trained according to the label mask corresponding to the data set and each task to be trained includes: for the same task to be trained, screening out a target sample containing a task label of the task to be trained from the data set according to a label mask corresponding to the current task to be trained; obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample; and obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained.

Preferably, the representation of the label mask is in the form of a binary vector.

Preferably, the step of obtaining the update parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process specifically includes:

accumulating the gradient values of the N tasks to be trained to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.

distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained; weighting and accumulating the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value; and obtaining the updating parameters of the model to be trained according to the target gradient value.

In a second aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment

A multitasking training device comprising:

the data set acquisition module is used for acquiring a data set of a model to be trained, wherein the model to be trained comprises N tasks to be trained, the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; the label mask acquiring module is used for acquiring a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, and the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; the gradient value acquisition module is used for acquiring a gradient value generated by each task to be trained in the model training process according to the data set and the label mask corresponding to each task to be trained; and the updating parameter acquisition module is used for acquiring the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.

Preferably, the tag mask obtaining module is further configured to:

Preferably, the gradient value obtaining module is further configured to:

for the same task to be trained, screening out a target sample containing a task label of the task to be trained from the data set according to a label mask corresponding to the current task to be trained; obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample; and obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained.

Preferably, the update parameter obtaining module is further configured to:

In a third aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment of the present application:

a multitasking training device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors the one or more programs including instructions for:

In a fourth aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment of the present application:

a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

the invention provides a multitask training method and a multitask training device, wherein the method comprises the following steps: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process. By the method, training samples in the data set do not need to be screened and removed. The updating parameters obtained in the training process of the model to be trained integrate the gradient values of a plurality of tasks to be trained, so that the plurality of tasks to be trained are optimized in the same iteration, the information of task labels corresponding to the plurality of tasks to be trained can be integrated, and the information can not be obviously dominated by a certain task to be trained, so that the plurality of tasks to be trained can be optimized well at the same time, the consistency of multi-task training of the model to be trained is ensured, and the convergence effect of the model to be trained is improved.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flowchart of a multitask training method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of one embodiment of step S20 in FIG. 1;

FIG. 3 is a flowchart of another embodiment of step S20 in FIG. 1;

FIG. 4 is a functional diagram of a label mask of the multitask training method according to the first embodiment of the present invention;

FIG. 5 is a functional block diagram of a multitasking training device according to a second embodiment of the present invention;

FIG. 6 is a block diagram of an exemplary multitask training device according to a third embodiment of the present invention;

fig. 7 is a block diagram of an exemplary server according to a fourth embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

For purposes of explanation and understanding, the following detailed description describes embodiments of the invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numbers indicate like or similar elements or elements having like or similar functionality throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

First embodiment

Referring to fig. 1, a multitask training method is provided in this embodiment, fig. 1 shows a specific flow of the method, and the method will be described in detail with reference to the drawings. The multitask training method comprises the following steps:

step S10: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2.

Step S20: and obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained.

Step S30: and obtaining a gradient value generated by each task to be trained in the model training process according to the label mask corresponding to the data set and each task to be trained.

Step S40: and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.

In step S10, the model to be trained may include a plurality of tasks to be trained, in this embodiment, N tasks to be trained, where N is a positive integer greater than or equal to 2, such as a model to be trained for predicting the color and style of clothes, and the model may include a color recognition task and a style recognition task, i.e., includes two tasks to be trained.

The data set can be generated by mixing a plurality of different data sets, the training samples in the data set all carry task labels, the total number of the task labels in the data set is M, and M is a positive integer greater than or equal to 2. The training samples containing the task labels can be used for training the tasks to be trained corresponding to the task labels.

In step S20, a label mask corresponding to each to-be-trained task is obtained according to each to-be-trained task and the task label of each training sample, where the label mask of each to-be-trained task is used to identify the training sample containing the task label of the to-be-trained task.

Specifically, each training sample in the data set may be respectively subjected to task labeling for each task to be trained, whether the training sample contains a task label corresponding to the current task to be trained is referred to when the task labeling is performed, and then a vector formed by or a combination of all task labels in the data set for the current task to be trained may be used as a label mask.

In this embodiment, two specific ways of obtaining the tag mask are provided, as follows:

first, please refer to fig. 2:

step S21 a: and according to task labels carried by the training samples, performing task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label.

In step S21a, task labels that are not included in the training sample (tasks included in the N tasks, but corresponding task labels are not included in the training sample) can be uniformly labeled for uniform recognition. For example:

TABLE 1

As shown in Table 1, task labels that are not included in the training samples are uniformly labeled with "00" tasks. In addition, other recognizable labels may be used to represent that the training samples do not include the corresponding task labels. The color label, the style label and the style label are all marked by different tasks, which are respectively '01', '10' and '11'. The task mark used in the present embodiment is not limited, and for example, the task mark may be replaced with "1", "2", "3", or the like.

After the task labeling is performed on the training samples in the step S21a, the training samples can be conveniently screened. For example, in a data set that has been subjected to task labeling, there are a large number of training samples, and a training sample containing a style label needs to be screened out from the set, so that it can be directly identified whether the training sample is labeled with a task label "10", that is, a training sample 1 and a training sample 2 can be screened out.

By the task marking method in step S21a, the type of the task to be trained does not need to be determined, and only the task marks of different tasks to be trained need to be distinguished from each other, so that the marking is convenient and effective, and step S21a can also be performed before step S10.

Step S21 b: and generating a label mask of each task to be trained according to the task mark.

In step S21b, taking table 1 as an example, if the training sample used in the current training process of the model to be trained is the sample in table 1, the label mask of the color task obtained according to the task label is [01,00,01,01], similarly, the style task label mask is [10,10,00,00], and the style task label mask is [11,00,11,11 ].

Secondly, please refer to fig. 3:

step S22 a: and for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set.

In step S22a, the training samples in the data set need to be labeled to identify which training samples contain the task labels corresponding to the task to be trained currently. And marking a first mark on a training sample containing a task label of the current task to be trained. In this embodiment, the first flag may be marked with "1" to form a binary vector with a second flag, which will be described later, that is, the first flag uses binary 1(True) to represent the task label containing the current task to be trained, and binary 0(False) to represent the task label (second flag) not containing the current task to be trained, so that the binary vector can be computed by using bit operations, and the computation is fast and efficient.

Step S22 b: and marking a second mark on the training sample which does not contain the task label of the current task to be trained in the data set.

In step S22b, the type of the second mark should be distinguished from the first mark, i.e. the second mark may be any character different from the first mark, and may be labeled with "0", for example. It should be noted that, in some embodiments, step S22b may not be performed, i.e., the second mark is not labeled. Aiming at the current task to be trained, when the training sample does not contain the first mark, the training sample can be judged to contain no task label of the current task to be trained.

For example, in a specific embodiment, with a first flag "1" and a second flag "0", the labeling of step S22b is performed on the content in table 1 to obtain:

TABLE 2

As can be seen from Table 2, whether the training sample contains the task label corresponding to the current task to be trained can be determined by the labels of "1" and "0".

Step S22 c: and obtaining a label mask corresponding to the task to be trained according to the first mark and/or the second mark.

In step S22c, taking table 2 as an example, the training samples adopted by the model to be trained in the current training are the samples in table 2. The obtained label mask corresponding to the color task is [1,0,1,1], the label mask of the style task is [1,1,0,0], the label mask of the style is [1,0,1,1], the label masks are in a binary vector form, whether the training samples contain the task labels or not can be effectively represented, and the identification is more convenient.

Role and use of the tag mask: when the training process of the model to be trained is carried out, the training sample only containing the task to be trained can be selected by using the label mask. For the color task, a training sample 1, a training sample 3 and a training sample 4 are selected through label masks, as shown in fig. 4; for the style task, training sample 1 and training sample 2 are selected through the label mask.

In this embodiment, the selection of the training sample is performed through the label mask, so that the training sample can be fully utilized, the discarding of the training sample is avoided, and the re-labeling of the task label after the merging of a plurality of data sets is also avoided; the utilization rate and the utilization efficiency of the training samples are improved.

In step S30, a specific implementation step may include:

1. for the same task to be trained, screening out a target sample containing a task label of the current task to be trained from the data set according to a label mask corresponding to the current task to be trained; the process of filtering can refer to the description of the role and use of the label mask in step S22c, that is, the training sample containing the task label of the task to be trained currently serves as the target sample.

2. Obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample; the model training process of the model to be trained is a continuous iteration process, and each iteration comprises the following steps: firstly, inputting a batch of data to perform forward calculation, wherein the last step of the forward calculation is to obtain loss through calculation of a loss function; after the forward calculation, performing backward propagation, wherein the backward propagation is a process of calculating the gradient of the model parameter; after the gradient is obtained through back propagation calculation, the model parameters are updated according to the calculated gradient, and therefore one iteration is completed. The intermediate calculation of the model to be trained and the calculation step of the parameter update are known technical means, and are not described in detail in this embodiment. The input data used in this embodiment is a target sample screened out from the data set.

3. And obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained. Specifically, in the same iteration process, the gradient value corresponding to each task to be trained needs to be reserved so as to update the parameters of the model to be trained after the gradient values of the tasks to be trained are fused.

In step S40, gradient values of N tasks to be trained need to be fused. Specifically, the following two fusion methods are provided in this embodiment.

1. The cumulative fusion of the gradients was performed directly.

Step 1: accumulating the gradient values of the N tasks to be trained to obtain a target gradient value; wherein, the accumulation is directly added, and the gradient value obtained after the addition is the target gradient value.

Step 2: and obtaining the updating parameters of the model to be trained according to the target gradient value. The updated parameters are the amount of the corresponding parameters in the model to be trained that need to be adjusted, and the specific calculation process is well known to those skilled in the relevant field and is not described in detail. And updating the corresponding parameters in the model to be trained according to the updated parameters to complete iteration. It should be noted that, when a plurality of model parameters need to be updated, different update parameters can be obtained through the fused target gradient values, and iterative update is performed.

2. And performing weighted fusion on the gradient values of each different task to be trained.

Step 1: and distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained.

Step 2: and performing weighted accumulation on the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value.

And step 3: and obtaining the updating parameters of the model to be trained according to the target gradient value.

For example, when the number N of tasks to be trained is 3, and the corresponding samples used are shown in table 2. If the color task is a more main task for the model to be trained, weights are assigned in a tilting manner, for example, the weight assigned to the gradient value corresponding to the color task is 0.4, the weight assigned to the gradient value corresponding to the style task is 0.3, and the weight assigned to the gradient value corresponding to the style task is 0.3. Namely, the three gradient values are weighted and summed respectively, namely, fused. The updating process is described with reference to the first fusion method.

Through the combination of the label mask and the multi-task gradient fusion, a plurality of tasks to be trained are optimized in the same iteration, information of corresponding task labels can be synthesized, and the information can not be obviously dominated by a certain task to be trained, so that the plurality of tasks to be trained can be optimized better at the same time, and faster and better convergence during multi-task training can be ensured.

It should be noted that, in this embodiment, if the data volumes including different task labels in a batch data set are different greatly, the variance of the training sample gradient is dominated by the task to be trained with a small data volume, which brings instability of training, so that it should be ensured that the data volume of each task is relatively balanced as much as possible during the training process. In the example given above, 3 samples are contained in a batch for the color task, and 2 samples are contained in a batch for the style classification task, with a difference of 1, which is not obvious. However, in actual training, the size of a batch is generally much larger than 4, possibly hundreds of sizes, and if the data volumes of the two tasks are significantly different, the effect is amplified. Therefore, in a preferred embodiment of this embodiment, it is required to ensure that the difference between the numbers of samples corresponding to a plurality of tasks to be trained (i.e. the number of samples corresponding to each task to be trained and including the task label) is not more than 40%, more preferably not more than 30%, and in some cases, may be 10% -20%.

For example: in the same model K to be trained, there are a task A and a task B. 100 training samples corresponding to the task A labels in a data set X; 110 training samples corresponding to the task B labels in the data set X; the difference between the training samples of the task A and the task B is 10%, the difference is not obvious, and the model K to be trained can be stably converged by using the data set X to train.

In summary, the present invention provides a multitask training method and device, wherein the method includes the steps of: acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2; obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained; obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained; and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process. By the method, training samples in the data set do not need to be screened and removed. The updating parameters obtained in the training process of the model to be trained integrate the gradient values of a plurality of tasks to be trained, so that the plurality of tasks to be trained are optimized in the same iteration, the information of task labels corresponding to the plurality of tasks to be trained can be integrated, and the information can not be obviously dominated by a certain task to be trained, so that the plurality of tasks to be trained can be optimized well at the same time, the consistency of multi-task training of the model to be trained is ensured, and the convergence effect of the model to be trained is improved.

Second embodiment

Referring to fig. 5, based on the same inventive concept, the present invention further provides a multitask training device 600, wherein the device 600 comprises:

the data set obtaining module 601 is configured to obtain a data set of a model to be trained, where the model to be trained includes N tasks to be trained, where the data set includes M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2.

A label mask obtaining module 602, configured to obtain, according to each to-be-trained task and a task label of each training sample, a label mask corresponding to each to-be-trained task, where the label mask of each to-be-trained task is used to identify the training sample containing the task label of the to-be-trained task.

A gradient value obtaining module 603, configured to obtain, according to the label mask corresponding to the data set and each task to be trained, a gradient value generated by each task to be trained in a model training process.

An update parameter obtaining module 604, configured to obtain an update parameter of the model to be trained according to gradient values generated in a model training process of the N tasks to be trained.

As an optional implementation manner, the tag mask obtaining module 602 is further configured to:

As an optional implementation manner, the gradient value obtaining module 603 is further configured to:

As an alternative embodiment, the representation of the label mask is in the form of a binary vector.

As an optional implementation manner, the update parameter obtaining module 604 is further configured to:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Third embodiment

FIG. 6 is a block diagram illustrating a multitasking training device 800 according to an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power component 806 provides power to the various components of device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user in some embodiments, the screen may include a liquid crystal display (L CD) and a Touch Panel (TP). if the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.

The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), programmable logic devices (P L D), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the methods of the above-described method embodiments.

Fourth embodiment

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A computer readable storage medium, in particular a non-transitory computer readable storage medium, having instructions which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of multitasking training, the method comprising:

Fig. 7 is a schematic structural diagram of a server in an embodiment of the present invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows server, Mac OS XTM, UnixTM, &lttttranslation = L "&tttl &/t &gttinuxtm, FreeBSDTM, and so forth.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of multitask training comprising:

acquiring a data set of a model to be trained containing N tasks to be trained, wherein the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2;

obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, wherein the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained;

obtaining a gradient value generated by each task to be trained in a model training process according to the label mask corresponding to the data set and each task to be trained;

and obtaining the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.

2. The method according to claim 1, wherein the step of obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample comprises:

according to task labels carried by the training samples, carrying out task labeling on each training sample, wherein each class of task label of the training samples corresponds to a unique task label;

and generating a label mask of each task to be trained according to the task mark.

3. The method according to claim 1, wherein the step of obtaining a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample comprises:

for the same task to be trained, marking a first mark on a training sample containing a task label of the current task to be trained in the data set; and

marking a second mark on a training sample which does not contain the task label of the current task to be trained in the data set;

and obtaining a label mask corresponding to the task to be trained according to the first mark and the second mark.

4. The method according to claim 2 or 3, wherein the step of obtaining the gradient value of each task to be trained according to the label mask corresponding to each task to be trained in the data set comprises:

for the same task to be trained, screening out a target sample containing a task label of the task to be trained from the data set according to a label mask corresponding to the current task to be trained;

obtaining the loss corresponding to the current task to be trained generated in the model training process according to the target sample;

and obtaining the gradient value of the current task to be trained generated in the model training process according to the loss corresponding to the current task to be trained.

5. The method of claim 2, wherein the representation of the label mask is in the form of a binary vector.

6. The method according to claim 1, wherein the step of obtaining the update parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process specifically comprises:

accumulating the gradient values of the N tasks to be trained to obtain a target gradient value;

and obtaining the updating parameters of the model to be trained according to the target gradient value.

7. The method according to claim 1, wherein the step of obtaining the update parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process specifically comprises:

distributing preset weight to the gradient value of each task to be trained in the N tasks to be trained;

weighting and accumulating the gradient value of each task to be trained after the weight is distributed to obtain a target gradient value;

8. A multitasking training device, comprising:

the data set acquisition module is used for acquiring a data set of a model to be trained, wherein the model to be trained comprises N tasks to be trained, the data set comprises M training samples carrying task labels, and N, M is a positive integer greater than or equal to 2;

the label mask acquiring module is used for acquiring a label mask corresponding to each task to be trained according to each task to be trained and the task label of each training sample, and the label mask of each task to be trained is used for identifying the training sample containing the task label of the task to be trained;

the gradient value acquisition module is used for acquiring a gradient value generated by each task to be trained in the model training process according to the data set and the label mask corresponding to each task to be trained;

and the updating parameter acquisition module is used for acquiring the updating parameters of the model to be trained according to the gradient values generated by the N tasks to be trained in the model training process.

9. A multitasking training device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:

10. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the steps of: