WO2024000566A9 - Procédé et appareil d'apprentissage auxiliaire avec tâche conjointe et planification de données - Google Patents

Procédé et appareil d'apprentissage auxiliaire avec tâche conjointe et planification de données Download PDF

Info

Publication number
WO2024000566A9
WO2024000566A9 PCT/CN2022/103308 CN2022103308W WO2024000566A9 WO 2024000566 A9 WO2024000566 A9 WO 2024000566A9 CN 2022103308 W CN2022103308 W CN 2022103308W WO 2024000566 A9 WO2024000566 A9 WO 2024000566A9
Authority
WO
WIPO (PCT)
Prior art keywords
task
auxiliary
learning model
data samples
computer
Prior art date
Application number
PCT/CN2022/103308
Other languages
English (en)
Other versions
WO2024000566A1 (fr
Inventor
Xin Wang
Wenwu Zhu
Yijian QIN
Ze CHENG
Original Assignee
Robert Bosch Gmbh
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh, Tsinghua University filed Critical Robert Bosch Gmbh
Priority to PCT/CN2022/103308 priority Critical patent/WO2024000566A1/fr
Publication of WO2024000566A1 publication Critical patent/WO2024000566A1/fr
Publication of WO2024000566A9 publication Critical patent/WO2024000566A9/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method and an apparatus provided for auxiliary learning with joint task and data scheduling.
  • auxiliary learning Utilizing auxiliary tasks to help improve learning of the target task is called auxiliary learning.
  • This learning paradigm aims to model performance on the target task via utilizing the useful information carried in the related tasks.
  • Auxiliary learning has been widely adopted in different areas including but not limited to image classification, vision-language navigation and reinforce learning, etc., wherein the image classification is popular in variety of scenarios, such as object identification (traffic signs/road surfaces/pedestrians/vehicles) in automatic drive, or defective product identification in smart manufacturing and the like.
  • a typical and most widely adopted way in auxiliary learning is to calculate the average or total loss of all training samples for each task, linearly combine them into a single loss, and then use the aggregated loss to optimize the task learning model.
  • the linear weights for task combinations are always tuned with methods such as grid search or Hyper-parameter Optimization (HPO) tools.
  • HPO Hyper-parameter Optimization
  • the gradient similarity between the target task and the auxiliary tasks may be utilized to automatically assign weights to different auxiliary tasks in an adaptive way. Furthermore, by using a small set of the target task as guidance, it is proposed to learn a nonlinear combination of the given tasks to better utilize the auxiliary information.
  • a method and an apparatus for learning a joint task and data schedule for auxiliary learning is disclosed, which allow the importance of different data samples in each auxiliary task to the target task to be captured.
  • JTDS joint and task and data scheduling
  • the overall learning process is formulated as a bi-level optimization problem.
  • the task learning model parameters would be optimized under the schedule generated by the task-data scheduler, and in the upper level optimization, the scheduler parameters would be optimized using the gradient from a small developing dataset and the gradient could be implicit gradient for overcoming intractable computation issue.
  • the proposed model could be adapted to different scenarios, such as supervised learning, semi-supervised learning or a more severe setting where the training labels for the target task are corrupted.
  • the JTDS model possesses the capability in exploiting the auxiliary information and is robust to noisy samples in the data.
  • a computer-implemented method for training a task learning model for a target task utilizing one or more auxiliary tasks comprises inputting, data samples from a training set to the task learning model, wherein each data sample is at least equipped with labels on each of the auxiliary tasks; encoding, the data samples, to extract features of each of the data samples; calculating, losses for the target task and the one or more auxiliary tasks respectively for each data sample, based at least in part on outputs of the task learning model and associated labels; evaluating, a contribution of each of the data samples within each of the one or more auxiliary tasks to the target task; and calculating, a training loss for the task learning model, based at least in part on a sum of the respective losses for the target task and the one or more auxiliary tasks weighted by the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task.
  • the target task is an image classification problem.
  • the data samples input from the training set are images; the images are encoded to obtain a feature vector for each of the images; the loss for the image classification problem of an image is calculated based on a probability output by the task learning model for the image classification and an associated label of the image for the image classification problem; the loss for an auxiliary task of the image is calculated based on a probability output by the task learning model for the auxiliary task and an associated label of the image for the auxiliary task; the contribution of each of the images within each of the one or more auxiliary tasks to the image classification problem is evaluated; and the training loss for the task learning model is calculated based on a sum of the respective losses for the image classification problem and the one or more auxiliary tasks weighted by the contribution of each of the images within each of the one or more auxiliary tasks to the image classification.
  • each of the data samples is evaluated based on an influence of each of the data samples and its label on a corresponding auxiliary task and a relation between the corresponding auxiliary task and the target task.
  • the informativeness of the features of the data sample for the auxiliary task is evaluated by an inner product between the features of the data sample and a feature prototype of the corresponding auxiliary task that has the same dimension with the features of the data sample.
  • the method further comprises filtering out the data sample and its label if the label is evaluated as not correct.
  • the method further comprises updating, in an iteration, parameters of the task learning model to reduce the calculated training loss at a first learning rate, with parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task fixed; and updating, in the iteration, the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks at a second learning rate.
  • the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task are updated with the parameters of the task learning model which have been updated in the iteration fixed.
  • the method further comprises updating, in a next iteration, the parameters of the task learning model to reduce the calculated training loss at the first learning rate, with the parameters having fixed values which updated in last iteration that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task.
  • the parameters of the task learning model are updated for certain times in one iteration instead of waiting the parameters of the task learning model to converge.
  • the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task are updated on a validation set or a subset of the validation set, and wherein the validation set only contains data samples and labels on the target task.
  • the parameters of the task learning model are updated using the weighted gradient sum of the data samples within the target task and each of the one or more auxiliary tasks; and wherein the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task are updated using implicit gradient.
  • the method further comprises after each certain times of iteration of updating, the task learning model is tested on a validation set, and the parameters of the task learning model with the lowest error is selected as final learned parameters of the task learning model.
  • a computer-implemented method for image classification is disclosed, wherein the classification of an image is determined by a task learning model, which is trained utilizing one or more auxiliary tasks with one of the operations of the method described above, comprising: inputting, an image to the task learning model; and outputting, by the task learning model, probabilities of the image belonging to each of the classification.
  • the computer system comprises one or more processors; and one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method described above.
  • One or more computer readable storage media storing computer-executable instructions is disclosed. When executed, one or more processors are caused to perform the operations of the method described above.
  • a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method described above is disclosed.
  • Fig. 1 illustrates an exemplary model design for a joint and task and data scheduling (JTDS) model, in accordance with various aspects of the present disclosure.
  • JTDS joint and task and data scheduling
  • Fig. 2 illustrates an exemplary overall framework for the JTDS model, in accordance with various aspects of the present disclosure.
  • Fig. 3 illustrates an exemplary flow chart of training of a task learning model for a target task, in accordance with various aspects of the present disclosure.
  • Fig. 4 illustrates another exemplary flow chart of training of a task learning model for a target task, in accordance with various aspects of the present disclosure.
  • Fig. 5 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
  • auxiliary learning Utilizing auxiliary tasks to help improve learning of the target task is called auxiliary learning.
  • Existing auxiliary learning approaches only consider the relationships between the target task and the auxiliary tasks, ignoring the fact that data samples within an auxiliary task could contribute differently to the target task, which results in inefficient auxiliary information usage and non-robustness to data noise.
  • the target task is “correctly classifying a bird”
  • the training set comprised of m data samples that could be used for n+1 different tasks.
  • the training set is the i th data sample and is its label for the k th task
  • the data sample may be images, and if a task is “detecting the head color of a bird” , then the label may be “red” , “yellow” , etc.
  • D v is the validation set that is used to evaluate the task learning model.
  • the general auxiliary learning process can be formulated by Eq. (1) , where the task learning model ⁇ f k ⁇ is parametrized by ⁇ and f k outputs the predictions for task
  • the training loss function is L t
  • the loss optimization process P () generates a set of candidates ⁇ p ⁇ .
  • the ⁇ p minimizes the validation error E v on D v is chosen as the parameter for the target task.
  • auxiliary tasks can be transferred to the target task.
  • the goal of auxiliary learning is to design an appropriate L t to combine the information of the given tasks, so as to obtain the best task learning model (parameterized by ⁇ ) for the target task.
  • auxiliary tasks One simple but widely adopted method to utilize the auxiliary tasks is to combine the average training loss for each task in a linear way as formulated in Eq. (2) , where l k ( ⁇ , ⁇ ) is the loss function for task and then tune the weight w k for each task with grid search or other Hyper-parameter Optimization (HPO) methods.
  • HPO Hyper-parameter Optimization
  • Recent approaches propose to assign weights for different tasks based on the gradient similarities between the target task and the auxiliary tasks. Since the gradient similarity is usually changing during the training process, this line of method assigns an adaptive weight to each task. However, these methods are limited in exploiting useful information in auxiliaries due to the ignorance of data sample level information.
  • JTDS Joint Task and Data Scheduling
  • the task learning model is configured with a schedule about the importance of each data sample within each task to the target task.
  • the auxiliary problem is formulated.
  • our training optimization objective L t is expressed in Eq. (3) , where w ik gives the importance of sample i within task
  • the key challenge is how to decide the value of the totally m (n+1) parameters ⁇ w ik ⁇ .
  • w ik gives the importance of sample i within task
  • the parameter-efficient task-data scheduler would be described in detail in below. Given all the training data samples and tasks, the scheduler outputs ⁇ w ik ⁇ , the schedule that captures the importance of each data sample within each task to the target task.
  • Fig. 1 illustrates an exemplary model design for the JTDS model, in accordance with various aspects of the present disclosure.
  • the JTDS model is illustrated as comprising two main components, i.e., the Parameter-efficient Task-data Scheduler 101 and the task learning model (TLM) 102 for general auxiliary learning. Further, the interactions between the scheduler 101 and the TLM 102 are also shown in Fig. 1. It is expected that the scheduler 101 to provide an appropriate joint schedule for training the TLM 102, so that a task learning model that has excellent performance on the target task may be obtained.
  • TLM task learning model
  • the most widely used hard parameter sharing architecture (one common backbone encoder and several task-specific heads for each task) is used as an example but not a limitation, where different tasks are best viewed in shape, other examples such as soft parameter sharing architecture is possible.
  • the outputs of the scheduler ⁇ w ik ⁇ are transferred to the task learning model, to calculate the training loss L t as in Eq. (3) .
  • the first hypothesis focuses on the relationships between the target task and each data sample within each auxiliary task.
  • the first hypothesis indicates the task and the data sample make multiplicative contribution to the target task. Either has low relevance to or is less informative for will make carry little importance. This hypothesis enables that the same data sample within different auxiliary tasks can express different levels of importance in terms of the relevance to the target task. Moreover, different data samples in the same auxiliary task can also demonstrate different levels of importance based on how informative the data sample is to
  • the second hypothesis focuses on evaluating how informative the training pair is to
  • the correctness of a label can be determined by its corresponding loss value.
  • the first hypothesis decomposes w ik into two parts, i.e., (1) the relevance between and and (2) the importance of to
  • the second hypothesis further decomposes part (2) into the importance of the sample feature and the rationality of the label.
  • the parameter-efficient task-data scheduler would be further discussed based on the above two hypotheses.
  • a learnable task relation vector ⁇ [ ⁇ 1 , ..., ⁇ n , ⁇ G ] is introduced in the proposed Parameter-efficient Task-data Scheduler 101 to describe the relationships between each task and where each ⁇ k represents the importance of to Additionally, the importance of to relies on the sample feature and sample loss.
  • the data sample will be sent to the task learning model f k to obtain its feature and its loss where f k, enc ( ⁇ ) is the encoder of the task learning model for which is shown in Fig. 1.
  • the hard parameter sharing architecture one common backbone encoder and several task-specific heads for each task
  • other sharing architectures for multiple tasks may be used, just to name a few, soft parameter sharing architecture, hierarchical sharing architecture, sparse sharing architecture, etc.
  • Each P k in is a vector that has the same dimension with c ik , representing the feature prototype of task
  • the scheduler 101 assigns each task with an adaptive pass filter to filter out the noisy pairs. These filters are flexible to different scenarios by introducing learnable parameters ⁇ a k ⁇ k ⁇ U and ⁇ b k ⁇ k ⁇ U .
  • the scheduler 101 generates the joint schedule w ik as shown in Eq. (4) , where ⁇ () is the activation function to ensure non-negative values.
  • the second term utilizes the inner product between the task prototype and the sample feature to evaluate the informativeness of
  • the third term is a linear classifier to evaluate whether is a noisy pair for and is the loss after normalization. Considering that w ik relies on the loss, feature and the introduced learnable parameters, which are always updated during the training process, w ik is naturally adaptive.
  • the disclosed scheduler 101 comprehensively considers task, feature and label information, while only introducing an additional learnable parameter set where the number of parameters is O (dn) and d is the dimension of the features if it is assumed that the features of all tasks have the same dimension d .
  • the feature dimension d typically value ⁇ 32, 64, 128, 256 ⁇
  • the parameters to be optimized are therefore largely reduced.
  • Fig. 2 illustrates an exemplary overall framework for the JTDS model, in accordance with various aspects of the present disclosure.
  • a bi-level optimization process for the TLM 102 and the scheduler 101 is illustrated in Fig. 2.
  • the TLM 102 updates its parameters ⁇ with the scheduled gradient for certain times on D t .
  • the training loss of the TLM 102 is calculated to obtain a gradient, which is used to update ⁇ in 202, the parameters of the scheduler 101.
  • the task learning model parameters ⁇ are updated with the goal of minimizing the training loss in Eq. (3) .
  • the loss on D v can be utilized to update the parameters ⁇ .
  • a small developing dataset is introduced, which only contains information about task i.e., the target task, D dev is a small subset sampled from the validation set D v .
  • the problem is formulated as a bi-level optimization problem shown in Eq. (5) .
  • L t ( ⁇ , ⁇ ) is the scheduled training loss in Eq. (3) .
  • the parameters of TLM 102 is updated, with ⁇ , the parameters of the scheduler 101 fixed.
  • is updated by using the scheduled gradient, as an example, ⁇ is updated with weighted gradient sum of the data samples within different tasks, as shown in Eq. (6) .
  • the gradient of L dev ( ⁇ * ( ⁇ ) ) with respect to ⁇ needs to be derived.
  • L dev ( ⁇ * ( ⁇ ) ) directly relies on ⁇ instead of ⁇
  • an implicit gradient may be obtained by utilizing implicit differentiation.
  • the gradient of L dev ( ⁇ * ( ⁇ ) ) with respect to ⁇ may be obtained using the chain rule as shown in Eq. (7) .
  • Algorithm 1 Upon obtaining the gradient of ⁇ and ⁇ , Algorithm 1 presents the complete algorithm that simultaneously learns the TLM 102 and the task-data scheduler 101. Algorithm 1 is merely shown as an example, and other embodiments are possible.
  • a training set D t a validation set D v , an iteration threshold T, a first learning rate ⁇ 1 and a second learning rate ⁇ 2 may be input to the JTDS model.
  • a small developing dataset D dev may also be input to the JTDS model for training the parameters of the scheduler ⁇ instead of D v .
  • a truncated threshold K may also be input to the JTDS model for calculating an implicit gradient.
  • a count threshold M may also be input to the JTDS model for lower level optimization.
  • a training set D t a validation set D v , an iteration threshold T, a first learning rate ⁇ 1 and a second learning rate ⁇ 2 , a small developing dataset D dev , a truncated threshold K and a count threshold M are input to JTDS.
  • the parameters of TLM 102 is updated, with ⁇ , the parameters of the scheduler 101 fixed.
  • the scheduler Scheduler (l ik , c ik ; ⁇ ) gives the importance w ik to each sample within each task as in Eq. (4) .
  • the parameters ⁇ are updated with gradient in Eq. (6) at the first learning rate ⁇ 1 . Instead of waiting ⁇ to converge, a more efficient M-step optimization is shown in Algorithm 1, i.e., after ⁇ has been updated for M times, it is switched to the upper optimization to optimize ⁇ .
  • the parameters of the scheduler 101 is updated.
  • may be updated with the implicit gradient and may be K-truncated for reducing the amount of computation.
  • the loss on D v can be utilized to update the parameters ⁇ .
  • D dev a small developing dataset D dev may be used instead of D v . It is shown in Algorithm 1, L dev and L t are first calculated, then ⁇ is updated using the implicit gradient at the second learning rate ⁇ 2 according to Eq. (9) .
  • the update of ⁇ is simplified to a one-step optimization.
  • the task learning model are tested on the validation set D v , and the ⁇ with the lowest error may be recorded as the final learned model for the target task as in Eq. (1) .
  • Fig. 3 illustrates an exemplary flow chart of training of a task learning model for a target task, in accordance with various aspects of the present disclosure.
  • some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments.
  • the method may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below.
  • the process may be carried out by scheduler 101 and/or TLM 102 described above with reference to Fig. 1 and Fig. 2.
  • auxiliary tasks the training of a task learning model for a target task utilizing one or more auxiliary tasks is discussed in the context of the task, including but not limited to, image classification, audio classification, recommendation, vision-language navigation and reinforce learning, etc.
  • image classification if the target task is a bird classification problem, then the attribute classification may be utilized as auxiliary tasks to help the bird species classification.
  • a rotation degree prediction problem could be the auxiliary task of self-supervised learning.
  • the process is for training a task learning model for a target task utilizing one or more auxiliary tasks, and may begin at block 301, with inputting, data samples from a training set to the task learning model, wherein each data sample is at least equipped with labels on each of the auxiliary tasks.
  • the data samples may be images, audios, texts, videos, vision data, etc. for different types of tasks.
  • the operation of block 301 may be performed by TLM 102 described above with reference to Fig. 1 and Fig. 2.
  • the fully supervised setting is configured for the target task, wherein all training samples are equipped with labels on the target task.
  • the semi-supervised setting is configured for the target task, wherein only a part of the training samples is equipped with labels on the target task. In either scenario, the training labels may be corrupted.
  • the process proceeds to block 302, with encoding, the data samples, to extract features of each of the data samples.
  • the operation of block 302 may be performed by TLM 102 described above with reference to Fig. 1 and Fig. 2.
  • a hard parameter sharing architecture may be used, with one common backbone encoder and several task-specific heads for each task.
  • a soft parameter sharing architecture may be used, with several task-specific encoder and heads respectively.
  • other sharing architectures for multiple tasks may be used, such as hierarchical sharing architecture, sparse sharing architecture, etc.
  • features of each of the data samples may be represented as wherein is the data sample, and f k, enc () is the encoder of the task learning model f k .
  • the process proceeds to block 303-1, with calculating, losses for the target task and the one or more auxiliary tasks respectively for each data sample, based at least in part on outputs of the task learning model and associated labels.
  • the operation of block 303-1 may be performed by TLM 102 described above with reference to Fig. 1 and Fig. 2.
  • the losses for the target task and the one or more auxiliary tasks are calculated as wherein represents the output of the task learning model f k for data sample and represents the associated label for data sample and ⁇ represents the learnable parameters of the task learning model.
  • block 303-2 with evaluating, a contribution of each of the data samples within each of the one or more auxiliary tasks to the target task.
  • the operation of block 303-2 may be performed by scheduler 101 described above with reference to Fig. 1 and Fig. 2.
  • the contribution of each of the data samples is evaluated based on an influence of each of the data samples and its label on a corresponding auxiliary task and a relation between the corresponding auxiliary task and the target task, which indicates the task and the data sample make multiplicative contribution to the target task.
  • the influence of each of the data samples and its label on the corresponding auxiliary task is evaluated based on whether the features of the data sample are informative for the corresponding auxiliary task and its label is correct.
  • the informativeness of the features of the data sample for the auxiliary task is evaluated by an inner product between the features of the data sample and a feature prototype of the corresponding auxiliary task that has the same dimension with the features of the data sample.
  • the correctness of the label is evaluated by the calculated loss for the corresponding auxiliary task.
  • w ik represents the contribution each of the data samples within each of the one or more auxiliary tasks to the target task, and may be calculated according to Eq.(4) above, wherein ⁇ k refers to the relation between the corresponding auxiliary task and the target task, is used for evaluating the informativeness of the features of the data sample for the auxiliary task, and represents a filter which evaluates the correctness of the label by the calculated loss for the corresponding auxiliary task, and the noisy pairs may be filtered out with parameters ⁇ a k ⁇ k ⁇ U and ⁇ b k ⁇ k ⁇ U .
  • the process proceeds to block 304, with calculating, a training loss for the task learning model, based at least in part on a sum of the respective losses for the target task and the one or more auxiliary tasks weighted by the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task.
  • the operation of block 304 may be performed by TLM 102 described above with reference to Fig. 1 and Fig. 2.
  • the training loss may be calculated according to Eq. (3) above with w ik , by which the task learning model is provided with a schedule about the importance of each data sample within each task to the target task.
  • Fig. 4 illustrates another exemplary flow chart of training of a task learning model for a target task, in accordance with various aspects of the present disclosure.
  • some or all illustrated features may be omitted in a particular implementation within the scope of the present disclosure, and some illustrated features may not be required for implementation of all embodiments.
  • the process may be carried out by any suitable apparatus or means for carrying out the functions or algorithm described below.
  • the method may be carried out by scheduler 101 and/or TLM 102 described above with reference to Fig. 1 and Fig. 2.
  • the process is for training a task learning model for a target task utilizing one or more auxiliary tasks, and may begin at block 401 and proceeds to block 402, block 403, block 404, which may perform the same operation as described with blocks 301-304 of Fig. 3.
  • the operations of blocks 401-404 may be performed by scheduler 101 and/or TLM 102 described above with reference to Fig. 1 and Fig. 2.
  • the process proceeds to block 405, with updating, in an iteration, parameters of the task learning model to reduce the calculated training loss at a first learning rate, with parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task fixed.
  • the updated parameters of the task learning model would be transferred to block 403-1 for optimization in next iteration, and to block 406 for upper layer optimization in the current iteration.
  • the operations of block 405 may be performed by TLM 102 described above with reference to Fig. 1 and Fig. 2, or any suitable apparatus or means for carrying out the functions or algorithm.
  • the parameters of the task learning model may be updated in an iteration according to Eq. (6) and wherein ⁇ represents the parameters of the task learning model, and ⁇ 1 is the first learning rate, with parameter set that represents the contribution w ik fixed.
  • the parameters of the task learning model are updated for certain times in one iteration instead of waiting the parameters of the task learning model to converge.
  • the parameters of the task learning model are updated using the weighted gradient sum of the data samples within the target task and each of the one or more auxiliary tasks.
  • the process proceeds to block 406, with updating, in the iteration, the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks at a second learning rate.
  • the updated parameters that represent the contribution would be transferred to block 403-2 for optimization in next iteration.
  • the operations of block 406 may be performed by scheduler 101 described above with reference to Fig. 1 and Fig. 2, or any suitable apparatus or means for carrying out the functions or algorithm.
  • the parameters that represent the contribution w ik is updated in the iteration at the second learning rate ⁇ 2 .
  • the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task are updated with the parameters of the task learning model which have been updated in the iteration fixed.
  • the updated parameters of the task learning model ⁇ is transferred to block 406 for upper layer optimization in the current iteration.
  • the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task are updated on a validation set or a subset of the validation set, and wherein the validation set only contains data samples and labels on the target task.
  • the parameters that represent the contribution of each of the data samples within each of the one or more auxiliary tasks to the target task are updated using implicit gradient.
  • the implicit gradient may be calculated according to Eq. (9) , the optimization is turned to a one-step optimization by using the implicit gradient.
  • the process proceeds to block 407 to determine whether the iteration count is satisfied an iteration threshold T, and proceeds to block 408 after each T times of iteration of updating the parameters of the task learning model, wherein the task learning model may be tested on a validation set, and the parameters of the task learning model with the lowest error is selected as final learned parameters of the task learning model.
  • the operations of block 407 may be performed by TLM 102 described above with reference to Fig. 1 and Fig. 2, or any suitable apparatus or means for carrying out the functions or algorithm.
  • Fig. 5 illustrates an exemplary computing system, in accordance with various aspects of the present disclosure.
  • the computing system may comprise at least one processor 510.
  • the computing system may further comprise at least one storage device 520. It should be appreciated that the storage device 520 may store computer-executable instructions that, when executed, cause the processor 510 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-4.
  • the embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium.
  • the non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform a method for training a task learning model for a target task utilizing one or more auxiliary tasks, comprising: inputting, data samples from a training set to the task learning model, wherein each data sample is at least equipped with labels on each of the auxiliary tasks; encoding, the data samples, to extract features of each of the data samples; calculating, losses for the target task and the one or more auxiliary tasks respectively for each data sample, based at least in part on outputs of the task learning model and associated labels; evaluating, a contribution of each of the data samples within each of the one or more auxiliary tasks to the target task; and calculating, a training loss for the task learning model, based at least in part on a sum of the respective losses for the target task and the one or more auxiliary tasks weighted by the contribution of each
  • the non-transitory computer-readable medium may further comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-4.
  • the embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-4.
  • modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé mis en œuvre par ordinateur pour entraîner un modèle d'apprentissage de tâche pour une tâche cible à l'aide d'une ou de plusieurs tâches auxiliaires. Le procédé consiste à entrer des échantillons de données d'un ensemble d'entraînement dans le modèle d'apprentissage de tâche, chaque échantillon de données étant au moins équipé d'étiquettes sur chacune des tâches auxiliaires ; à coder les échantillons de données pour extraire des caractéristiques de chacun des échantillons de données ; à calculer des pertes pour la tâche cible et la ou les tâches auxiliaires respectivement pour chaque échantillon de données, sur la base, au moins en partie, des sorties du modèle d'apprentissage de tâche et des étiquettes associées ; à évaluer une contribution de chacun des échantillons de données à l'intérieur de chacune de la ou des tâches auxiliaires à la tâche cible ; et à calculer une perte d'entraînement pour le modèle d'apprentissage de tâche, sur la base, au moins en partie, d'une somme des pertes respectives pour la tâche cible et la ou les tâches auxiliaires pondérée par la contribution de chacun des échantillons de données à l'intérieur de chacune de la ou des tâches auxiliaires à la tâche cible.
PCT/CN2022/103308 2022-07-01 2022-07-01 Procédé et appareil d'apprentissage auxiliaire avec tâche conjointe et planification de données WO2024000566A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103308 WO2024000566A1 (fr) 2022-07-01 2022-07-01 Procédé et appareil d'apprentissage auxiliaire avec tâche conjointe et planification de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103308 WO2024000566A1 (fr) 2022-07-01 2022-07-01 Procédé et appareil d'apprentissage auxiliaire avec tâche conjointe et planification de données

Publications (2)

Publication Number Publication Date
WO2024000566A1 WO2024000566A1 (fr) 2024-01-04
WO2024000566A9 true WO2024000566A9 (fr) 2024-03-14

Family

ID=89383906

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103308 WO2024000566A1 (fr) 2022-07-01 2022-07-01 Procédé et appareil d'apprentissage auxiliaire avec tâche conjointe et planification de données

Country Status (1)

Country Link
WO (1) WO2024000566A1 (fr)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954881B2 (en) * 2018-08-28 2024-04-09 Apple Inc. Semi-supervised learning using clustering as an additional constraint
US10691980B1 (en) * 2019-04-18 2020-06-23 Siemens Healthcare Gmbh Multi-task learning for chest X-ray abnormality classification
JP7304235B2 (ja) * 2019-08-16 2023-07-06 セコム株式会社 学習済みモデル、学習装置、学習方法、及び学習プログラム
CN111382709A (zh) * 2020-03-11 2020-07-07 国网浙江省电力有限公司电力科学研究院 一种基于无人机巡检的绝缘子图像检测方法
CN114299340A (zh) * 2021-12-30 2022-04-08 携程旅游信息技术(上海)有限公司 模型训练方法、图像分类方法、系统、设备及介质

Also Published As

Publication number Publication date
WO2024000566A1 (fr) 2024-01-04

Similar Documents

Publication Publication Date Title
Dou et al. Domain generalization via model-agnostic learning of semantic features
CN109934293B (zh) 图像识别方法、装置、介质及混淆感知卷积神经网络
Gu et al. Stack-captioning: Coarse-to-fine learning for image captioning
US10719780B2 (en) Efficient machine learning method
CN108027899B (zh) 用于提高经训练的机器学习模型的性能的方法
US10275719B2 (en) Hyper-parameter selection for deep convolutional networks
Sukhbaatar et al. Learning from noisy labels with deep neural networks
EP3767536A1 (fr) Code latent pour adaptation de domaines non supervisée
US20200285938A1 (en) Neural network learning method and device for recognizing class
KR102570706B1 (ko) 분류를 위한 강제된 희소성
US11610097B2 (en) Apparatus and method for generating sampling model for uncertainty prediction, and apparatus for predicting uncertainty
US11551076B2 (en) Event-driven temporal convolution for asynchronous pulse-modulated sampled signals
CN113469186B (zh) 一种基于少量点标注的跨域迁移图像分割方法
Gutmann et al. TildeCRF: Conditional random fields for logical sequences
CN110188827B (zh) 一种基于卷积神经网络和递归自动编码器模型的场景识别方法
EP4232957A1 (fr) Élagage de réseau neuronal personnalisé
Nguyen et al. Robust learning under label noise with iterative noise-filtering
Jamal et al. A lazy approach to long-horizon gradient-based meta-learning
US20210319269A1 (en) Apparatus for determining a classifier for identifying objects in an image, an apparatus for identifying objects in an image and corresponding methods
US20230076290A1 (en) Rounding mechanisms for post-training quantization
CN117557886A (zh) 融合偏标签和消极学习的含噪声标签图像识别方法及系统
WO2024000566A9 (fr) Procédé et appareil d'apprentissage auxiliaire avec tâche conjointe et planification de données
US20230419170A1 (en) System and method for efficient machine learning
CN115240843A (zh) 基于结构因果模型的公平性预测系统
Yang et al. A two-stage training framework with feature-label matching mechanism for learning from label proportions