CN114186684A - Multitask model training method, multitask model training system, multitask model training medium and electronic terminal - Google Patents

Multitask model training method, multitask model training system, multitask model training medium and electronic terminal Download PDF

Info

Publication number
CN114186684A
CN114186684A CN202111522799.7A CN202111522799A CN114186684A CN 114186684 A CN114186684 A CN 114186684A CN 202111522799 A CN202111522799 A CN 202111522799A CN 114186684 A CN114186684 A CN 114186684A
Authority
CN
China
Prior art keywords
layer
classification layer
training
parameter
newly added
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111522799.7A
Other languages
Chinese (zh)
Inventor
蒋宏达
陈家豪
徐亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202111522799.7A priority Critical patent/CN114186684A/en
Publication of CN114186684A publication Critical patent/CN114186684A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a multitask model training method, a multitask model training system, a multitask model training medium and an electronic terminal, wherein the method comprises the following steps: adding a newly-added classification layer corresponding to the newly-added task into the original multi-task model to further obtain an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer; freezing the parameter layer and the original classification layer; inputting the newly added training sentence into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; according to the first prediction result and the real classification result, performing primary training on the newly added classification layer; unfreezing the parameter layer and the original classification layer, inputting a newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; performing joint training on all layers of the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model; and the model training efficiency is improved.

Description

Multitask model training method, multitask model training system, multitask model training medium and electronic terminal
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a multitask model training method, a multitask model training system, a multitask model training medium and an electronic terminal.
Background
With the development of natural language processing technology, multitask models are more and more widely applied. However, over time, many new domain tasks typically occur. At present, when a new task occurs, an original old model is usually required to be discarded, new knowledge and old knowledge are integrated, and a multi-task model is retrained, so that the model training efficiency is low, faster model iteration cannot be supported, and new knowledge cannot be continuously learned on the basis of keeping the old knowledge.
For example: in the life insurance quality inspection task, the identification of the illegal dialect is often carried out manually or by a machine, namely whether the seat relates to some illegal dialect in the process of communicating with a customer, the illegal dialect may relate to a plurality of fields, and new illegal dialect appears over time, the newly appeared illegal dialect needs to be judged, and the prior art needs to integrate new knowledge and old knowledge and retrain a multi-task model to meet the identification requirement of the new illegal dialect, so that the iteration efficiency of the model is low.
Disclosure of Invention
The invention provides a multitask model training method, a multitask model training system, a multitask model training medium and an electronic terminal, and aims to solve the problems that in the prior art, when a new task occurs, new knowledge cannot be continuously learned on the basis of keeping old knowledge, the new knowledge and the old knowledge need to be integrated, and a multitask model needs to be retrained, so that the model training efficiency and the iteration efficiency are low.
The invention provides a multi-task model training method, which comprises the following steps:
acquiring a newly added task, adding a newly added classification layer into an original multi-task model according to the newly added task, and further acquiring an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer;
freezing the parameter layer and the original classification layer;
inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer;
performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task;
unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer;
and performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model.
Optionally, the step of freezing the parameter layer and the original classification layer includes:
updating the parameter attributes of the trainable variables of the parameter layer and the original classification layer once according to the preset freezing attribute;
adding a parameter filter to an optimizer of the intermediate model;
judging whether the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both freezing attributes or not through the parameter attributes after the trainable variables of the parameter layer and the original classification layer are updated once, and acquiring a first judgment result;
and completing the freezing of the parameter layer and the original classification layer according to the first judgment result.
Optionally, the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task includes:
training the newly added classification layer according to the first prediction result, a corresponding real classification result in the newly added task and a preset first loss function, wherein the mathematical expression of the first loss function is as follows:
Figure BDA0003408392360000021
wherein the content of the first and second substances,
Figure BDA0003408392360000022
in order to be a function of the first loss,
Figure BDA0003408392360000023
prediction result output for newly added classification layer, ynAnd outputting a real classification result corresponding to the prediction result output by the newly added classification layer.
Optionally, the step of adding a parameter filter to the optimizer of the intermediate model is followed by:
when a newly added classification layer is trained for one time, a parameter filter is controlled to filter trainable variables in the parameter layer and an original classification layer according to a preset filtering rule, and further the newly added classification layer is trained for one time;
the filtering rules include: and judging whether the parameter attribute of the trainable variable is the freezing attribute, if so, filtering the corresponding trainable variable and keeping the corresponding trainable variable unchanged.
Optionally, the step of unfreezing the parameter layer and the original classification layer includes:
according to a preset unfreezing rule, secondarily updating the parameter attribute of the trainable variable in the parameter layer and the trainable variable in the original classification layer;
and after going through the parameter attributes of the trainable variables in the parameter layer and the original classification layer after secondary updating, judging whether the parameter attributes of the trainable variables in the parameter layer and the original classification layer are thawing attributes, and acquiring a second judgment result so as to finish thawing of the parameter layer and the original classification layer.
Optionally, the step of performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result includes:
acquiring a fourth prediction result output by a newly added classification layer in a secondary prediction process, acquiring a first loss according to the fourth prediction result, a corresponding real classification result and a preset first loss function, and performing secondary training on the newly added classification layer to acquire the newly added classification layer after the secondary training;
distilling the original classification layer according to the second prediction result, the third prediction result and a preset second loss function to obtain a second loss, and performing primary training on the original classification layer to obtain a trained original classification layer;
according to the first loss and the second loss, performing combined training on a parameter layer, a newly-added classification layer after secondary training and an original classification layer after primary training in the intermediate model;
the mathematical expression of the second loss function is:
Figure BDA0003408392360000031
wherein the content of the first and second substances,
Figure BDA0003408392360000032
is a second loss function, y'oA second prediction result output for the original classification layer frozen in the one-time prediction process,
Figure BDA0003408392360000033
and l is a third prediction result output by the unfrozen original classification layer in the secondary prediction process, and the prediction times of the unfrozen original classification layer.
Optionally, the step of performing joint training on the parameter layer, the newly added classification layer after the secondary training, and the original classification layer after the primary training in the intermediate model according to the first loss and the second loss includes:
according to the first loss and the second loss, a preset third loss function is utilized to carry out combined training on a parameter layer, a newly added classification layer after secondary training and an original classification layer after primary training in the intermediate model, and the mathematical expression of the third loss function is as follows:
Figure BDA0003408392360000034
wherein the content of the first and second substances,
Figure BDA0003408392360000035
as a third loss function, θsIs a parameter layer, θ0To the original classification layer, θnFor newly adding a classification layer, argmin represents the value of the variable that makes the following equation the minimum, λ0Is a preset first weight, lambda1Is a preset second weight value, and the weight value is,
Figure BDA0003408392360000036
is the second loss, which is the loss of the original classification layer during distillation,
Figure BDA0003408392360000037
and the first loss is the loss of a new added classification layer in the secondary prediction process.
The invention also provides a multi-task model training system, comprising:
and the newly-added task module is used for acquiring a newly-added task, adding a newly-added classification layer into the original multi-task model according to the newly-added task, and further acquiring an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer;
the first training module is used for freezing the parameter layer and the original classification layer; inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task;
the second training module is used for unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model;
the newly added task module, the first training module and the second training module are connected.
The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method as defined in any one of the above.
The present invention also provides an electronic terminal, comprising: a processor and a memory;
the memory is adapted to store a computer program and the processor is adapted to execute the computer program stored by the memory to cause the terminal to perform the method as defined in any one of the above.
The invention has the beneficial effects that: according to the multi-task model training method, the system, the medium and the electronic terminal, the intermediate model is obtained by obtaining the new tasks and adding the new classification layer corresponding to the new tasks into the original multi-task model, wherein the intermediate model comprises the following steps: the parameter layer, the original classification layer and the newly added classification layer are frozen, newly added training sentences in newly added tasks are input into an intermediate model to be subjected to primary prediction, a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer are obtained, the newly added classification layer is subjected to primary training according to the first prediction result and a corresponding real classification result in the newly added tasks, the parameter layer and the original classification layer are unfrozen, the newly added training sentences are input into the intermediate model to be subjected to secondary prediction, a third prediction result output by the original classification layer is obtained, all layers in the intermediate model are subjected to joint training according to the second prediction result and the third prediction result to obtain a final multi-task model, so that the aim of continuously learning new knowledge and completing iteration of the original multi-task model on the basis of keeping old knowledge is achieved, the training efficiency and the iteration efficiency of the model are high, and the accuracy of the model is high. It is to be understood that new knowledge refers to the newly added task and old knowledge refers to the original task of the original multitask model.
Drawings
FIG. 1 is a flowchart illustrating a method for training a multitask model according to an embodiment of the present invention.
FIG. 2 is a schematic flow chart illustrating freezing of a parameter layer and an original classification layer in the multi-task model training method according to the embodiment of the present invention.
Fig. 3 is a schematic flow chart of performing one training on a newly added classification layer in the multi-task model training method in the embodiment of the present invention.
Fig. 4 is a schematic flow chart illustrating the process of unfreezing the parameter layer and the original classification layer in the multi-task model training method according to the embodiment of the present invention.
Fig. 5 is a schematic flow chart of joint training of all layers of the intermediate model in the multi-task model training method in the embodiment of the present invention.
FIG. 6 is a schematic structural diagram of a multitask model training system according to an embodiment of the present invention.
FIG. 7 is a schematic structural diagram of an electronic terminal for multitask model training in an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
As shown in fig. 1, the multi-task model training method in this embodiment includes:
s11: acquiring a newly added task, adding a newly added classification layer into an original multi-task model according to the newly added task, and further acquiring an intermediate model, wherein the intermediate model comprises: parameter layer, original classification layer and new classification layer. The newly added task refers to a newly appeared task, and the newly added task comprises the following steps: newly adding a training sentence and a real classification result corresponding to the newly added training sentence; the original multi-task model is an original model used for predicting a plurality of tasks; the original multitasking model comprises: the system comprises a parameter layer and an original classification layer, wherein the original classification layer is the original classification layer in an original multi-task model, and the parameter layer is the layer except the original classification layer in the original multi-task model; the newly added task corresponds to the newly added classification layer, and after the newly added classification layer is added into the original multi-task model, an intermediate model is formed, wherein the intermediate model comprises: parameter layer, original classification layer and new classification layer. By acquiring the newly added tasks and adding the newly added classification layer corresponding to the newly added tasks into the original multi-task model according to the newly added tasks, the method is convenient for continuously learning new knowledge on the basis of the original multi-task model in the follow-up process, namely continuously learning new tasks, and the iteration and updating efficiency of the multi-task model is improved. For example: the original task in the original multi-task model is 'confuse financing products' and the like, a newly added task is obtained, if the newly added task is 'misleading insurance will not cost', a classification layer corresponding to the 'misleading insurance will not cost' of the newly added task is added in the original multi-task model, and an intermediate model is obtained, wherein the intermediate model comprises: the method comprises a parameter layer, an original classification layer corresponding to an original task and a new classification layer corresponding to a new task which misleads insurance and does not cost money, and further training and iteration of the multi-task model are performed on the basis, so that the iteration efficiency of the multi-task model is improved.
It is understood that the multitask model refers to a model for learning a plurality of tasks at the same time, the structure of the multitask model generally includes a parameter layer of a transform structure and N classification layers corresponding to the tasks, the transform is an attention-based coder-Decoder (Encoder-Decoder) structure, and the multitask model is opposite to the single task model. As time goes forward continuously, new tasks come after, and when a task in a new field appears, the prior art generally needs to integrate old knowledge and new knowledge and retrain a multi-task model, which undoubtedly has huge training cost and longer training time.
S12: freezing the parameter layer and the original classification layer; specifically, freezing the parameter layer and the original classification layer in the intermediate model means that in a subsequent training process, trainable variables of the parameter layer and the original classification layer in the intermediate model are not updated, that is, only participate in forward loss calculation, and do not participate in backward propagation. By freezing the parameter layer and the original classification layer in the intermediate model, the newly added task layer can be conveniently and independently trained in the subsequent one-time training process, and the accuracy of the newly added classification layer is improved.
S13: inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; inputting a newly added training sentence in a newly added task into an intermediate model, and performing primary prediction on the newly added training sentence by utilizing a newly added classification layer and an original classification layer respectively to obtain a first prediction result output by the newly added classification layer and a second classification result output by the original classification layer. By obtaining the first prediction result output by the newly added classification layer, the newly added classification layer can be conveniently and independently trained subsequently according to the first prediction result, and the classification accuracy of the newly added classification layer is improved. Because the original classification layer is a trained classification layer, it can be understood that the second prediction result output by the original classification layer is a better prediction, so that knowledge distillation can be conveniently carried out on the original classification layer after subsequent thawing through the second prediction result output by the original classification layer, and the accuracy of the original classification layer is improved.
S14: performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task; the newly added classification layer is iteratively trained by acquiring the difference between the first prediction result and the corresponding real classification result in the newly added task, so that the independent training of the newly added classification layer is realized, and the accuracy of the newly added classification layer is improved.
S15: unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; after one-time training is finished, unfreezing the parameter layer and the original classification layer in the intermediate model, enabling trainable variables in the parameter layer and the original classification layer to participate in updating and back propagation in the subsequent training process, inputting a newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer.
S16: and performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model. Knowledge distillation is carried out on the unfrozen original classification layer by obtaining the difference between the second prediction result and the third prediction result, the accuracy of the original classification layer is improved, new knowledge is learned on the basis of keeping the recognition capability of old knowledge, all layers in the intermediate model, namely a parameter layer, the original classification layer and a newly added classification layer, are jointly trained according to the difference between the second prediction result and the third prediction result, a better final multi-task model is obtained, the new knowledge is learned on the basis of keeping the old knowledge, the training efficiency and the iteration efficiency of the multi-task model are effectively improved, the model accuracy is higher, the model training cost is greatly reduced, the integration of the new knowledge and the old knowledge is avoided when a new task occurs, and a multi-task model is retrained by utilizing the integrated knowledge.
In some embodiments, the number of the newly added tasks may be one or more, when there are a plurality of newly added tasks, the newly added tasks are sorted and labeled, and the steps S11, S12, S13, S14, S15, and S16 are repeated according to the sequence of the newly added tasks, so that iteration and update of the multi-task model are completed, the iteration efficiency and the training efficiency of the multi-task model are effectively improved, the model training cost is greatly saved, the model accuracy is high, and the implementability is strong.
In some embodiments, when a newly added task occurs again, the steps S11, S12, S13, S14, S15 and S16 are repeated to complete iteration and updating of the multi-task model, so that the model is high in accuracy, convenient to implement and low in cost.
As shown in fig. 2, in order to better implement freezing of the parameter layer and the original classification layer and ensure that the parameter layer and the original classification layer do not participate in parameter updating and iteration any more in a subsequent training process, the inventor proposes that the step of freezing the parameter layer and the original classification layer includes:
s121: updating the parameter attributes of the trainable variables of the parameter layer and the original classification layer once according to the preset freezing attribute; that is, the parameter attributes of the trainable variables of the parameter layer and the original classification layer are changed to preset freezing attributes, for example: the parameter property (requires-grad) of the trainable variables of the parameter level and the original classification level is set to False.
S122: adding a parameter filter (filter) to an optimizer of the intermediate model;
s123: judging whether the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both freezing attributes or not through the parameter attributes after the trainable variables of the parameter layer and the original classification layer are updated once, and acquiring a first judgment result;
s124: and completing the freezing of the parameter layer and the original classification layer according to the first judgment result. Namely, in order to further ensure that the trainable variables of the parameter layer and the original classification layer are all frozen, the parameter attributes of the trainable variables of the parameter layer and the original classification layer are judged whether to be frozen attribute False or not, if the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both frozen attribute False, the freezing of the trainable variables of the parameter layer and the original classification layer is completed, if the parameter attributes of the trainable variables of the parameter layer and/or the original classification layer are not frozen attribute False, the parameter attributes of the corresponding trainable variables are updated to be frozen attributes, and if the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both frozen attributes, the freezing of the trainable variables of the parameter layer and the original classification layer is completed, thereby ensuring that the trainable variables of the parameter layer and the original classification layer are frozen in the one-time training process of the trainable variables in the parameter layer and the original classification layer, namely, in the process of independent training of the newly added classification layer, the back propagation and updating are not participated any more, and the accuracy of model training is improved.
As shown in fig. 3, in some embodiments, the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task includes:
s141: training the newly added classification layer according to the first prediction result, a corresponding real classification result in the newly added task and a preset first loss function, wherein the mathematical expression of the first loss function is as follows:
Figure BDA0003408392360000071
wherein the content of the first and second substances,
Figure BDA0003408392360000081
in order to be a function of the first loss,
Figure BDA0003408392360000082
prediction result output for newly added classification layer, ynAnd outputting a real classification result corresponding to the prediction result output by the newly added classification layer. In other words, in the one-time training process, the newly added classification layer is subjected to iterative training by using the preset first loss function, and the prediction accuracy of the newly added classification layer is effectively improved.
In some embodiments, when the newly added classification layer is trained once, the parameter filter is controlled to filter the trainable variables in the parameter layer and the original classification layer according to a preset filtering rule, and further, the newly added classification layer is trained once;
the filtering rules include: and judging whether the parameter attribute of the trainable variable is the freezing attribute, if so, filtering the corresponding trainable variable and keeping the corresponding trainable variable unchanged. For example: judging whether the parameter attribute of the trainable variable of the intermediate model is a frozen attribute False, if so, filtering the trainable variable, and keeping the value of the trainable variable unchanged, thereby freezing the trainable variable of the parameter layer and the original classification layer in one training process, avoiding the trainable variable of the parameter layer and the original classification layer in one training process from participating in parameter updating and iteration, realizing independent training of a newly added classification layer, further realizing learning new knowledge on the basis of original knowledge, namely old knowledge, and improving the efficiency of model training and iteration.
As shown in fig. 4, in order to improve the thawing efficiency of the parameter layer and the original classification layer, and at the same time, in order to avoid errors in the thawing process, the inventors propose that the step of thawing the parameter layer and the original classification layer comprises:
s151: according to a preset unfreezing rule, secondarily updating the parameter attribute of the trainable variable in the parameter layer and the trainable variable in the original classification layer; that is, the parameter attributes of the trainable variables in the parameter layer and the original classification layer are changed to preset thawing attributes, for example: changing the parameter attribute (requires-grad) of the trainable variables of the parameter layer and the original classification layer to True, and finishing secondary updating of the trainable variables of the parameter layer and the original classification layer.
S152: and after going through the parameter attributes of the trainable variables in the parameter layer and the original classification layer after secondary updating, judging whether the parameter attributes of the trainable variables in the parameter layer and the original classification layer are thawing attributes, and acquiring a second judgment result so as to finish thawing of the parameter layer and the original classification layer. In order to further ensure that the trainable variables of the parameter layer and the original classification layer are unfrozen, the parameter attributes of the trainable variables of the parameter layer and the original classification layer are judged whether to be unfreezing attributes True or not, if the parameter attributes of the trainable variables of the parameter layer and the original classification layer are unfreezing attributes True, the unfreezing of the trainable variables of the parameter layer and the original classification layer is completed, if the parameter attributes of the trainable variables in the parameter layer and/or the original classification layer are not unfreezing attributes True, the parameter attributes of the corresponding trainable variables are updated to be the unfreezing attributes, and if the parameter attributes of the trainable variables in the parameter layer and the original classification layer are unfreezing attributes, the unfreezing of the trainable variables of the parameter layer and the original classification layer is completed, so that the continuous training process of the trainable variables in the parameter layer and the original classification layer is ensured, and the method participates in back propagation and parameter updating, and realizes the continuous learning of new knowledge on the basis of old knowledge.
Preferably, in the secondary training process, the parameter filter in the intermediate model can be deleted or removed, so that the model operation load is reduced, and the model operation efficiency is improved.
In some embodiments, the step of jointly training all layers in the intermediate model according to the second prediction result and the third prediction result comprises:
and performing joint training on all layers of the intermediate model by obtaining the difference between the second prediction result and the third prediction result to obtain a final multi-task model.
In order to further improve the accuracy of the final multitask model, the inventor proposes that, in the process of the combined training, on the basis of obtaining the difference between the second prediction result and the third prediction result, the loss of obtaining the newly added classification layer is increased, and then the combined training is performed on all layers of the intermediate model by using the loss of the newly added classification layer and the distillation loss of the original classification layer, as shown in fig. 5, the step of performing the combined training on all layers of the intermediate model by using the loss of the newly added classification layer and the distillation loss of the original classification layer includes:
s161: acquiring a fourth prediction result output by a newly added classification layer in a secondary prediction process, acquiring a first loss according to the fourth prediction result, a corresponding real classification result and a preset first loss function, and performing secondary training on the newly added classification layer to acquire the newly added classification layer after the secondary training; namely, a difference between the fourth prediction result and the corresponding real classification result is obtained by using a preset first loss function, a first loss is obtained, secondary training is performed on the newly added classification layer according to the first loss, the prediction accuracy of the newly added classification layer is improved, the degree of the secondary training performed on the newly added classification layer can be set according to the actual situation, and details are not repeated here.
S162: distilling the original classification layer according to the second prediction result, the third prediction result and a preset second loss function to obtain a second loss, and performing primary training on the original classification layer to obtain a trained original classification layer; namely, the second loss function is utilized to obtain the difference between the second prediction result and the third prediction result, so as to obtain the second loss, the original classification layer is distilled according to the second loss, so as to realize one-time training of the original classification layer, and the degree of one-time training of the original classification layer can be set according to the actual situation, which is not repeated here.
S163: according to the first loss and the second loss, performing combined training on a parameter layer, a newly-added classification layer after secondary training and an original classification layer after primary training in the intermediate model; and performing joint training on all layers of the intermediate model by combining the first loss and the second loss to obtain a final multi-task model with higher accuracy.
The mathematical expression of the second loss function is:
Figure BDA0003408392360000091
wherein the content of the first and second substances,
Figure BDA0003408392360000092
is a second loss function, y'oA second prediction result output for the original classification layer frozen in the one-time prediction process,
Figure BDA0003408392360000093
and l is a third prediction result output by the unfrozen original classification layer in the secondary prediction process, and the prediction times of the unfrozen original classification layer. By using the above-mentioned secondAnd the loss function is used for distilling the original classification layer, so that the prediction accuracy of the original classification layer can be effectively improved, and the distillation efficiency is good.
In some embodiments, the step of performing joint training on the parameter layer, the newly added classification layer after the secondary training, and the original classification layer after the primary training in the intermediate model according to the first loss and the second loss includes:
s1631: according to the first loss and the second loss, a preset third loss function is utilized to carry out combined training on a parameter layer, a newly added classification layer after secondary training and an original classification layer after primary training in the intermediate model, and the mathematical expression of the third loss function is as follows:
Figure BDA0003408392360000101
wherein the content of the first and second substances,
Figure BDA0003408392360000102
as a third loss function, θsIs a parameter layer, θ0To the original classification layer, θ0For newly adding a classification layer, argmin represents the value of the variable that makes the following equation the minimum, λ0Is a preset first weight, lambda1Is a preset second weight value, and the weight value is,
Figure BDA0003408392360000103
is the second loss, which is the loss of the original classification layer during distillation,
Figure BDA0003408392360000104
and the first loss is the loss of a new added classification layer in the secondary prediction process. By adopting the third loss function, the combined training and iterative updating are carried out on all layers of the intermediate model, the accuracy and the precision of the obtained final multi-task model are better improved, the training efficiency of the multi-task model is greatly improved, a new model does not need to be retrained, and the learning of new knowledge on the basis of old knowledge is realized.
The first embodiment is as follows:
in the application scenario of the life risk quality inspection, it is usually necessary to determine whether a violation of speech is involved in the process of communicating with the client. As time goes on, a new illegal speech operation usually occurs, if an original multi-task model is adopted for judgment, the corresponding illegal speech operation often cannot be accurately identified, in the prior art, new knowledge (newly added training sentences and corresponding real classification results) corresponding to a new task and old knowledge (training data adopted by the original multi-task model, including original training samples and corresponding real prediction results) corresponding to an old task are usually integrated, and a new multi-task model is retrained, so that the training difficulty is high, the operation is complex, the training time is long, the cost is high, and the model training efficiency is low. Therefore, in this embodiment, an intermediate model is obtained by adding a new classification layer corresponding to a new task to the original multitask model, and the intermediate model includes: the intermediate model includes: the device comprises a parameter layer, an original classification layer and a newly added classification layer; freezing a parameter layer and an original classification layer in the intermediate model, inputting a newly added training statement in a newly added task into the intermediate model for primary prediction, obtaining a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer, and performing primary training on the newly added classification layer according to the first prediction result and a corresponding real classification result in the newly added task; after the primary training is finished, unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer and a fourth classification result output by the newly added classification layer; performing joint training on all layers in the intermediate model according to the second prediction result, the third prediction result, the fourth prediction result and a real classification result corresponding to the fourth prediction result to obtain a final multi-task model, so that iterative training and updating of the original multi-task model are realized, integration of new knowledge and old knowledge is not needed, and a multi-task model is retrainedType training time. For example: when the newly added task is: when misjudging that the illegal dialect misleads insurance and does not cost money, adding a corresponding newly added classification layer theta in the original multi-task model according to the newly added task1Obtaining an intermediate model, the intermediate model comprising: newly added classification layer theta1Parameter layer thetasAnd the original classification layer theta0Freezing the parameter layer thetasAnd the original classification layer theta0So that the parameter layer thetasAnd the original classification layer theta0In the subsequent one-time training process of the newly added classification layer, the newly added training sentences corresponding to the newly added tasks are input into the intermediate model for one-time prediction without participating in back propagation and iteration, and the newly added classification layer theta is obtained1Outputting the first prediction result and the original classification layer theta0The output second prediction result is used for training the newly added classification layer once according to the first prediction result and the corresponding real classification result, and because in the process, the parameter layer theta issAnd the original classification layer theta0Has frozen and, therefore, the parameter layer thetasAnd the original classification layer theta0Does not participate in parameter updating, and adds a classification layer theta1After the independent training is finished, the parameter layer theta is processedsAnd the original classification layer theta0Unfreezing, inputting the newly added training sentences of the newly added tasks into the intermediate model for secondary prediction to obtain an original classification layer theta0Outputting the third prediction result and a new classification layer theta1And the output fourth prediction result is used for performing combined training on all layers in the original multi-task model according to the difference between the second prediction result and the third prediction result and the difference between the fourth prediction result and the corresponding real classification result to obtain a better final multi-task model, so that the training of a new task on the basis of the original multi-task model is better realized, new knowledge is further learned on the basis of old knowledge, the training efficiency is higher, the accuracy is higher, and the model prediction effect is better.
Example two:
in the application scene of intelligent telephone answering or public opinion judgment, when a task for judging new illegal dialogs or target dialogs is needed, an intermediate model is obtained by adding a newly-added classification layer corresponding to the newly-added task into an original multi-task model, the intermediate model comprises a parameter layer, an original classification layer and the newly-added classification layer, the parameter layer and the original classification layer in the intermediate model are further frozen, a newly-added training sentence of the newly-added task is input into the intermediate model for primary prediction, a first prediction result output by the newly-added classification layer and a second prediction result output by the original classification layer are obtained, the newly-added classification layer is trained once by obtaining the difference between the first prediction result and a corresponding real classification result, and after the independent training of the newly-added classification layer is finished, the parameter layer and the original classification layer are unfrozen, so that the parameter layer and the original classification layer participate in parameter updating and iteration in the subsequent training process, inputting a newly added training sentence into the intermediate model for secondary prediction, obtaining a third prediction result output by the original classification layer and a fourth prediction result output by the newly added classification layer, performing combined training on all layers in the intermediate model according to the second prediction result, the third prediction result, the fourth prediction result and a real classification result corresponding to the fourth prediction result, obtaining a final multi-task model, and having higher accuracy, so that when a new task appears, new knowledge corresponding to the newly added task can be trained on the basis of keeping old knowledge of the original multi-task model, and the training efficiency of the multi-task model is effectively improved.
As shown in fig. 6, this embodiment further provides a multitask model training system, which includes:
and the newly-added task module is used for acquiring a newly-added task, adding a newly-added classification layer into the original multi-task model according to the newly-added task, and further acquiring an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer;
the first training module is used for freezing the parameter layer and the original classification layer; inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task;
the second training module is used for unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model;
the newly added task module, the first training module and the second training module are connected. In the multi-task model training system in this embodiment, a newly added task is obtained, and a newly added classification layer corresponding to the newly added task is added to an original multi-task model, so as to obtain an intermediate model, where the intermediate model includes: the parameter layer, the original classification layer and the newly added classification layer are frozen, newly added training sentences in newly added tasks are input into an intermediate model to be subjected to primary prediction, a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer are obtained, the newly added classification layer is subjected to primary training according to the first prediction result and a corresponding real classification result in the newly added tasks, the parameter layer and the original classification layer are unfrozen, the newly added training sentences are input into the intermediate model to be subjected to secondary prediction, a third prediction result output by the original classification layer is obtained, all layers in the intermediate model are subjected to joint training according to the second prediction result and the third prediction result to obtain a final multi-task model, so that the new knowledge is continuously learned on the basis of keeping the old knowledge, and the iteration of the original multi-task model is completed, the training efficiency and the iteration efficiency of the model are high, the model accuracy is high, the cost is low, and the practicability is high.
In some embodiments, freezing the parameter layer and the original classification layer comprises:
updating the parameter attributes of the trainable variables of the parameter layer and the original classification layer once according to the preset freezing attribute;
adding a parameter filter to an optimizer of the intermediate model;
judging whether the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both freezing attributes or not through the parameter attributes after the trainable variables of the parameter layer and the original classification layer are updated once, and acquiring a first judgment result;
and completing the freezing of the parameter layer and the original classification layer according to the first judgment result.
In some embodiments, the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task includes:
training the newly added classification layer according to the first prediction result, a corresponding real classification result in the newly added task and a preset first loss function, wherein the mathematical expression of the first loss function is as follows:
Figure BDA0003408392360000131
wherein the content of the first and second substances,
Figure BDA0003408392360000132
in order to be a function of the first loss,
Figure BDA0003408392360000133
prediction result output for newly added classification layer, ynAnd outputting a real classification result corresponding to the prediction result output by the newly added classification layer.
In some embodiments, the step of adding a parametric filter in the optimizer of the intermediate model is followed by:
when a newly added classification layer is trained for one time, a parameter filter is controlled to filter trainable variables in the parameter layer and an original classification layer according to a preset filtering rule, and further the newly added classification layer is trained for one time;
the filtering rules include: and judging whether the parameter attribute of the trainable variable is the freezing attribute, if so, filtering the corresponding trainable variable and keeping the corresponding trainable variable unchanged.
In some embodiments, the step of unfreezing the parameter layer and the original classification layer comprises:
according to a preset unfreezing rule, secondarily updating the parameter attribute of the trainable variable in the parameter layer and the trainable variable in the original classification layer;
and after going through the parameter attributes of the trainable variables in the parameter layer and the original classification layer after secondary updating, judging whether the parameter attributes of the trainable variables in the parameter layer and the original classification layer are thawing attributes, and acquiring a second judgment result so as to finish thawing of the parameter layer and the original classification layer.
In some embodiments, the step of jointly training all layers in the intermediate model according to the second prediction result and the third prediction result comprises:
acquiring a fourth prediction result output by a newly added classification layer in a secondary prediction process, acquiring a first loss according to the fourth prediction result, a corresponding real classification result and a preset first loss function, and performing secondary training on the newly added classification layer to acquire the newly added classification layer after the secondary training;
distilling the original classification layer according to the second prediction result, the third prediction result and a preset second loss function to obtain a second loss, and performing primary training on the original classification layer to obtain a trained original classification layer;
according to the first loss and the second loss, performing combined training on a parameter layer, a newly-added classification layer after secondary training and an original classification layer after primary training in the intermediate model;
the mathematical expression of the second loss function is:
Figure BDA0003408392360000134
wherein the content of the first and second substances,
Figure BDA0003408392360000135
is a second loss function, y'oFor freezing during a predictionThe second prediction result output by the original classification layer,
Figure BDA0003408392360000136
and l is a third prediction result output by the unfrozen original classification layer in the secondary prediction process, and the prediction times of the unfrozen original classification layer.
In some embodiments, the step of performing joint training on the parameter layer, the newly added classification layer after the secondary training, and the original classification layer after the primary training in the intermediate model according to the first loss and the second loss includes:
according to the first loss and the second loss, a preset third loss function is utilized to carry out combined training on a parameter layer, a newly added classification layer after secondary training and an original classification layer after primary training in the intermediate model, and the mathematical expression of the third loss function is as follows:
Figure BDA0003408392360000141
wherein the content of the first and second substances,
Figure BDA0003408392360000142
as a third loss function, θsIs a parameter layer, θ0To the original classification layer, θnFor newly adding a classification layer, argmin represents the value of the variable that makes the following equation the minimum, λ0Is a preset first weight, lambda1Is a preset second weight value, and the weight value is,
Figure BDA0003408392360000143
is the second loss, which is the loss of the original classification layer during distillation,
Figure BDA0003408392360000144
and the first loss is the loss of a new added classification layer in the secondary prediction process.
The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements any of the methods in the present embodiments.
The present embodiment further provides an electronic terminal, including: a processor and a memory;
the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the method in the embodiment.
Fig. 7 is a schematic structural diagram of an electronic terminal according to an embodiment of the invention. This example provides an electronic terminal, includes: a processor 71, a memory 72, a communicator 73, a communication interface 74, and a system bus 75; the memory 72 and the communication interface 74 are connected with the processor 71 and the communicator 73 through the system bus 75 and are used for mutual communication, the memory 72 is used for storing computer programs, the communication interface 74 is used for communicating with other equipment, and the processor 71 and the communicator 73 are used for running the computer programs so that the electronic terminal can execute the steps of the multi-task model distillation method.
The system bus 75 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
In this embodiment, the Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
The computer-readable storage medium in the present embodiment can be understood by those skilled in the art as follows: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device, such as through the internet using an internet service provider.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
In summary, in the multitask model training method, the multitask model training system, the medium and the electronic terminal in this embodiment, by obtaining the new tasks, adding a new classification layer corresponding to the new tasks into the original multitask model, and obtaining an intermediate model, where the intermediate model includes: the parameter layer, the original classification layer and the newly added classification layer are frozen, then, the parameter layer and the original classification layer are frozen, newly added training sentences in newly added tasks are input into an intermediate model to be subjected to primary prediction, a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer are obtained, the newly added classification layer is subjected to primary training according to the first prediction result and a corresponding real classification result in the newly added tasks, then, the parameter layer and the original classification layer are unfrozen, the newly added training sentences are input into the intermediate model to be subjected to secondary prediction, a third prediction result output by the original classification layer is obtained, all layers in the intermediate model are subjected to joint training according to the second prediction result and the third prediction result to obtain a final multi-task model, the continuous learning of new knowledge on the basis of keeping old knowledge is realized, and the iteration of the original multi-task model is completed, the training efficiency and the iteration efficiency of the model are high, the accuracy of the model is high, and the iteration cost of the multi-task model is effectively reduced.
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A multitask model training method is characterized by comprising the following steps:
acquiring a newly added task, adding a newly added classification layer into an original multi-task model according to the newly added task, and further acquiring an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer;
freezing the parameter layer and the original classification layer;
inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer;
performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task;
unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer;
and performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model.
2. The multitask model training method of claim 1, wherein the step of freezing the parameter layer and the original classification layer comprises:
updating the parameter attributes of the trainable variables of the parameter layer and the original classification layer once according to the preset freezing attribute;
adding a parameter filter to an optimizer of the intermediate model;
judging whether the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both freezing attributes or not through the parameter attributes after the trainable variables of the parameter layer and the original classification layer are updated once, and acquiring a first judgment result;
and completing the freezing of the parameter layer and the original classification layer according to the first judgment result.
3. The multi-task model training method of claim 1, wherein the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task comprises:
training the newly added classification layer according to the first prediction result, a corresponding real classification result in the newly added task and a preset first loss function, wherein the mathematical expression of the first loss function is as follows:
Figure FDA0003408392350000011
wherein the content of the first and second substances,
Figure FDA0003408392350000012
in order to be a function of the first loss,
Figure FDA0003408392350000013
prediction result output for newly added classification layer, ynFor adding output of classification layerAnd measuring a real classification result corresponding to the result.
4. The multitask model training method according to claim 2, wherein the step of adding a parameter filter to the optimizer of the intermediate model is followed by:
when a newly added classification layer is trained for one time, a parameter filter is controlled to filter trainable variables in the parameter layer and an original classification layer according to a preset filtering rule, and further the newly added classification layer is trained for one time;
the filtering rules include: and judging whether the parameter attribute of the trainable variable is the freezing attribute, if so, filtering the corresponding trainable variable and keeping the corresponding trainable variable unchanged.
5. The multitask model training method according to claim 1, wherein the step of unfreezing the parameter layer and the original classification layer comprises:
according to a preset unfreezing rule, secondarily updating the parameter attribute of the trainable variable in the parameter layer and the trainable variable in the original classification layer;
and after going through the parameter attributes of the trainable variables in the parameter layer and the original classification layer after secondary updating, judging whether the parameter attributes of the trainable variables in the parameter layer and the original classification layer are thawing attributes, and acquiring a second judgment result so as to finish thawing of the parameter layer and the original classification layer.
6. The method for training a multitask model according to claim 1, wherein the step of jointly training all layers in said intermediate model according to said second predicted result and said third predicted result includes:
acquiring a fourth prediction result output by a newly added classification layer in a secondary prediction process, acquiring a first loss according to the fourth prediction result, a corresponding real classification result and a preset first loss function, and performing secondary training on the newly added classification layer to acquire the newly added classification layer after the secondary training;
distilling the original classification layer according to the second prediction result, the third prediction result and a preset second loss function to obtain a second loss, and performing primary training on the original classification layer to obtain a trained original classification layer;
according to the first loss and the second loss, performing combined training on a parameter layer, a newly-added classification layer after secondary training and an original classification layer after primary training in the intermediate model;
the mathematical expression of the second loss function is:
Figure FDA0003408392350000021
wherein the content of the first and second substances,
Figure FDA0003408392350000022
is a second loss function, y'oA second prediction result output for the original classification layer frozen in the one-time prediction process,
Figure FDA0003408392350000023
and l is a third prediction result output by the unfrozen original classification layer in the secondary prediction process, and the prediction times of the unfrozen original classification layer.
7. The method for training the multitask model according to claim 6, wherein the step of performing the joint training on the parameter layer, the newly added classification layer after the secondary training and the original classification layer after the primary training in the intermediate model according to the first loss and the second loss comprises:
according to the first loss and the second loss, a preset third loss function is utilized to carry out combined training on a parameter layer, a newly added classification layer after secondary training and an original classification layer after primary training in the intermediate model, and the mathematical expression of the third loss function is as follows:
Figure FDA0003408392350000031
wherein the content of the first and second substances,
Figure FDA0003408392350000032
as a third loss function, θsIs a parameter layer, θ0To the original classification layer, θnFor newly adding a classification layer, argmin represents the value of the variable that makes the following equation the minimum, λ0Is a preset first weight, lambda1Is a preset second weight value, and the weight value is,
Figure FDA0003408392350000033
is the second loss, which is the loss of the original classification layer during distillation,
Figure FDA0003408392350000034
and the first loss is the loss of a new added classification layer in the secondary prediction process.
8. A multitask model training system, comprising:
and the newly-added task module is used for acquiring a newly-added task, adding a newly-added classification layer into the original multi-task model according to the newly-added task, and further acquiring an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer;
the first training module is used for freezing the parameter layer and the original classification layer; inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task;
the second training module is used for unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model;
the newly added task module, the first training module and the second training module are connected.
9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.
10. An electronic terminal, comprising: a processor and a memory;
the memory is for storing a computer program and the processor is for executing the computer program stored by the memory to cause the terminal to perform the method of any of claims 1 to 7.
CN202111522799.7A 2021-12-13 2021-12-13 Multitask model training method, multitask model training system, multitask model training medium and electronic terminal Pending CN114186684A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111522799.7A CN114186684A (en) 2021-12-13 2021-12-13 Multitask model training method, multitask model training system, multitask model training medium and electronic terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111522799.7A CN114186684A (en) 2021-12-13 2021-12-13 Multitask model training method, multitask model training system, multitask model training medium and electronic terminal

Publications (1)

Publication Number Publication Date
CN114186684A true CN114186684A (en) 2022-03-15

Family

ID=80604860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111522799.7A Pending CN114186684A (en) 2021-12-13 2021-12-13 Multitask model training method, multitask model training system, multitask model training medium and electronic terminal

Country Status (1)

Country Link
CN (1) CN114186684A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817295A (en) * 2022-04-20 2022-07-29 平安科技(深圳)有限公司 Multi-table Text2sql model training method, system, device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817295A (en) * 2022-04-20 2022-07-29 平安科技(深圳)有限公司 Multi-table Text2sql model training method, system, device and medium
CN114817295B (en) * 2022-04-20 2024-04-05 平安科技(深圳)有限公司 Multi-table Text2sql model training method, system, device and medium

Similar Documents

Publication Publication Date Title
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
US20210303970A1 (en) Processing data using multiple neural networks
CN111523640B (en) Training method and device for neural network model
US11922281B2 (en) Training machine learning models using teacher annealing
JP2022508091A (en) Dynamic reconstruction training computer architecture
CN111259647A (en) Question and answer text matching method, device, medium and electronic equipment based on artificial intelligence
Uteuliyeva et al. Fourier neural networks: A comparative study
CN113554175B (en) Knowledge graph construction method and device, readable storage medium and terminal equipment
Demidov et al. Application model of modern artificial neural network methods for the analysis of information systems security
US20190228297A1 (en) Artificial Intelligence Modelling Engine
KR20200029351A (en) Sample processing method and device, related apparatus and storage medium
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN112529071A (en) Text classification method, system, computer equipment and storage medium
CN114186684A (en) Multitask model training method, multitask model training system, multitask model training medium and electronic terminal
Teijema et al. Active learning-based Systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders
CN114328277A (en) Software defect prediction and quality analysis method, device, equipment and medium
Dekhovich et al. Neural network relief: a pruning algorithm based on neural activity
CN116737939B (en) Meta learning method, text classification device, electronic equipment and storage medium
CN116680401A (en) Document processing method, document processing device, apparatus and storage medium
US20230097940A1 (en) System and method for extracting and using groups of features for interpretability analysis
US20230012316A1 (en) Automation of leave request process
CN113706347A (en) Multitask model distillation method, multitask model distillation system, multitask model distillation medium and electronic terminal
CN114723989A (en) Multitask learning method and device and electronic equipment
CN116805384A (en) Automatic searching method, automatic searching performance prediction model training method and device
CN114238798A (en) Search ranking method, system, device and storage medium based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination