CN114186684A

CN114186684A - Multitask model training method, multitask model training system, multitask model training medium and electronic terminal

Info

Publication number: CN114186684A
Application number: CN202111522799.7A
Authority: CN
Inventors: 蒋宏达; 陈家豪; 徐亮
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-03-15

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a multitask model training method, a multitask model training system, a multitask model training medium and an electronic terminal, wherein the method comprises the following steps: adding a newly-added classification layer corresponding to the newly-added task into the original multi-task model to further obtain an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer; freezing the parameter layer and the original classification layer; inputting the newly added training sentence into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; according to the first prediction result and the real classification result, performing primary training on the newly added classification layer; unfreezing the parameter layer and the original classification layer, inputting a newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; performing joint training on all layers of the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model; and the model training efficiency is improved.

Description

Multitask model training method, multitask model training system, multitask model training medium and electronic terminal

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a multitask model training method, a multitask model training system, a multitask model training medium and an electronic terminal.

Background

With the development of natural language processing technology, multitask models are more and more widely applied. However, over time, many new domain tasks typically occur. At present, when a new task occurs, an original old model is usually required to be discarded, new knowledge and old knowledge are integrated, and a multi-task model is retrained, so that the model training efficiency is low, faster model iteration cannot be supported, and new knowledge cannot be continuously learned on the basis of keeping the old knowledge.

For example: in the life insurance quality inspection task, the identification of the illegal dialect is often carried out manually or by a machine, namely whether the seat relates to some illegal dialect in the process of communicating with a customer, the illegal dialect may relate to a plurality of fields, and new illegal dialect appears over time, the newly appeared illegal dialect needs to be judged, and the prior art needs to integrate new knowledge and old knowledge and retrain a multi-task model to meet the identification requirement of the new illegal dialect, so that the iteration efficiency of the model is low.

Disclosure of Invention

The invention provides a multitask model training method, a multitask model training system, a multitask model training medium and an electronic terminal, and aims to solve the problems that in the prior art, when a new task occurs, new knowledge cannot be continuously learned on the basis of keeping old knowledge, the new knowledge and the old knowledge need to be integrated, and a multitask model needs to be retrained, so that the model training efficiency and the iteration efficiency are low.

The invention provides a multi-task model training method, which comprises the following steps:

acquiring a newly added task, adding a newly added classification layer into an original multi-task model according to the newly added task, and further acquiring an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer;

freezing the parameter layer and the original classification layer;

inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer;

performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task;

unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer;

and performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model.

Optionally, the step of freezing the parameter layer and the original classification layer includes:

updating the parameter attributes of the trainable variables of the parameter layer and the original classification layer once according to the preset freezing attribute;

adding a parameter filter to an optimizer of the intermediate model;

judging whether the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both freezing attributes or not through the parameter attributes after the trainable variables of the parameter layer and the original classification layer are updated once, and acquiring a first judgment result;

and completing the freezing of the parameter layer and the original classification layer according to the first judgment result.

Optionally, the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task includes:

training the newly added classification layer according to the first prediction result, a corresponding real classification result in the newly added task and a preset first loss function, wherein the mathematical expression of the first loss function is as follows:

wherein the content of the first and second substances,

in order to be a function of the first loss,

prediction result output for newly added classification layer, y_nAnd outputting a real classification result corresponding to the prediction result output by the newly added classification layer.

Optionally, the step of adding a parameter filter to the optimizer of the intermediate model is followed by:

when a newly added classification layer is trained for one time, a parameter filter is controlled to filter trainable variables in the parameter layer and an original classification layer according to a preset filtering rule, and further the newly added classification layer is trained for one time;

the filtering rules include: and judging whether the parameter attribute of the trainable variable is the freezing attribute, if so, filtering the corresponding trainable variable and keeping the corresponding trainable variable unchanged.

Optionally, the step of unfreezing the parameter layer and the original classification layer includes:

according to a preset unfreezing rule, secondarily updating the parameter attribute of the trainable variable in the parameter layer and the trainable variable in the original classification layer;

and after going through the parameter attributes of the trainable variables in the parameter layer and the original classification layer after secondary updating, judging whether the parameter attributes of the trainable variables in the parameter layer and the original classification layer are thawing attributes, and acquiring a second judgment result so as to finish thawing of the parameter layer and the original classification layer.

Optionally, the step of performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result includes:

acquiring a fourth prediction result output by a newly added classification layer in a secondary prediction process, acquiring a first loss according to the fourth prediction result, a corresponding real classification result and a preset first loss function, and performing secondary training on the newly added classification layer to acquire the newly added classification layer after the secondary training;

distilling the original classification layer according to the second prediction result, the third prediction result and a preset second loss function to obtain a second loss, and performing primary training on the original classification layer to obtain a trained original classification layer;

according to the first loss and the second loss, performing combined training on a parameter layer, a newly-added classification layer after secondary training and an original classification layer after primary training in the intermediate model;

the mathematical expression of the second loss function is:

wherein the content of the first and second substances,

is a second loss function, y'_oA second prediction result output for the original classification layer frozen in the one-time prediction process,

and l is a third prediction result output by the unfrozen original classification layer in the secondary prediction process, and the prediction times of the unfrozen original classification layer.

Optionally, the step of performing joint training on the parameter layer, the newly added classification layer after the secondary training, and the original classification layer after the primary training in the intermediate model according to the first loss and the second loss includes:

according to the first loss and the second loss, a preset third loss function is utilized to carry out combined training on a parameter layer, a newly added classification layer after secondary training and an original classification layer after primary training in the intermediate model, and the mathematical expression of the third loss function is as follows:

wherein the content of the first and second substances,

as a third loss function, θ_sIs a parameter layer, θ₀To the original classification layer, θ_nFor newly adding a classification layer, argmin represents the value of the variable that makes the following equation the minimum, λ₀Is a preset first weight, lambda₁Is a preset second weight value, and the weight value is,

is the second loss, which is the loss of the original classification layer during distillation,

and the first loss is the loss of a new added classification layer in the secondary prediction process.

The invention also provides a multi-task model training system, comprising:

and the newly-added task module is used for acquiring a newly-added task, adding a newly-added classification layer into the original multi-task model according to the newly-added task, and further acquiring an intermediate model, wherein the intermediate model comprises: the device comprises a parameter layer, an original classification layer and a newly added classification layer;

the first training module is used for freezing the parameter layer and the original classification layer; inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task;

the second training module is used for unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model;

the newly added task module, the first training module and the second training module are connected.

The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method as defined in any one of the above.

The present invention also provides an electronic terminal, comprising: a processor and a memory;

the memory is adapted to store a computer program and the processor is adapted to execute the computer program stored by the memory to cause the terminal to perform the method as defined in any one of the above.

The invention has the beneficial effects that: according to the multi-task model training method, the system, the medium and the electronic terminal, the intermediate model is obtained by obtaining the new tasks and adding the new classification layer corresponding to the new tasks into the original multi-task model, wherein the intermediate model comprises the following steps: the parameter layer, the original classification layer and the newly added classification layer are frozen, newly added training sentences in newly added tasks are input into an intermediate model to be subjected to primary prediction, a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer are obtained, the newly added classification layer is subjected to primary training according to the first prediction result and a corresponding real classification result in the newly added tasks, the parameter layer and the original classification layer are unfrozen, the newly added training sentences are input into the intermediate model to be subjected to secondary prediction, a third prediction result output by the original classification layer is obtained, all layers in the intermediate model are subjected to joint training according to the second prediction result and the third prediction result to obtain a final multi-task model, so that the aim of continuously learning new knowledge and completing iteration of the original multi-task model on the basis of keeping old knowledge is achieved, the training efficiency and the iteration efficiency of the model are high, and the accuracy of the model is high. It is to be understood that new knowledge refers to the newly added task and old knowledge refers to the original task of the original multitask model.

Drawings

FIG. 1 is a flowchart illustrating a method for training a multitask model according to an embodiment of the present invention.

FIG. 2 is a schematic flow chart illustrating freezing of a parameter layer and an original classification layer in the multi-task model training method according to the embodiment of the present invention.

Fig. 3 is a schematic flow chart of performing one training on a newly added classification layer in the multi-task model training method in the embodiment of the present invention.

Fig. 4 is a schematic flow chart illustrating the process of unfreezing the parameter layer and the original classification layer in the multi-task model training method according to the embodiment of the present invention.

Fig. 5 is a schematic flow chart of joint training of all layers of the intermediate model in the multi-task model training method in the embodiment of the present invention.

FIG. 6 is a schematic structural diagram of a multitask model training system according to an embodiment of the present invention.

FIG. 7 is a schematic structural diagram of an electronic terminal for multitask model training in an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

As shown in fig. 1, the multi-task model training method in this embodiment includes:

s11: acquiring a newly added task, adding a newly added classification layer into an original multi-task model according to the newly added task, and further acquiring an intermediate model, wherein the intermediate model comprises: parameter layer, original classification layer and new classification layer. The newly added task refers to a newly appeared task, and the newly added task comprises the following steps: newly adding a training sentence and a real classification result corresponding to the newly added training sentence; the original multi-task model is an original model used for predicting a plurality of tasks; the original multitasking model comprises: the system comprises a parameter layer and an original classification layer, wherein the original classification layer is the original classification layer in an original multi-task model, and the parameter layer is the layer except the original classification layer in the original multi-task model; the newly added task corresponds to the newly added classification layer, and after the newly added classification layer is added into the original multi-task model, an intermediate model is formed, wherein the intermediate model comprises: parameter layer, original classification layer and new classification layer. By acquiring the newly added tasks and adding the newly added classification layer corresponding to the newly added tasks into the original multi-task model according to the newly added tasks, the method is convenient for continuously learning new knowledge on the basis of the original multi-task model in the follow-up process, namely continuously learning new tasks, and the iteration and updating efficiency of the multi-task model is improved. For example: the original task in the original multi-task model is 'confuse financing products' and the like, a newly added task is obtained, if the newly added task is 'misleading insurance will not cost', a classification layer corresponding to the 'misleading insurance will not cost' of the newly added task is added in the original multi-task model, and an intermediate model is obtained, wherein the intermediate model comprises: the method comprises a parameter layer, an original classification layer corresponding to an original task and a new classification layer corresponding to a new task which misleads insurance and does not cost money, and further training and iteration of the multi-task model are performed on the basis, so that the iteration efficiency of the multi-task model is improved.

It is understood that the multitask model refers to a model for learning a plurality of tasks at the same time, the structure of the multitask model generally includes a parameter layer of a transform structure and N classification layers corresponding to the tasks, the transform is an attention-based coder-Decoder (Encoder-Decoder) structure, and the multitask model is opposite to the single task model. As time goes forward continuously, new tasks come after, and when a task in a new field appears, the prior art generally needs to integrate old knowledge and new knowledge and retrain a multi-task model, which undoubtedly has huge training cost and longer training time.

S12: freezing the parameter layer and the original classification layer; specifically, freezing the parameter layer and the original classification layer in the intermediate model means that in a subsequent training process, trainable variables of the parameter layer and the original classification layer in the intermediate model are not updated, that is, only participate in forward loss calculation, and do not participate in backward propagation. By freezing the parameter layer and the original classification layer in the intermediate model, the newly added task layer can be conveniently and independently trained in the subsequent one-time training process, and the accuracy of the newly added classification layer is improved.

S13: inputting a newly added training sentence in the newly added task into the intermediate model for primary prediction to obtain a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer; inputting a newly added training sentence in a newly added task into an intermediate model, and performing primary prediction on the newly added training sentence by utilizing a newly added classification layer and an original classification layer respectively to obtain a first prediction result output by the newly added classification layer and a second classification result output by the original classification layer. By obtaining the first prediction result output by the newly added classification layer, the newly added classification layer can be conveniently and independently trained subsequently according to the first prediction result, and the classification accuracy of the newly added classification layer is improved. Because the original classification layer is a trained classification layer, it can be understood that the second prediction result output by the original classification layer is a better prediction, so that knowledge distillation can be conveniently carried out on the original classification layer after subsequent thawing through the second prediction result output by the original classification layer, and the accuracy of the original classification layer is improved.

S14: performing primary training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task; the newly added classification layer is iteratively trained by acquiring the difference between the first prediction result and the corresponding real classification result in the newly added task, so that the independent training of the newly added classification layer is realized, and the accuracy of the newly added classification layer is improved.

S15: unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer; after one-time training is finished, unfreezing the parameter layer and the original classification layer in the intermediate model, enabling trainable variables in the parameter layer and the original classification layer to participate in updating and back propagation in the subsequent training process, inputting a newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer.

S16: and performing joint training on all layers in the intermediate model according to the second prediction result and the third prediction result to obtain a final multi-task model. Knowledge distillation is carried out on the unfrozen original classification layer by obtaining the difference between the second prediction result and the third prediction result, the accuracy of the original classification layer is improved, new knowledge is learned on the basis of keeping the recognition capability of old knowledge, all layers in the intermediate model, namely a parameter layer, the original classification layer and a newly added classification layer, are jointly trained according to the difference between the second prediction result and the third prediction result, a better final multi-task model is obtained, the new knowledge is learned on the basis of keeping the old knowledge, the training efficiency and the iteration efficiency of the multi-task model are effectively improved, the model accuracy is higher, the model training cost is greatly reduced, the integration of the new knowledge and the old knowledge is avoided when a new task occurs, and a multi-task model is retrained by utilizing the integrated knowledge.

In some embodiments, the number of the newly added tasks may be one or more, when there are a plurality of newly added tasks, the newly added tasks are sorted and labeled, and the steps S11, S12, S13, S14, S15, and S16 are repeated according to the sequence of the newly added tasks, so that iteration and update of the multi-task model are completed, the iteration efficiency and the training efficiency of the multi-task model are effectively improved, the model training cost is greatly saved, the model accuracy is high, and the implementability is strong.

In some embodiments, when a newly added task occurs again, the steps S11, S12, S13, S14, S15 and S16 are repeated to complete iteration and updating of the multi-task model, so that the model is high in accuracy, convenient to implement and low in cost.

As shown in fig. 2, in order to better implement freezing of the parameter layer and the original classification layer and ensure that the parameter layer and the original classification layer do not participate in parameter updating and iteration any more in a subsequent training process, the inventor proposes that the step of freezing the parameter layer and the original classification layer includes:

s121: updating the parameter attributes of the trainable variables of the parameter layer and the original classification layer once according to the preset freezing attribute; that is, the parameter attributes of the trainable variables of the parameter layer and the original classification layer are changed to preset freezing attributes, for example: the parameter property (requires-grad) of the trainable variables of the parameter level and the original classification level is set to False.

S122: adding a parameter filter (filter) to an optimizer of the intermediate model;

s123: judging whether the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both freezing attributes or not through the parameter attributes after the trainable variables of the parameter layer and the original classification layer are updated once, and acquiring a first judgment result;

s124: and completing the freezing of the parameter layer and the original classification layer according to the first judgment result. Namely, in order to further ensure that the trainable variables of the parameter layer and the original classification layer are all frozen, the parameter attributes of the trainable variables of the parameter layer and the original classification layer are judged whether to be frozen attribute False or not, if the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both frozen attribute False, the freezing of the trainable variables of the parameter layer and the original classification layer is completed, if the parameter attributes of the trainable variables of the parameter layer and/or the original classification layer are not frozen attribute False, the parameter attributes of the corresponding trainable variables are updated to be frozen attributes, and if the parameter attributes of the trainable variables of the parameter layer and the original classification layer are both frozen attributes, the freezing of the trainable variables of the parameter layer and the original classification layer is completed, thereby ensuring that the trainable variables of the parameter layer and the original classification layer are frozen in the one-time training process of the trainable variables in the parameter layer and the original classification layer, namely, in the process of independent training of the newly added classification layer, the back propagation and updating are not participated any more, and the accuracy of model training is improved.

As shown in fig. 3, in some embodiments, the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task includes:

s141: training the newly added classification layer according to the first prediction result, a corresponding real classification result in the newly added task and a preset first loss function, wherein the mathematical expression of the first loss function is as follows:

wherein the content of the first and second substances,

in order to be a function of the first loss,

prediction result output for newly added classification layer, y_nAnd outputting a real classification result corresponding to the prediction result output by the newly added classification layer. In other words, in the one-time training process, the newly added classification layer is subjected to iterative training by using the preset first loss function, and the prediction accuracy of the newly added classification layer is effectively improved.

In some embodiments, when the newly added classification layer is trained once, the parameter filter is controlled to filter the trainable variables in the parameter layer and the original classification layer according to a preset filtering rule, and further, the newly added classification layer is trained once;

the filtering rules include: and judging whether the parameter attribute of the trainable variable is the freezing attribute, if so, filtering the corresponding trainable variable and keeping the corresponding trainable variable unchanged. For example: judging whether the parameter attribute of the trainable variable of the intermediate model is a frozen attribute False, if so, filtering the trainable variable, and keeping the value of the trainable variable unchanged, thereby freezing the trainable variable of the parameter layer and the original classification layer in one training process, avoiding the trainable variable of the parameter layer and the original classification layer in one training process from participating in parameter updating and iteration, realizing independent training of a newly added classification layer, further realizing learning new knowledge on the basis of original knowledge, namely old knowledge, and improving the efficiency of model training and iteration.

As shown in fig. 4, in order to improve the thawing efficiency of the parameter layer and the original classification layer, and at the same time, in order to avoid errors in the thawing process, the inventors propose that the step of thawing the parameter layer and the original classification layer comprises:

s151: according to a preset unfreezing rule, secondarily updating the parameter attribute of the trainable variable in the parameter layer and the trainable variable in the original classification layer; that is, the parameter attributes of the trainable variables in the parameter layer and the original classification layer are changed to preset thawing attributes, for example: changing the parameter attribute (requires-grad) of the trainable variables of the parameter layer and the original classification layer to True, and finishing secondary updating of the trainable variables of the parameter layer and the original classification layer.

S152: and after going through the parameter attributes of the trainable variables in the parameter layer and the original classification layer after secondary updating, judging whether the parameter attributes of the trainable variables in the parameter layer and the original classification layer are thawing attributes, and acquiring a second judgment result so as to finish thawing of the parameter layer and the original classification layer. In order to further ensure that the trainable variables of the parameter layer and the original classification layer are unfrozen, the parameter attributes of the trainable variables of the parameter layer and the original classification layer are judged whether to be unfreezing attributes True or not, if the parameter attributes of the trainable variables of the parameter layer and the original classification layer are unfreezing attributes True, the unfreezing of the trainable variables of the parameter layer and the original classification layer is completed, if the parameter attributes of the trainable variables in the parameter layer and/or the original classification layer are not unfreezing attributes True, the parameter attributes of the corresponding trainable variables are updated to be the unfreezing attributes, and if the parameter attributes of the trainable variables in the parameter layer and the original classification layer are unfreezing attributes, the unfreezing of the trainable variables of the parameter layer and the original classification layer is completed, so that the continuous training process of the trainable variables in the parameter layer and the original classification layer is ensured, and the method participates in back propagation and parameter updating, and realizes the continuous learning of new knowledge on the basis of old knowledge.

Preferably, in the secondary training process, the parameter filter in the intermediate model can be deleted or removed, so that the model operation load is reduced, and the model operation efficiency is improved.

In some embodiments, the step of jointly training all layers in the intermediate model according to the second prediction result and the third prediction result comprises:

and performing joint training on all layers of the intermediate model by obtaining the difference between the second prediction result and the third prediction result to obtain a final multi-task model.

In order to further improve the accuracy of the final multitask model, the inventor proposes that, in the process of the combined training, on the basis of obtaining the difference between the second prediction result and the third prediction result, the loss of obtaining the newly added classification layer is increased, and then the combined training is performed on all layers of the intermediate model by using the loss of the newly added classification layer and the distillation loss of the original classification layer, as shown in fig. 5, the step of performing the combined training on all layers of the intermediate model by using the loss of the newly added classification layer and the distillation loss of the original classification layer includes:

s161: acquiring a fourth prediction result output by a newly added classification layer in a secondary prediction process, acquiring a first loss according to the fourth prediction result, a corresponding real classification result and a preset first loss function, and performing secondary training on the newly added classification layer to acquire the newly added classification layer after the secondary training; namely, a difference between the fourth prediction result and the corresponding real classification result is obtained by using a preset first loss function, a first loss is obtained, secondary training is performed on the newly added classification layer according to the first loss, the prediction accuracy of the newly added classification layer is improved, the degree of the secondary training performed on the newly added classification layer can be set according to the actual situation, and details are not repeated here.

S162: distilling the original classification layer according to the second prediction result, the third prediction result and a preset second loss function to obtain a second loss, and performing primary training on the original classification layer to obtain a trained original classification layer; namely, the second loss function is utilized to obtain the difference between the second prediction result and the third prediction result, so as to obtain the second loss, the original classification layer is distilled according to the second loss, so as to realize one-time training of the original classification layer, and the degree of one-time training of the original classification layer can be set according to the actual situation, which is not repeated here.

S163: according to the first loss and the second loss, performing combined training on a parameter layer, a newly-added classification layer after secondary training and an original classification layer after primary training in the intermediate model; and performing joint training on all layers of the intermediate model by combining the first loss and the second loss to obtain a final multi-task model with higher accuracy.

The mathematical expression of the second loss function is:

wherein the content of the first and second substances,

and l is a third prediction result output by the unfrozen original classification layer in the secondary prediction process, and the prediction times of the unfrozen original classification layer. By using the above-mentioned secondAnd the loss function is used for distilling the original classification layer, so that the prediction accuracy of the original classification layer can be effectively improved, and the distillation efficiency is good.

In some embodiments, the step of performing joint training on the parameter layer, the newly added classification layer after the secondary training, and the original classification layer after the primary training in the intermediate model according to the first loss and the second loss includes:

s1631: according to the first loss and the second loss, a preset third loss function is utilized to carry out combined training on a parameter layer, a newly added classification layer after secondary training and an original classification layer after primary training in the intermediate model, and the mathematical expression of the third loss function is as follows:

wherein the content of the first and second substances,

as a third loss function, θ_sIs a parameter layer, θ₀To the original classification layer, θ₀For newly adding a classification layer, argmin represents the value of the variable that makes the following equation the minimum, λ₀Is a preset first weight, lambda₁Is a preset second weight value, and the weight value is,

and the first loss is the loss of a new added classification layer in the secondary prediction process. By adopting the third loss function, the combined training and iterative updating are carried out on all layers of the intermediate model, the accuracy and the precision of the obtained final multi-task model are better improved, the training efficiency of the multi-task model is greatly improved, a new model does not need to be retrained, and the learning of new knowledge on the basis of old knowledge is realized.

The first embodiment is as follows:

in the application scenario of the life risk quality inspection, it is usually necessary to determine whether a violation of speech is involved in the process of communicating with the client. As time goes on, a new illegal speech operation usually occurs, if an original multi-task model is adopted for judgment, the corresponding illegal speech operation often cannot be accurately identified, in the prior art, new knowledge (newly added training sentences and corresponding real classification results) corresponding to a new task and old knowledge (training data adopted by the original multi-task model, including original training samples and corresponding real prediction results) corresponding to an old task are usually integrated, and a new multi-task model is retrained, so that the training difficulty is high, the operation is complex, the training time is long, the cost is high, and the model training efficiency is low. Therefore, in this embodiment, an intermediate model is obtained by adding a new classification layer corresponding to a new task to the original multitask model, and the intermediate model includes: the intermediate model includes: the device comprises a parameter layer, an original classification layer and a newly added classification layer; freezing a parameter layer and an original classification layer in the intermediate model, inputting a newly added training statement in a newly added task into the intermediate model for primary prediction, obtaining a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer, and performing primary training on the newly added classification layer according to the first prediction result and a corresponding real classification result in the newly added task; after the primary training is finished, unfreezing the parameter layer and the original classification layer, inputting the newly added training sentence into the intermediate model for secondary prediction, and obtaining a third prediction result output by the original classification layer and a fourth classification result output by the newly added classification layer; performing joint training on all layers in the intermediate model according to the second prediction result, the third prediction result, the fourth prediction result and a real classification result corresponding to the fourth prediction result to obtain a final multi-task model, so that iterative training and updating of the original multi-task model are realized, integration of new knowledge and old knowledge is not needed, and a multi-task model is retrainedType training time. For example: when the newly added task is: when misjudging that the illegal dialect misleads insurance and does not cost money, adding a corresponding newly added classification layer theta in the original multi-task model according to the newly added task₁Obtaining an intermediate model, the intermediate model comprising: newly added classification layer theta₁Parameter layer theta_sAnd the original classification layer theta₀Freezing the parameter layer theta_sAnd the original classification layer theta₀So that the parameter layer theta_sAnd the original classification layer theta₀In the subsequent one-time training process of the newly added classification layer, the newly added training sentences corresponding to the newly added tasks are input into the intermediate model for one-time prediction without participating in back propagation and iteration, and the newly added classification layer theta is obtained₁Outputting the first prediction result and the original classification layer theta₀The output second prediction result is used for training the newly added classification layer once according to the first prediction result and the corresponding real classification result, and because in the process, the parameter layer theta is_sAnd the original classification layer theta₀Has frozen and, therefore, the parameter layer theta_sAnd the original classification layer theta₀Does not participate in parameter updating, and adds a classification layer theta₁After the independent training is finished, the parameter layer theta is processed_sAnd the original classification layer theta₀Unfreezing, inputting the newly added training sentences of the newly added tasks into the intermediate model for secondary prediction to obtain an original classification layer theta₀Outputting the third prediction result and a new classification layer theta₁And the output fourth prediction result is used for performing combined training on all layers in the original multi-task model according to the difference between the second prediction result and the third prediction result and the difference between the fourth prediction result and the corresponding real classification result to obtain a better final multi-task model, so that the training of a new task on the basis of the original multi-task model is better realized, new knowledge is further learned on the basis of old knowledge, the training efficiency is higher, the accuracy is higher, and the model prediction effect is better.

Example two:

in the application scene of intelligent telephone answering or public opinion judgment, when a task for judging new illegal dialogs or target dialogs is needed, an intermediate model is obtained by adding a newly-added classification layer corresponding to the newly-added task into an original multi-task model, the intermediate model comprises a parameter layer, an original classification layer and the newly-added classification layer, the parameter layer and the original classification layer in the intermediate model are further frozen, a newly-added training sentence of the newly-added task is input into the intermediate model for primary prediction, a first prediction result output by the newly-added classification layer and a second prediction result output by the original classification layer are obtained, the newly-added classification layer is trained once by obtaining the difference between the first prediction result and a corresponding real classification result, and after the independent training of the newly-added classification layer is finished, the parameter layer and the original classification layer are unfrozen, so that the parameter layer and the original classification layer participate in parameter updating and iteration in the subsequent training process, inputting a newly added training sentence into the intermediate model for secondary prediction, obtaining a third prediction result output by the original classification layer and a fourth prediction result output by the newly added classification layer, performing combined training on all layers in the intermediate model according to the second prediction result, the third prediction result, the fourth prediction result and a real classification result corresponding to the fourth prediction result, obtaining a final multi-task model, and having higher accuracy, so that when a new task appears, new knowledge corresponding to the newly added task can be trained on the basis of keeping old knowledge of the original multi-task model, and the training efficiency of the multi-task model is effectively improved.

As shown in fig. 6, this embodiment further provides a multitask model training system, which includes:

the newly added task module, the first training module and the second training module are connected. In the multi-task model training system in this embodiment, a newly added task is obtained, and a newly added classification layer corresponding to the newly added task is added to an original multi-task model, so as to obtain an intermediate model, where the intermediate model includes: the parameter layer, the original classification layer and the newly added classification layer are frozen, newly added training sentences in newly added tasks are input into an intermediate model to be subjected to primary prediction, a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer are obtained, the newly added classification layer is subjected to primary training according to the first prediction result and a corresponding real classification result in the newly added tasks, the parameter layer and the original classification layer are unfrozen, the newly added training sentences are input into the intermediate model to be subjected to secondary prediction, a third prediction result output by the original classification layer is obtained, all layers in the intermediate model are subjected to joint training according to the second prediction result and the third prediction result to obtain a final multi-task model, so that the new knowledge is continuously learned on the basis of keeping the old knowledge, and the iteration of the original multi-task model is completed, the training efficiency and the iteration efficiency of the model are high, the model accuracy is high, the cost is low, and the practicability is high.

In some embodiments, freezing the parameter layer and the original classification layer comprises:

adding a parameter filter to an optimizer of the intermediate model;

In some embodiments, the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task includes:

wherein the content of the first and second substances,

in order to be a function of the first loss,

In some embodiments, the step of adding a parametric filter in the optimizer of the intermediate model is followed by:

In some embodiments, the step of unfreezing the parameter layer and the original classification layer comprises:

the mathematical expression of the second loss function is:

wherein the content of the first and second substances,

is a second loss function, y'_oFor freezing during a predictionThe second prediction result output by the original classification layer,

wherein the content of the first and second substances,

The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements any of the methods in the present embodiments.

The present embodiment further provides an electronic terminal, including: a processor and a memory;

the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the method in the embodiment.

Fig. 7 is a schematic structural diagram of an electronic terminal according to an embodiment of the invention. This example provides an electronic terminal, includes: a processor 71, a memory 72, a communicator 73, a communication interface 74, and a system bus 75; the memory 72 and the communication interface 74 are connected with the processor 71 and the communicator 73 through the system bus 75 and are used for mutual communication, the memory 72 is used for storing computer programs, the communication interface 74 is used for communicating with other equipment, and the processor 71 and the communicator 73 are used for running the computer programs so that the electronic terminal can execute the steps of the multi-task model distillation method.

The system bus 75 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

In this embodiment, the Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The computer-readable storage medium in the present embodiment can be understood by those skilled in the art as follows: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device, such as through the internet using an internet service provider.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In summary, in the multitask model training method, the multitask model training system, the medium and the electronic terminal in this embodiment, by obtaining the new tasks, adding a new classification layer corresponding to the new tasks into the original multitask model, and obtaining an intermediate model, where the intermediate model includes: the parameter layer, the original classification layer and the newly added classification layer are frozen, then, the parameter layer and the original classification layer are frozen, newly added training sentences in newly added tasks are input into an intermediate model to be subjected to primary prediction, a first prediction result output by the newly added classification layer and a second prediction result output by the original classification layer are obtained, the newly added classification layer is subjected to primary training according to the first prediction result and a corresponding real classification result in the newly added tasks, then, the parameter layer and the original classification layer are unfrozen, the newly added training sentences are input into the intermediate model to be subjected to secondary prediction, a third prediction result output by the original classification layer is obtained, all layers in the intermediate model are subjected to joint training according to the second prediction result and the third prediction result to obtain a final multi-task model, the continuous learning of new knowledge on the basis of keeping old knowledge is realized, and the iteration of the original multi-task model is completed, the training efficiency and the iteration efficiency of the model are high, the accuracy of the model is high, and the iteration cost of the multi-task model is effectively reduced.

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A multitask model training method is characterized by comprising the following steps:

freezing the parameter layer and the original classification layer;

2. The multitask model training method of claim 1, wherein the step of freezing the parameter layer and the original classification layer comprises:

adding a parameter filter to an optimizer of the intermediate model;

3. The multi-task model training method of claim 1, wherein the step of performing a training on the newly added classification layer according to the first prediction result and the corresponding real classification result in the newly added task comprises:

wherein the content of the first and second substances,

in order to be a function of the first loss,

prediction result output for newly added classification layer, y_nFor adding output of classification layerAnd measuring a real classification result corresponding to the result.

4. The multitask model training method according to claim 2, wherein the step of adding a parameter filter to the optimizer of the intermediate model is followed by:

5. The multitask model training method according to claim 1, wherein the step of unfreezing the parameter layer and the original classification layer comprises:

6. The method for training a multitask model according to claim 1, wherein the step of jointly training all layers in said intermediate model according to said second predicted result and said third predicted result includes:

the mathematical expression of the second loss function is:

wherein the content of the first and second substances,

7. The method for training the multitask model according to claim 6, wherein the step of performing the joint training on the parameter layer, the newly added classification layer after the secondary training and the original classification layer after the primary training in the intermediate model according to the first loss and the second loss comprises:

wherein the content of the first and second substances,

8. A multitask model training system, comprising:

9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.

10. An electronic terminal, comprising: a processor and a memory;

the memory is for storing a computer program and the processor is for executing the computer program stored by the memory to cause the terminal to perform the method of any of claims 1 to 7.