CN115049108A

CN115049108A - Multitask model training method, multitask prediction method, related device and medium

Info

Publication number: CN115049108A
Application number: CN202210552195.5A
Authority: CN
Inventors: 宿嘉颖
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-09-13

Abstract

The embodiment of the specification discloses a multitask model training method, a multitask prediction method, a related device and a medium, wherein the training method comprises the following steps: the parameter weight of the sample feature in the attention module of the mth sub-model is determined, the parameter in the attention module of the (m + 1) th sub-model is determined according to the parameter weight in the attention module of the mth sub-model, and the multitask model can be trained based on the parameter weight in the attention module of the mth sub-model and the parameter in the attention module of the (m + 1) th sub-model. The multi-task model can enable each task in the multi-task to correspond to one sub-model, and the related information of the previous task is transferred to the next task through the attention module between the adjacent tasks, so that the next task can obtain a more accurate and more relevant prediction result by combining the related information of the previous task.

Description

Multitask model training method, multitask prediction method, related device and medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a multitask model training method, a multitask prediction method, a related device, and a medium.

Background

Generally, the task prediction mode may be obtained by inputting data corresponding to a task into a trained model, the model may be obtained by training based on a learning mode of task data, and in the face of a complex task, decomposition may be performed first and then a plurality of tasks are trained respectively, and results corresponding to the plurality of tasks are combined to obtain a final prediction result.

Because there may be a correlation between multiple tasks, the correlation information may be lost in the prediction result obtained by the conventional prediction method, and therefore a technical solution with higher accuracy of the prediction result needs to be provided.

Disclosure of Invention

The embodiment of the specification provides a multitask model training method, a multitask prediction method, a related device and a medium, and the technical scheme is as follows:

in a first aspect, an embodiment of the present specification provides a method for training a multitask model, where the multitask model includes M submodels, each submodel corresponds to a task, and each submodel includes an attention module, and the method includes:

determining the parameter weight of the sample feature in the attention module of the mth sub-model; wherein M is a positive integer less than M;

determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model; the sample characteristics comprise sample user characteristics, sample product characteristics and sample results of tasks corresponding to the (m + 1) th sub-model by the user;

the multitask model is trained based on the sample features, the parameter weights in the attention module of the mth sub-model, and the parameters in the attention module of the m +1 th sub-model.

In a second aspect, an embodiment of the present specification provides a multitask prediction method, where the method is applied to a multitask model, the multitask model includes M submodels, each submodel corresponds to a task, and each submodel includes an attention module, and the method includes:

determining the parameter weight of the target feature in the attention module of the mth submodel; wherein M is a positive integer less than M;

determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model; the target characteristics comprise target user characteristics and target product characteristics;

and obtaining a prediction result of the task corresponding to the m +1 th sub-model based on the target characteristics and the parameters in the attention module of the m +1 th sub-model.

In a third aspect, an embodiment of the present specification provides a multitask model training device, where the multitask model includes M submodels, each submodel corresponds to a task, each submodel includes an attention module, and the multitask model training device includes:

a first processing module for determining a parameter weight of the sample feature in the attention module of the mth sub-model; wherein M is a positive integer less than M;

the second processing module is used for determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model; the sample characteristics comprise sample user characteristics, sample product characteristics and sample results of tasks corresponding to the (m + 1) th sub-model by the user;

and the training module is used for training the multitask model based on the sample characteristics, the parameter weight in the attention module of the mth sub-model and the parameter in the attention module of the (m + 1) th sub-model.

In a fourth aspect, an embodiment of the present specification further provides a multitask model training device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned steps of the multi-tasking model training method.

In a fifth aspect, embodiments of the present specification provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the multitask model training method described above.

In a sixth aspect, an embodiment of the present specification provides a multitask predicting apparatus, where the apparatus is applied to a multitask model, the multitask model includes M submodels, each submodel corresponds to a task, each submodel includes an attention module, and the apparatus includes:

a third processing module for determining the parameter weight of the target feature in the attention module of the mth sub-model; wherein M is a positive integer less than M;

the fourth processing module is used for determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model; the target characteristics comprise target user characteristics and target product characteristics;

and the prediction module is used for obtaining a prediction result of the task corresponding to the (m + 1) th sub-model based on the target characteristics and the parameters in the attention module of the (m + 1) th sub-model.

In a seventh aspect, an embodiment of the present specification further provides a multitask prediction device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned multi-tasking method steps.

In an eighth aspect, the embodiments of the present specification further provide a computer storage medium storing a plurality of instructions, the instructions being adapted to be loaded by a processor and to perform the above-mentioned steps of the multitask prediction method.

The technical scheme provided by some embodiments of the present description brings beneficial effects at least including:

in one or more embodiments of the present specification, when training the multitask model, a parameter weight of a sample feature in an attention module of an mth sub-model may be determined, a parameter in an attention module of an m +1 th sub-model may be determined according to the parameter weight in the attention module of the mth sub-model, and the multitask model may be trained based on the parameter weight in the attention module of the mth sub-model and the parameter in the attention module of the m +1 th sub-model. The multi-task model can enable each task in the multi-task to correspond to one sub-model, and the related information of the previous task is transferred to the next task through the attention module between the adjacent tasks, so that the next task can obtain a more accurate and more relevant prediction result by combining the related information of the previous task.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating a prediction process of a conventional multitasking model according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a prediction flow of a multitask model according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a multitask model training method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a sub-model provided in an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a multitasking model provided by an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a multitasking prediction method according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a multitask model training device provided by an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a multitask predicting device provided by an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of another multitask model training device provided in an embodiment of the present specification;

fig. 10 is a schematic structural diagram of another multitask predicting device provided in an embodiment of the present specification.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present specification.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

When the general life application program is used by a user, recommendation service capable of improving user requirements can be provided for the user according to the action executed by the user. Taking the third-party application program capable of executing the payment function as an example, the user may browse and click the interested related information on the recommendation interface of the third-party application program, and may pay and purchase the interested related information, wherein the whole process of the user may be divided into a plurality of tasks, for example, the user may correspond to an exposure task of the related information interested by the user when browsing the recommendation interface of the third-party application program, the user may correspond to a click task of the related information interested by the user when clicking the interested related information on the recommendation interface of the third-party application program, and the user may correspond to a purchase task of the related information interested by the user when paying and purchasing the interested related information by the third-party application program. It is understood that there is a sequential dependency between the exposure task, the click task and the purchase task, i.e. the click task can continue to occur after the exposure task occurs, the purchase task can continue to occur after the click task occurs, and there is an association between adjacent tasks.

Of course, the number of the plurality of tasks corresponding to the user performing the action may not be limited to the above-mentioned three. For example, for a certain purchasing third-party application program, when ordering the purchasing third-party application program, a user may browse the purchasing third-party application program, then click on an article meeting the requirement, then pick up a discount coupon in a store corresponding to the article meeting the requirement, then approve the article meeting the requirement, and order the approved article meeting the requirement. The system comprises a purchasing third-party application program, a user and a user verification system, wherein the user can correspond to an exposure task of an article meeting user requirements when browsing the purchasing third-party application program, the user can correspond to a click task of the article meeting the user requirements when clicking the article meeting the requirements by the purchasing third-party application program, the user can correspond to a coupon getting task of the article meeting the user requirements when getting a discount coupon in a shop corresponding to the article meeting the requirements, the user can correspond to an approval task of the article meeting the user requirements when approving the article meeting the requirements according to the got discount coupon, and the user can correspond to a task of issuing the article meeting the user requirements when issuing an order after approving the article meeting the requirements.

Based on this, to obtain the prediction result of the user requirement, a model learning manner may be generally adopted to perform learning training on a plurality of tasks corresponding to the user requirement, and the prediction result of each task is combined to obtain the final prediction result.

Specifically, as a common way to predict multiple tasks, the multiple tasks may be divided into multiple tasks, the multiple tasks are learned respectively, and the prediction results corresponding to the multiple tasks are combined to obtain a final prediction result of the multiple tasks. For example, the sample feature vector corresponding to each task may be input to the task model for training, so that the feature vector corresponding to each task is input to the trained task model to obtain the corresponding prediction result. It can be understood that, this time, a task model may be set for each task, respectively, to ensure the accuracy of the prediction result of each task model. Here, taking the example that the multitask can be divided into A, B and C, the input of task a into the trained task model can obtain the corresponding predicted result a, the input of task B into the trained task model can obtain the corresponding predicted result B, and the input of task C into the trained task model can obtain the corresponding predicted result C, and the final predicted result of the multitask can be obtained by combining the predicted result a, the predicted result B, and the predicted result C, for example, but not limited to, a × B × C.

Reference is also made herein to the schematic diagram of the prediction flow of an existing multitask model provided by the embodiment of the present specification shown in fig. 1. As shown in fig. 1, for example, the multitask model includes two submodels, the multitask corresponding to the event includes a first task and a second task, the prediction result of the first task may be represented as a prediction probability that the event is clicked, the prediction result of the second task may be represented as a prediction probability that the event is converted if the event is clicked, and the prediction result of the multitask may be represented as a prediction probability that the event is clicked and then converted. Specifically, when the probability of occurrence of the multitask corresponding to the event is predicted, the feature vector corresponding to the multitask may be input into the first model, so as to obtain a first prediction result corresponding to the first task. The first model may be a sub-model of the multi-task model, which corresponds to the task model of the first task and is obtained by training sample features of known predicted results. The feature vectors corresponding to the plurality of tasks may then be input into a second model to obtain a second predicted result corresponding to a second task. The second model may be a sub-model of the multi-task model, which corresponds to the task model of the second task and is obtained by training sample features of known predicted results. Further, the multi-tasking target prediction result may be obtained by combining a first prediction result corresponding to a first task and a second prediction result corresponding to a second task, for example, by multiplying the first prediction result corresponding to the first task by the second prediction result corresponding to the second task. It is to be understood that, the first prediction result corresponding to the first task and the second prediction result corresponding to the second task are not limited to the order, and for example, the second prediction result corresponding to the second task may be obtained first, and then the first prediction result corresponding to the first task may be obtained, or the first prediction result corresponding to the first task and the second prediction result corresponding to the second task may be obtained simultaneously.

It can be seen that the above-mentioned manner of predicting multiple tasks does not consider strong correlation between tasks, and obtains the final prediction result with loss of related information only by a simple manner of scalar multiplication, which is liable to affect the accuracy of the final prediction result.

Next, to better solve the above technical problems, one or more embodiments of the present specification will be explained.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a prediction flow of a multitask model according to an embodiment of the present disclosure. As shown in fig. 2, the multitask model may include two sub models (which may be represented as a first model and a second model), and the corresponding events may include a first task and a second task, where the first model may correspond to a prediction of an occurrence probability of the first task, the prediction result of the first task may be represented as a prediction probability that the event is clicked, the second model may correspond to a prediction of an occurrence probability of the second task, and the prediction result of the second task may be represented as a prediction probability that the event is clicked and then transformed.

When predicting the probability of occurrence of a multitask corresponding to an event, the feature vector corresponding to the multitask may be input into the first model to obtain a parameter weight in the attention module of the first model. The first model may include a first module and an attention module, and specifically, the feature vector corresponding to the multitask may be input into the first module to obtain first conversion information, and then the first conversion information is input into the attention module to obtain a parameter weight in the attention module of the first model. It is to be understood that here the parameter weights in the attention module of the first model may be used to characterize the parameters corresponding to the association information between the first task and the second task, which may also correspond to part of all the parameters in the attention module of the first model. Taking all the parameters in the attention module of the first model as A, B, C, D and E as an example, the weight of the parameters in the attention module of the first model can be, but is not limited to be (0, 1, 1, 0, 0), that is, the parameters corresponding to the association information between the first task and the second task include B and C.

The feature vector corresponding to the multitask mentioned herein may include, but is not limited to, a user feature and a task product feature, where the user feature may be understood as feature information of a target user who is to execute an event, for example, identity information of the target user, and may specifically include any at least one of a user name, a user category, or a user address. Taking an example that an executed event is applied to a certain purchasing third-party application, the feature information of the target user may be determined by querying user information filled by the target user, where the feature information of the target user may include, in addition to the user name, the user category, and the user address mentioned above, a browsing record or a purchasing record of the target user within a preset time, and may also record a redemption ticket or a product coupon that the user has received, and the embodiment is not limited thereto.

Here, the task product feature may be understood as feature information for characterizing a product corresponding to an event, for example, any at least one of a product name of the event, product production information, or product definition information, which may be understood as information for characterizing a function of the product. Taking the example that the executed event is applied to a certain purchasing third-party application, when a certain product to be purchased is determined, the store where the certain product is located may be queried for characteristic information corresponding to the certain product, such as the aforementioned information of product name, product production time, product function, and the like, and may also record preferential information corresponding to the certain product, which is not limited in this embodiment.

Further, after obtaining the parameter weight in the attention module of the first model, the parameter weight in the attention module of the first model may be passed to the attention module of the second model to determine the parameter in the attention module of the second model according to the parameter weight in the attention module of the first model. The attention module of the second model can adjust the parameters thereof according to the received parameter weight in the attention module of the first model, and the adjusted parameters in the attention module of the second model can have the associated information between the first task and the second task. For example, the parameters in the attention module of the second model can be represented as B, D, E and F, the parameters corresponding to the parameter weights in the attention module of the first model can be represented as B and C, and the adjusted parameters in the attention module of the second model can be represented as B, C, D, E and F. It will be appreciated that the structure of the attention module of the second model may be identical to the structure of the attention module of the first model, but the respective corresponding parameters are different.

It is also understood that in the process of transferring the parameter weight in the attention module of the first model to the attention module of the second model, the transferring may be, but is not limited to, performed by a connection module, for example, the transferring process may be implemented by a full connection layer, so as to ensure the integrity of the parameter weight in the attention module of the first model during the transferring process.

Further, after determining the parameters in the attention module of the second model, the feature vector corresponding to the multitask may be input into the second model, and the probability of occurrence of the multitask corresponding to the event is predicted by the attention module of the second model. Specifically, the feature vector corresponding to the multitask is input into the second module to obtain second conversion information, and then the second conversion information is input into the attention module with the adjusted parameter, so as to directly obtain the probability of occurrence of the multitask corresponding to the event. It can be understood that the prediction result obtained according to the second model is combined with the association information between the first task corresponding to the first model and the second task corresponding to the second model, and compared with the case that the prediction results of the tasks corresponding to the sub models are obtained respectively, and then the final multi-task prediction result is obtained by combining the prediction results of the corresponding tasks, only one final prediction result needs to be obtained in the whole prediction process, so that the data processing time is reduced, and the association information between adjacent tasks can be kept in the final obtained prediction result, so that the final obtained prediction result is more accurate.

Of course, the present embodiment may be, but not limited to, a multitask model including two submodels, where the multitask model mentioned above may include an exposure task, a click task, a coupon task, a verification task, and a next task as an example, the multitask model may further be configured to include five submodels (which may be respectively represented as a first model, a second model, a third model, a fourth model, and a fifth model), where the first model may correspond to a probability of occurrence of a predicted exposure task, the second model may correspond to a probability of occurrence of a predicted click task, the third model may correspond to a probability of occurrence of a coupon task, the fourth model may correspond to a probability of occurrence of a verification task, and the fifth model may correspond to a probability of occurrence of a next task, and each model may include an attention module. Specifically, the feature vectors corresponding to the multiple tasks may be input into the first model to obtain the parameter weights in the attention module of the first model, and the parameter weights in the attention module of the first model may be transmitted to the attention module of the second model to adjust the parameters in the attention module of the second model. Further, the feature vectors corresponding to the multiple tasks may be re-input into the second model to obtain the parameter weights in the attention module of the second model, and the parameter weights in the attention module of the second model may be transmitted to the attention module of the third model to adjust the parameters in the attention module of the third model. Further, the feature vectors corresponding to the multiple tasks may be re-input into the third model to obtain the parameter weight in the attention module of the third model, and the parameter weight in the attention module of the third model is transferred to the attention module of the fourth model to adjust the parameter in the attention module of the fourth model. Further, the feature vectors corresponding to the multiple tasks may be input into the fourth model, the parameter weight in the attention module of the fourth model is obtained, and the parameter weight in the attention module of the fourth model is transferred to the attention module of the fifth model to adjust the parameter in the attention module of the fifth model. Further, the feature vectors corresponding to the multiple tasks may be input into a fifth model, and the final prediction result is obtained by the attention module of the fifth model.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a multitask model training method provided in the embodiment of the present specification.

As shown in fig. 3, the multi-tasking model training method may include at least the following steps:

and step 302, determining the parameter weight of the sample feature in the attention module of the mth sub-model.

When the occurrence probability of the multitask needs to be predicted, a target feature corresponding to the multitask can be input into the trained multitask model mentioned in the embodiment, so that a more accurate prediction result can be obtained. Based on this, the process of training the multitask model mentioned in the present embodiment is particularly important.

Specifically, before the multi-task model is trained, the number of tasks in the multi-task corresponding to the sample features and the sample result corresponding to each task may be determined. The sample characteristics may include sample user characteristics, sample product characteristics, and a sample result of each task, and the sample user characteristics may be understood as sample characteristic information of a user who is to execute an event, for example, sample identity information of the user, and may specifically include any at least one of a user name, a user category, or a user address. The sample product characteristics may be sample characteristic information for characterizing a product corresponding to an event, for example, any at least one of a sample product name of the event, sample product production information, or sample product definition information, which may be information for characterizing a product function. It will be appreciated that the demand for a sample product may correspond to a plurality of tasks, and that each task may correspond to a known sample result, which may be, but is not limited to, a character 0 when the corresponding task is determined not to have occurred and a character 1 when the corresponding task is determined to have occurred.

After the number of tasks in the multitask corresponding to the sample characteristics and the sample result corresponding to each task are determined, the parameter weight in the attention module of the model corresponding to the mth task in the multitask can be obtained according to the sample user characteristics and the sample product characteristics. The multi-task model of this embodiment may include M sub-models, each sub-model may include an attention module, each task in the multi-task corresponding to the sample feature may correspond to one sub-model, and two adjacent sub-models may correspond to each other between adjacent tasks. For example, a first task in the multitask corresponding to the sample feature may correspond to a first sub-model in the multitask model, a second task in the multitask corresponding to the sample feature may correspond to a second sub-model in the multitask model, and an mth task in the multitask model corresponding to the sample feature may correspond to an mth sub-model in the multitask model, that is, the parameter weight in the attention module of the model corresponding to the mth task in the multitask model may be understood as the parameter weight in the attention module of the mth sub-model in the multitask model. Here, M may be a positive integer greater than M, for example, when M is 2, M may be a positive integer greater than 2.

It is understood that, based on the above-mentioned sample features, the mth task in the multitask model may correspond to the mth sub-model in the multitask model, and the parameter weight in the attention module in the mth sub-model may be used to characterize the parameter corresponding to the association information between the mth task and the (m + 1) th task, which may also correspond to a part of the parameters of all the parameters in the attention module of the sub-model corresponding to the mth task. Taking the example that all the parameters in the attention module of the sub-model corresponding to the mth task can be represented as A, B, C, D and E, the weight of the parameter in the attention module of the sub-model corresponding to the mth task can be represented as (0, 1, 1, 0, 0) but is not limited to (0, 1, 1, 0, 0), that is, the parameters corresponding to the association information between the mth task and the m +1 th task include B and C.

It is further understood that the task corresponding to the mth sub-model in this embodiment may be any one of the first task through the second last task of the multiple tasks corresponding to the sample feature. Possibly, when the number of the multitask corresponding to the sample feature is 5, the value of m may be 1 or 2 or 3 or 4. Possibly, when the number of the multitasks corresponding to the sample feature is 2, the value of m may only be 1.

And step 304, determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model.

Specifically, after determining the parameter weight in the attention module of the mth sub-model, the parameter weight in the attention module of the mth sub-model may be passed to the attention module of the adjacent (m + 1) th sub-model to adjust the parameter in the attention module of the (m + 1) th sub-model. Here, for example, the parameters in the attention module of the m +1 th model can be represented as B, D, E and F, the parameters corresponding to the parameter weights in the attention module of the m-th model can be represented as B and C, and the parameters in the adjusted attention module of the m +1 th model can be represented as B, C, D, E and F.

It can be understood that the attention module included in each sub-model in the multitasking model of the embodiment may have the same structure, and the difference is that the parameters of the attention module included in each sub-model are different.

Step 306, training the multitask model based on the sample characteristics, the parameter weight in the attention module of the mth sub-model and the parameter in the attention module of the (m + 1) th sub-model.

Specifically, after determining the parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m + 1) th sub-model, the sample user characteristics and the sample product characteristics in the sample characteristics may be input into the (m + 1) th sub-model, and the occurrence probability of the (m + 1) th task may be obtained through the attention module of the (m + 1) th sub-model. It can be understood that the occurrence probability of the (m + 1) th task obtained in this way includes the correlation information between the first m tasks, and the accuracy is higher compared with the occurrence probability of the (m + 1) th task obtained in the prior art.

Further, after the occurrence probability of the (m + 1) th task is obtained, the (m + 1) th sub-model and the (m + 1) th sub-model of the multi-task model can be subjected to parameter optimization training by combining the sample result of the (m + 1) th task in the sample characteristics, so that the occurrence probability of the (m + 1) th task is closer to the sample result of the (m + 1) th task in the sample characteristics. It can be understood that, in the process of training the multi-task model, as the parameter in the mth sub-model changes, the parameter weight in the attention module of the mth sub-model also changes, and then the parameter in the attention module of the (m + 1) th sub-model also changes, so that the probability of occurrence of the (m + 1) th task is closer to the sample result of the (m + 1) th task in the sample feature.

It should be noted that, when m is any positive integer greater than 1, the parameter weights in the attention modules of the respective corresponding sub-models can be sequentially obtained according to the arrangement order of each task, and then the parameters in the attention module of the m +1 th sub-model can be determined according to the parameter weights in the attention module of the m-th sub-model, and all the m +1 sub-models can be trained in the training process, so as to improve the training efficiency. For example, when m is 3, the parameter weight in the attention module of the first submodel, the parameter weight in the attention module of the second submodel, and the parameter weight in the attention module of the third submodel may be sequentially determined, the parameter in the attention module of the fourth submodel may be determined according to the parameter weight in the attention module of the third submodel to obtain the occurrence probability of the fourth task, and the parameter optimization training may be performed on the four submodels according to the occurrence probability of the fourth task and the sample result corresponding to the fourth task.

In an embodiment of the present specification, the multitask model may correspond each task in the multitask to one sub-model, and migrate the associated information of the previous task to the next task through the attention module between adjacent tasks, so that the next task may combine the associated information of the previous task to obtain a more accurate and more relevant prediction result.

As an optional option of this embodiment, the mth sub-model further includes an embedding module, a combining module, and a first converting module;

determining the parameter weight of the sample feature in the attention module of the mth submodel, comprising:

inputting the sample user characteristics and the sample product characteristics into an embedding module to obtain characteristic vectors corresponding to the sample user characteristics and the sample product characteristics respectively;

inputting the feature vectors respectively corresponding to the sample user features and the sample product features into a combination module to obtain a combination feature vector;

inputting the combined feature vector to a first conversion module to obtain first conversion information;

and inputting the first conversion information into an attention module of the mth sub-model to obtain the parameter weight of the sample characteristics in the mth sub-model.

Specifically, reference may be made to fig. 4, which is a schematic structural diagram of a sub-model provided in an embodiment of the present disclosure. As shown in fig. 4, the schematic structural diagram may be used to represent a schematic structural diagram of an mth sub-model in this embodiment, and the mth sub-model may sequentially include an embedding module, a combining module, a first converting module, and an attention module according to a connection order. The embedding module can be used for inputting sample user characteristics and sample product characteristics in the sample characteristics to obtain characteristic vectors corresponding to the sample user characteristics and the sample product characteristics respectively. It is understood that the sample user characteristics can be expressed as the identity characteristics of the sample user, such as but not limited to including the user name, the user category, the user address, and the like, and the sample product characteristics can be expressed as sample characteristic information for characterizing the product corresponding to the event, such as but not limited to including the sample product name of the event, the sample product production information, and the sample product definition information, which can be understood as information for characterizing the product function. The Embedding module in this embodiment may be, but is not limited to, an Embedding layer (also called Embedding layer) in the speech processing technology, a corpus including a dictionary and characters may be preset in the Embedding module, and each character may correspond to one chinese character in the dictionary, for example, the chinese character "you" may correspond to character 1, and the chinese character "good" may correspond to character 3. When the sample user characteristics and the sample product characteristics are input into the embedding module, each Chinese character in the sample user characteristics and the sample product characteristics can be converted into a corresponding character according to the corpus, and the characteristic vectors corresponding to the sample user characteristics and the sample product characteristics are output to the combining module in a matrix form. It can also be understood that the order of inputting the sample user features and the sample product features may not be limited, and it is possible that the sample user features are input into the embedding module to obtain the feature vector corresponding to the sample user features, and then the sample product features are input into the embedding module to obtain the feature vector corresponding to the sample product features. Possibly, the sample product features can be input into the embedding module to obtain feature vectors corresponding to the sample product features, and then the sample user features are input into the embedding module to obtain feature vectors corresponding to the sample user features. Possibly, the sample user characteristics and the sample product characteristics can be simultaneously input into the embedding module, and the characteristic vector corresponding to the sample user characteristics and the characteristic vector corresponding to the sample product characteristics are simultaneously obtained.

Further, after the feature vectors respectively corresponding to the sample user feature and the sample product feature are input to the combining module, the combined feature vector may be output by the combining module. The combined feature vector may be understood as a combination of a feature vector corresponding to the sample user feature and a feature vector corresponding to the sample product feature, where the feature vector corresponding to the sample user feature may be represented as [ a, B ], the feature vector corresponding to the sample product feature may be represented as [ C, D, E ], and the combined feature vector obtained by the combining module may be represented as [ a, B, C, D, E ].

Further, after obtaining the combined feature vector according to the combining module, the combined feature vector may be input into the first conversion module to obtain the first conversion information. The first conversion module can be understood as a neural network corresponding to the mth task and is used for obtaining vector information capable of representing the corresponding features of the mth task according to the combined feature vector. It is understood that the first conversion module may be, but is not limited to, any one of Deep Neural Networks (DNNs), Deep Interest learning Networks (DIN), or Deep recommendation-Machine based Neural Networks (Deep fms).

Further, after obtaining the first conversion information according to the first conversion module, the first conversion information may be input to the attention module, and the parameter weight of the sample feature in the mth sub-model may be learned. The attention module can learn the vector information of the features corresponding to the correlation information between the mth task and the (m + 1) th task in the first conversion information to obtain the parameter weight corresponding to the vector information of the features corresponding to the correlation information between the mth task and the (m + 1) th task. It is understood that the attention module herein can be, but is not limited to, applying an attention mechanism (i.e., attention mechanism) in the common technology, and the present embodiment is not limited thereto.

As another optional option of this embodiment, the (m + 1) th sub-model further includes an embedding module, a combining module, and a second converting module;

training the multitask model based on the sample features, the parameter weight in the attention module of the mth sub-model and the parameters in the attention module of the m +1 th sub-model, including:

inputting the combined feature vector to a second conversion module to obtain second conversion information;

inputting the second conversion information into the attention module of the (m + 1) th sub-model, and obtaining a prediction result of a task corresponding to the (m + 1) th sub-model according to the second conversion information and parameters in the attention module of the (m + 1) th sub-model;

and training the multi-task model according to the prediction result of the task corresponding to the (m + 1) th sub-model and the sample result of the task corresponding to the (m + 1) th sub-model by the user.

Specifically, a schematic structural diagram of a multitask model provided by the embodiment of the present specification and shown in fig. 5 may be referred to herein. As shown in fig. 5, the multitask model may include an mth submodel and an m +1 th submodel, the mth submodel may correspond to an mth task of the multitask corresponding to the sample feature, and the m +1 th submodel may correspond to an m +1 th task of the multitask corresponding to the sample feature. The m-th sub-model can sequentially comprise an embedding module, a combination module, a first conversion module and an attention module according to a connection sequence, and the m + 1-th sub-model can sequentially comprise an embedding module, a combination module, a second conversion module and an attention module according to a connection sequence. It can be understood that, here, the embedding module in the mth sub-model and the embedding module in the m +1 th sub-model may be the same embedding module (may also be understood as sharing the same embedding module), and this way, the association information between the mth task and the m +1 th task may be effectively retained. Similarly, the combination module in the mth sub-model and the combination module in the m +1 th sub-model may be the same combination module, so as to ensure the reliability of the associated information between the mth task and the m +1 th task.

In the (m + 1) th sub-model, the embedding module may be configured to input a sample user feature and a sample product feature in the sample features, and obtain feature vectors corresponding to the sample user feature and the sample product feature, respectively. It is understood that the sample user characteristics can be expressed as the identity characteristics of the sample user, such as but not limited to including the user name, the user category, the user address, and the like, and the sample product characteristics can be expressed as sample characteristic information for characterizing the product corresponding to the event, such as but not limited to including the sample product name of the event, the sample product production information, and the sample product definition information, which can be understood as information for characterizing the product function. The Embedding module in this embodiment may be, but is not limited to, an Embedding layer (also called Embedding layer) in the speech processing technology, a corpus including a dictionary and characters may be preset in the Embedding module, and each character may correspond to one chinese character in the dictionary, for example, the chinese character "you" may correspond to character 1, and the chinese character "good" may correspond to character 3. When the sample user characteristics and the sample product characteristics are input into the embedding module, each Chinese character in the sample user characteristics and the sample product characteristics can be converted into a corresponding character according to the corpus, and the characteristic vectors corresponding to the sample user characteristics and the sample product characteristics are output to the combining module in a matrix form. It can also be understood that the order of inputting the sample user features and the sample product features may not be limited, and it is possible that the sample user features are input into the embedding module to obtain the feature vector corresponding to the sample user features, and then the sample product features are input into the embedding module to obtain the feature vector corresponding to the sample product features. Possibly, the sample product features can be input into the embedding module to obtain feature vectors corresponding to the sample product features, and then the sample user features are input into the embedding module to obtain feature vectors corresponding to the sample user features. Possibly, the sample user characteristics and the sample product characteristics can be simultaneously input into the embedding module, and the characteristic vector corresponding to the sample user characteristics and the characteristic vector corresponding to the sample product characteristics are simultaneously obtained.

Further, after the combined feature vector is obtained according to the combining module, the combined feature vector may be input to the second conversion module to obtain second conversion information. The second conversion module can be understood as a neural network corresponding to the (m + 1) th task and is used for obtaining vector information capable of representing the corresponding features of the (m + 1) th task according to the combined feature vector. It is understood that the second conversion module may be, but is not limited to, any one of Deep Neural Networks (DNNs), Deep Interest learning Networks (DIN), or Deep recommendation Networks (Deep fm).

Further, after obtaining the second conversion information according to the second conversion module, the second conversion information may be input to the attention module, and the prediction result of the (m + 1) th task may be obtained according to the parameter adjusted in the attention module and the activation function sigmoid in the attention module. It can be understood that the parameters of the attention module in the m +1 th sub-model may include the correlation information with the mth task, and the probability of occurrence of the m +1 th task obtained by the attention module in the m +1 th sub-model has the correlation with the mth task, so that the prediction result has higher reliability and accuracy compared with the conventional techniques.

It is further understood that a full-connection module may be disposed between the attention module in the m +1 th sub-model and the attention module in the m-th sub-model, and after the parameter weight in the attention module of the m-th sub-model is determined, the parameter weight in the attention module of the m-th sub-model may be input into the full-connection module to obtain connection information including the parameter weight in the attention module of the m-th sub-model. It should be noted here that the fully connected module in this embodiment may be configured to reserve the parameter weight in the attention module of the mth sub-model and transmit the parameter weight in the attention module of the mth sub-model, so as to effectively ensure the integrity of the parameter weight in the attention module of the mth sub-model. After the fully-connected module obtains the connection information including the parameter weight in the attention module of the mth sub-model, the fully-connected module may input the connection information including the parameter weight in the attention module of the mth sub-model into the attention module of the m +1 th sub-model, so that the parameter in the attention module of the m +1 th sub-model is adjusted in combination with the parameter weight in the attention module of the mth sub-model.

Further, after the prediction result of the task corresponding to the (m + 1) th sub-model is obtained, the parameter in the multi-task model can be optimally trained by combining the prediction result of the task corresponding to the (m + 1) th sub-model and the sample result of the user for the task corresponding to the (m + 1) th sub-model until the prediction result of the task corresponding to the (m + 1) th sub-model approaches the sample result of the user for the task corresponding to the (m + 1) th sub-model. It is understood that, in this embodiment, the multi-tasking model may be trained through the prediction result and the sample result of the (m + 1) th sub-model, for example, the multi-tasking model may also be trained through the prediction result and the sample result of the (m) th sub-model, but is not limited thereto.

It should be noted that the above-mentioned multitask model may include, but is not limited to, only the mth submodel and the m +1 th submodel, and may further include, for example, a first submodel, a second submodel.. m +2 th submodel, an m +3 th submodel, and the like, each submodel may correspond to one task of the corresponding multitask of the sample feature, and each submodel may sequentially include, according to a connection order, an embedding module, a combining module, a conversion module corresponding to each submodel, and an attention module. It will be appreciated that a fully connected module may be provided between the attention modules of any two adjacent submodels, so as to transfer the parameter weights in the attention module of the previous submodel to the attention module of the next submodel.

As another optional option of this embodiment, after obtaining the prediction result of the task corresponding to the (m + 1) th sub-model according to the second conversion information and the parameter of the (m + 1) th sub-model, the method further includes:

obtaining a first loss function according to the prediction result of the task corresponding to the (m + 1) th sub-model and the sample characteristics;

the (m + 1) th sub-model is optimized based on the first loss function.

Specifically, in the process of training the multi-task model, after the prediction result of the task corresponding to the (m + 1) th sub-model is obtained, the (m + 1) th sub-model may be optimized by calculating a cross entropy loss function (i.e., corresponding to the first loss function mentioned above). The calculation method of the cross entropy loss function is shown as the following formula (1):

in the formula (1), L _ce Can be expressed as a cross entropy loss function of the (m + 1) th sub-model, theta is a parameter of the (m + 1) th model, N is the number of sample features, D is a sample set, and (x, y) _t ) For a group of samples in the sample set D, x can be expressed as sample user characteristics and sample product characteristics corresponding to the m +1 st task in the sample, y _t Can be expressed as a sample result corresponding to the user at the m +1 st task,

can be expressed as the predicted result of the task corresponding to the (m + 1) th sub-model. Here y _t The value of (d) may be, but is not limited to, 1 or 0, where 0 may indicate that the (m + 1) th task is not completed, and 1 may indicate that the (m + 1) th task is completed.

As another optional option of this embodiment, after obtaining the first loss function according to the prediction result of the task corresponding to the (m + 1) th sub-model and the sample feature, before optimizing the (m + 1) th sub-model based on the first loss function, the method further includes:

judging whether the prediction result of the task corresponding to the (m + 1) th sub-model meets a preset condition or not;

when the prediction result of the task corresponding to the (m + 1) th sub-model is determined not to meet the preset condition, the prediction result of the task corresponding to the m-th sub-model is obtained according to the first conversion information and the parameters in the attention module of the m-th sub-model;

obtaining a second loss function according to the prediction result of the task corresponding to the mth sub-model, the prediction result of the task corresponding to the (m + 1) th sub-model and the sample characteristics;

optimizing the (m + 1) th sub-model based on the first loss function, comprising:

and optimizing the (m + 1) th sub-model based on the first loss function and the second loss function.

According to the practical situation, the m +1 th task can be completed only when the m-th task is completed, that is, the prediction result corresponding to the m-th task is higher than that corresponding to the m +1 th task. Based on this, whether the prediction result corresponding to the (m + 1) th task meets the requirement or not can be judged, and then parameter optimization is performed on the (m + 1) th sub-model according to the judgment result.

Specifically, after the cross entropy loss function of the (m + 1) th sub-model is calculated, the predicted result of the task corresponding to the (m + 1) th sub-model may be compared with a preset result, and when it is determined that the predicted result of the task corresponding to the (m + 1) th sub-model is higher than the preset result, the predicted result of the (m) th task is obtained according to the parameter in the attention module of the (m) th sub-model and the activation function sigmoid in the attention module, and then the predicted result of the task corresponding to the (m + 1) th sub-model, and the sample feature are combined to obtain the probability calibration loss function (i.e., the aforementioned second loss function). Before the prediction result of the mth task is obtained, parameters in the attention module of the mth sub-model are determined according to the parameter weights in the attention module of the (m-1) th sub-model, then the characteristics of the sample user and the characteristics of the sample product are input into the mth sub-model, and the prediction result of the mth task is obtained through an activation function sigmoid in the attention module sequentially passing through the embedding module, the combining module, the first conversion module and the attention module.

It is to be understood that the above-mentioned preset result may be determined according to a sample result corresponding to the user at the mth task, for example, but not limited to, the preset result may be set to be smaller than the sample result corresponding to the user at the mth task.

Here, the calculation method of the probability calibration loss function can be obtained by the following formula (2):

in the formula (2), L _le Probability correction representable as m +1 th sub-modelThe function of the quasi-loss is,

may be expressed as a predicted result of the task corresponding to the mth submodel,

can be expressed as the predicted result of the task corresponding to the (m + 1) th sub-model. Herein when

Is greater than

When the task is not in accordance with the actual situation, the probability calibration loss function needs to be introduced to calibrate the multi-task model.

For convenience of understanding, when the (m + 1) th sub-model is optimized based on the first loss function and the second loss function, a target loss function of the (m + 1) th sub-model may be determined according to the first loss function and the second loss function, and then the (m + 1) th sub-model is optimized through the target loss function. The target loss function of the (m + 1) th sub-model can be calculated here by the following equation (3):

L(θ)＝L _ce (θ)+αL _le (θ) (3)

l (theta) in the formula (3) can be expressed as an objective loss function of the (m + 1) th sub-model, L _ce (θ) can be expressed as the cross-entropy loss function, L, of the m +1 th sub-model _le (θ) may be expressed as a probabilistic calibration loss function for the (m + 1) th sub-model, and α may be expressed as a calibration parameter, with a larger α indicating a higher weight for the probabilistic calibration loss function for the (m + 1) th sub-model.

It is understood that the above formula (3) may also perform parameter optimization on any other submodel in the multitask model, and the embodiment is not limited thereto.

It should be noted that, in order to further improve the parameter optimization effect on the m +1 th sub-model of the multi-task model, a mean square error module may be further added to the m +1 th sub-model and the m +1 th sub-model, when the sample user characteristics and the sample product characteristics pass through the embedding module, the sample user characteristics and the sample product characteristics may be abstracted into mapping functions, the mean square error loss function of the m +1 th sub-model is constructed based on the mapping function of the m sub-model and the mapping function of the m +1 th sub-model, and the m +1 th sub-model is subjected to parameter optimization by combining the aforementioned cross entropy loss function, probability calibration loss function, and the mean square error loss function.

The mean square error loss function of the (m + 1) th sub-model can be calculated by the following formula (4):

l in formula (4) _mse Can be expressed as the mean square error loss function of the (m + 1) th sub-model, f _t (x _i ) Can be expressed as a mapping function of the (m + 1) th sub-model, f _t-1 (x _i ) Can be expressed as a mapping function for the mth submodel.

Here, the above equation (3) and the mean square error loss function can be combined to obtain the target loss function of the new m +1 th sub-model by the following equation (5):

L(θ)＝L _ce (θ)+αL _le (θ)+γL _mse (5)

l (theta) in the formula (5) can be expressed as an objective loss function of the (m + 1) th sub-model, L _ce (θ) can be expressed as the cross-entropy loss function, L, of the m +1 th sub-model _le (θ) can be expressed as the probabilistic calibration loss function, L, of the m +1 th sub-model _mse Can be expressed as a mean square error loss function of the (m + 1) th sub-model, alpha can be expressed as a calibration parameter, gamma can be expressed as an error parameter, and gamma can be any natural number.

As another optional option of this embodiment, after obtaining the prediction result of the task corresponding to the mth sub-model according to the first conversion information and the parameter weight of the mth sub-model, the method further includes:

obtaining a third loss function according to the prediction result of the task corresponding to the mth submodel and the sample characteristics;

the mth submodel is optimized based on the third loss function.

In the process of training the multitask model, the (m + 1) th sub-model can be subjected to parameter optimization, and any one sub-model in the multitask model can be subjected to parameter optimization, so that the accuracy of a prediction result of the multitask model is further guaranteed.

Specifically, after the prediction result of the task corresponding to the mth sub-model is obtained, the mth sub-model may be optimized by calculating a cross entropy loss function (i.e., corresponding to the third loss function mentioned above), where the calculation method of the cross entropy loss function may refer to the above formula (1), and is not described herein.

Referring to fig. 6, fig. 6 is a flowchart illustrating a multitask prediction method according to an embodiment of the present disclosure.

As shown in fig. 6, the multitask prediction method may include at least the following steps:

step 602, determining the parameter weight of the target feature in the attention module of the mth sub-model.

Specifically, the target feature may include a target user feature and a target product feature, where the target user feature may be understood as target feature information of a user who is to execute an event, for example, target identity information of the user, and may specifically include any at least one of a user name, a user category, or a user address. The target product feature may be understood as target feature information for characterizing a product corresponding to the event, for example, at least one of a target product name of the event, target product production information, or target product definition information, which may be understood as information for characterizing a product function.

After the target user characteristics and the target product characteristics are determined, the parameter weight in the attention module of the model corresponding to the mth task in the multiple tasks can be obtained according to the target user characteristics and the target product characteristics. The multitasking model of the embodiment may include M submodels, each submodel may include an attention module, each task of the multitasking corresponding to the target feature may correspond to one submodel, and two adjacent submodels may correspond to each adjacent task. For example, a first task of the multiple tasks corresponding to the target feature may correspond to a first sub-model of the multiple task model, a second task of the multiple tasks corresponding to the target feature may correspond to a second sub-model of the multiple task model, and an mth task of the multiple tasks corresponding to the target feature may correspond to an mth sub-model of the multiple task model. Here, M may be a positive integer greater than M, for example, when M is 2, M may be a positive integer greater than 2.

It is understood that, based on the above-mentioned target feature, the mth task in the multitask model may correspond to the mth submodel in the multitask model, and the parameter weight in the attention module in the mth submodel may be used to characterize the parameter corresponding to the association information between the mth task and the (m + 1) th task, which may also correspond to a part of the parameters of all the parameters in the attention module of the submodel corresponding to the mth task. Taking the example that all the parameters in the attention module of the sub-model corresponding to the mth task can be represented as A, B, C, D and E, the weight of the parameter in the attention module of the sub-model corresponding to the mth task can be represented as (0, 1, 1, 0, 0) but is not limited to (0, 1, 1, 0, 0), that is, the parameters corresponding to the association information between the mth task and the m +1 th task include B and C.

It is further understood that the task corresponding to the mth sub-model in this embodiment may be any one of the first task to the second last task in the multiple tasks corresponding to the target feature. Possibly, when the number of the multitasks corresponding to the target feature is 5, the value of m may be 1 or 2 or 3 or 4. Possibly, when the number of the multitasks corresponding to the target feature is 2, the value of m may only be 1.

And step 604, determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model.

It is understood that the attention module included in each sub-model in the multitasking model of the present embodiment may have the same structure, and the difference is that the parameters of the attention module included in each sub-model are different.

And 606, obtaining a prediction result of the task corresponding to the (m + 1) th sub-model based on the target characteristics and the parameters in the attention module of the (m + 1) th sub-model.

In this embodiment, the mth sub-model may sequentially include the embedding module, the combining module, the first conversion module, and the attention module according to a connection order, and the m +1 th sub-model may sequentially include the embedding module, the combining module, the second conversion module, and the attention module according to a connection order. It can be understood that, here, the embedding module in the mth sub-model and the embedding module in the m +1 th sub-model may be the same embedding module (may also be understood as sharing the same embedding module), and this way, the association information between the mth task and the m +1 th task may be effectively retained. Similarly, the combination module in the mth sub-model and the combination module in the m +1 th sub-model may be the same combination module, so as to ensure the reliability of the associated information between the mth task and the m +1 th task.

Specifically, in the (m + 1) th sub-model, the embedding module may be configured to input a target user feature and a target product feature in the target features, and obtain feature vectors corresponding to the target user feature and the target product feature, respectively. It is understood that the target user feature may be represented as an identity feature of the target user, such as but not limited to including a user name, a user category, a user address, and the like, and the target product feature may be represented as target feature information for characterizing a product corresponding to the event, such as but not limited to including a target product name of the event, target product production information, and target product definition information, which may be understood as information for characterizing a function of the product. The Embedding module in this embodiment may be, but is not limited to, an Embedding layer (also called Embedding layer) in the speech processing technology, a corpus including a dictionary and characters may be preset in the Embedding module, and each character may correspond to one chinese character in the dictionary, for example, the chinese character "you" may correspond to character 1, and the chinese character "good" may correspond to character 3. When the target user characteristics and the target product characteristics are input into the embedding module, each Chinese character in the target user characteristics and the target product characteristics can be converted into corresponding characters according to the corpus, and the characteristic vectors corresponding to the target user characteristics and the target product characteristics are output to the combining module in a matrix form. It can also be understood that the order of inputting the target user features and the target product features may not be limited, and it is possible that the target user features are input into the embedding module to obtain the feature vector corresponding to the target user features, and then the target product features are input into the embedding module to obtain the feature vector corresponding to the target product features. Possibly, the target product features can be input into the embedding module to obtain feature vectors corresponding to the target product features, and then the target user features are input into the embedding module to obtain feature vectors corresponding to the target user features. Possibly, the target user characteristics and the target product characteristics can be simultaneously input into the embedding module, and the characteristic vector corresponding to the target user characteristics and the characteristic vector corresponding to the target product characteristics are simultaneously obtained.

Further, after the feature vectors respectively corresponding to the target user feature and the target product feature are input to the combining module, the combined feature vector may be output by the combining module. The combined feature vector may be understood as a combination of a feature vector corresponding to the target user feature and a feature vector corresponding to the target product feature, where the feature vector corresponding to the target user feature may be represented as [ a, B ], and the feature vector corresponding to the target product feature may be represented as [ C, D, E ], and the combined feature vector obtained by the combining module may be represented as [ a, B, C, D, E ].

Referring to fig. 7, fig. 7 is a schematic structural diagram illustrating a multitask model training device provided in the embodiment of the present specification.

The multitask model comprises M submodels, each submodel corresponds to one task, and each submodel comprises an attention module. As shown in fig. 7, the multitask model training device 700 may further include at least a first processing module 701, a second processing module 702 and a training module 703, wherein:

a first processing module 701 for determining the parameter weight of the sample feature in the attention module of the mth sub-model; wherein M is a positive integer less than M;

a second processing module 702, configured to determine parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the mth sub-model; the sample characteristics comprise sample user characteristics, sample product characteristics and sample results of tasks corresponding to the (m + 1) th sub-model by the user;

a training module 703, configured to train the multitask model based on the sample features, the parameter weight in the attention module of the mth sub-model, and the parameter in the attention module of the m +1 th sub-model.

In some possible embodiments, the mth sub-model further comprises an embedding module, a combining module, and a first transformation module;

the first processing module 701 includes:

the first embedding unit is used for inputting the sample user characteristics and the sample product characteristics into the embedding module to obtain characteristic vectors respectively corresponding to the sample user characteristics and the sample product characteristics;

the first combination unit is used for inputting the feature vectors respectively corresponding to the sample user features and the sample product features into the combination module to obtain a combined feature vector;

the first conversion unit is used for inputting the combined feature vector to the first conversion module to obtain first conversion information;

and the first generation unit is used for inputting the first conversion information to the attention module of the mth sub-model to obtain the parameter weight of the sample characteristics in the mth sub-model.

In some possible embodiments, a full connection module is arranged between the attention module of the mth sub-model and the attention module of the (m + 1) th sub-model;

the second processing module 702 includes:

the connecting unit is used for inputting the parameter weight in the attention module of the mth sub-model into the full-connection module to obtain connection information containing the parameter weight in the attention module of the mth sub-model;

and the second generation unit is used for inputting the connection information into the attention module of the (m + 1) th sub-model to obtain parameters in the attention module of the (m + 1) th sub-model.

In some possible embodiments, the (m + 1) th sub-model further comprises an embedding module, a combining module, and a second conversion module;

the training module 703 includes:

the second embedding unit is used for inputting the sample user characteristics and the sample product characteristics into the embedding module to obtain characteristic vectors respectively corresponding to the sample user characteristics and the sample product characteristics;

the second combination unit is used for inputting the feature vectors respectively corresponding to the sample user features and the sample product features into the combination module to obtain a combined feature vector;

the second conversion unit is used for inputting the combined feature vector to the second conversion module to obtain second conversion information;

the third generation unit is used for inputting the second conversion information into the attention module of the (m + 1) th sub-model and obtaining a prediction result of a task corresponding to the (m + 1) th sub-model according to the second conversion information and parameters in the attention module of the (m + 1) th sub-model;

and the training unit is used for training the multi-task model according to the prediction result of the task corresponding to the (m + 1) th sub-model and the sample result of the task corresponding to the (m + 1) th sub-model by the user.

In some possible embodiments, the training module 703 further comprises:

the first calculation unit is used for obtaining a first loss function according to the prediction result of the task corresponding to the (m + 1) th sub-model and the sample characteristics after obtaining the prediction result of the task corresponding to the (m + 1) th sub-model according to the second conversion information and the parameters of the (m + 1) th sub-model;

and the first optimization unit is used for optimizing the (m + 1) th sub-model based on the first loss function.

In some possible embodiments, the training module 703 further includes:

the judging unit is used for judging whether the prediction result of the task corresponding to the (m + 1) th sub-model meets a preset condition or not before the (m + 1) th sub-model is optimized based on the first loss function after the first loss function is obtained according to the prediction result of the task corresponding to the (m + 1) th sub-model and the sample characteristics;

the second calculating unit is used for obtaining the prediction result of the task corresponding to the mth sub-model according to the first conversion information and the parameters in the attention module of the mth sub-model when the prediction result of the task corresponding to the (m + 1) th sub-model is determined not to meet the preset condition;

the third calculation unit is used for obtaining a second loss function according to the prediction result of the task corresponding to the mth sub-model, the prediction result of the task corresponding to the (m + 1) th sub-model and the sample characteristics;

the first optimization unit is specifically configured to:

In some possible embodiments, the training module 703 further includes:

the fourth calculating unit is used for obtaining a third loss function according to the prediction result of the task corresponding to the mth sub-model and the sample characteristics after the prediction result of the task corresponding to the mth sub-model is obtained according to the first conversion information and the parameter weight of the mth sub-model;

a second optimization unit for optimizing the mth sub-model based on a third loss function.

Referring to fig. 8, fig. 8 is a schematic structural diagram illustrating a multitask predicting device according to an embodiment of the present disclosure.

As shown in fig. 8, the multitask predicting apparatus 800 is applied to a multitask model, the multitask model includes M submodels, each submodel corresponds to a task, each submodel includes an attention module, and the multitask predicting apparatus 800 may include at least a third processing module 801, a fourth processing module 802, and a predicting module 803, where:

a third processing module 801 for determining the parameter weight of the target feature in the attention module of the mth sub-model; wherein M is a positive integer less than M;

a fourth processing module 802, configured to determine parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the mth sub-model; the target characteristics comprise target user characteristics and target product characteristics;

the prediction module 803 is configured to obtain a prediction result of the task corresponding to the (m + 1) th sub-model based on the target feature and the parameters in the attention module of the (m + 1) th sub-model.

Referring to fig. 9, fig. 9 is a schematic structural diagram illustrating another multitask model training device provided in the embodiment of the present specification.

The multitask model comprises M submodels, each submodel corresponds to one task, and each submodel comprises an attention module. As shown in fig. 9, the multitask model training device 900 may further include: at least one processor 901, at least one network interface 904, a user interface 903, memory 905, and at least one communication bus 902.

The communication bus 902 can be used for realizing the connection communication of the above components.

The user interface 903 may include keys, and the selectable user interfaces may include standard wired interfaces and wireless interfaces.

The network interface 904 may include, but is not limited to, a bluetooth module, an NFC module, a Wi-Fi module, and the like.

Processor 901 may include one or more processing cores, among other things. The processor 901 interfaces with various interfaces and circuitry throughout the multitasking model training device 900 to perform various functions and process data of the multitasking model training device 900 by executing or executing instructions, programs, code sets or instruction sets stored in the memory 905 and by invoking data stored in the memory 905. Optionally, the processor 901 may be implemented in at least one hardware form of DSP, FPGA, and PLA. The processor 901 may integrate one or a combination of several of a CPU, a GPU, a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 901, but may be implemented by a single chip.

The memory 905 may include a RAM or a ROM. Optionally, the memory 905 includes a non-transitory computer readable medium. The memory 905 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 905 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described method embodiments, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 905 may optionally be at least one memory device located remotely from the processor 901. As shown in FIG. 9, memory 905, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a multitasking model training application.

In particular, processor 901 may be configured to invoke a multitask model training application stored in memory 905 and specifically perform the following operations:

when the processor 901 determines the parameter weight of the sample feature in the attention module of the mth sub-model, it specifically performs:

inputting the feature vectors corresponding to the sample user features and the sample product features to a combination module to obtain a combination feature vector;

when the processor 901 determines the parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the mth sub-model, it specifically executes:

inputting the parameter weight in the attention module of the mth sub-model to the full-connection module to obtain connection information containing the parameter weight in the attention module of the mth sub-model;

and inputting the connection information into the attention module of the (m + 1) th sub-model to obtain parameters in the attention module of the (m + 1) th sub-model.

when the processor 901 trains the multitask model based on the sample characteristics, the parameter weight in the attention module of the mth sub-model and the parameter in the attention module of the m +1 th sub-model, specifically:

In some possible embodiments, after obtaining the predicted result of the task corresponding to the m +1 th sub-model according to the second conversion information and the parameter of the m +1 th sub-model, the processor 901 is further configured to perform:

the (m + 1) th sub-model is optimized based on the first loss function.

In some possible embodiments, after obtaining the first loss function according to the predicted result and the sample feature of the task corresponding to the (m + 1) th sub-model, before optimizing the (m + 1) th sub-model based on the first loss function, the processor 901 is further configured to:

when the prediction result of the task corresponding to the (m + 1) th sub-model does not meet the preset condition, obtaining the prediction result of the task corresponding to the mth sub-model according to the first conversion information and the parameters in the attention module of the mth sub-model;

optimizing the (m + 1) th sub-model based on a first loss function, comprising:

In some possible embodiments, after obtaining the predicted result of the task corresponding to the mth sub-model according to the first conversion information and the weight parameter of the mth sub-model, the processor 901 is further configured to perform:

the mth submodel is optimized based on the third loss function.

Referring to fig. 10, fig. 10 is a schematic structural diagram illustrating another multitask predicting device according to an embodiment of the present disclosure.

The multitask predicting device 1000 is applied to a multitask model which comprises M submodels, each submodel corresponds to one task, and each submodel comprises one attention module. As shown in fig. 10, the multitask predicting device 1000 may further include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, and at least one communication bus 1002.

The communication bus 1002 can be used for implementing connection communication of the above components.

The user interface 1003 may include keys, and the optional user interface may also include a standard wired interface or a wireless interface.

The network interface 1004 may include, but is not limited to, a bluetooth module, an NFC module, a Wi-Fi module, and the like.

Processor 1001 may include one or more processing cores, among other things. The processor 1001, using various interfaces and lines to connect various parts throughout the multitask predicting device 1000, executes various functions of the multitask predicting device 1000 and processes data by running or executing instructions, programs, code sets or instruction sets stored in the memory 1005 and calling data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of DSP, FPGA, or PLA. The processor 1001 may integrate one or a combination of several of a CPU, GPU, modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.

The memory 1005 may include a RAM or a ROM. Optionally, the memory 1005 includes a non-transitory computer readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a multitasking prediction application program.

In particular, the processor 1001 may be configured to invoke a multitasking prediction application program stored in the memory 1005 and specifically perform the following operations:

Embodiments of the present specification also provide a computer-readable storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more of the steps in the embodiments of fig. 3 or 6 described above. The above-mentioned respective constituent modules of the electronic device may be stored in the computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the embodiments of the present specification are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. The technical features in the present examples and embodiments may be arbitrarily combined without conflict.

The above-mentioned embodiments are only described as preferred embodiments of the present disclosure, and do not limit the scope of the present disclosure, and various modifications and improvements of the technical solution of the present disclosure made by those skilled in the art without departing from the design spirit of the present disclosure should fall within the protection scope defined by the claims of the present disclosure.

Claims

1. A multi-task model training method, wherein the multi-task model comprises M sub-models, each sub-model corresponds to a task, each sub-model comprises an attention module, and the method comprises the following steps:

training the multitask model based on the sample features, the parameter weights in the attention module of the mth sub-model, and the parameters in the attention module of the m +1 th sub-model.

2. The method of claim 1, the mth submodel further comprising an embedding module, a combining module, and a first transformation module;

the determining the parameter weight of the sample feature in the attention module of the mth sub-model comprises:

inputting the sample user characteristics and the sample product characteristics to the embedding module to obtain characteristic vectors respectively corresponding to the sample user characteristics and the sample product characteristics;

inputting the feature vectors respectively corresponding to the sample user features and the sample product features into the combination module to obtain a combined feature vector;

inputting the combined feature vector to the first conversion module to obtain first conversion information;

and inputting the first conversion information into an attention module of the mth sub-model to obtain the parameter weight of the sample feature in the mth sub-model.

3. The method of claim 1, wherein a full connection module is disposed between the attention module of the mth sub-model and the attention module of the m +1 th sub-model;

the determining the parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model comprises:

4. The method of claim 2, the m +1 th sub-model further comprising the embedding module, the combining module, and a second transformation module;

the training the multitask model based on the sample features, the parameter weight in the attention module of the mth sub-model and the parameter in the attention module of the m +1 th sub-model comprises:

inputting the feature vectors respectively corresponding to the sample user features and the sample product features into the combination module to obtain the combination feature vector;

inputting the combined feature vector to the second conversion module to obtain second conversion information;

inputting the second conversion information into an attention module of the (m + 1) th sub-model, and obtaining a prediction result of a task corresponding to the (m + 1) th sub-model according to the second conversion information and parameters in the attention module of the (m + 1) th sub-model;

5. The method according to claim 4, after obtaining the predicted result of the task corresponding to the (m + 1) th sub-model according to the second conversion information and the parameter of the (m + 1) th sub-model, the method further comprises:

optimizing the m +1 th sub-model based on the first loss function.

6. The method according to claim 5, wherein after obtaining the first loss function according to the predicted result of the task corresponding to the (m + 1) th sub-model and the sample feature, and before optimizing the (m + 1) th sub-model based on the first loss function, the method further comprises:

when the prediction result of the task corresponding to the (m + 1) th sub-model is determined not to meet the preset condition, obtaining the prediction result of the task corresponding to the m-th sub-model according to the first conversion information and the parameters in the attention module of the m-th sub-model;

the optimizing the m +1 th sub-model based on the first loss function includes:

optimizing the m +1 th sub-model based on the first loss function and the second loss function.

7. The method according to claim 6, after obtaining the predicted result of the task corresponding to the mth sub-model according to the first conversion information and the weight parameter of the mth sub-model, the method further comprising:

obtaining a third loss function according to the prediction result of the task corresponding to the mth sub-model and the sample characteristics;

optimizing the mth sub-model based on the third loss function.

8. A multitask prediction method is applied to a multitask model, the multitask model comprises M submodels, each submodel corresponds to a task, each submodel comprises an attention module, and the method comprises the following steps:

determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the (m) th sub-model; wherein the target characteristics comprise target user characteristics and target product characteristics;

and obtaining a prediction result of the task corresponding to the (m + 1) th sub-model based on the target feature and the parameters in the attention module of the (m + 1) th sub-model.

9. A multitask model training device, the multitask model includes M submodels, each submodel corresponds to a task, each submodel includes an attention module, the multitask model training device includes:

a first processing module for determining a parametric weight of the sample feature in the attention module of the mth sub-model; wherein M is a positive integer less than M;

the second processing module is used for determining parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the mth sub-model; the sample characteristics comprise sample user characteristics, sample product characteristics and sample results of tasks corresponding to the (m + 1) th sub-model by a user;

a training module for training the multi-tasking model based on the sample features, the parameter weights in the attention module of the mth sub-model, and the parameters in the attention module of the m +1 th sub-model.

10. A multitask predicting apparatus, said apparatus being applied to a multitask model, said multitask model including M submodels, each of said submodels corresponding to a respective task, each of said submodels including a respective attention module, said apparatus comprising:

a third processing module for determining a parameter weight of the target feature in the attention module of the mth submodel; wherein M is a positive integer less than M;

a fourth processing module, configured to determine parameters in the attention module of the (m + 1) th sub-model according to the parameter weights in the attention module of the mth sub-model; wherein the target characteristics comprise target user characteristics and target product characteristics;

11. A multitask model training device comprises a processor and a memory;

the processor is connected with the memory;

the memory for storing executable program code;

the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the method of any one of claims 1-7.

12. A multitasking predicting device comprising a processor and a memory;

the processor is connected with the memory;

the memory for storing executable program code;

the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the method of claim 8.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.