CN115081630A

CN115081630A - Training method of multi-task model, information recommendation method, device and equipment

Info

Publication number: CN115081630A
Application number: CN202211015938.1A
Authority: CN
Inventors: 王震; 张文慧; 吴志华; 于佃海
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-09-20
Also published as: WO2024040869A1

Abstract

The disclosure provides a multi-task model training method, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of deep learning, cloud computing, multi-task parallel processing and data searching. The specific implementation scheme is as follows: inputting sample behavior data of a sample object into a sharing sub-model to obtain behavior characteristic information of the sample object, wherein the behavior characteristic information comprises a plurality of behavior characteristic sub-information; obtaining a sub-loss value of the task processing sub-model according to the behavior characteristic sub-information related to the task processing sub-model; determining a target gradient value corresponding to the sub-loss value according to the sub-loss value of the task processing sub-model; determining a weight value corresponding to the sub-loss value according to the target gradient values; and training the multitask model according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values. The disclosure also provides an information recommendation method, an information recommendation device, electronic equipment and a storage medium.

Description

Training method of multi-task model, information recommendation method, device and equipment

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular, to the field of deep learning, cloud computing, multitask parallel processing, data search, and the like. More specifically, the present disclosure provides a training method of a multitask model, an information recommendation method, an apparatus, an electronic device and a storage medium.

Background

With the development of artificial intelligence techniques, deep learning models can be used to handle multiple different tasks simultaneously. In the training process, in order to enable the parameters learned by the deep learning model to have certain applicability to each task, the learning speed of each task can be adjusted.

Disclosure of Invention

The disclosure provides a training method of a multitask model, an information recommendation method, a device, equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a method for training a multitask model, the multitask model including a plurality of task processing submodels and a sharing submodel, the method including: inputting sample behavior data of a sample object into a sharing sub-model to obtain behavior characteristic information of the sample object, wherein the behavior characteristic information comprises a plurality of behavior characteristic sub-information; obtaining a sub-loss value of the task processing sub-model according to the behavior characteristic sub-information related to the task processing sub-model; determining a target gradient value corresponding to the sub-loss value according to the sub-loss value of the task processing sub-model; determining a weight value corresponding to the sub-loss value according to the target gradient values; and training the multitask model according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values.

According to another aspect of the present disclosure, there is provided an information recommendation method including: inputting target behavior data of a target object into a multi-task model to obtain a plurality of output results; and recommending target information to the target object according to the plurality of output results, wherein the multitask model is trained according to the method provided by the disclosure.

According to another aspect of the present disclosure, there is provided a training apparatus of a multitask model including a plurality of task processing submodels and a sharing submodel, the apparatus including: the system comprises a first obtaining module, a sharing sub-model and a second obtaining module, wherein the first obtaining module is used for inputting sample behavior data of a sample object into the sharing sub-model to obtain behavior characteristic information of the sample object, and the behavior characteristic information comprises a plurality of behavior characteristic sub-information; the second obtaining module is used for obtaining a sub-loss value of the task processing sub-model according to the behavior characteristic sub-information related to the task processing sub-model; the first determining module is used for determining a target gradient value corresponding to the sub-loss value according to the sub-loss value of the task processing sub-model; the second determining module is used for determining a weight value corresponding to the sub-loss value according to the target gradient values; and the training module is used for training the multitask model according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values.

According to another aspect of the present disclosure, there is provided an information recommendation apparatus including: the third obtaining module is used for inputting the target behavior data of the target object into the multitask model to obtain a plurality of output results; and the recommending module is used for recommending target information to the target object according to the output results, wherein the multitask model is trained according to the device provided by the disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow diagram of a method of training a multitasking model according to one embodiment of the present disclosure;

FIG. 2 is a schematic diagram of obtaining sub-penalty values for a task processing sub-model according to one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of determining target gradient values corresponding to sub-loss values, according to one embodiment of the present disclosure;

FIG. 4 is a flow diagram of a method of training a multitask model according to one embodiment of the present disclosure;

FIG. 5 is a flow diagram of an information recommendation method according to another embodiment of the present disclosure;

FIG. 6 is a block diagram of a training apparatus for a multitask model according to one embodiment of the present disclosure;

fig. 7 is a block diagram of an information recommendation device according to another embodiment of the present disclosure; and

FIG. 8 is a block diagram of an electronic device to which a training method and/or an information recommendation method of a multitask model may be applied, according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The deep learning framework may provide a developer with a relevant model interface. The deep learning framework can also provide an auxiliary tool for analyzing scene problems from the perspective of developers, so as to improve the development efficiency and the development experience. The deep learning framework may include, for example, PaddlePaddle (PaddlePaddle), tensor flow (Tensorflow), and so forth. Among these frameworks, for example, the propeller framework may support deep learning models and machine learning models.

For deep learning models, the learning form of the model may include supervised learning and unsupervised learning, among others. Supervised learning may include, for example, multitask learning. The task of optimizing more than one objective function may be referred to as multi-task learning. What corresponds to multi-task learning is single task learning. For single-task learning, the optimization of an objective function can be studied. In order to implement multi-task learning, it is necessary to process data of a plurality of tasks and balance learning speed among the plurality of tasks.

The multi-task learning has wide application range and can be applied to the technical fields of natural language processing, computer vision, information recommendation and the like. In some embodiments, the behavioral data of an object may relate to multiple tasks, exemplified by the application of multi-task learning in the field of information recommendation. For example, for short videos, determining a video playing completion rate, determining a click rate, determining an attention rate, determining a like rate, and the like may be respectively taken as one task.

In some embodiments, multitask learning may be implemented by a multitask model. In the process of training the multi-task model, a certain balance can be kept among different tasks, so that the parameters learned by the multi-task model have certain applicability to each task. For example, to balance the learning speed of different tasks, weights of the tasks may be set. However, during training, the gradient changes of different submodels for performing multiple tasks are difficult to represent with a constant. Even if the gradient changes of some submodels can be represented by constants, multiple sets of experiments are required for one submodel to be debugged. Thus, the time cost of training the multitask model is high.

FIG. 1 is a flow diagram of a method of training a multitask model according to one embodiment of the present disclosure.

As shown in fig. 1, the method 100 may include operations S110 to S150.

In the disclosed embodiments, the multitasking model includes a plurality of task processing submodels and a sharing submodel. For example, each task processing sub-model is used to process one task. As another example, the shared submodel may be used to extract features of the sample data.

In operation S110, sample behavior data of the sample object is input into the shared sub-model, and behavior feature information of the sample object is obtained.

In the disclosed embodiments, the sample behavior data may be behavior data of the sample object within one sample period. For example, during a sample period, historical behavior performed by a sample object on one or more pieces of historical information may be collected. In one example, the historical information may be, for example, a short video that the sample object has viewed, and the historical behaviors may include, for example, clicking, focusing on, like, commenting, and the like.

In the embodiment of the present disclosure, the behavior feature information includes a plurality of behavior feature sub information.

For example, one behavior feature sub-information may correspond to one behavior.

In operation S120, a sub loss value of the task processing sub-model is obtained according to the behavior feature sub-information related to the task processing sub-model.

In the embodiment of the present disclosure, the behavior characteristic sub-information may be processed by using the task processing sub-model to obtain an output result. And determining a sub-loss value of the task processing sub-model according to the output result.

For example, the output result may indicate a probability that the sample object performed an action on the sample information. In one example, the sample information may be a short video that has not been viewed by the sample object.

For example, the sub-loss value of the task processing sub-model may be determined by using various loss functions according to the output result based on the supervised learning, the semi-supervised learning, or the unsupervised learning.

In operation S130, a target gradient value corresponding to the sub-loss value is determined according to the sub-loss value of the task processing sub-model.

In the disclosed embodiment, the target gradient value may be one gradient value of the shared submodel.

For example, the task processing sub-model may include at least one task processing layer. As another example, the shared submodel may include at least one shared layer

For example, at least one gradient value of the task processing submodel may be determined according to the sub-penalty value of the task processing submodel, and at least one gradient value of the sharing submodel may also be determined. A target gradient value corresponding to the sub-loss value may be determined according to at least one gradient value of the shared sub-model.

For example, any one of at least one gradient value of the shared submodel may be taken as the target gradient value.

In operation S140, a weight value corresponding to the sub-loss value is determined according to the plurality of target gradient values.

In the embodiments of the present disclosure, from a plurality of target gradient values and target gradient values corresponding to sub-loss values, a weight value corresponding to a sub-loss value may be determined.

For example, various calculations may be performed based on a plurality of target gradient values to obtain a gradient value calculation result. The various operations may include, for example, summing, multiplying, and the like. For example, a weight value corresponding to a sub-loss value can be obtained by performing various calculations based on the calculation result of the gradient value and the target gradient value corresponding to the sub-loss value.

In operation S150, a multitask model is trained according to a plurality of sub-loss values and a plurality of weight values respectively corresponding to the plurality of sub-loss values.

In the embodiment of the present disclosure, the loss value may be obtained according to a plurality of sub-loss values and a plurality of weight values respectively corresponding to the plurality of sub-loss values. Based on the loss values, a multitask model may be trained.

For example, the loss value may be obtained by performing various calculations based on a plurality of sub-loss values and a plurality of weight values respectively corresponding to the plurality of sub-loss values. In one example, the various operations may include weighted summation, weighted averaging, and the like.

For example, based on a gradient descent algorithm or a back propagation algorithm, the multi-tasking model may be trained based on the loss values.

According to the embodiment of the disclosure, the final loss value is determined according to the target gradient values, the difference of learning speeds among different tasks is concerned, the learning speeds of different tasks can be effectively balanced, and the parameters learned by the multi-task model have certain applicability to different tasks. The multitask model obtained by the method has stronger multitask parallel processing capacity, improves the data processing efficiency under the condition that hardware resources are preset or limited, and can recommend more accurate information according to behavior data of the object.

It is to be understood that the training method of the multitask model of the present disclosure has been described in detail above. The manner of obtaining the sub-penalty value of the task processing sub-model in the present disclosure will be described in detail below with reference to the related embodiment and fig. 2.

FIG. 2 is a schematic diagram of obtaining a sub-penalty value for a task processing sub-model according to one embodiment of the present disclosure.

As shown in fig. 2, the multitasking model may include a sharing sub-model 210 and n task processing sub-models. n is an integer greater than 1. The n task processing submodels may include, for example, a 1 st task processing submodel 221, a 2 nd task processing submodel 222, … …, and an nth task processing submodel 223. In one example, n may be 3.

In the embodiment of the present disclosure, the sample behavior data of the sample object may be input into the shared sub-model to obtain the behavior feature information of the sample object. The behavior feature information may include n pieces of behavior feature sub information.

For example, the shared submodel 210 may include multiple shared layers. The plurality of sharing layers can be a 1 st sharing layer, a 2 nd sharing layer, … …, an mth sharing layer, and m is an integer greater than 1. The 1 st sharing layer can process the sample behavior data to obtain 1 st initial behavior feature information. The 2 nd sharing layer can process the 1 st initial behavior feature information to obtain the 2 nd initial behavior feature information. … … the mth sharing layer can process the (m-1) th initial behavior feature information to obtain the mth initial behavior feature information. The mth initial behavior feature information may be taken as the behavior feature information of the sample object.

For example, the n behavior feature sub-information may include 1 st behavior feature sub-information, 2 nd behavior feature sub-information, … …, and nth behavior feature sub-information.

In the embodiment of the present disclosure, the sub-loss value of the task processing sub-model may be obtained according to the behavior feature sub-information related to the task processing sub-model.

In the embodiment of the present disclosure, the behavior characteristic sub-information related to the task processing sub-model may be input into the task processing sub-model to obtain an output result of the task processing sub-model. According to the output result, the sub-loss value of the task processing sub-model can be obtained.

For example, the 1 st behavior feature sub-information may be input into the 1 st task processing sub-model 221, resulting in the 1 st output result 2211. The 1 st output 2211 may indicate a predicted probability that the sample object performed the 1 st action. In one example, from the 1 st output result 2211 and the 1 st subtag of the sample behavior data, the 1 st sub-loss value 2212 may be determined using various loss functions. The 1 st sub-label may indicate a true probability that the sample object performs the 1 st action. The various loss functions may include, for example, a Cross Entropy (CE) loss function, an L1 loss function, and so forth.

For example, the 2 nd behavior feature sub-information may be input into the 2 nd task processing sub-model 222, resulting in the 2 nd output result 2221. The 2 nd output result 2221 may indicate the predicted probability that the sample object performed the 2 nd action. In one example, from the 2 nd output result 2221 and the 2 nd subtags of the sample behavior data, the 2 nd sub-loss value 2222 may be determined using various loss functions. The 2 nd sub-label may indicate a true probability that the sample object performs the 2 nd action.

For example, the nth behavior feature sub-information may be input into the nth task processing sub-model 223 to obtain the nth output result 2231. The nth output 2231 may indicate a predicted probability that the sample object performed the nth behavior. In one example, the nth sub-loss value 2232 is determined using various loss functions based on the nth output result 2231 and the nth sub-label of the sample behavior data. The nth subtag may indicate a true probability that the sample object performs the nth behavior.

In the embodiment of the present disclosure, for one sample information and sample behavior data 201, manual labeling may be performed to obtain a label for the sample information. The tag may include n sub-tags. The n sub-tags may include, for example, the 1 st sub-tag, the 2 nd sub-tag, … …, and the nth sub-tag described above.

It will be appreciated that some embodiments of obtaining sub-penalty values for a task processing sub-model are described in detail above. A manner of determining the target gradient value corresponding to the sub-loss value will be described in detail below with reference to the related embodiment and fig. 3.

Fig. 3 is a schematic diagram of determining target gradient values corresponding to sub-loss values according to one embodiment of the present disclosure.

As shown in FIG. 3, the multitasking model may include a sharing submodel 310 and n task processing submodels. The n task processing submodels may include, for example, a 1 st task processing submodel 321, a 2 nd task processing submodel 322, … …, and an nth task processing submodel 323. In one example, n may be 3.

The sharing submodel 310 may also include multiple sharing layers. The plurality of sharing layers can be a 1 st sharing layer, a 2 nd sharing layer, … …, an mth sharing layer, and m is an integer greater than 1. As described above, the 1 st sharing layer may process the sample behavior data to obtain the 1 st initial behavior feature information. The 2 nd sharing layer can process the 1 st initial behavior feature information to obtain the 2 nd initial behavior feature information. … … the mth sharing layer can process the (m-1) th initial behavior feature information to obtain the mth initial behavior feature information. The mth initial behavior feature information may be taken as the behavior feature information of the sample object.

In the disclosed embodiment, after determining the sub-penalty value of the task processing sub-model, based on various means (e.g., back propagation), a target gradient value corresponding to the penalty value may be determined. It can be understood that the above detailed descriptions about the 1 st sub-loss value 2212, the 2 nd sub-loss values 2222, … …, and the nth sub-loss value 2232 can also be applied to the present embodiment, and the detailed descriptions of the disclosure are omitted here.

For example, based on the back propagation algorithm, from the 1 st sub-penalty value 3212 and the parameters of the 1 st task processing sub-model 321, a gradient value of the 1 st task processing sub-model 321 may be determined. Next, according to the parameters of the shared sub-model 310, m first gradient values of the shared sub-model 310 may be determined. Of the m first gradient values, the first gradient value of the 1 st shared layer may be taken as a 1 st target gradient value 3213. It is to be understood that, in determining the first gradient value of the 1 st shared layer, the related parameter of the 1 st shared layer may be used.

For example, based on a back propagation algorithm, from the 2 nd sub-loss value 3222 and the parameters of the 2 nd task processing sub-model 322, a gradient value of the 2 nd task processing sub-model 322 may be determined. Next, according to the parameters of the shared sub-model 310, m second gradient values of the shared sub-model 310 may be determined. Among the m second gradient values, the second gradient value of the 1 st shared layer may be taken as a 2 nd target gradient value 3223. It is to be understood that, in determining the second gradient value of the 1 st shared layer, the related parameter of the 1 st shared layer may be used.

For example, based on a back propagation algorithm, a gradient value of the nth task processing submodel 323 may be determined according to the nth sub-penalty value 3232 and parameters of the nth task processing submodel 323. Next, according to the parameters of the shared sub-model 310, m nth gradient values of the shared sub-model 310 may be determined. Among the m nth gradient values, the nth gradient value of the 1 st shared layer may be taken as an nth target gradient value 3233. It is understood that, in determining the nth gradient value of the 1 st shared layer, the related parameter of the 1 st shared layer may be used.

It is understood that during the forward propagation, the 1 st sharing layer may process the sample behavior data. In the back propagation process, the 1 st sharing layer may be the last sharing layer of the plurality of sharing layers.

In other embodiments, any one of the m first gradient values may also be used as the 1 st target gradient value. Any one of the m second gradient values may also be used as the 2 nd target gradient value. … … may also use any of the n-th gradient values as the n-th target gradient value.

It will be appreciated that some embodiments of determining target gradient values corresponding to sub-loss values are described in detail above. The determination of the weight values corresponding to the sub-loss values will be described in detail below with reference to the related embodiment and fig. 4.

FIG. 4 is a flow diagram of a method of training a multitasking model according to one embodiment of the present disclosure.

It is understood that the operations S110 and S120 described above may also be applied to the present embodiment.

In the embodiment of the present disclosure, the sample behavior data of the sample object may be input into the shared sub-model, so as to obtain the behavior feature information of the sample object. For example, the behavior feature information includes a plurality of behavior feature sub information.

In the embodiment of the disclosure, the sub-loss value of the task processing sub-model is obtained according to the behavior characteristic sub-information related to the task processing sub-model. For example, the 1 st sub loss value loss _1 may be obtained. The 1 st sub penalty value loss _1 may be a sub penalty value of the 1 st task processing sub model. For example, the 2 nd sub loss value loss _2 may be obtained. The 2 nd sub penalty value loss _2 may be a sub penalty value of the 2 nd task processing sub model. For example, the nth sub loss value loss _ n may be obtained. The nth sub penalty value loss _ n may be a sub penalty value of the nth task processing sub model. It can be understood that the above-mentioned manner of obtaining the 1 st sub-loss value 2212, the 2 nd sub-loss values 2222, … …, and the nth sub-loss value 2312 may also be applied to the present embodiment, and the details of the present disclosure are not repeated herein.

Next, operation S430, operation S440, and operation S451 may be performed.

In operation S430, a target gradient value corresponding to the sub-loss value is determined.

For example, the 1 st target gradient value grad _1 corresponding to the 1 st sub-loss value loss _1 may be determined according to the 1 st sub-loss value loss _ 1. For another example, the 2 nd target gradient value grad _2 corresponding to the 2 nd sub loss value loss _2 may be determined according to the 2 nd sub loss value loss _ 2.… …, for example, an nth target gradient value grad _ n corresponding to the nth sub-loss value loss _ n may be determined according to the nth sub-loss value loss _ n.

It is to be understood that the related parameter para _ last _ shared _ layer of the 1 st shared layer of the shared sub-model may be used in determining the 1 st target gradient value grad _1, the 2 nd target gradient value grad _2, … …, and the nth target gradient value grad _ n.

It is understood that the above-mentioned manners of determining the 1 st target gradient value 3213, the 2 nd target gradient values 3223, … …, and the nth target gradient value 3233 may also be applied to this embodiment, and the details of the disclosure are not repeated herein.

In operation S440, a weight value corresponding to the sub-loss value is determined.

In the embodiment of the present disclosure, the weight value corresponding to the sub-loss value may be determined according to a plurality of target gradient values.

For example, a process parameter value may be determined based on a plurality of target gradient values.

For example, the processing parameter value may be determined from the 1 st, 2 nd, and nth target gradient values grad _1, 2 nd, … …, and grad _ n.

For example, from the processing parameter values and the target gradient values corresponding to the sub-loss values, the weight values corresponding to the sub-loss values may be determined.

For example, the target gradient values corresponding to the sub-loss values are processed to obtain processed target gradient values. According to the processing parameter values, normalization processing can be carried out on the processed target gradient values to obtain normalized gradient values. From the inverse of the normalized gradient values, the weight values corresponding to the sub-loss values may be determined. In the training process, for a task with a large gradient, the weight value of the task can be smaller through the embodiment of the disclosure. In the final loss value, the weight of the task is reduced, and the learning speed between different tasks can be effectively balanced.

For example, the ith weight value w _ i corresponding to the ith sub-loss value loss _ i may be determined by the following formula.

(formula one)

i may be an integer greater than or equal to 1, and i may be an integer less than or equal to n.

∑ _i exp (grad _ i) may be a process parameter value. exp (grad _ i) may be the ith post-processing target gradient value.

It is understood that according to formula one, the 1 st, 2 nd, and nth weight values w _1, w _2, … …, w _ n may be determined.

In one example, the 1 st target gradient value grad _1 is processed to obtain a 1 st processed target gradient value exp (grad _ 1). According to the value of the processing parameter ∑ _i exp (grad _ i), normalizing the 1 st processed target gradient value exp (grad _1), to obtain the 1 st normalized gradient value exp (grad _ 1)/[ sigma ] _i exp (grad _ i). The reciprocal of the 1 st normalized gradient value may be taken as the 1 st weight value w _ 1.

In operation S451, a loss value is obtained.

In the embodiment of the present disclosure, the loss value may be obtained according to a plurality of sub-loss values and a plurality of weight values respectively corresponding to the plurality of sub-loss values.

For example, the loss value may be obtained by performing weighted summation based on a plurality of sub-loss values and a plurality of weight values respectively corresponding to the plurality of sub-loss values.

For example, the Loss value Loss may be obtained by performing weighted summation according to the 1 st sub-Loss value Loss _1, the 2 nd sub-Loss values Loss _2, … …, the nth sub-Loss value Loss _ n, and the 1 st, 2 nd, … …, nth weight values w _1, w _2, w _ n.

In one example, the Loss value Loss may be obtained by the following equation:

(formula two)

In embodiments of the present disclosure, a multitask model may be trained based on the loss values.

For example, parameters of the task processing submodels and the sharing submodel may be adjusted according to the loss values, respectively, to train the multitask model. In one example, based on a back propagation algorithm, parameters of the plurality of task processing sub-models and the shared sub-model may be adjusted with a penalty value.

Fig. 5 is a flowchart of an information recommendation method according to another embodiment of the present disclosure.

As shown in fig. 5, the method 500 may include operations S510 to S520.

In operation S510, target behavior data of the target object is input into the multitask model, and a plurality of output results are obtained.

In embodiments of the present disclosure, a multitask model may be trained in accordance with the methods provided by the present disclosure. For example, a multitask model may be trained according to method 100.

In embodiments of the present disclosure, the output may indicate a probability that the target object performs an action on one of the candidate information.

For example, the candidate information may be information such as short video, image, or text. In one example, one output result may indicate a probability that the target object performed a "click" action on the candidate information. In one example, another output may indicate a probability that the target object performed a "comment" action on the candidate information.

In operation S520, target information is recommended to the target object according to the plurality of output results.

In the embodiment of the present disclosure, the recommended parameter of the candidate information is determined according to a plurality of output results. And determining target information from the candidate information according to the recommendation parameters of the candidate information. And recommending target information to the target object.

For example, the output results may be normalized to a value greater than 0 and less than 1. According to a plurality of normalized output results, various operations can be performed to obtain a recommendation parameter of the candidate information. In one example, the various operations may include, for example, averaging, summing, weighted summing, and so forth.

For example, the candidate information having the largest recommendation parameter may be recommended to the target object as the target information.

Through the embodiment of the disclosure, information can be accurately recommended to the target object, the information pushing efficiency is improved, and the user experience is improved.

It can be understood that in the related fields of information recommendation and the like, a large amount of candidate information can be recalled for a target object, and then target information can be determined from the recalled candidate information. Some embodiments of determining the target information from the candidate information are described in detail above, and some embodiments of recalling the candidate information are described below.

In some embodiments, candidate information may be determined from the plurality of initial information based on target behavior data for the target object.

For example, the target behavior data may be converted into a target vector. And calculating the similarity between the target vector and the feature vector of the initial information. In the case where the similarity is greater than the preset similarity threshold, the initial information corresponding to the similarity may be used as one candidate information.

For example, a plurality of candidate information may be determined.

It is to be understood that the application of the multitask model provided by the present disclosure in the field of information recommendation has been described in detail above, but the present disclosure is not limited thereto. The multitasking model provided by the present disclosure may also be applied to other fields (e.g., image processing, text processing, audio processing, etc.).

FIG. 6 is a block diagram of a training apparatus for a multitask model according to one embodiment of the present disclosure.

In the disclosed embodiments, the multitasking model may include a plurality of task processing submodels and a sharing submodel.

As shown in fig. 6, the apparatus 600 may include a first obtaining module 610, a second obtaining module 620, a first determining module 630, a second determining module 640, and a training module 650.

A first obtaining module 610, configured to input the sample behavior data of the sample object into the shared sub-model, so as to obtain behavior feature information of the sample object. For example, the behavior feature information includes a plurality of behavior feature sub information.

And a second obtaining module 620, configured to obtain a sub-loss value of the task processing sub-model according to the behavior feature sub-information related to the task processing sub-model.

The first determining module 630 is configured to determine a target gradient value corresponding to a sub-loss value according to the sub-loss value of the task processing sub-model.

The second determining module 640 is configured to determine a weight value corresponding to the sub-loss value according to the plurality of target gradient values.

The training module 650 is configured to train the multitask model according to the plurality of sub-loss values and a plurality of weight values respectively corresponding to the plurality of sub-loss values.

In some embodiments, the second determining module comprises: the first determining submodule is used for determining a processing parameter value according to the target gradient values; and a second determining submodule, configured to determine a weight value corresponding to the sub-loss value according to the processing parameter value and the target gradient value corresponding to the sub-loss value.

In some embodiments, the second determination submodule comprises: the processing unit is used for processing the target gradient value corresponding to the sub-loss value to obtain a processed target gradient value; the normalization processing unit is used for performing normalization processing on the processed target gradient value according to the processing parameter value to obtain a normalized gradient value; and a determining unit for determining a weight value corresponding to the sub-loss value according to an inverse of the normalized gradient value.

In some embodiments, the second obtaining module comprises: the first obtaining submodule is used for inputting the behavior characteristic sub-information related to the task processing submodel into the task processing submodel to obtain an output result of the task processing submodel; and the second obtaining submodule is used for obtaining the sub-loss value of the task processing submodel according to the output result.

In some embodiments, the training module comprises: a third obtaining submodule, configured to obtain a loss value according to the multiple sub-loss values and multiple weight values respectively corresponding to the multiple sub-loss values; and the training submodule is used for training the multi-task model according to the loss value.

In some embodiments, the third obtaining sub-module comprises: and the weighted summation unit is used for carrying out weighted summation according to the plurality of sub-loss values and a plurality of weight values respectively corresponding to the plurality of sub-loss values to obtain the loss value.

In some embodiments, the training submodule comprises: and the adjusting unit is used for respectively adjusting the parameters of the plurality of task processing submodels and the shared submodel according to the loss value so as to train the multitask model.

Fig. 7 is a block diagram of an information recommendation apparatus according to another embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 may include a third obtaining module 710 and a recommending module 720.

The third obtaining module 710 is configured to input the target behavior data of the target object into the multitask model to obtain a plurality of output results.

And a recommending module 720, configured to recommend the target information to the target object according to the plurality of output results.

For example, a multitasking model is trained according to the apparatus provided by the present disclosure.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

In an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor. For example, the memory stores instructions executable by the at least one processor to enable the at least one processor to perform methods provided in accordance with the present disclosure.

In the disclosed embodiments, a readable storage medium stores computer instructions, which may be a non-transitory computer readable storage medium. For example, the computer instructions may cause a computer to perform a method provided in accordance with the present disclosure.

In an embodiment of the present disclosure, the computer program product comprises a computer program which, when executed by a processor, implements the method provided according to the present disclosure. This will be described in detail below with reference to fig. 8.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as a training method of a multitask model and/or an information recommendation method. For example, in some embodiments, the training method and/or the information recommendation method of the multitasking model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by the computing unit 801, a computer program may perform one or more steps of the training method and/or the information recommendation method of the multi-tasking model described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the training method and/or the information recommendation method of the multitask model by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) display or an LCD (liquid crystal display)) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a multitask model, the multitask model comprising a plurality of task processing sub-models and a shared sub-model, the method comprising:

inputting sample behavior data of a sample object into the sharing sub-model to obtain behavior characteristic information of the sample object, wherein the behavior characteristic information comprises a plurality of behavior characteristic sub-information;

obtaining a sub-loss value of the task processing sub-model according to the behavior characteristic sub-information related to the task processing sub-model;

determining a target gradient value corresponding to the sub-loss value according to the sub-loss value of the task processing sub-model;

determining a weight value corresponding to the sub-loss value according to a plurality of target gradient values; and

and training the multitask model according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values.

2. The method of claim 1, wherein said determining, from the plurality of target gradient values, a weight value corresponding to the sub-loss value comprises:

determining a processing parameter value according to a plurality of target gradient values; and

determining the weight value corresponding to the sub-loss value according to the processing parameter value and the target gradient value corresponding to the sub-loss value.

3. The method of claim 2, wherein said determining the weight value corresponding to the sub-loss value as a function of the processing parameter value and a target gradient value corresponding to the sub-loss value comprises:

processing the target gradient value corresponding to the sub-loss value to obtain a processed target gradient value;

according to the processing parameter value, normalization processing is carried out on the processed target gradient value to obtain a normalized gradient value; and

determining the weight value corresponding to the sub-loss value according to the reciprocal of the normalized gradient value.

4. The method of claim 1, wherein the deriving a sub-loss value of the task processing submodel from the behavior feature sub-information associated with the task processing submodel comprises:

inputting the behavior characteristic sub-information related to the task processing sub-model into the task processing sub-model to obtain an output result of the task processing sub-model; and

and obtaining the sub-loss value of the task processing sub-model according to the output result.

5. The method of claim 1, wherein said training the multitask model according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values comprises:

obtaining a loss value according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values; and

and training the multitask model according to the loss value.

6. The method of claim 5, wherein the deriving a loss value according to a plurality of the sub-loss values and a plurality of the weight values respectively corresponding to the plurality of sub-loss values comprises:

and carrying out weighted summation according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values to obtain the loss value.

7. The method of claim 5, wherein the training the multitask model according to the loss value comprises:

and respectively adjusting parameters of the task processing submodels and the shared submodels according to the loss value so as to train the multi-task model.

8. An information recommendation method, comprising:

inputting target behavior data of a target object into a multi-task model to obtain a plurality of output results;

recommending target information to the target object according to the output results,

wherein the multitasking model is trained according to the method of any one of claims 1 to 7.

9. An apparatus for training a multitask model, the multitask model comprising a plurality of task processing submodels and a shared submodel, the apparatus comprising:

the first obtaining module is used for inputting sample behavior data of a sample object into the sharing sub-model to obtain behavior characteristic information of the sample object, wherein the behavior characteristic information comprises a plurality of behavior characteristic sub-information;

the second obtaining module is used for obtaining a sub-loss value of the task processing sub-model according to the behavior characteristic sub-information related to the task processing sub-model;

the first determining module is used for determining a target gradient value corresponding to the sub-loss value according to the sub-loss value of the task processing sub-model;

a second determining module, configured to determine, according to the multiple target gradient values, a weight value corresponding to the sub-loss value; and

and the training module is used for training the multitask model according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values.

10. The apparatus of claim 9, wherein the second determining means comprises:

the first determining submodule is used for determining a processing parameter value according to the target gradient values; and

a second determining submodule, configured to determine the weight value corresponding to the sub-loss value according to the processing parameter value and the target gradient value corresponding to the sub-loss value.

11. The apparatus of claim 10, wherein the second determination submodule comprises:

the processing unit is used for processing the target gradient value corresponding to the sub-loss value to obtain a processed target gradient value;

the normalization processing unit is used for performing normalization processing on the processed target gradient value according to the processing parameter value to obtain a normalized gradient value; and

a determining unit, configured to determine the weight value corresponding to the sub-loss value according to a reciprocal of the normalized gradient value.

12. The apparatus of claim 9, wherein the second obtaining means comprises:

the first obtaining submodule is used for inputting the behavior characteristic sub-information related to the task processing submodel into the task processing submodel to obtain an output result of the task processing submodel; and

and the second obtaining submodule is used for obtaining the sub-loss value of the task processing submodel according to the output result.

13. The apparatus of claim 9, wherein the training module comprises:

a third obtaining submodule, configured to obtain a loss value according to the multiple sub-loss values and the multiple weight values respectively corresponding to the multiple sub-loss values; and

and the training sub-module is used for training the multi-task model according to the loss value.

14. The apparatus of claim 13, wherein the third obtaining submodule comprises:

and the weighted summation unit is used for carrying out weighted summation according to the plurality of sub-loss values and the plurality of weight values respectively corresponding to the plurality of sub-loss values to obtain the loss value.

15. The apparatus of claim 13, wherein the training submodule comprises:

and the adjusting unit is used for respectively adjusting the parameters of the task processing submodels and the sharing submodel according to the loss value so as to train the multi-task model.

16. An information recommendation apparatus comprising:

the third obtaining module is used for inputting the target behavior data of the target object into the multitask model to obtain a plurality of output results;

a recommending module for recommending target information to the target object according to the output results,

wherein the multitasking model is trained according to the apparatus of any one of claims 9 to 15.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 8.