CN111242752A

CN111242752A - Method and system for determining recommended object based on multi-task prediction

Info

Publication number: CN111242752A
Application number: CN202010329692.XA
Authority: CN
Inventors: 钱浩; 周俊; 崔卿; 李龙飞
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-06-05
Anticipated expiration: 2040-04-24
Also published as: CN111242752B

Abstract

The embodiment of the specification discloses a method and a system for determining a recommended object based on multi-task prediction, wherein the method comprises the following steps: acquiring a user characteristic of a target user and an object characteristic of at least one candidate object; using a recommendation model to perform the following processing on each of the at least one candidate object to obtain at least one decision value: processing the user characteristics and the object characteristics through a recommendation model, and determining two or more predicted values corresponding to the candidate object; the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to a target task; determining a decision value corresponding to the candidate object based on the two or more predicted values, wherein the decision value reflects the completion degree of the target task; and determining a target object recommended to the target user from the at least one candidate object based on the at least one decision value.

Description

Method and system for determining recommended object based on multi-task prediction

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a method and a system for determining a recommended object based on multi-task prediction.

Background

Currently, more and more network platforms recommend objects (e.g., commodities, short videos, etc.) that meet the user's interests to the user using a recommendation system. In practical applications, the recommendation system may need to optimize multiple target tasks simultaneously. Such as the user's click-through rate and the platform's volume of deals. Therefore, it is necessary to provide a method and a system for determining a recommended object based on multi-task prediction to improve the comprehensive completion of multiple tasks.

Disclosure of Invention

An aspect of an embodiment of the present specification provides a method of determining a recommended object based on multi-task prediction, including: acquiring a user characteristic of a target user and an object characteristic of at least one candidate object; using a recommendation model to perform the following processing on each of the at least one candidate object to obtain at least one decision value: processing the user characteristics and the object characteristics through a recommendation model, and determining two or more predicted values corresponding to the candidate object; the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to a target task; determining a decision value corresponding to the candidate object based on the two or more predicted values, wherein the decision value reflects the completion degree of the target task; and determining a target object recommended to the target user from the at least one candidate object based on the at least one decision value.

An aspect of an embodiment of the present specification provides a system for determining a recommended object based on multi-task prediction, including: the characteristic acquisition module is used for acquiring the user characteristic of a target user and the object characteristic of at least one candidate object; a decision value determination module, configured to perform the following processing on each of the at least one candidate object by using the recommendation model to obtain at least one decision value: processing the user characteristics and the object characteristics through a recommendation model, and determining two or more predicted values corresponding to the candidate object; the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task; determining a decision value corresponding to the candidate object based on the two or more predicted values, wherein the decision value reflects the completion degree of the target task; a target object determination module for determining a target object recommended to the target user from the at least one candidate object based on the at least one decision value.

An aspect of the embodiments of the present specification provides an apparatus for determining a recommended object based on multi-task prediction, including a processor for executing a method for determining a recommended object based on multi-task prediction as described above.

One aspect of the embodiments of the present specification provides a method for training a recommendation model based on multi-task prediction, including: obtaining a plurality of training samples carrying labels; the training sample comprises a sample user characteristic and a sample object characteristic, and the label is used for representing the completion degree of a target task; iteratively updating parameters of the initial recommendation model based on a plurality of training samples to reduce loss function values corresponding to the training samples to obtain a trained recommendation model; wherein, the loss function value corresponding to each training sample is determined by the following process: processing the sample user characteristics and the sample object characteristics through a recommendation model, obtaining two or more predicted values and determining a decision value based on the two or more predicted values; wherein the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task; and determining a loss function value corresponding to the training sample at least based on the difference between the decision value and the label corresponding to the training sample.

One aspect of embodiments of the present specification provides a system for training a recommendation model based on multi-tasking prediction, including: the training sample acquisition module is used for acquiring a plurality of training samples carrying labels; the training sample comprises a sample user characteristic and a sample object characteristic, and the label is used for representing the completion degree of a target task; the parameter adjusting module is used for iteratively updating the parameters of the initial recommendation model based on a plurality of training samples so as to reduce the loss function values corresponding to the training samples and obtain a trained recommendation model; wherein, the loss function value corresponding to each training sample is determined by the following process: processing the sample user characteristics and the sample object characteristics through a recommendation model, obtaining two or more predicted values and determining a decision value based on the two or more predicted values; wherein the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task; and determining a loss function value corresponding to the training sample at least based on the difference between the decision value and the label corresponding to the training sample.

An aspect of the embodiments of the present specification provides a training apparatus for a recommendation model based on multi-task prediction, including a processor, configured to execute a training method for a recommendation model based on multi-task prediction as described above.

Drawings

The present description will be further described by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an application scenario of a system for determining recommended objects based on multi-tasking according to some embodiments of the present description;

FIG. 2 is a flow diagram of a method of determining recommended objects based on multi-tasking according to some embodiments of the present description;

FIG. 3 is a schematic diagram of a structure of a recommendation model shown in accordance with some embodiments of the present description;

FIG. 4 is a flow diagram of a method of training a recommendation model based on multi-tasking prediction in accordance with some embodiments of the present description;

FIG. 5 is a schematic diagram of a process for generating a plurality of training samples via successive click events, according to some embodiments of the present description;

FIG. 6 is a block diagram of a system for determining recommended objects based on multi-tasking in accordance with some embodiments of the present description;

FIG. 7 is a block diagram of a training system for a multi-tasking prediction based recommendation model in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used in this specification is a method for distinguishing different components, elements, parts or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

FIG. 1 is a schematic diagram of an application scenario of a system for determining recommended objects based on multi-tasking according to some embodiments of the present description.

As shown in fig. 1, a system 100 for determining a recommendation object based on multi-tasking may include a processing device 110, a network 120, and a user terminal 130.

The processing device 110 may be used to process information and/or data associated with determining a recommended target object to perform one or more of the functions disclosed in this specification. In some embodiments, the processing device 110 may be configured to obtain a user characteristic of the target user and an object characteristic of the at least one candidate object. In some embodiments, the processing device 110 may process each of the at least one candidate object using the recommendation model to obtain at least one decision value. In some embodiments, the processing device 110 may determine a target object recommended to the target user from at least one candidate object based on at least one decision value. In some embodiments, the processing device 110 may include one or more processing engines (e.g., single core processing engines or multi-core processors). By way of example only, the processing device 110 may include one or more combinations of a central processing unit (cpu), an Application Specific Integrated Circuit (ASIC), an application specific instruction set processor (ASIP), an image processor (GPU), a physical arithmetic processing unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller unit, a Reduced Instruction Set Computer (RISC), a microprocessor, and the like.

Network 120 may facilitate the exchange of information and/or data. In some embodiments, one or more components of the multitask prediction based determination recommendation object system 100 (e.g., processing device 110, user terminal 130) may communicate information to other components of the multitask prediction based determination recommendation object system 100 via the network 120. For example, processing device 110 may obtain user characteristics generated by user terminal 130 via network 120. For another example, the user terminal 130 may obtain the target object recommended by the processing device 110 through the network 120. In some embodiments, the network 120 may be any form of wired or wireless network, or any combination thereof. By way of example only, network 120 may be one or more combinations of a wireline network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a bluetooth network, and so forth.

User terminal 130 may be a device with data acquisition, storage, and/or transmission capabilities. In some embodiments, the user of the user terminal 130 may be a user of an online service using the application platform. In some embodiments, the user terminal 130 may include, but is not limited to, a mobile device 130-1, a tablet 130-2, a laptop 130-3, a desktop 130-4, and the like, or any combination thereof. Exemplary mobile devices 130-1 may include, but are not limited to, smart phones, Personal Digital Assistants (PDAs), handheld game consoles, smart watches, wearable devices, virtual display devices, display enhancement devices, and the like, or any combination thereof. In some embodiments, the user terminal 130 may send the obtained data to one or more devices in the determination recommendation object system 100 based on multi-tasking prediction. For example, the user terminal 130 may transmit the acquired data to the processing device 110. In some embodiments, the data acquired by the user terminal 130 may be user characteristic data generated by the user at the user terminal 130.

The technical scheme disclosed by the embodiment of the specification can be applied to the object recommendation scene. Based on different services of different application platforms, different object recommendations can be made to the user. By way of example only, in some scenarios, the application platform may issue certain user benefits (including but not limited to cash redpacks, coupons, benefits) to the user, thereby pulling consumption. For example, an application platform with financial services recommends a cash red envelope to a user using a recommendation model so that the user can purchase a financial product using the cash red envelope obtained by clicking. For another example, the application platform with mobile payment services recommends a subsidy to the user using a recommendation model, so that the user can make online or offline purchases using the clicked purchase subsidy. Issuing (or recommending) appropriate user rights to a particular user can lead to greater contributions to the overall revenue of the platform. How to determine the user rights and interests recommended to the user becomes a problem to be solved urgently.

In some embodiments, when the application platform uses the recommendation model to perform object recommendation, multiple tasks may be preset, for example, to improve click-through rate, conversion rate and/or platform volume, in which case, the recommendation model is a multi-task model. Of course, application platforms may wish to improve the overall completion of these tasks by recommending appropriate objects to users.

For example only, if the preset tasks of the multitask model are a main task 1 and an affiliated task 2, respectively, where the main task 1 may be a click-through rate optimizing task and the affiliated task 2 may be a GMV (platform turnover) optimizing task. The training mode of the multitask model can be as follows: and constructing a loss function for each task, training the double-tower model by using the idea of supervised learning, and adjusting shared layer parameters (such as embedded layer parameters shared by the double-tower model) of the model together. However, the following features may exist in this training mode: the training samples of the affiliated task 2 are insufficient, for example, the values of platform trades are scattered, so that the samples corresponding to each platform trades are insufficient, the parameters of the shared layer are degraded, and the prediction accuracy of the model is influenced.

Therefore, the embodiment of the present specification discloses a method for determining a recommendation object based on multi-task prediction, which uses a recommendation model as an executive network to perform prediction of different preset tasks by using a reinforcement learning idea, and determines a decision value based on a prediction result, wherein the decision value reflects the completion degree of a target task determined based on the prediction result of the different preset tasks, and the GMV is considered as a reward corresponding to the completion degree of the target task, so that interference of the GMV prediction network on a click rate prediction network is avoided, the effect of click rate optimization is ensured, and the GMV is optimized. The technical solutions disclosed in the present specification are described in detail below with reference to the accompanying drawings by taking user right recommendations as examples. It should be understood that the object recommendation method and recommendation model disclosed in the present specification can be used in other scenarios where object recommendation is performed to a user based on multitasking, such as recommending a driving path to the user in a navigation system.

FIG. 2 is a flow diagram illustrating a method for determining recommended objects based on multi-tasking according to some embodiments of the present description.

Step 202, obtaining the user characteristics of the target user and the object characteristics of at least one candidate object.

In some embodiments, this step may be performed by feature acquisition module 610.

In some embodiments, the target user may be a user of an online service using the application platform. For example, a user of an online transaction service using a mobile payment platform. As another example, a user of an online shopping service using an e-commerce platform.

In some embodiments, the candidate object may be a user interest. In some embodiments, the user interests may refer to offers that the application platform provides to the user. For example, the platform provides cash red packs, coupons, benefits, etc. to the user. In practical applications, the application platform may provide different user benefits to the user according to the services it includes. For example, if the application platform includes a telephone fee recharging service, a telephone fee interest, namely, a cash red packet or a coupon related to recharging the telephone fee, can be provided. As another example, where the application platform comprises a financial transaction, a financial interest may be provided, i.e., a cash red package or coupon for the purchase of a different financial product, such as a 10-dollar insurance cash red package, a 2-dollar fund red package, or the like. It can be understood that the existence of the user rights can guide the user to consume on the application platform to a certain extent, and further contribute to the overall profit of the platform.

In some embodiments, the user characteristics reflect at least personal attributes and historical consumption behavior of the target user. In some embodiments, user characteristics that reflect personal attributes of the target user may include income, age, and professional information of the target user. In some embodiments, historical consumption behavior may refer to historical consumption behavior of a target user on one or more application platforms. In some embodiments, the historical consumption behavior may include user interests of historical clicks by the target user and/or historical purchases of goods by the target user. Such as historical clicks on cash red packs captured by the target user, and/or historical purchases of financial products, physical goods, etc. by the target user.

In some embodiments, the object characteristics reflect at least cost information of the candidate object. In some embodiments, the cost information may include a constraint cost, i.e., the cost required by the application platform to provide the user interest to the user. In some embodiments, the object features may also reflect classification attributes of user interests. In some embodiments, the classification attribute may include id information of user interests, category information, and the like. The category information may include the type of interest, such as cash red envelope, discount coupons, etc.

In some embodiments, the feature acquisition module 610 may acquire user features from the user terminal 130 or a storage device that reflect personal attributes of the target user. In some embodiments, the feature extraction module 610 may extract user features from a shopping log or storage device that reflect historical consumption behavior of the target user. In some embodiments, the feature extraction module 610 may extract object features of the candidate object from a storage device.

Step 204, processing the user characteristics and the object characteristics through a recommendation model, and determining two or more predicted values corresponding to the candidate objects; the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task.

In some embodiments, this step may be performed by the decision value determination module 620.

In some embodiments, the recommendation model may be a pre-trained machine learning model. The trained recommendation model may process each of the at least one candidate object to obtain two or more predicted values corresponding to the at least one candidate object. The training process of the recommendation model can be referred to fig. 4 and its related description, and will not be described herein.

In some embodiments, the two or more predicted values are associated with two or more predetermined tasks, respectively. The preset task here may be set based on the target task. In some embodiments, the target task may be to increase the overall revenue of the platform, and thus, the two or more pre-set tasks may include a click-through rate prediction task and a GMV (i.e., platform turnover) prediction task. It will be appreciated that click through rate and platform volume both affect the achievement of the final goal. As described herein, the two or more predicted values may include a first predicted value and a second predicted value. In some embodiments, the first predictor may reflect a probability that the target user clicked on the candidate object. Wherein "clicking" may be understood as the user receiving the candidate object. For example, if the candidate is a "10-dollar insurance cash parcel" provided by a payroll platform, the first predicted value may be a numerical value associated with a probability that the target user clicks on the cash parcel. In some embodiments, the second predictive value may reflect an amount of consumption by the target user through the candidate object. It will be appreciated that the user interest will only be validated when the user actually consumes, i.e. the user will be charged the amount of money consumed by using the user interest. However, the platform cannot know in advance how much amount of consumption the user can generate through the user interest, and therefore, another preset task of the recommendation model is to predict the amount of consumption the target user generates through the candidate object. Still taking the candidate as "10 yuan insurance cash package" as an example, if the recommendation model predicts that the target user will use the cash package to purchase 200 yuan insurance on the paymate platform, the second prediction value is 200.

In other embodiments, the preset tasks may also include other tasks related to the final goal, and the number and types of the preset tasks are not limited in the present specification.

And step 206, determining a decision value corresponding to the candidate object based on the two or more predicted values, wherein the decision value reflects the completion degree of the target task.

In some embodiments, the decision value corresponding to each candidate may be determined based on two or more predicted values for that candidate. The decision value may reflect the completion of the target task. For example, still taking the two or more preset tasks as the click-through rate prediction task and the GMV prediction task as examples, the decision value may reflect the completion degree of the final objective of increasing the platform overall profit by recommending the candidate object, which is determined based on the click-through rate prediction result and the GMV prediction result.

In some embodiments, the decision value is positively correlated with the first predictor and/or the second predictor. It is to be understood that the larger the first predicted value and/or the second predicted value, the larger the decision value. In some embodiments, the decision value may be determined by equation (1)

：

(1)

Wherein the content of the first and second substances,

in order to decide the value of the decision,

in order to be the second predicted value,

is the first predicted value. The range of the second predicted value may be large because the amount of consumption by the target user through the candidate object may be large. By carrying out logarithm solving processing on the second predicted value, the range of the second predicted value can be reduced, and therefore training of the recommendation model is facilitated.

Step 208, determining a target object recommended to the target user from the at least one candidate object based on at least one decision value.

In particular, this step may be performed by the target object determination module 630.

In some embodiments, the target object determining module 630 may rank the decision values, and select a candidate object with a preset rank from at least one candidate object to determine as the target object. The prediction ranking can be specifically set according to actual requirements. E.g., 5, 8, or 10, etc. In some embodiments, the target object determination module 630 may determine the candidate object corresponding to the highest decision value as the target object.

The embodiment of the specification utilizes the idea of reinforcement learning, takes a recommendation model as an executive body network, determines a decision value (namely, a reward value) based on a first predicted value and/or a second predicted value, and selects an optimal candidate object based on the decision value to recommend to a target user. The click rate and the GMV can be optimized simultaneously, so that the satisfaction degree of the user and the bargain amount of the application platform are improved.

FIG. 3 is a block diagram of a recommendation model in accordance with some embodiments of the present description.

In some embodiments, a recommendation model may be constructed based on the preset tasks and the relationship of the preset tasks to the decision values. In some embodiments, the recommendation task may be constructed based on a neural network. As shown in fig. 3, the recommendation model may include an embedding layer, a feature intersection layer, a first multi-layered perceptron, a second multi-layered perceptron, and an output layer.

In some embodiments, an embedding layer may be used to convert user features and object features into respective vector representations. In some embodiments, the user features and the object features obtained by the feature obtaining module 610 may be high-dimensional sparse features, and the high-dimensional sparse features may be mapped to low-dimensional dense features through the embedding layer, that is, the transformed vector represents a dimensionality reduction. In some embodiments, the embedding layer may include a TF-IDF, a Word2Vec network, or a Bert network, among others.

In some embodiments, the feature intersection layer is configured to perform a feature fusion process on the vector representation of the user feature and the vector representation of the object feature to obtain a fused vector representation. In some embodiments, the feature intersection layer may include a Wide & Deep network, Deep fm network, or Deep and Cross network.

In some embodiments, the first multi-layered perceptron is configured to process the fused vector representation to obtain the first predictor. In some embodiments, the second multi-layered perceptron is configured to process the fused vector representation and the first predictor to obtain a second predictor.

In some embodiments, the output layer is configured to perform a decision value operation on the first predicted value and the second predicted value, and output a decision value. For the decision value operation, reference may be made to the detailed description of the formula (1) in the step 206, which is not described herein again.

FIG. 4 is a flow diagram of a method of training a recommendation model based on multi-tasking prediction in accordance with some embodiments of the present description.

Step 402, obtaining a plurality of training samples carrying labels; the training sample comprises a sample user characteristic and a sample object characteristic, and the label is used for representing the completion degree of the target task.

In some embodiments, this step may be performed by training sample acquisition module 710.

In some embodiments, the training samples may be data input into the initial recommendation model for training the recommendation model. In some embodiments, the training sample includes sample user features and sample object features. In some embodiments, the sample user characteristics reflect at least personal attributes and historical consumption behavior of the sample user. In some embodiments, the personal attributes include at least one of: income, age, and occupation. In some embodiments, the historical consumption behavior includes user interests of the sample user's historical clicks and/or goods purchased by the sample user. In some embodiments, the sample object may be a user interest. In some embodiments, the sample object characteristics reflect at least cost information of the sample object. For details of the training samples, reference may be made to step 202 and its related description, which are not repeated herein.

In some embodiments, the labels may be used to characterize some kind of real information of the training sample. In some embodiments, tags may be used to characterize the completion of a target task. As can be seen from the above step 204 and the related description, the two or more predetermined tasks may be a click-through rate prediction task and a GMV (platform volume) prediction task. For the two or more preset tasks, the completion degree of the target task can be understood as the contribution of the event that the user clicks the candidate object and consumes the candidate object to achieve the target task (increase the overall profit of the platform). Therefore, the completeness of the tag characterization can refer to the actual consumption amount generated by the user through the candidate object, and this information can be known in the historical consumption behavior of the user. It will be appreciated that the tag may also serve, to some extent, as a reward for the user clicking on the candidate and consuming the event through the candidate. It can be understood that if the user clicks on the candidate object but does not complete the purchase, the actual amount of consumption generated by the user through the candidate object at this time is 0, i.e. the label is 0.

In some embodiments, the training samples may be labeled by manual labeling. In some embodiments, training sample acquisition module 710 may acquire training samples from a storage device of processing device 110.

In some embodiments, the training samples may come from click events of different sample users on different sample candidates recommended to the user for the platform history. In some embodiments, one training sample may come from a single click event. As an example, the platform recommends sample candidates a, b, c to the user, the sample user clicks on sample candidate a, and a 500 dollar spending amount is generated by sample candidate a. Based on this event, a training sample may be constructed, including the sample user characteristics, sample candidate a, and label (or reward value) 500.

In some embodiments, multiple training samples may be from multiple click events, or referred to as consecutive click events, of the same sample user on the same sample candidate object, and these training samples may be considered to have some causal association. For example only, taking the sample object as "10 dollar insurance cash red envelope" as an example, the continuous click event may mean that the user clicks the "10 dollar insurance cash red envelope" multiple times to make multiple insurance purchases.

Referring to FIG. 5, it can be seen that multiple training samples may be generated by successive click events. After the labels are respectively labeled on the training samples, a plurality of training samples with labels can be obtained. These training samples correspond to multiple click events of the same sample user on the same sample object, respectively. Illustratively, the sample user 1 clicks on the sample object 1 for the first time to generate the training sample 1, the sample user 1 clicks on the sample object 1 for the second time to generate the training sample 2, the sample user 1 clicks on the sample object 1 for the third time to generate the training sample 3, and so on, and the sample user 1 clicks on the sample object 1 for the nth time to generate the training sample n.

And 404, iteratively updating parameters of the initial recommendation model based on a plurality of training samples to reduce loss function values corresponding to the training samples, so as to obtain the trained recommendation model.

In some embodiments, this step may be performed by parameter adjustment module 720.

During the model training process, the parameter adjustment module 720 may continuously update the parameters of the initial recommended model based on a plurality of training samples. Specifically, the parameter adjusting module 720 may continuously adjust the parameters of the initial recommended model to reduce the loss function values corresponding to the training samples, so that the loss function values satisfy the preset conditions. For example, the loss function value converges, or the loss function value is less than a preset value. And when the loss function meets the preset condition, completing model training to obtain a trained recommendation model. The trained recommendation model can obtain at least one decision value corresponding to at least one candidate object based on the user characteristics of the target user and the object characteristics of the at least one candidate object, and recommend the target object for the target user.

In some embodiments, the loss function value corresponding to each training sample may be determined by: processing the sample user characteristics and the sample object characteristics through a recommendation model, acquiring two or more predicted values and determining a decision value based on the two or more predicted values; wherein, two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task; and determining a loss function value corresponding to the training sample at least based on the difference between the decision value and the label corresponding to the training sample.

In some embodiments, the two or more prediction values include a first prediction value reflecting a probability that the sample user clicked on the sample object and a second prediction value reflecting an amount of consumption by the sample user by the sample object. For details of the decision value, the first prediction value, and the second prediction value, reference may be made to step 206 and the related description thereof, which are not repeated herein.

In some embodiments, the training samples may be samples generated by a single click event. For a training sample generated by a single click event, a loss function value corresponding to the training sample can be determined by the difference between the decision value and the label corresponding to the training sample. Specifically, the loss function value corresponding to the training sample can be determined by formula (2)

：

(2)

Wherein the content of the first and second substances,

in order to obtain the value of the loss function,

and calculating the decision value of the training sample for the model, wherein y is the label value of the training sample.

In some embodiments, the training samples may be samples generated by successive click events.

For each of at least one training sample generated by a continuous click event, determining a loss function value corresponding to the training sample at least based on a difference between a decision value and a label corresponding to the training sample, including: a loss function value is determined based on a difference between a sum of a label corresponding to the training sample and a portion of the label corresponding to the at least one other training sample and the decision value. In some embodiments, the click event corresponding to at least one other training sample is later than the click event corresponding to that training sample. Still taking fig. 5 as an example, if at least one training sample is training sample 1, then at least one other training sample is training samples 2 and 3.

Specifically, the loss function value corresponding to at least one training sample can be determined by formula (3)

：

(3)

Wherein L is a loss function value,

the decision value of the at least one training sample is represented by i, the number of clicks corresponding to the at least one training sample is represented by n, the total number of clicks of continuous click events is represented by n,

for the discount coefficient, it is generally taken from the range of values (0, 1),

is the label value (or prize value) of the training sample.

Illustratively, and also taking the above example as an example, if at least one training sample is training sample 1, theni=1，n=3, loss function value L of training sample 1 =

=

(ii) a Wherein the content of the first and second substances,

in order to train the decision value of sample 1,

to train the label value of sample 1,

in order to be able to determine the discount factor,

to train the label value of sample 2,

is the label value of training sample 3. It will be appreciated that for training samples from consecutive click events, the reward value is related not only to the current amount of consumption, but also to the amount of consumption incurred by future click events. Based on the above formula, the loss function is constructed, so that the characteristics of continuous click events can be described, and the model prediction accuracy is improved.

FIG. 6 is a block diagram of a system for determining recommended objects based on multi-tasking according to some embodiments of the present description.

As shown in fig. 6, the system 600 for determining a recommended object based on multi-task prediction may include a feature obtaining module 610, a decision value determining module 620, and a target object determining module 630.

The feature obtaining module 610 may be configured to obtain a user feature of the target user and an object feature of the at least one candidate object. In some embodiments, the candidate object is a user interest; the user characteristics reflect at least personal attributes and historical consumption behaviors of the target user; the personal attributes include at least one of: income, age, and occupation; the object features reflect at least cost information of the candidate objects. In some embodiments, the historical consumption behavior includes user interests of the target user's historical clicks and/or goods historically purchased by the target user.

In some embodiments, the decision value determination module 620 may be configured to utilize the recommendation model to perform the following for each of the at least one candidate object to obtain at least one decision value: processing the user characteristics and the object characteristics through a recommendation model, and determining two or more predicted values corresponding to the candidate object; wherein the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task; and determining a decision value corresponding to the candidate object based on the two or more predicted values, wherein the decision value reflects the completion degree of the target task.

In some embodiments, the two or more predictor values include a first predictor value and a second predictor value; the recommendation model includes: an embedding layer, a feature crossing layer, a first multi-layer perceptron, and a second multi-layer perceptron; the embedding layer is used for converting the user characteristics and the object characteristics into respective vector representations; the feature crossing layer is used for performing feature fusion processing on the vector representation of the user features and the vector representation of the object features to obtain fusion vector representation; the first multilayer perceptron is used for processing the fusion vector representation to obtain the first predicted value; and the second multilayer perceptron is used for processing the fusion vector representation and the first predicted value to obtain a second predicted value. In some embodiments, the feature intersection layer comprises: wide & Deep networks, Deep fm networks, or Deep and Cross networks. In some embodiments, the decision value is positively correlated with the first predictor and/or the second predictor. In some embodiments, the two or more predictive values include a first predictive value reflecting a probability of the target user clicking on the candidate object and a second predictive value reflecting an amount of consumption by the target user through the candidate object.

The target object determination module 630 may be configured to determine a target object recommended to the target user from the at least one candidate object based on the at least one decision value.

As shown in fig. 7, a training system 700 for a recommendation model based on multi-tasking prediction may include: a training sample acquisition module 710 and a parameter adjustment module 720.

The training sample obtaining module 710 may be configured to obtain a plurality of training samples carrying labels; the training sample comprises a sample user characteristic and a sample object characteristic, and the label is used for representing the completion degree of the target task. In some, the sample object is a user interest; the sample user characteristics reflect at least personal attributes and historical consumption behavior of the sample user; the personal attributes include at least one of: income, age, and occupation; the sample object characteristics reflect at least cost information of the sample object. In some embodiments, the historical consumption behavior includes user interests of the sample user historical clicks and/or goods purchased by the sample user historical.

The parameter adjusting module 720 may be configured to iteratively update parameters of the initial recommendation model based on a plurality of training samples to reduce loss function values corresponding to the training samples, so as to obtain a trained recommendation model; wherein, the loss function value corresponding to each training sample is determined by the following process: processing the sample user characteristics and the sample object characteristics through a recommendation model, obtaining two or more predicted values and determining a decision value based on the two or more predicted values; wherein the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task; and determining a loss function value corresponding to the training sample at least based on the difference between the decision value and the label corresponding to the training sample. In some embodiments, the two or more prediction values include a first prediction value reflecting a probability that the sample user clicked on the sample object and a second prediction value reflecting an amount of consumption by the sample user by the sample object.

In some embodiments, at least one training sample and at least one other training sample in the plurality of training samples carrying the label respectively correspond to multiple click events of the same sample object by the same sample user; for each of the at least one training sample: determining a loss function value corresponding to the training sample based on at least a difference between the decision value and a label corresponding to the training sample, including: and determining the loss function value based on the difference between the sum of the label corresponding to the training sample and a part of the label corresponding to the at least one other training sample and the decision value, wherein the click event corresponding to the at least one other training sample is later than the click event corresponding to the training sample.

It should be understood that the system and its modules shown in fig. 6 or 7 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above descriptions of the system 600 for determining recommendation objects based on multi-task prediction and its modules, and the training system 700 for recommendation models based on multi-task prediction and its module diagram are only for convenience of description, and should not limit the present disclosure within the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, the feature obtaining module 610, the decision value determining module 620, and the target object determining module 630 disclosed in fig. 6 may be different modules in a system, or may be a module that implements the functions of the two modules. For another example, in the system 600 for determining a recommendation target based on multi-task prediction, each module may share one storage module, and each module may have its own storage module. Such variations are within the scope of the present disclosure.

The embodiment of the present specification further provides an apparatus for determining a recommended object based on multi-task prediction, which includes a processor, and the processor is configured to execute the foregoing method for determining a recommended object based on multi-task prediction.

The embodiment of the specification further provides a training device for the recommendation model based on the multi-task prediction, which comprises a processor, wherein the processor is used for executing the training method for the recommendation model based on the multi-task prediction.

The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) the recommendation model is used as an executive body network by utilizing the idea of reinforcement learning, the optimized part of GMV (platform bargain) is used as reward, the candidate object corresponding to the optimal reward is selected and recommended to the target user, the click rate and the GMV are optimized, and the completion degree of the target task is improved. (2) The GMV is considered as the reward corresponding to the target task completion degree, so that the interference of the GMV prediction network on the click rate prediction network is avoided, and the final model effect is improved. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing processing device or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method for determining recommended objects based on multi-tasking, comprising:

acquiring a user characteristic of a target user and an object characteristic of at least one candidate object;

using a recommendation model to perform the following processing on each of the at least one candidate object to obtain at least one decision value:

processing the user characteristics and the object characteristics through a recommendation model, and determining two or more predicted values corresponding to the candidate object; wherein the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task;

determining a decision value corresponding to the candidate object based on the two or more predicted values, wherein the decision value reflects the completion degree of the target task;

and determining a target object recommended to the target user from the at least one candidate object based on the at least one decision value.

2. The method of claim 1, the two or more predictors including a first predictor and a second predictor;

the recommendation model includes: an embedding layer, a feature crossing layer, a first multi-layer perceptron, and a second multi-layer perceptron;

the embedding layer is used for converting the user characteristics and the object characteristics into respective vector representations;

the feature crossing layer is used for performing feature fusion processing on the vector representation of the user features and the vector representation of the object features to obtain fusion vector representation;

the first multilayer perceptron is used for processing the fusion vector representation to obtain the first predicted value;

and the second multilayer perceptron is used for processing the fusion vector representation and the first predicted value to obtain a second predicted value.

3. The method of claim 2, the feature intersection layer comprising: wide & Deep networks, Deep fm networks, or Deep Cross networks.

4. The method of claim 2, wherein the decision value is positively correlated with the first predictive value and/or the second predictive value.

5. The method of claim 1, the candidate object being a user interest;

the user characteristics reflect at least personal attributes and historical consumption behaviors of the target user; the personal attributes include at least one of: income, age, and occupation;

the object features reflect at least cost information of the candidate objects.

6. The method of claim 5, wherein the historical consumption behavior comprises user interest of the target user's historical clicks and/or goods historically purchased by the target user.

7. The method of claim 5, wherein the two or more predictive values include a first predictive value reflecting a probability of the target user clicking on the candidate object and a second predictive value reflecting an amount of consumption by the target user through the candidate object.

8. A system for determining recommended objects based on multi-tasking, comprising:

the characteristic acquisition module is used for acquiring the user characteristic of a target user and the object characteristic of at least one candidate object;

a decision value determination module, configured to perform the following processing on each of the at least one candidate object by using the recommendation model to obtain at least one decision value: processing the user characteristics and the object characteristics through a recommendation model, and determining two or more predicted values corresponding to the candidate object; wherein the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task; determining a decision value corresponding to the candidate object based on the two or more predicted values, wherein the decision value reflects the completion degree of the target task;

a target object determination module for determining a target object recommended to the target user from the at least one candidate object based on the at least one decision value.

9. An apparatus for determining a recommended object based on multi-tasking, comprising a processor configured to perform the method of any of claims 1-7.

10. A training method of a recommendation model based on multi-task prediction comprises the following steps:

obtaining a plurality of training samples carrying labels; the training sample comprises a sample user characteristic and a sample object characteristic, and the label is used for representing the completion degree of a target task;

iteratively updating parameters of the initial recommendation model based on a plurality of training samples to reduce loss function values corresponding to the training samples to obtain a trained recommendation model;

wherein, the loss function value corresponding to each training sample is determined by the following process:

processing the sample user characteristics and the sample object characteristics through a recommendation model, obtaining two or more predicted values and determining a decision value based on the two or more predicted values; wherein the two or more predicted values are respectively related to two or more preset tasks, and the two or more preset tasks are related to the target task;

and determining the value of the loss function at least based on the difference between the decision value and the label corresponding to the training sample.

11. The method of claim 10, the sample object being a user interest;

the sample user characteristics reflect at least personal attributes and historical consumption behavior of the sample user; the personal attributes include at least one of: income, age, and occupation;

the sample object characteristics reflect at least cost information of the sample object.

12. The method of claim 11, the historical consumption behavior comprising user interests for the sample user historical clicks and/or goods purchased by the sample user historical.

13. The method of claim 11, wherein the two or more prediction values comprise a first prediction value and a second prediction value, wherein the first prediction value reflects a probability that the sample user clicks on the sample object, and the second prediction value reflects an amount of consumption of the sample user by the sample object.

14. The method of claim 11, wherein at least one training sample and at least one other training sample in the plurality of labeled training samples respectively correspond to multiple click events of a same sample user on a same sample object;

for each of the at least one training sample:

determining a loss function value corresponding to the training sample based on at least a difference between the decision value and a label corresponding to the training sample, including: and determining the loss function value based on the difference between the sum of the label corresponding to the training sample and a part of the label corresponding to the at least one other training sample and the decision value, wherein the click event corresponding to the at least one other training sample is later than the click event corresponding to the training sample.

15. A system for training a recommendation model based on multi-tasking prediction, comprising:

the training sample acquisition module is used for acquiring a plurality of training samples carrying labels; the training sample comprises a sample user characteristic and a sample object characteristic, and the label is used for representing the completion degree of a target task;

the parameter adjusting module is used for iteratively updating the parameters of the initial recommendation model based on a plurality of training samples so as to reduce the loss function values corresponding to the training samples and obtain a trained recommendation model;

and determining a loss function value corresponding to the training sample at least based on the difference between the decision value and the label corresponding to the training sample.

16. A training apparatus for a recommendation model based on multi-tasking prediction, comprising a processor for performing the method of any one of claims 10-14.