CN111292122A

CN111292122A - Method and apparatus for facilitating user to perform target behavior for target object

Info

Publication number: CN111292122A
Application number: CN202010047694.XA
Authority: CN
Inventors: 付大鹏
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-06-16

Abstract

The embodiment of the specification provides a method for promoting a user to execute target behaviors aiming at a target object during a target object recommendation task, which comprises the following steps: determining recommendation rights for the user from the candidate rights sets based on the recommendation coefficients of the respective candidate rights, and providing the recommendation rights on the terminal device of the user to perform rights recommendation operation, wherein the recommendation rights are used for prompting the user to perform at least one target behavior for a target object; acquiring response behavior data of a user aiming at the recommendation rights and interests from the terminal equipment; determining a benefit of the equity recommendation operation based on the response behavior data; and updating the recommendation coefficients of the candidate equity interests based on the income.

Description

Method and apparatus for facilitating user to perform target behavior for target object

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a method and a device for promoting a user to execute target behaviors aiming at a target object.

Background

In practice, a user is required to perform a specified target action or actions to coordinate with an enterprise or organization to complete a business. By providing certain rights and interests for the users, the users can be stimulated to perform target behaviors, so that corresponding business requirements are realized. For example, a web page can be viewed by providing bonus points for that web page, thereby enabling information on that web page to reach the user. The responses of users with different preferences to the same incentive means are different, so that if the same rights are provided to the user groups without distinction, the incentive effect is reduced. Therefore, there is a need in the art for a technique that can effectively facilitate a user to perform a target behavior.

Disclosure of Invention

In view of the foregoing, the present specification provides a method and apparatus for facilitating a user to perform a target behavior with respect to a target object during a target object recommendation task.

According to an aspect of embodiments of the present specification, there is provided a method for facilitating a user to perform a target behavior with respect to a target object during a target object recommendation task, including: determining recommendation rights for the user from the candidate rights sets based on the recommendation coefficients of the respective candidate rights, and providing the recommendation rights on the terminal device of the user to perform rights recommendation operation, wherein the recommendation rights are used for prompting the user to perform at least one target behavior for a target object; acquiring response behavior data of a user aiming at the recommendation rights and interests from the terminal equipment; determining a benefit of the equity recommendation operation based on the response behavior data; and updating the recommendation coefficients of the candidate equity interests based on the income.

Optionally, in an example, the user may include a user group, each candidate interest has a recommendation coefficient for each user, determining a recommendation interest for the user from the candidate interest set based on the recommendation coefficient of each candidate interest, and providing the recommendation interest on the terminal device of the user may include: and for each user in the user group, determining the recommendation interest for the user from the candidate interest set based on the recommendation coefficient of each candidate interest for the user, and providing the recommendation interest for the user on the terminal equipment of the user. Updating the recommendation coefficients for the respective candidate interests based on the revenue may include: and updating the recommendation coefficients of the candidate rights and interests for the users based on the income.

Optionally, in an example, the recommendation coefficient may include a recommendation probability, the target object recommended task may have a task duration, the recommendation probability of each candidate interest for each user obeys a predetermined probability distribution, the probability distribution may have a probability distribution parameter, and when the task duration is lower than a predetermined threshold, determining the recommendation interest for the user from the candidate interest set based on the recommendation coefficient of each candidate interest for the user may include: for each candidate interest, determining the random probability of the candidate interest for the user based on the probability distribution corresponding to the recommendation probability of the candidate interest for the user; and determining the recommendation interest aiming at the user from the candidate interests based on the random probability aiming at the user of the candidate interests. Updating the recommendation coefficients for the respective candidate interests based on the revenue may include: updating probability distribution parameters corresponding to candidate benefits corresponding to the recommended benefits of the respective users based on the benefits.

Optionally, in an example, the user may include a user group, the target object recommendation task may have a task duration, the respective candidate interests may have interest recommendation coefficients for all users in the user group, when the task duration is not less than a predetermined threshold, determining a recommendation interest for the user from the candidate interest set based on the recommendation coefficient of the respective candidate interest, and providing the recommendation interest on the terminal device of the user may include: and determining the recommendation rights for all the users from the candidate rights based on the recommendation coefficients of the candidate rights, and providing the recommendation rights to the terminal equipment of all the users. Updating the recommendation coefficients for the respective candidate interests based on the revenue may include: updating a recommendation coefficient corresponding to a candidate equity of the recommendation equity based on the income.

Optionally, in an example, the respective candidate recommendation rights can be configured based on the recommendation right sending opportunity, the recommendation right push channel and the recommendation right type.

Optionally, in an example, the respective candidate recommendation rights may be configured based on a recommendation rights push opportunity, a recommendation rights push channel, and a recommendation rights type.

Optionally, in one example, the target behavior may include at least one of click, conversion, and reach, and the benefit may include at least one of click rate, conversion rate, and reach rate.

Optionally, in one example, the benefit may also include an equity push cost.

According to another aspect of embodiments of the present specification, there is also provided an apparatus for facilitating a user to perform a target behavior with respect to a target object during a target object recommendation task, including: a recommendation operation execution unit which determines a recommendation interest for the user from the candidate interest set based on the recommendation coefficient of each candidate interest and provides the recommendation interest on the terminal equipment of the user to execute an interest recommendation operation, wherein the recommendation interest is used for prompting the user to execute at least one target behavior for a target object; the response behavior data acquisition unit is used for acquiring response behavior data of the user aiming at the recommendation rights and interests from the terminal equipment; a profit determination unit that determines profits of the equity recommendation operation based on the response behavior data; and a recommendation coefficient updating unit that updates the recommendation coefficient of each candidate right based on the profit.

Optionally, in one example, the user may comprise a population of users, each of the candidate interests in the set of candidate interests having a recommendation coefficient for each user. The recommendation operation execution unit may determine, for each user in the user group, a recommendation interest for the user from the candidate interest set based on the recommendation coefficient for the user for each candidate interest, and provide the recommendation interest for the user on the terminal device of the user. The recommendation coefficient updating unit may update the recommendation coefficient of the respective candidate equity for the respective user based on the profit.

Optionally, in an example, the recommendation coefficient may include a recommendation probability, the target object recommendation task may have a task duration, the recommendation probability of each candidate interest for each user may obey a predetermined probability distribution, the probability distribution may have a probability distribution parameter, and the recommendation operation performing unit may include: the random probability determining module is used for determining the random probability of the candidate interest for the user based on the probability distribution corresponding to the recommendation probability of the candidate interest for the user when the task deadline is lower than a preset threshold; the recommendation interest determining module is used for determining the recommendation interest aiming at the user from the candidate interests based on the random probability aiming at the user of each candidate interest; and the recommendation operation execution module is used for providing the recommendation rights and interests aiming at each user on the terminal equipment of the corresponding user. The recommendation coefficient updating unit may update, based on the profit, a probability distribution parameter corresponding to a candidate benefit corresponding to the recommendation benefit of the respective user.

Optionally, in one example, the user may include a user group, the target object recommendation task may have a task duration, and the respective candidate equity may have an equity recommendation coefficient for all users in the user group. The recommendation operation execution unit may determine, based on the recommendation coefficient of each candidate interest, a recommendation interest for the all users from the candidate interests when the task duration is not less than a predetermined threshold, and provide the recommendation interest to the terminal devices of the all users. The recommendation coefficient updating unit may update the recommendation coefficient corresponding to the candidate equity of the recommendation equity based on the profit.

According to another aspect of embodiments of the present specification, there is also provided a computing device including: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method as described above.

According to another aspect of embodiments herein, there is also provided a non-transitory machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method as described above.

By using the method and the device in the embodiment of the specification, the recommendation rights and interests for the user are determined based on the recommendation coefficients of the candidate rights and interests, and then the recommendation coefficients of the candidate rights and interests are updated based on the response behavior data of the user for the rights and interests recommendation operation, so that the recommendation coefficients of the candidate rights and interests can be closer to the preference of the user, and the benefit obtained by the rights and interests recommendation operation performed based on the recommendation coefficients can be improved.

Drawings

A further understanding of the nature and advantages of the contents of the embodiments of the present specification may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals. The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the detailed description serve to explain the embodiments of the invention. In the drawings:

FIG. 1 is a flow diagram of a method for facilitating a user to perform a target behavior with respect to a target object in accordance with one embodiment of the present description;

FIG. 2 is a schematic diagram for illustrating a method of facilitating a user to perform a target behavior with respect to a target object in accordance with an embodiment of the present description;

FIG. 3 is a flow diagram of a method for facilitating a user to perform a target behavior with respect to a target object in accordance with another embodiment of the present description;

FIG. 4 is a flow diagram of one example of a recommendation interest determination process in a method for facilitating a user to perform a target behavior with respect to a target object, according to another embodiment of the present description;

FIG. 5 is an exemplary diagram for explaining a recommendation interest determination process in a method for facilitating a user to perform a target behavior with respect to a target object according to another embodiment of the present specification;

FIG. 6 is a flow diagram of a method for facilitating a user to perform a target behavior with respect to a target object in accordance with another embodiment of the present description;

FIG. 7 is an exemplary diagram illustrating a method of facilitating a user to perform a target behavior with respect to a target object in accordance with another embodiment of the present description;

FIG. 8 is a block diagram of an apparatus for facilitating a user in performing a target behavior with respect to a target object, according to one embodiment of the present description;

FIG. 9 is a block diagram of an example of a recommended task execution unit in the apparatus for facilitating a user to perform a target behavior with respect to a target object shown in FIG. 8; and

FIG. 10 is a block diagram of a computing device that is configured to facilitate a method of a user performing a target behavior with respect to a target object, according to one embodiment of the specification.

Detailed Description

The subject matter described herein will be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the embodiments of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

The method and apparatus for facilitating a user to perform a target behavior with respect to a target object of embodiments of the present specification are now described with reference to the drawings.

FIG. 1 is a flow diagram of a method for facilitating a user in performing a target behavior with respect to a target object during a target object recommendation task, according to one embodiment of the present description.

As shown in FIG. 1, at block 120, a recommendation interest for the user is determined from the set of candidate interests based on the recommendation coefficients for the respective candidate interests, and the recommendation interest is recommended to the user. The recommendation coefficient indicates the opportunity size of the candidate interest to be recommended to the user, the candidate interest with larger recommendation coefficient is more likely to be recommended to the user, and the candidate interest with smaller recommendation coefficient is less likely to be recommended to the user. On the other hand, the recommendation coefficient can indicate the user's preference for each candidate benefit. The recommendation coefficient may be, for example, a recommendation probability.

The equity refers to an incentive provided to the user for prompting the user to perform a target behavior with respect to a target object. The candidate equity in the set of candidate equity may be, for example, a coupon, a credit, a discount, a sweepstakes eligibility, etc., or a combination of the above examples. The target object may be a commodity, a web page, or textual information (e.g., a questionnaire or questions to be answered, etc.), etc. The target behavior may be, for example, a reach operation, a click operation, a conversion operation, and the like. The touch operation refers to that a target object is exposed to a user after the user performs a certain operation, for example, the user can see an advertisement or a commodity presented in an application program or a website after logging in the application program or the website. The conversion operation generally refers to a user operation desired by the enterprise, for example, a user purchasing a good or service corresponding to the advertisement after clicking on the advertisement is an example of the conversion operation. As an example, the target object may also be a questionnaire for a certain matter, and the target behavior may be a reply behavior to the questionnaire. In this example, the conversion behavior may be the answering and submission of questions upon clicking into a questionnaire page.

The initial values of the recommendation coefficients for the respective candidate interests may be determined based on user characteristic data of the user, for example, using a recommendation coefficient prediction model. In another example, the recommendation coefficients for the respective candidate interests for the user may be randomly initialized. In another example, the recommendation coefficients for the respective candidate interests for the user may be equal. In this example, at least one candidate equity may be randomly determined from the set of candidate equity shares as a recommendation equity in a first round of equity recommendation operations. After determining the initial value, the recommendation coefficients for the respective candidate equities may be updated after performing at least one round of equity recommendation operations. After updating the recommendation coefficient, the recommendation coefficient before updating may be recorded as a recommendation coefficient history value. As an example, one or more candidate benefits with the highest recommendation coefficient may be recommended to the user as recommendation benefits. The recommendation rights may also be determined based on the current and historical values of the recommendation coefficients. For example, the current value and the historical value may be averaged, and then the at least one candidate benefit with the largest value may be determined as the recommendation benefit for the user.

The recommendation rights may be recommended to the user upon detecting that the user opens or logs in a specified application, or upon detecting that the user browses a specified page. In one example, whether an activation event occurs to an application on a user device can be monitored, and the activation event can be that the user opens or logs in the application, and the like. If a login request message of the user terminal device for the application program is received, or a page data request message of the user terminal device for a specified page is received, it can be determined that the user has opened or logged in the specified application program, or the user is browsing the specified page. The recommendation rights for the user can be provided on the terminal device of the user when the activation event is monitored, so that the recommendation rights are recommended to the user. The recommendation rights for the user may also be determined based on the obtained user information and the rights recommendation policy when an activation event is monitored. If at least one recommendation interest for the customer can be determined based on the user information and the interest recommendation policy, the determined recommendation interest may be recommended to the user. If the user information cannot meet the decision requirement of the equity recommendation strategy and cannot determine the recommendation equity, the equity recommendation can be carried out by using the method of the embodiment of the specification. As an example, the equity recommendation policy may be generated based on the age, gender, location, consumption history within a recent predetermined time period of the user, e.g., "a coupon may be recommended to a user that satisfies the following conditions: women aged 20-30 years, purchased bags in the last month ". When the consumption history of the user cannot be obtained, the recommendation rights and interests for the user cannot be decided according to the rights and interests recommendation strategy, so that the recommendation rights and interests can be determined based on the recommendation coefficient.

After determining the recommendation interest, the recommendation interest may be provided to the terminal device of the user for a predetermined period of time until the recommendation interest for the next round of interest recommendation operation is determined.

After the recommendation rights are to be provided on the user's terminal device, the user can confirm the recommendation rights displayed on the terminal device. And if the recommendation rights displayed on the user terminal equipment are interested, executing target behaviors corresponding to the recommendation rights. If the user is not interested in the recommendation interests, no response may be made.

Then, at block 140, response behavior data of the user for the recommendation interests is obtained from the user's terminal device. The response behavior data of the user may be that the user performs or does not perform a target behavior such as a reach, click, conversion or reply within a predetermined time period after the recommendation interest is provided on the terminal device of the user.

After the user's response behavior data is obtained, at block 160, the benefits of the equity recommendation operation are determined based on the user's response behavior data to the recommendation equity. The benefit may represent a positive response effect of the user to the equity recommendation operation. When the target behavior is reach, click, conversion behavior, the benefit may be, for example, reach click rate, conversion rate, and the like. The benefit may be a response rate when the target action is a response action to a predetermined question. As an example, user behavior data such as user click behavior, conversion behavior, etc. may be collected based on a predetermined time interval, such that the click rate, conversion rate, etc. may be determined as a benefit based on the user behavior data. The benefit may also be determined based on the target object recommendation task. For example, if the target object recommendation task is to place an advertisement within an application, if the expectation of the target object recommendation task is that the click rate of the advertisement by the user is not lower than a predetermined value, the click rate of the user group may be taken as the profit. It may also be set that click through rate, conversion rate, etc. correspond to different values of benefit within different ranges, for example the benefit may be a negative value or may be 0 when click through rate does not reach an expected value and the benefit may be a positive value when click through rate reaches an expected value.

After determining the benefits of the equity recommendation operation, the recommendation coefficients for each candidate equity are updated based on the benefits, at block 160. For example, if positive earnings are obtained, the recommendation coefficients of the candidate equity determined as the recommendation equity may be increased, and if negative earnings are obtained, the recommendation coefficients of the candidate equity determined as the recommendation equity may be decreased. The updated recommendation coefficient is used to determine the recommendation rights for the next round of rights recommendation operation. Therefore, after multiple rounds of equity recommendation operation and recommendation coefficient updating operation are performed, the recommendation coefficients of the candidate equity can be closer to the preference of the user, higher benefits are obtained by the equity recommendation operation, and the target object can be recommended to the user more efficiently.

A reinforcement learning model may be employed to determine recommendation interests and update recommendation coefficients for candidate interests based on the gains. Fig. 2 is a schematic diagram for explaining the reinforcement learning model. As shown in fig. 2, the reinforcement learning model relates to elements such as execution environment, state, action, benefit, etc. in its decision process. The agent perceives the execution environment in which it is located and the state is a representation of the execution environment perceived by the agent. The agent determines an action to be performed based on a predetermined policy for the perceived execution environment and performs the action. The action performed may cause the state of the execution environment to change, transitioning to the next state. While moving to the next state, certain benefits are generated. For example, positive gains may be achieved if the next state is at or near the desired state, and negative gains may be achieved if the next state is not at or far from the desired state. The agent may determine, through the revenue obtained, whether the action performed is a "good" action or a "bad" action and update its policy for determining the action. Thus, the reinforcement learning model can heuristically adapt the phase policy to the execution environment. The equity recommendation operation may be used as an action of the reinforcement learning model, and the profit determined based on the response behavior data of the user for the recommended equity may be used as the profit of the reinforcement learning, so that the recommendation coefficient of each candidate equity may be updated by performing at least one round of equity recommendation operation.

In one example, when updating the recommendation coefficients for each candidate interest, user behavior data since the last determined recommendation interest may be collected and a recommendation coefficient model may be used to predict the recommendation coefficient for each candidate interest for the user based on the newly collected user behavior data. Then, the recommendation coefficient may be updated based on the predicted recommendation coefficient and the determined benefit this time. For example, for a candidate interest that was not determined as a recommendation interest when the recommendation interest was determined last time, the predicted recommendation coefficient may be updated to a new recommendation coefficient, and for a candidate interest determined as a recommendation interest, a benefit and an influence right of the predicted recommendation coefficient on the candidate interest may be set, so that a new recommendation coefficient is calculated in a weighted manner.

When the equity recommendation operation is performed, the user may be a single user or a group of users. The user group may be determined after clustering the collected users based on the user information, and the clustering may be implemented by manual clustering or clustering using a machine learning algorithm. User information may include user profile information (e.g., age, gender, occupation, locale, etc.), user behavior information, user device information (e.g., device type, application version, etc.), and so forth. After clustering is carried out based on the user information, users with similar preferences can be clustered into the same user group, so that in the process of executing the equity recommendation operation, the determined profit can reflect the profit brought by the equity recommendation operation aiming at each user in the user group more truly, and the condition that part of equity recommendation operation which can bring the profit is neglected is avoided. Taking the click through rate as the profit case as an example, if in the present round of recommendation operation, although some users click on the target object, since the click through rate does not reach the expected value, the profit of the present round of recommendation operation is considered as a non-positive profit, which will result in that some users having a positive effect actually have their profit recommendation operation ignored. Thus, by clustering users having similar preferences into one user group and performing equity recommendation operations for the user group, it is possible to minimize the omitted equity recommendation operations.

In recommending equity to a user population, each candidate equity may have a recommendation coefficient for each user. In this example, the recommendation interests may be determined for the respective users based on the recommendation coefficients for the respective users, respectively, as shown in fig. 3. FIG. 3 is a flow diagram of a method for facilitating a user to perform a target behavior with respect to a target object in accordance with another embodiment of the present description.

As shown in FIG. 3, for each user in the user population, a recommendation interest for the user is determined from the set of candidate interests based on the recommendation coefficient for the user for each candidate interest, and the recommendation interest is provided on the terminal device of the user, at block 320. At block 340, response behavior data of each user for the recommendation interests is obtained from the terminal device of each user.

After the response activity data is obtained, at block 360, the benefits of the equity recommendation operation are determined based on the response activity data of the respective user. Taking the target behavior as the conversion behavior as an example, the click rate of the user group may be determined based on the response behavior data of each user as the benefit.

After the benefits are determined, the recommendation coefficients for each of the candidate benefits to each of the users are updated based on the benefits at block 380.

The target object recommendation task may have a task deadline. When the task deadline is below a predetermined threshold, the target object recommended task is a short-term recommended task. For short-term recommended tasks, due to the fact that the task duration is short, the user traffic is limited, and after the equity recommended task is started, if the recommendation coefficients of the candidate equity cannot be matched with the preference of each user as soon as possible, part of the user traffic will be wasted. In an example in which each candidate equity has a recommendation coefficient for each user, in order to avoid wasting user traffic, the recommendation coefficients for each user are quickly matched to the preferences of each user, thereby maximizing the positive effect of equity recommendation operations, the example shown in fig. 4 may be utilized for short-term recommendation tasks.

FIG. 4 is a flow diagram of one example of a recommendation interest determination process in a method for facilitating a user to perform a target behavior with respect to a target object, according to another embodiment of the present description. Fig. 5 is an exemplary diagram for explaining a recommendation interest determination process in the method for facilitating a user to perform a target behavior with respect to a target object according to this example of the present specification. In the example shown in fig. 4, the recommendation coefficient is a recommendation probability, and the recommendation probabilities for respective candidate benefits for respective users obey a predetermined probability distribution, the probability distribution having a probability distribution parameter.

As shown in FIG. 4, for each candidate interest in the set of candidate interests, a random probability for the candidate interest for the user is determined based on a probability distribution corresponding to the probability of recommendation for the user for the candidate interest, at block 402. The following description will take an example in which the recommendation probability follows a Beta distribution.

FIG. 5 shows an example of a Beta distribution, where the horizontal axis may represent the probability of recommendation of each candidate equity to the user.f (x; α) in FIG. 5 is a probability density function of the Beta distribution, α and β are probability distribution parameters of the probability distribution, α may represent the number of times positive earnings are generated in the historical equity recommendation operation, β may represent the number of times no positive earnings are generated in the historical equity recommendation operation. α and β may be determined randomly or based on empirical values.A probability density curve is shown in FIG. 5 where α takes a value of 81 and β takes a value of 219.

After determining the random probability for each candidate benefit for the user, at block 404, a recommendation benefit for the user is determined from each candidate benefit based on the random probability for the user for each candidate benefit. For example, at least one candidate benefit corresponding to the one or more random probabilities with the largest value may be determined as the recommendation benefit for the user.

After determining the recommendation interests for the respective users, at block 406, the recommendation interests for the respective users are provided on the terminal devices of the respective users to recommend the recommendation interests to the respective users. Then, at block 408, response behavior data of each user for the corresponding recommendation interests is obtained from the terminal device of each user.

After obtaining the response behavior data of each user, at block 410, the benefit of the equity recommendation operation is determined based on the response behavior data of each user in the user population for the recommendation equity. After the avails are determined, the probability distribution parameters for the candidate equity corresponding to the recommended equity for each user are updated based on the avails, at block 412.

For example, if the user clicks on a target object, the profit may be determined to be +1, and if the user does not click on the target object, the profit may be determined to be 0.

For example, assuming that the recommendation probability obeys a Beta distribution, the recommendation interest for each user is incremented by α when the click rate during the predetermined time interval reaches a desired value (i.e., there is positive gain), and the recommendation interest for each user is incremented by β when the click rate during the predetermined time interval does not reach (i.e., there is no positive gain) the recommendation interest for each user is incremented by one, e.g., assuming that there are five user us 1-5 and three candidate interest O5-O5, the candidate interest O5-O5 has a recommendation probability for the user us 5-5, and the respective corresponding interest distribution obeys the recommendation interest for the user us 5, 5 for the corresponding preference for the user us 5, 5 for the recommendation interest rate for the user us 5, 5 for the corresponding preference selection operation, 5 for the user us 5, 5 for the corresponding preference selection rule for the user is incremented by one when the click rate during the predetermined time interval does not reach the recommendation interest for the corresponding preference for the user us 5, 5 for the corresponding preference distribution, 5 for the recommendation interest for the user O5, 5 for the corresponding preference for the recommendation operation for the preference for the corresponding preference for the user 5, 36.

The above example for short-term recommended tasks may be implemented based on a dobby algorithm model in a reinforcement learning model. The dobby algorithm model may be, for example, a thomplin sampling algorithm. The candidate equity benefits may be used as arms of a dobby algorithm model, and the user response behavior data to the recommendation equity benefits may be used as rewards for the dobby algorithm model to train the dobby algorithm model during multiple rounds of equity recommendation operations.

The target object recommended task may also be a long-term recommended task having a task duration not less than a predetermined duration. For example, a given equity recommendation task may be a long-term marketing task or a long-term questionnaire task that is long-lasting. For long-term recommendation tasks, each candidate equity may have a probability of recommendation for a user population, and equity recommendations may be made using the example shown in fig. 3.

In another example, for a long-term recommendation task, the respective candidate interests may have a recommendation coefficient for all users in the user population. That is, the user population is treated as a whole, and each candidate benefit has a recommendation coefficient for the whole. For example, assuming that there are 5 candidate benefits, the recommendation coefficients for each candidate benefit for all users in the user population may be a1, a2, A3, a4, and a5, and then the recommendation coefficients for each candidate benefit are the same for all users in the user population. In this example, the example shown in fig. 6 may be utilized. FIG. 6 is a flow diagram of a method for facilitating a user to perform a target behavior with respect to a target object in accordance with another embodiment of the present description.

As shown in FIG. 6, at block 620, recommendation interests for all users in the user population are determined from the candidate interests based on the recommendation coefficients of the candidate interests, and the recommendation interests are provided on the terminal devices of the users in the user population to recommend the determined recommendation interests to the users. Initial values of recommended body coefficients for the respective candidate equity against the user group may be predicted using a recommendation probability prediction model based on respective user information in the user group, and then the recommendation coefficients may be updated based on the profit of each round of equity recommendation operations.

After recommending the recommendation interests to each user, at block 640, the determined response behavior data of the user group for the interest recommendation operation is obtained from the terminal devices of each user. Then, at block 660, the benefits of the equity recommendation operation are determined based on the response behavior data for the recommendation equity for each user in the user population. The click-through rate of the user group in response to the round of equity recommendation operations may be counted as a profit after the equity recommendation operations.

After the avails are determined, at block 680, recommendation coefficients for candidate interests corresponding to each candidate interest recommendation interest are updated based on the avails. For example, the corresponding recommendation factor may be increased if there is positive gain, and otherwise decreased.

For long-term recommendation tasks, equity recommendation operations and recommendation coefficient update operations may be performed based on a policy gradient model in a reinforcement learning model. Upon receiving the target object recommendation task, a reinforcement learning model for executing the target object recommendation task may be determined based on a task deadline of the target object recommendation task. When the task deadline is not less than a predetermined deadline, a policy gradient model may be utilized to make the equity recommendation. The user characteristic information of each user can be used as the state of the strategy gradient model, each candidate recommendation interest in the candidate recommendation interest set is used as the action of the strategy gradient model, and the response behavior data of the user aiming at the recommendation interest is used as the benefit of the strategy gradient model.

FIG. 7 is an exemplary diagram for illustrating a method of facilitating a user to perform a target behavior with respect to a target object according to another embodiment of the present description.

As shown in fig. 7, the target object recommendation task may include behavior information, user equipment information, environment information, and the like as the states of the reinforcement learning algorithm, the equity recommendation operation as the reinforcement learning operation, and the profit of the equity recommendation operation as the profit of the reinforcement learning. The reinforcement learning algorithm may be, for example, a Policy Gradient (Policy Gradient) algorithm.

The recommendation probability P1-Pn for each candidate interest for each user may be predicted using a recommendation probability prediction model based on the user information for that user. The recommendation probability of each candidate interest for the user group can also be predicted by using a recommendation probability prediction model based on the sum of the user information of the user group. The respective candidate equity may be generated based on a recommendation opportunity, a recommendation channel, and an equity category. The recommended times may include, for example, various time periods of the day, dining times, weekdays, non-weekdays, and the like. The recommendation channel refers to a display mode of recommended recommendation rights, and may be displayed on an application start page or pushed to a mobile phone of a user by a short message, for example. The equity categories may include red envelope, top-up coupons, membership, etc. The recommendation opportunities, recommendation channels, and equity categories may be arbitrarily combined to generate individual candidate equities. The benefits may include one or more of click through rate, conversion rate, number of new registered users, equity recommendation cost. For the click rate, the conversion rate and the number of new registered users, the profit value when the click rate, the conversion rate and the number of new registered users reach the expected value is a positive value, and the profit when the number of new registered users does not reach the expected value is a negative value or 0. For the equity recommendation cost, a negative profit value may be set, and the absolute value of the profit value may be set to be higher as the cost is higher.

Different users 'push opportunities, push channels, equity categories, etc. all affect the user's preference for recommended equity. By configuring each candidate equity based on the equity pushing opportunity, the pushing channel and the equity category, the trained reinforcement learning model can reflect the preference of the user more comprehensively, and the income of equity recommendation operation is maximized. In addition, by configuring each candidate equity based on the equity push opportunity, the push channel and the equity category, a model aiming at the push opportunity, the push channel and the equity category does not need to be separately designed and trained, so that the profit of the equity recommendation operation is improved, and the cost can be reduced. In addition, by using the push cost as the profit, the recommendation cost can be minimized while the profit of the equity recommendation operation is increased.

If the predicted recommendation probability is for each user, the recommendation interests recommended to the user may be determined based on the recommendation probabilities for each candidate interest for the user. The recommendation probabilities for the respective candidate interests for the respective users may then be updated based on the revenue. If the predicted recommendation probability is the recommendation probability for the whole user group, the determined recommendation rights and interests are recommended to the users in the user group after the recommendation rights and interests are determined for the recommendation probability of the user group based on the candidate rights and interests, and then the recommendation probability corresponding to the candidate rights and interests is updated based on the profits.

FIG. 8 is a block diagram of an apparatus for facilitating a user in performing a target behavior with respect to a target object, according to one embodiment of the present description. As shown in fig. 8, the target behavior promoting apparatus 800 includes a recommending operation performing unit 810, a response behavior data acquiring unit 820, a profit determining unit 830, and a recommendation coefficient updating unit 840.

The recommendation operation execution unit 810 determines a recommendation interest for the user from the candidate interest set based on the recommendation coefficient of each candidate interest, and provides the recommendation interest on the terminal device of the user to execute an interest recommendation operation, wherein the recommendation interest is used for prompting the user to execute at least one target behavior for a target object. The response behavior data acquisition unit 820 acquires response behavior data of the user for the recommendation interest from the terminal device. After acquiring the response data, the profit determination unit 830 determines the profit of the equity recommendation operation based on the response behavior data. The recommendation coefficient updating unit 840 updates the recommendation coefficients of the respective candidate interests based on the profit.

The user may comprise a user population. In one example, the respective candidate interests may have a recommendation coefficient for each user in the user population. In this example, the recommendation operation performing unit 810 may determine, for each user in the user group, a recommendation interest for the user from the candidate interest set based on the recommendation coefficient for the user for each candidate interest, and provide the recommendation interest for the user on the terminal device of the user. The recommendation coefficient updating unit 840 may update the recommendation coefficients for the respective candidate interests for the respective users based on the profits.

In another example, the target object recommendation task may have a task duration, and the respective candidate equity may have an equity recommendation coefficient for all users in the user population. In this example, the recommending operation performing unit 810 may determine the recommending interest for all users from the respective candidate interests based on the recommending coefficients of the respective candidate interests when the task deadline is not less than the predetermined threshold, and provide the recommending interest to the terminal devices of all users. The recommendation coefficient updating unit 840 may update the recommendation coefficients corresponding to the candidate equity of the recommendation equity based on the earnings.

FIG. 9 is a block diagram of an example of a recommended task execution unit in the apparatus for facilitating a user to perform a target behavior with respect to a target object shown in FIG. 8. In this example, the recommendation coefficient may include a recommendation probability, the task deadline for the target object to recommend the task is below a predetermined deadline, the recommendation probability for each candidate benefit for each user may follow a predetermined probability distribution, and the probability distribution may have a probability distribution parameter. As shown in fig. 9, the recommendation operation performing unit 810 includes a random probability determining module 811, a recommendation interest determining module 812, and a recommendation operation performing module 813.

When the task deadline is lower than a predetermined threshold, the random probability determination module 811 determines, for each candidate interest, a random probability for the candidate interest for the user based on a probability distribution corresponding to a recommendation probability for the candidate interest for the user. The recommendation interest determination module 812 determines a recommendation interest for the user from the candidate interests based on the random probability for the user for each candidate interest. The recommendation operation execution module 813 provides the recommendation rights for the respective users on the terminal devices of the respective users. In this example, the recommendation coefficient updating unit 840 may update the probability distribution parameters corresponding to the candidate benefits corresponding to the recommendation benefits of the respective users based on the benefits.

Embodiments of a method and apparatus for facilitating a user to perform a target behavior with respect to a target object according to embodiments of the present specification are described above with reference to fig. 1 to 9. The details mentioned in the above description of the method embodiments apply equally to the embodiments of the device of the embodiments of the present description.

The means for facilitating the user to perform the target behavior on the target object of the embodiments of the present specification may be implemented by hardware, software, or a combination of hardware and software. The various embodiments in this specification are described in a progressive manner, with like reference to each other.

The means for facilitating the user to perform the target behavior on the target object of the embodiments of the present specification may be implemented by hardware, software, or a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the storage into the memory for operation through the processor of the device where the software implementation is located as a logical means. In embodiments of the present specification, the means for generating the blockchain may be implemented, for example, using a computing device.

FIG. 10 is a block diagram of a computing device that is configured to facilitate a method of a user performing a target behavior with respect to a target object, according to one embodiment of the specification. As shown in fig. 10, the computing device 1000 includes a processor 1010, a storage 1020, a memory 1030, a communication interface 1040, and an internal bus 1050, and the processor 1010, the storage (e.g., a non-volatile storage) 1020, the memory 1030, and the communication interface 1040 are connected together via the bus 1050. According to one embodiment, the computing device 1000 may include at least one processor 1010, the at least one processor 1010 executing at least one computer-readable instruction (i.e., an element described above as being implemented in software) stored or encoded in a computer-readable storage medium (i.e., memory 1020).

In one embodiment, computer-executable instructions are stored in the memory 1020 that, when executed, cause the at least one processor 1010 to: determining recommendation rights for the user from the candidate rights sets based on the recommendation coefficients of the respective candidate rights, and providing the recommendation rights on the terminal device of the user to perform rights recommendation operation, wherein the recommendation rights are used for prompting the user to perform at least one target behavior for a target object; acquiring response behavior data of a user aiming at the recommendation rights and interests from the terminal equipment; determining a benefit of the equity recommendation operation based on the response behavior data; and updating the recommendation coefficients of the candidate equity interests based on the income. .

It should be appreciated that the computer-executable instructions stored in the memory 1020, when executed, cause the at least one processor 1010 to perform the various operations and functions described above in connection with fig. 1-9 in the various embodiments of the present specification.

According to one embodiment, a program product, such as a non-transitory machine-readable medium, is provided. A non-transitory machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-9 in various ones of the embodiments of the present specification.

Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Not all steps and elements in the above flows and system structure diagrams are necessary, and some steps or elements may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.

The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

Although the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the embodiments of the present disclosure are not limited to the specific details of the embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present disclosure within the technical concept of the embodiments of the present disclosure, and all of them fall within the scope of the embodiments of the present disclosure.

The previous description of the contents of the embodiments of the present specification is provided to enable any person skilled in the art to make or use the contents of the embodiments of the present specification. Various modifications to the disclosure of the embodiments herein will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the embodiments herein. Thus, the present specification embodiments are not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for facilitating a user to perform a target behavior with respect to a target object during a target object recommendation task, comprising:

determining recommendation rights for the user from the candidate rights sets based on the recommendation coefficients of the respective candidate rights, and providing the recommendation rights on the terminal device of the user to perform rights recommendation operation, wherein the recommendation rights are used for prompting the user to perform at least one target behavior for a target object;

acquiring response behavior data of a user aiming at the recommendation rights and interests from the terminal equipment;

determining a benefit of the equity recommendation operation based on the response behavior data; and

and updating the recommendation coefficients of the candidate rights and interests based on the income.

2. The method of claim 1, wherein the user comprises a population of users, each candidate benefit in the set of candidate benefits has a recommendation coefficient for each user, determining a recommendation benefit for the user from the set of candidate benefits based on the recommendation coefficient for each candidate benefit, and providing the recommendation benefit on the user's terminal device comprises:

for each user in the user group, determining the recommendation interest for the user from the candidate interest set based on the recommendation coefficient of each candidate interest for the user, and providing the recommendation interest for the user on the terminal equipment of the user,

updating the recommendation coefficients for the respective candidate equity based on the revenue includes:

and updating the recommendation coefficients of the candidate rights and interests for the users based on the income.

3. The method of claim 2, wherein the recommendation coefficient comprises a recommendation probability, the target object recommends a task with a task duration, the recommendation probability for each candidate benefit for the respective user obeys a predetermined probability distribution, the probability distribution having a probability distribution parameter, and determining the recommendation benefit for the user from the set of candidate benefits based on the recommendation coefficient for the user for the respective candidate benefit when the task duration is below a predetermined threshold comprises:

for each candidate interest, determining the random probability of the candidate interest for the user based on the probability distribution corresponding to the recommendation probability of the candidate interest for the user;

determining a recommended equity for the user from the respective candidate equity based on the random probability for the respective candidate equity for the user,

updating probability distribution parameters corresponding to candidate benefits corresponding to the recommended benefits of the respective users based on the benefits.

4. The method of claim 1, wherein the user comprises a user population, the target object recommending tasks having task deadlines, the respective candidate interests having interest recommendation coefficients for all users in the user population, determining a recommendation interest for the user from the set of candidate interests based on the respective candidate interest recommendation coefficients when the task deadlines are not below a predetermined threshold, and providing the recommendation interest on the user's terminal device comprises:

determining the recommendation rights for the all users from the candidate rights based on the recommendation coefficients of the candidate rights, and providing the recommendation rights to the terminal equipment of the all users,

updating a recommendation coefficient corresponding to a candidate equity of the recommendation equity based on the income.

5. The method of claim 4, wherein the respective candidate equity recommendations are configured based on an equity recommendation transmission opportunity, an equity recommendation push channel, and an equity recommendation type.

6. The method of claim 4, wherein the respective candidate recommendation interests are configured based on a recommendation interest push opportunity, a recommendation interest push channel, and a recommendation interest type.

7. The method of claim 1, wherein the target behavior comprises at least one of click, conversion, and reach, and the benefit comprises at least one of click rate, conversion, and reach rate.

8. The method of claim 1, wherein the revenue further comprises equity push costs.

9. An apparatus for facilitating a user to perform a target behavior with respect to a target object during a target object recommendation task, comprising:

a recommendation operation execution unit which determines a recommendation interest for the user from the candidate interest set based on the recommendation coefficient of each candidate interest and provides the recommendation interest on the terminal equipment of the user to execute an interest recommendation operation, wherein the recommendation interest is used for prompting the user to execute at least one target behavior for a target object;

the response behavior data acquisition unit is used for acquiring response behavior data of the user aiming at the recommendation rights and interests from the terminal equipment;

a profit determination unit that determines profits of the equity recommendation operation based on the response behavior data; and

and the recommendation coefficient updating unit is used for updating the recommendation coefficients of the candidate rights and interests based on the profits.

10. The apparatus of claim 9, wherein the user comprises a population of users, each of the set of candidate benefits having a recommendation coefficient for each user,

the recommendation operation execution unit determines, for each user in the user group, a recommendation interest for the user from the candidate interest set based on the recommendation coefficient for the user for each candidate interest, and provides the recommendation interest for the user on the terminal device of the user,

the recommendation coefficient updating unit updates the recommendation coefficient of each candidate equity for each user based on the income.

11. The apparatus of claim 10, wherein the recommendation coefficient includes a recommendation probability, the target object recommending a task having a task duration, the recommendation probability of each candidate interest for the each user obeying a predetermined probability distribution, the probability distribution having a probability distribution parameter, the recommending operation performing unit includes:

the random probability determining module is used for determining the random probability of the candidate interest for the user based on the probability distribution corresponding to the recommendation probability of the candidate interest for the user when the task deadline is lower than a preset threshold;

the recommendation interest determining module is used for determining the recommendation interest aiming at the user from the candidate interests based on the random probability aiming at the user of each candidate interest; and

a recommendation operation execution module for providing the recommendation rights for each user on the terminal device of the corresponding user,

the recommendation coefficient updating unit updates, based on the profit, probability distribution parameters corresponding to candidate benefits corresponding to the recommendation benefits of the respective users.

12. The apparatus of claim 9, wherein the user comprises a user population, the target object recommendation task has a task duration, the respective candidate equity has an equity recommendation coefficient for all users in the user population,

the recommending operation executing unit determines the recommending interest aiming at all the users from all the candidate interests based on the recommending coefficient of each candidate interest when the task duration is not lower than the preset threshold value, and provides the recommending interest to the terminal equipment of all the users,

the recommendation coefficient updating unit updates recommendation coefficients corresponding to candidate benefits of the recommendation benefits based on the benefits.

13. A computing device, comprising:

at least one processor; and

a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 8.

14. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any one of claims 1 to 8.