CN110647687B

CN110647687B - Service recommendation method and device

Info

Publication number: CN110647687B
Application number: CN201910916020.6A
Authority: CN
Inventors: 范丰麟; 孙传亮; 赵华; 朱通
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2022-02-08
Anticipated expiration: 2039-09-26
Also published as: CN110647687A

Abstract

One embodiment of the present specification provides a service recommendation method and apparatus, where the method includes: acquiring behavior data of network behaviors which are executed by a user and are related to the fee deduction service; the fee withholding service is provided by a third-party payment platform; the network behavior comprises a service signing behavior and a service contract-resolving behavior; determining a subscription willingness value of the user for the fee withholding service in a reinforcement learning mode according to the behavior data; determining a payment credit score of the user on the third party payment platform based on historical payment information of the user on the third party payment platform; and determining a recommendation strategy for recommending fee withholding service to the user according to the subscription willingness value and the payment credit value.

Description

Service recommendation method and device

Technical Field

The present disclosure relates to the field of computer devices, and in particular, to a service recommendation method and apparatus.

Background

With the continuous development of internet technology, users can subscribe various internet services, such as video member services, on the network.

In order to facilitate the user to order the internet service, the third party payment platform releases the fee withholding service, and if the user selects the fee withholding service while ordering the internet service, the third party payment platform can replace a service provider to automatically deduct the fee, for example, replace a video provider to automatically collect member fees monthly, so that the payment experience of the user is improved, and the user can conveniently enjoy the internet service.

Based on this, in consideration of different acceptance degrees of the fee deduction service by different users, it is necessary to provide a technical scheme for determining a recommendation policy for recommending the fee deduction service to the user, so as to improve the accuracy of recommending the fee deduction service.

Disclosure of Invention

An embodiment of the present disclosure provides a service recommendation method and apparatus, so as to determine a recommendation policy for recommending a fee withholding service to a user, and improve accuracy of recommending the fee withholding service.

To solve the above technical problem, one embodiment of the present specification is implemented as follows:

one embodiment of the present specification provides a service recommendation method, including:

acquiring behavior data of network behaviors which are executed by a user and are related to the fee deduction service; the fee withholding service is provided by a third-party payment platform; the network behavior comprises a service signing behavior and a service contract-resolving behavior;

determining a subscription willingness value of the user for the fee withholding service in a reinforcement learning mode according to the behavior data;

determining a payment credit score of the user on the third party payment platform based on historical payment information of the user on the third party payment platform;

and determining a recommendation strategy for recommending fee withholding service to the user according to the subscription willingness value and the payment credit value.

One embodiment of the present specification provides a service recommendation apparatus including:

the data acquisition module is used for acquiring behavior data of network behaviors which are executed by a user and are related to the fee withholding service; the fee withholding service is provided by a third-party payment platform; the network behavior comprises a service signing behavior and a service contract-resolving behavior;

the intention determining module is used for determining a subscription intention value of the user for the fee withholding service in a reinforcement learning mode according to the behavior data;

a credit determination module for determining a payment credit score of the user on the third party payment platform based on historical payment information of the user on the third party payment platform;

and the strategy recommending module is used for determining a recommending strategy for recommending the fee withholding service to the user according to the signing willingness value and the payment credit value.

One embodiment of the present specification provides a service recommendation apparatus including: a processor; and a memory arranged to store computer executable instructions which, when executed, cause the processor to implement the steps of the service recommendation method described above.

One embodiment of the present specification provides a storage medium for storing computer-executable instructions that, when executed, implement the steps of the service recommendation method described above.

In an embodiment of the present specification, a subscription willingness value of a user for a fee deduction service can be determined according to behavior data of a network behavior executed by the user and related to the fee deduction service, a payment credit score of the user on a third-party payment platform is determined based on historical payment information of the user on the third-party payment platform, and a recommendation policy for recommending the fee deduction service to the user is determined according to the subscription willingness value and the payment credit score, so that an effect of accurately determining the recommendation policy for recommending the fee deduction service to the user from the perspective of user willingness and the perspective of user credit according to multi-aspect information is achieved, and accuracy of recommending the fee deduction service is improved.

Drawings

In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without any creative effort.

Fig. 1a is a schematic view of a scenario of a service recommendation method provided in an embodiment of the present specification;

fig. 1b is a schematic view of a scenario of a service recommendation method provided in an embodiment of the present specification;

FIG. 2 is a flow chart illustrating a service recommendation method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a recommendation strategy provided in one embodiment of the present specification;

fig. 4 is a schematic block diagram of a service recommendation device according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a service recommendation device according to an embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from one or more of the embodiments described herein without making any inventive step shall fall within the scope of protection of this document.

Fig. 1a is a schematic view of a scenario of a service recommendation method provided in an embodiment of the present specification, and fig. 1b is a schematic view of a scenario of a service recommendation method provided in an embodiment of the present specification, as shown in fig. 1a, the scenario includes a user terminal and a server of a third party payment platform, where the user terminal includes, but is not limited to, a tablet computer 101, a mobile phone 102, a desktop computer 103, and a notebook computer 104 as shown in fig. 1a, and the server of the third party payment platform includes, but is not limited to, a server 200 as shown in fig. 1 a. In this scenario, the user may subscribe to various internet services, such as a member service subscribing to a video website shown in fig. 1b, through the user terminal and the server of the third party payment platform. In the user ordering process, a server of the third-party payment platform can recommend fee deduction service to a user through a user terminal, and if the user colludes the fee deduction service, the third-party payment platform can replace a video provider to automatically collect member fees monthly, so that the payment experience of the user is improved, and the user can conveniently enjoy internet services. In fig. 1a, a server of a third party payment platform may execute a service recommendation method in an embodiment of this specification, determine a recommendation policy for recommending a fee deduction service to a user, and improve accuracy of recommending the fee deduction service.

Fig. 2 is a schematic flowchart of a service recommendation method according to an embodiment of the present disclosure, and as shown in fig. 2, the flowchart includes the following steps:

step S202, acquiring behavior data of network behaviors which are executed by a user and are related to the fee deduction service; the fee withholding service is provided by a third-party payment platform; the network behavior comprises service signing behavior and service contract-resolving behavior;

step S204, determining a subscription willingness value of the user for the fee withholding service in a reinforcement learning mode according to the behavior data;

step S206, determining the payment credit score of the user on the third-party payment platform based on the historical payment information of the user on the third-party payment platform;

and step S208, determining a recommendation strategy for recommending fee withholding service to the user according to the signing willingness value and the payment credit value.

In this embodiment, the fee deduction service is a service between the user, the third-party payment platform, and the service provider, and after the user, the third-party payment platform, and the service provider sign a fee deduction service, the third-party payment platform may automatically deduct a fee agreed by the service from a user account agreed by the service according to the service agreed time, and transfer the deducted fee to the service provider according to the service agreed manner. In one case, the fee withholding service includes a password withholding-free service, that is, without inputting a password by a user, the third party payment platform automatically deducts money under the condition that the user feels no.

In step S102, the server obtains behavior data of a network behavior related to the fee deduction service executed by the user. The fee withholding service is provided by a third party payment platform. The network behavior in step S102 includes a service contract behavior of the user and a service contract resolving behavior of the user, and correspondingly, the behavior data of the network behavior includes behavior data of the service contract behavior and behavior data of the service contract resolving behavior.

In this embodiment, the behavior data of the service subscription behavior includes, but is not limited to, a time when the user performs the service subscription behavior, an identifier of a corresponding service provider, an identifier of a mobile terminal used for subscription, a network type at the time of subscription, and the like. The behavior data of the service contract opening behavior includes, but is not limited to, the time when the user performs the service contract opening behavior, the identification of the corresponding service provider, the identification of the mobile terminal used for the contract opening, the network type at the time of the contract opening, and the like.

In one embodiment, the network behavior further includes a payment behavior of the user after the service subscription. It should be noted that, when the fee deduction service includes the password-free deduction service, the fee payment behavior of the user after the service subscription is the automatic deduction behavior after the service subscription, and the third-party payment platform automatically completes the fee payment deduction behavior without inputting a password or performing other operations after the service subscription. When the fee withholding service comprises a withholding service requiring a user to input a password, the fee payment behavior of the user after service subscription is the user payment behavior after service subscription, and the third-party payment platform needs to carry out money withholding operation according to the password input by the user. The behavior data of the payment behavior after the service subscription includes, but is not limited to, payment time, an identifier of a corresponding service provider, an identifier of a mobile terminal used for payment, a network type at the time of payment, and the like.

It should be noted that the network behaviors in this embodiment include, but are not limited to, the service subscription behavior, the service contract release behavior, and the service subscription fee payment behavior mentioned above, and other network behaviors related to the fee deduction service are all within the scope of the above network behaviors.

In the step S204, the subscription willingness value of the user for the fee deduction service is determined in a reinforcement learning manner according to the acquired behavior data. Reinforcement learning is a learning method in which an Agent learns in a "trial and error" manner, and obtains a reward guidance behavior by interacting with the environment, and the reinforcement learning aims to make the Agent obtain the maximum reward. One embodiment of the present specification mainly determines the subscription willingness value of the user for the fee deduction service through a Q-learning algorithm in reinforcement learning. The Q-learning algorithm is one of the main algorithms for reinforcement learning, and is a learning capability for providing an intelligent system to select the optimal action in a markov environment by using the experienced action sequence.

When Q-learning is applied to judge the user fee withholding subscription willingness, a scene needs to be defined first. The current static state (including the current signing state of the withholding product and the accumulated withholding transaction number) of the user can be defined as the environment in the reinforcement learning, when the user signs a new product or releases an old product each time, the user is an Action in the reinforcement learning, and the specific payment number and/or money after signing the product is defined as the reward. In the Q-learning process, the Agent of the Agent is placed in the environment, the signing behavior and the withholding transaction behavior of the user are continuously learned, and the optimal Q matrix is calculated.

In the step S204, determining a subscription willingness value of the user for the fee deduction service in a reinforcement learning manner according to the acquired behavior data, specifically:

(a1) determining user state information and user action information in the reinforcement learning Q-learning algorithm according to the acquired behavior data, and initializing an incentive matrix R and an action utility function matrix Q in the reinforcement learning Q-learning algorithm based on the user state information and the user action information;

in the reward matrix R and the action utility function matrix Q, rows correspond to user state information, and columns correspond to user action information; each element value in the reward matrix R is an action reward value for a user to execute a corresponding action in a corresponding user state, and each element value in the action utility function matrix Q is an action willingness value for the user to execute the corresponding action in the corresponding user state;

(a2) and updating the action utility function matrix Q according to the acquired behavior data and the reward matrix R, and determining a subscription willingness value of the user for the fee deduction service according to the updated action utility function matrix Q.

In the above-described operation (a1), first, the user state information and the user operation information in the reinforcement learning Q-learning algorithm are determined based on the acquired behavior data. Specifically, according to the acquired behavior data, different states in which the user is located may be determined, so as to obtain user state information. Similarly, different actions corresponding to the user can be determined according to the acquired behavior data, so as to obtain user action information, in one embodiment, the user action information comprises service signing action information and service release action information, and in another embodiment, the user action information comprises service signing action information, service release action information and payment action information, wherein the payment action comprises an automatic deducted action which does not need to input a password after the user signs a fee for deduction of service and an active payment action which needs to input the password.

In the action (a1), the initial reward matrix R and the action utility function matrix Q in the reinforcement learning Q-learning algorithm based on the user state information and the user action information are specifically: initializing row corresponding user state information, column corresponding user action information and each element value in the reward matrix R, wherein each element value in the reward matrix R is an action reward value for a user to execute a corresponding action in a corresponding user state; initializing rows in the action utility function matrix Q to correspond to user state information, columns to correspond to user action information, initializing each element value in the action utility function matrix Q, and each element value in the action utility function matrix Q being an action willingness value of a user executing corresponding action in the corresponding user state.

In one embodiment, the element values in the reward matrix R are initialized to predetermined fixed values, and in another embodiment, the element values in the reward matrix R are initialized according to the amount paid after the user signs up for the deduction service and/or the number of paid strokes, for example, the amount paid after the user signs up for the deduction service is used as the element values in the reward matrix R. In this embodiment, each element value in the action utility function matrix Q may be initialized to be 0.

The example of initializing the reward matrix R and initializing the action utility function matrix Q is presented here, and in particular, the reward matrix R and the action utility function matrix Q in the reinforcement learning Q-learning algorithm may be initialized according to actual requirements.

In the above illustration, the row of R corresponds to user state information (state), the column of R corresponds to user action information (action), the first row of R corresponds to a user state of no service signed, the second row of R corresponds to a user state of a service signed, the first column of R corresponds to a user action of a user of no service signed, the second column of R corresponds to a user action of a service signed, each element value of R is an action reward value for a user executing a corresponding action in a corresponding user state, and these action reward values are preset fixed values.

In the above illustration, a row of Q corresponds to user state information (state), a column of Q corresponds to user action information (action), a first row of Q corresponds to a user state of a service that is not signed up, a second row of Q corresponds to a user state of a service that is signed up, a first column of Q corresponds to a user action of a user that is not signed up, a second column of Q corresponds to a user action of a service that is signed up, each element value of Q is an action intention value for a user to perform a corresponding action in a corresponding user state, and these action intention values are initialized to 0.

In the action (a2), updating the action utility function matrix Q according to the obtained behavior data and the reward matrix R specifically includes:

(a21) according to the acquired behavior data, constructing a state action sequence of the user; the state action sequence comprises a plurality of state action combinations which are arranged according to the time sequence, and the state action combinations are composed of user state information and user action information; the user action information in the state action combination is used for expressing the association relationship between the user state information in the state action combination and the user state information in the next state action combination;

(a22) in the state action sequence of the user, each state action combination is sequentially acquired according to the time sequence, and after each acquisition, the corresponding element value of the acquired state action combination in the action utility function matrix Q is updated based on the corresponding element value of the acquired state action combination in the reward matrix R.

In the above-described operation (a21), a state action sequence of the user is first constructed from the behavior data of the user, and the state action sequence may also be referred to as an (s, a) sequence. The state action sequence comprises a plurality of state action combinations, the state action combinations are arranged according to a time sequence, each state action combination comprises one piece of user state information and one piece of user action information, and for any one state action combination, the user action information in the combination is used for showing the association relationship between the user state information in the combination and the user state information in the next combination, namely the user action information in the combination is used for showing that the state of the user is changed from the user state corresponding to the user state information in the combination to the user state corresponding to the user state information in the next combination due to the fact that the user executes the action.

Here, how to construct the state action sequence of the user according to the behavior data of the user is explained by a specific example. In this example, the behavior data of the user may be expressed that the user signs up for the fee deduction service with the third party payment platform and the a video website in the state of the fee deduction service release, then, the user releases the fee deduction service signed up for the previous time, then, the user signs up for the fee deduction service with the third party payment platform and the B video website again, and then, the user signs up for the fee deduction service with the third party payment platform and the C video website again. In this example, the sequence of state actions (s, a) of the user can be constructed according to the behavior data of the user as follows: { (s0, a1), (s1, a0), (s0, a1), (s1, a1) }, where s0 denotes a user state of service contract, s1 denotes a user state of service contract, a0 denotes a user action of service contract, and a1 denotes a user action of service contract. In this sequence, in the first and second combinations of state actions, the state of the user changes from the contracted state to the contracted state because the user performs the contracted action in the contracted state, in the second and third combinations of state actions, the state of the user changes from the contracted state to the contracted state because the user performs the contracted action in the contracted state, in the third and fourth combinations of state actions, the state of the user changes from the contracted state to the contracted state because the user performs the contracted action in the contracted state, and in the fourth combination of state actions, the combination of state actions of the user is (s1, a 1).

In the above-mentioned action (a22), in the state action sequence of the user, each state action combination is sequentially acquired in time order, and after each acquisition, the element value of the acquired state action combination in the action utility function matrix Q is updated based on the element value of the acquired state action combination in the reward matrix R.

In this embodiment, a general Q matrix updating formula in the Q-learning algorithm may be adopted, and after each acquisition, the corresponding element values in the reward matrix R are updated based on the acquired state action combinationsThe obtained state action combines corresponding element values in the action utility function matrix Q. The formula is specifically:

in the formula, Q (s, a) represents the element value in the updated Q matrix, R (s, a) represents the element value in the R matrix, γ represents the attenuation coefficient, and may take a value between 0.8 or other 0-1,

representing the maximum element value corresponding to the next state of the current state s in the Q matrix.

The following sequence of state actions (s, a) for the user and the R matrix, Q matrix, exemplified above: { (s0, a1), (s1, a0), (s0, a1), (s1, a1) } exemplifies how the action utility function matrix Q is updated by the above formula.

First, the first combination of state actions (s0, a1) is obtained in the sequence of state actions (s, a) of the user, according to the meaning represented by (s0, a 1): the user performs a sign-on action in the state of a reduction, determines (s0, a1) the value 100 in the R matrix, and determines

Is 0, according to the above formula, the Q matrix is updated to obtain

Next, a second combination of state actions (s1, a0) is obtained in the sequence of state actions (s, a) of the user, according to the meaning represented by (s1, a 0): the user performs a reduction action in the contracted state, determines (s1, a0) the value-100 in the R matrix, and determines

To 100, according to the above formula, updating the Q matrix to obtain

Then, a third combination of state actions (s0, a1) is obtained in the sequence of state actions (s, a) of the user, according to the meaning represented by (s0, a 1): the user performs a sign-on action in the state of a reduction, determines (s0, a1) the value 100 in the R matrix, and determines

Is 0, according to the above formula, the Q matrix is updated to obtain

Finally, a fourth combination of state actions (s1, a1) is obtained in the sequence of state actions (s, a) of the user, according to the meaning represented by (s1, a 1): the user performs a sign-on action in a sign-on state, determines (s1, a1) the value 100 in the R matrix, and determines

100, and updating the Q matrix according to the above formula

Through the above actions (a21) and (a22), the action utility function matrix Q can be updated according to the behavior data and the reward matrix R. It can be understood that, in this embodiment, after the R matrix and the Q matrix are established, the Q matrix may be continuously updated according to the change of the user behavior. The R matrix and the Q matrix are only illustrative examples, and in specific implementation, the R matrix and the Q matrix may be constructed according to actual requirements. In the action (a2), determining, according to the updated action utility function matrix Q, a subscription willingness value of the user for the fee deduction service, specifically:

(a23) determining user state information corresponding to the current user state of the user in the updated action utility function matrix Q, and acquiring each user action information and each element value corresponding to the user state information;

(a24) and determining a subscription willingness value of the user for the fee withholding service according to each user action information and each element value corresponding to the user state information.

Taking the above example as an example, assuming that the current user state of the user is the service subscription state, determining the user state information corresponding to the current user state of the user as the service subscription state information, and updating the Q matrix

In the method, each user action information corresponding to the service subscription state information is obtained, and includes service subscription action information and service release action information, where an element value corresponding to the service subscription action information is 180, and an element value corresponding to the service release action information is-20, it may be determined that a subscription willingness value of the user for the fee deduction service is 180, that is, on the basis that the user has currently subscribed the fee deduction service, a willingness value of the user for signing the fee deduction service again is 180.

Of course, in other embodiments, the subscription willingness value of the user for the fee deduction service may be determined to be 160 according to that the element value corresponding to the service subscription action information is 180 and the element value corresponding to the service offer cancellation action information is-20. In other embodiments, the specific implementation requirement may also be considered, and a subscription willingness value of the user for the fee deduction service is determined by using other calculation methods according to each user action information and each element value corresponding to the obtained user state information, which is not specifically limited herein.

In the step S206, the payment credit score of the user on the third-party payment platform is determined based on the historical payment information of the user on the third-party payment platform.

In one embodiment, supervised machine learning modeling may be performed according to historical payment information of the user on the third-party payment platform to obtain the payment credit score calculation model, for example, a supervised machine learning algorithm such as random forest, GBDT, and the like is used for modeling to obtain the payment credit score calculation model.

For example, the user dimension is used as a main key, historical payment information of a user on a third-party payment platform is used as a machine learning modeling feature, the payment times and payment amount of the user in the last week or the last month and the behavior habit of specific channel expenditure such as credit card and flower are extracted, a wide payment behavior table based on the user is generated by combining static features such as personal basic information of the user, then, supervised machine learning algorithms such as random forest and GBDT are used for learning the payment behavior feature of the user, and a payment credit score calculation model is constructed to calculate the payment credit score of the user. It should be noted that supervised machine learning modeling is a mature technique in the prior art for those skilled in the art, and is not further developed here.

After the payment credit score calculation model is established, the historical payment information of the user on the third-party payment platform can be input into the payment credit score calculation model for processing, and the payment credit score of the user on the third-party payment platform is obtained.

In another embodiment, supervised machine learning modeling may be performed according to historical payment information of the user on the third-party payment platform and account liveness information of the user on the third-party payment platform to obtain the payment credit score calculation model, for example, a supervised machine learning algorithm such as random forest and GBDT is used for modeling to obtain the payment credit score calculation model.

Historical payment information and account activity information of users are mined, a supervised machine learning algorithm such as random forest and GBDT is adopted for modeling, and the corresponding relation between the historical payment information and the account activity information and the credit condition of the users is mined, so that the payment credit score of each user is calculated.

For example, with the user dimension as a main key, historical payment information and account activity information of a user on a third-party payment platform are used as machine learning modeling features, for example, payment times, payment amount and specific channel expenditure behavior habits of the user in the last week or the last month are extracted, a wide payment behavior table based on the user is generated by combining static features such as personal basic information of the user, and then the payment behavior features and account features of the user are learned by using a supervised machine learning algorithm such as random forest, GBDT and the like, so that a payment credit score calculation model is constructed to calculate the payment credit score of the user. It should be noted that supervised machine learning modeling is a mature technique in the prior art for those skilled in the art, and is not further developed here.

Accordingly, in this embodiment, before determining the payment credit score of the user on the third-party payment platform, the method may further include: the method comprises the steps of obtaining account activity information of a user on a third-party payment platform, and determining a payment credit score of the user on the third-party payment platform, wherein the account activity information is specifically as follows: calculating the payment credit score of the user on the third-party payment platform through a pre-trained payment credit score calculation model based on the historical payment information and account activity information of the user on the third-party payment platform; wherein, the payment credit score calculation model is a neural network model.

In this embodiment, the historical payment information includes: at least one of the historical payment times, the historical payment amount, the historical payment success rate for the first-share and then-pay fee deduction service and the historical payment default rate for the first-share and then-pay fee deduction service. The account activity information includes: at least one of account login information, account age information, account balance information, account expenditure information, and account revenue information.

The Non-social function (NSF) is a payment mode that a user enjoys a service first and then deducts a fee based on a credit level of the user, so that convenience of the service and user experience are improved.

In this embodiment, the higher the payment credit score of the user is, the better the credit degree of the user is, the more likely the user pays on time for the contracted agreement, for example, the more likely the user pays in time for the first-share and then-pay fee deduction service without default, the lower the payment credit score of the user is, the less likely the user pays in time for the user, the more likely the user defaults for the contracted agreement, for example, the more likely the user defaults without payment for the first-share and then-pay fee deduction service.

It should be noted that, based on the historical payment information of the user on the third-party payment platform, there are various payment credit scores determined for the user on the third-party payment platform, for example, the payment credit score of the user is simply calculated according to the payment success times and the payment default times of the user (for example, the payment default times for the first-share-later-pay type products), or the payment credit score is calculated by a model method described in this embodiment, and in specific implementation, the payment credit score of the user may be calculated by selecting an appropriate method according to requirements.

In the step S208, determining a recommendation policy for recommending a fee withholding service to the user according to the subscription willingness value and the credit payment value, specifically including:

(b1) judging whether the signing willingness value is larger than a preset willingness threshold value or not to obtain a first judgment result, and judging whether the payment credit value is larger than a preset credit threshold value or not to obtain a second judgment result;

(b2) and determining a recommendation strategy for recommending the fee withholding service to the user according to the first judgment result and the second judgment result.

Fig. 3 is a schematic diagram of a recommendation policy provided in an embodiment of this specification, and as shown in fig. 3, specifically, if a first determination result indicates that a subscription intention value of a user is greater than a preset intention threshold, and a second determination result indicates that a payment credit score of the user is greater than the preset credit threshold, it is determined that the user is a user with high intention and high credit, and a recommendation policy corresponding to the user is determined to recommend a periodic fee deduction service, where the periodic fee deduction service refers to that after the user signs a subscription once, a third-party payment platform performs periodic fee deduction according to an agreement, such as a fee deduction at the beginning of each month.

If the first judgment result shows that the subscription intention value of the user is larger than the preset intention threshold value, and the second judgment result shows that the payment credit value of the user is smaller than the preset credit threshold value, the user is determined to be a user with high intention and high risk, the recommendation strategy corresponding to the user is determined to be recommendation first-share and then-pay fee deduction service, and the first-share and then-pay fee deduction service means that the third-party payment platform deducts money before the user enjoys the product after signing once. The first-share and second-payment type fee deduction service can be a periodic fee deduction service or a single fee deduction service.

If the first judgment result shows that the subscription willingness value of the user is smaller than the preset willingness threshold value, and the second judgment result shows that the payment credit value of the user is larger than the preset credit threshold value, the user is determined to be a user with low willingness and high credit, the recommendation strategy corresponding to the user is determined to be a single recommended fee withholding service, and the single fee withholding service refers to a service for only carrying out one fee withholding after the user signs once.

And if the first judgment result shows that the signing intention value of the user is smaller than the preset intention threshold value and the second judgment result shows that the payment credit value of the user is smaller than the preset credit threshold value, determining that the user is a user with low intention and high risk, and determining that the recommendation strategy corresponding to the user is a non-recommendation fee withholding service. After the recommendation strategy is determined, fee withholding service can be recommended to the user according to the recommendation strategy.

In summary, the present embodiment has the following technical effects:

(1) the recommendation strategy for recommending the fee withholding service to the user is accurately determined from the perspective of user will and user credit by combining multi-aspect information, and the accuracy of recommending the fee withholding service is further improved;

(2) the recommendation strategy of the fee withholding service is determined to be recommended to the user according to the user intention and the user credit, the influence of the user credit on the service provider is considered, and the risk of bad account of the service provider due to the fact that the user credit cannot pay timely is avoided;

(3) the reinforcement learning Q-learning is adopted to carry out learning optimization on long-term signing, contract-solving, trading behaviors and the like of the user, so that an optimal user recommendation strategy is obtained, the reinforcement learning is very suitable for the problem of long-term operation in the method, the characteristic that the model updates and learns the user continuously changes in real time can be guaranteed, and the model is not greatly degraded.

Fig. 4 is a schematic diagram illustrating a module composition of a service recommendation device according to an embodiment of the present disclosure, and as shown in fig. 4, the device includes:

a data obtaining module 41, configured to obtain behavior data of a network behavior executed by a user and related to a fee deduction service; the fee withholding service is provided by a third-party payment platform; the network behavior comprises a service signing behavior and a service contract-resolving behavior;

a willingness determining module 42, configured to determine, according to the behavior data, a subscription willingness value of the user for the fee deduction service in a reinforcement learning manner;

a credit determination module 43 for determining a payment credit score of the user on the third party payment platform based on historical payment information of the user on the third party payment platform;

and the policy recommending module 44 is configured to determine, according to the subscription willingness value and the credit payment value, a recommending policy for recommending the fee withholding service to the user.

Optionally, the willingness determining module 42 is specifically configured to: determining user state information and user action information in the reinforcement learning Q-learning algorithm according to the behavior data, and initializing a reward matrix R and an action utility function matrix Q in the reinforcement learning Q-learning algorithm based on the user state information and the user action information; in the reward matrix R and the action utility function matrix Q, rows correspond to the user state information, and columns correspond to the user action information; each element value in the reward matrix R is an action reward value for the user to execute a corresponding action in a corresponding user state, and each element value in the action utility function matrix Q is an action willingness value for the user to execute the corresponding action in the corresponding user state; and updating the action utility function matrix Q according to the behavior data and the reward matrix R, and determining a subscription willingness value of the user for the fee deduction service according to the updated action utility function matrix Q.

Optionally, the willingness determining module 42 is further specifically configured to: constructing a state action sequence of the user according to the behavior data; the state action sequence comprises a plurality of state action combinations arranged according to a time sequence, and the state action combinations are composed of the user state information and the user action information; the user action information in the state action combination is used for representing the incidence relation between the user state information in the state action combination and the user state information in the next state action combination; and in the state action sequence of the user, sequentially acquiring each state action combination according to a time sequence, and after each acquisition, updating the corresponding element value of the acquired state action combination in the action utility function matrix Q based on the corresponding element value of the acquired state action combination in the reward matrix R.

Optionally, the willingness determining module 42 is further specifically configured to: determining the user state information corresponding to the current user state of the user in the updated action utility function matrix Q, and acquiring each piece of user action information and each element value corresponding to the user state information; and determining a subscription willingness value of the user for the fee withholding service according to each user action information and each element value corresponding to the user state information.

Optionally, the system further includes an information obtaining module, configured to: before determining the payment credit score of the user on the third-party payment platform, acquiring account activity information of the user on the third-party payment platform; the credit determination module 43 is specifically configured to: calculating a payment credit score of the user on the third-party payment platform through a pre-trained payment credit score calculation model based on historical payment information of the user on the third-party payment platform and the account activity information; wherein the payment credit score calculation model is a neural network model.

Optionally, the historical payment information includes: at least one of historical payment times, historical payment amount, historical payment success rate aiming at the first-share and then-pay fee deduction service and historical payment default rate aiming at the first-share and then-pay fee deduction service; the account liveness information includes: at least one of account login information, account age information, account balance information, account expenditure information, and account revenue information.

Optionally, the policy recommendation module 44 is specifically configured to: judging whether the signing willingness value is larger than a preset willingness threshold value or not to obtain a first judgment result, and judging whether the payment credit value is larger than a preset credit threshold value or not to obtain a second judgment result; and determining a recommendation strategy for recommending the fee withholding service to the user according to the first judgment result and the second judgment result.

The service recommendation device in this embodiment of the present specification can implement the respective processes of the foregoing service recommendation method and achieve the same effects and functions, which are not repeated here.

Further, another embodiment of the present specification further provides a service recommendation device, fig. 5 is a schematic structural diagram of the service recommendation device provided in one embodiment of the present specification, and as shown in fig. 5, the service recommendation device may generate a relatively large difference due to different configurations or performances, and may include one or more processors 901 and a memory 902, and one or more stored applications or data may be stored in the memory 902. Memory 902 may be, among other things, transient storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for a service recommendation device. Still further, the processor 901 may be configured to communicate with the memory 902 to execute a series of computer-executable instructions in the memory 902 on the service recommendation device. The service recommendation apparatus may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906, and the like.

In one particular embodiment, the service recommendation device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the service recommendation device, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:

Optionally, when executed, the computer-executable instructions determine, by means of reinforcement learning, a subscription willingness value of the user for a fee deduction service according to the behavior data, and include:

determining user state information and user action information in the reinforcement learning Q-learning algorithm according to the behavior data, and initializing a reward matrix R and an action utility function matrix Q in the reinforcement learning Q-learning algorithm based on the user state information and the user action information;

in the reward matrix R and the action utility function matrix Q, rows correspond to the user state information, and columns correspond to the user action information; each element value in the reward matrix R is an action reward value for the user to execute a corresponding action in a corresponding user state, and each element value in the action utility function matrix Q is an action willingness value for the user to execute the corresponding action in the corresponding user state;

and updating the action utility function matrix Q according to the behavior data and the reward matrix R, and determining a subscription willingness value of the user for the fee deduction service according to the updated action utility function matrix Q.

Optionally, when executed, the computer-executable instructions update the action utility function matrix Q based on the behavior data and the reward matrix R, including:

constructing a state action sequence of the user according to the behavior data; the state action sequence comprises a plurality of state action combinations arranged according to a time sequence, and the state action combinations are composed of the user state information and the user action information; the user action information in the state action combination is used for representing the incidence relation between the user state information in the state action combination and the user state information in the next state action combination;

and in the state action sequence of the user, sequentially acquiring each state action combination according to a time sequence, and after each acquisition, updating the corresponding element value of the acquired state action combination in the action utility function matrix Q based on the corresponding element value of the acquired state action combination in the reward matrix R.

Optionally, when executed, the computer-executable instructions determine, according to the updated action utility function matrix Q, a subscription willingness value of the user for a fee deduction service, including:

determining the user state information corresponding to the current user state of the user in the updated action utility function matrix Q, and acquiring each piece of user action information and each element value corresponding to the user state information;

and determining a subscription willingness value of the user for the fee withholding service according to each user action information and each element value corresponding to the user state information.

Alternatively, computer-executable instructions, when executed,

prior to determining the user's payment credit score on the third party payment platform, further comprising: acquiring account activity information of the user on the third party payment platform;

determining a payment credit score of the user on the third party payment platform, comprising:

calculating a payment credit score of the user on the third-party payment platform through a pre-trained payment credit score calculation model based on historical payment information of the user on the third-party payment platform and the account activity information;

wherein the payment credit score calculation model is a neural network model.

Alternatively, computer-executable instructions, when executed,

the historical payment information includes: at least one of historical payment times, historical payment amount, historical payment success rate aiming at the first-share and then-pay fee deduction service and historical payment default rate aiming at the first-share and then-pay fee deduction service;

the account liveness information includes: at least one of account login information, account age information, account balance information, account expenditure information, and account revenue information.

Optionally, when executed, the computer-executable instructions determine a recommendation policy for recommending a fee withholding service to the user according to the subscription intent value and the payment credit score, including:

judging whether the signing willingness value is larger than a preset willingness threshold value or not to obtain a first judgment result, and judging whether the payment credit value is larger than a preset credit threshold value or not to obtain a second judgment result;

and determining a recommendation strategy for recommending the fee withholding service to the user according to the first judgment result and the second judgment result.

Further, another embodiment of the present specification further provides a storage medium for storing computer-executable instructions, and in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and the storage medium stores computer-executable instructions that, when executed by a processor, implement the following processes:

Optionally, the storage medium stores computer-executable instructions, which when executed by the processor, determine a subscription willingness value of the user for the fee deduction service in a reinforcement learning manner according to the behavior data, and includes:

Optionally, the storage medium stores computer-executable instructions that, when executed by the processor, update the action utility function matrix Q based on the behavior data and the reward matrix R, including:

Optionally, when executed by a processor, the method includes determining, according to the updated action utility function matrix Q, a willingness value of the user to sign up for a fee deduction service, including:

Optionally, the storage medium stores computer-executable instructions that, when executed by the processor,

wherein the payment credit score calculation model is a neural network model.

Optionally, the storage medium stores computer-executable instructions that, when executed by the processor, determine a recommendation policy for recommending a fee withholding service to the user based on the willingness to sign up value and the credit pay value, including:

The storage medium in this embodiment of the present specification is capable of realizing the respective processes of the aforementioned service recommendation method and achieving the same effects and functions, and will not be repeated here.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification and is not intended to limit the present document. Various modifications and changes may occur to the embodiments described herein, as will be apparent to those skilled in the art. Any modifications, equivalents, improvements, etc. which come within the spirit and principle of the disclosure are intended to be included within the scope of the claims of this document.

Claims

1. A service recommendation method, comprising:

determining a subscription willingness value of the user for the fee withholding service in a reinforcement learning mode according to the behavior data; the reinforcement learning mode comprises an incentive matrix R and an action utility function matrix Q, the action utility function matrix Q is updated according to the behavior data and the incentive matrix R, and the subscription willingness value is determined according to the action willingness value of each user action executed by the user in the current user state in the updated action utility function matrix Q;

2. The method of claim 1, wherein determining the subscription willingness value of the user for the fee deduction service by means of reinforcement learning according to the behavior data comprises:

3. The method of claim 2, updating the action utility function matrix Q based on the behavioral data and the reward matrix R, comprising:

4. The method of claim 2, wherein determining a subscription willingness value of the user for the fee deduction service according to the updated action utility function matrix Q comprises:

5. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein the payment credit score calculation model is a neural network model.

6. The method of claim 5, wherein the first and second light sources are selected from the group consisting of,

7. The method of any one of claims 1 to 6, wherein determining a recommendation policy for recommending a fee withholding service to the user according to the subscription intent value and the payment credit score comprises:

8. A service recommendation device comprising:

the intention determining module is used for determining a subscription intention value of the user for the fee withholding service in a reinforcement learning mode according to the behavior data; the reinforcement learning mode comprises an incentive matrix R and an action utility function matrix Q, the action utility function matrix Q is updated according to the behavior data and the incentive matrix R, and the subscription willingness value is determined according to the action willingness value of each user action executed by the user in the current user state in the updated action utility function matrix Q;

9. The apparatus of claim 8, the intent determination module to:

10. The apparatus of claim 8, further comprising an information acquisition module to:

before determining the payment credit score of the user on the third-party payment platform, acquiring account activity information of the user on the third-party payment platform;

the credit determination module is specifically configured to:

wherein the payment credit score calculation model is a neural network model.

11. The apparatus according to any one of claims 8 to 10, the policy recommendation module being specifically configured to:

12. A service recommendation device comprising: a processor; and a memory arranged to store computer executable instructions which, when executed, cause the processor to carry out the steps of the service recommendation method of any of claims 1 to 7 above.

13. A storage medium storing computer-executable instructions which, when executed, implement the steps of the service recommendation method of any of claims 1 to 7 above.