CN109711871B - Potential customer determination method, device, server and readable storage medium - Google Patents

Potential customer determination method, device, server and readable storage medium Download PDF

Info

Publication number
CN109711871B
CN109711871B CN201811526942.8A CN201811526942A CN109711871B CN 109711871 B CN109711871 B CN 109711871B CN 201811526942 A CN201811526942 A CN 201811526942A CN 109711871 B CN109711871 B CN 109711871B
Authority
CN
China
Prior art keywords
action
user
product
information
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811526942.8A
Other languages
Chinese (zh)
Other versions
CN109711871A (en
Inventor
盛名扬
陆子龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201811526942.8A priority Critical patent/CN109711871B/en
Publication of CN109711871A publication Critical patent/CN109711871A/en
Application granted granted Critical
Publication of CN109711871B publication Critical patent/CN109711871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a potential customer determination method, a potential customer determination device, a potential customer determination server and a readable storage medium. The method comprises the following steps: obtaining status information of a product in the platform; the state information includes: a first profit value brought to the platform by the product, a second profit value brought to a platform user of the platform by the product and a third profit value brought to a product user using the product by the product; inputting the state information and a plurality of action information of a user to be analyzed into a deep reinforcement learning model obtained by pre-training to obtain a long-term feedback estimation value corresponding to each action information; and determining whether the user to be analyzed is a potential user of the product or not according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model. In this way, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced.

Description

Potential customer determination method, device, server and readable storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, a server, and a readable storage medium for determining a potential customer.
Background
In order to promote products (such as fan-headed strips) in a live platform better, potential customers of the products often need to be determined, so that potential users can be attracted to use the products in a targeted manner. Wherein a potential user generally refers to a customer who has an intent to purchase but has not yet become a user of the product.
Currently, the way to determine potential customers of a product is: the staff member artificially determines which users are potential customers according to past experience. However, this way of determining potential customers requires a relatively high experience of the staff; moreover, the technician needs to analyze a large number of users to identify potential customers. That is, this manner of identifying potential customers is inefficient and requires a significant expenditure of human labor.
Disclosure of Invention
To overcome the problems in the related art, the present application provides a potential customer determination method, apparatus, server, and readable storage medium, so as to improve the efficiency of determining potential customers and reduce the labor cost.
According to a first aspect of embodiments of the present application, there is provided a potential user determination method, including:
obtaining status information of a product in the platform; the state information includes: a first profit value brought to the platform by the product, a second profit value brought to a platform user of the platform by the product and a third profit value brought to a product user using the product by the product;
inputting the state information and a plurality of action information of a user to be analyzed into a deep reinforcement learning model obtained by pre-training to obtain a long-term feedback estimation value corresponding to each action information; each action information at least comprises: the method comprises the steps that characteristic information of a user to be analyzed and an action identifier are obtained, wherein the action identifier is an identifier of an ordering action aiming at a product or an identifier of a quitting ordering action; and determining whether the user to be analyzed is a potential user of the product or not according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Optionally, in an embodiment of the present application, the deep reinforcement learning model includes a deep Q network model.
Optionally, in this embodiment of the application, before the step of inputting the state information and the plurality of pieces of motion information of the user to be analyzed into the deep reinforcement learning model trained in advance, the method may further include:
constructing a Markov decision process model; wherein the Markov decision process model is: { S, A, R, T }; s represents the state information of the product, A represents the action information of the platform user for the action executed by the product, R represents the reward function, and T represents the state transition function;
obtaining a plurality of training samples based on a Markov decision process model; wherein, each training sample comprises: the system comprises historical state information of a product, action information of an action executed by a target user in platform users aiming at the product under the state information, an instant reward value obtained after the target user executes the target action in the action information, and next state information corresponding to the state information after the target action is executed; the target action is as follows: a subscription action or a forgoing subscription action;
optimizing parameters of the initial Q function by using the training sample to obtain a trained deep Q network model; the deep neural network corresponding to the initial Q function consists of two convolutional layers and two fully-connected layers; the parameters include: learning rate, discount factor, and Q value.
Optionally, in this embodiment of the present application, optimizing parameters of the initial Q function by using a training sample, to obtain a trained deep Q network model includes:
and optimizing the parameters of the initial Q function by using a training sample and a greedy algorithm epsilon-greedy algorithm to obtain a trained depth Q network model.
Optionally, in this embodiment of the application, the instant reward value output by the reward function is a value corresponding to the ordering action (the revenue value increased by the first positive platform + the revenue value increased by the second positive platform user + the revenue value increased by the third positive platform user + the revenue value increased by the user to be analyzed) + a value corresponding to the abandoning ordering action is a first negative number; and the value corresponding to the ordering action is 1-the value corresponding to the ordering action is abandoned.
Optionally, in this embodiment of the present application, the identification of the ordering action for the product includes: one or more of a first identification to perform a subscription action based on the user recommendation, a second identification to perform the subscription action based on the privacy recommendation, a third identification to perform the subscription action based on the coupon activity, and a fourth identification to perform the subscription action through a subscription portal in the platform.
Optionally, in this embodiment of the present application, the feature information of the user to be analyzed includes:
and one or more items of account information, the number of fans, the number of live broadcast works and the type of preference works of the user to be analyzed.
According to a second aspect of embodiments of the present application, there is provided a potential user determination apparatus, the apparatus comprising:
a first obtaining module configured to obtain status information of a product in a platform; the state information includes: a first profit value brought to the platform by the product, a second profit value brought to a platform user of the platform by the product and a third profit value brought to a product user using the product by the product;
the input module is configured to input the state information and the plurality of pieces of action information of the user to be analyzed into a depth reinforcement learning model obtained through pre-training to obtain an estimated value of long-term feedback corresponding to each piece of action information; each action information at least comprises: the method comprises the steps that characteristic information of a user to be analyzed and an action identifier are obtained, wherein the action identifier is an identifier of an ordering action aiming at a product or an identifier of a quitting ordering action;
and the determining module is configured to determine whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Optionally, in an embodiment of the present application, the deep reinforcement learning model includes a deep Q network model.
Optionally, in an embodiment of the present application, the apparatus further includes:
the system comprises a building module, a state information acquisition module and a state information analysis module, wherein the building module is configured to build a Markov decision process model before inputting state information and a plurality of action information of a user to be analyzed into a depth reinforcement learning model obtained by training in advance; wherein the Markov decision process model is: { S, A, R, T }; s represents the state information of the product, A represents the action information of the platform user for the action executed by the product, R represents the reward function, and T represents the state transition function;
a second obtaining module configured to obtain a plurality of training samples based on a Markov decision process model; wherein, each training sample comprises: the system comprises historical state information of a product, action information of an action executed by a target user in platform users aiming at the product under the state information, an instant reward value obtained after the target user executes the target action in the action information, and next state information corresponding to the state information after the target action is executed; the target action is as follows: a subscription action or a forgoing subscription action;
the optimization module is configured to optimize parameters of the initial Q function by using the training samples to obtain a trained deep Q network model; the deep neural network corresponding to the initial Q function consists of two convolutional layers and two fully-connected layers; the parameters include: learning rate, discount factor, and Q value.
Optionally, in this embodiment of the present application, the optimization module is specifically configured to:
and optimizing the parameters of the initial Q function by using a training sample and a greedy algorithm epsilon-greedy algorithm to obtain a trained depth Q network model.
Optionally, in this embodiment of the application, the instant reward value output by the reward function is a value corresponding to the ordering action (the revenue value increased by the first positive platform + the revenue value increased by the second positive platform user + the revenue value increased by the third positive platform user + the revenue value increased by the user to be analyzed) + a value corresponding to the abandoning ordering action is a first negative number; and the value corresponding to the ordering action is 1-the value corresponding to the ordering action is abandoned.
Optionally, in this embodiment of the present application, the identification of the ordering action for the product includes: one or more of a first identification to perform a subscription action based on the user recommendation, a second identification to perform the subscription action based on the privacy recommendation, a third identification to perform the subscription action based on the coupon activity, and a fourth identification to perform the subscription action through a subscription portal in the platform.
Optionally, in this embodiment of the present application, the feature information of the user to be analyzed includes:
and one or more items of account information, the number of fans, the number of live broadcast works and the type of preference works of the user to be analyzed.
According to a third aspect of embodiments of the present application, there is provided a server, including:
a processor, a memory for storing processor-executable instructions;
wherein the processor is configured to perform the method steps of any one of the potential user determination methods of the first aspect described above.
According to a fourth aspect of embodiments herein, there is provided a readable storage medium, wherein instructions, when executed by a processor of a server, enable the server to perform the method steps of any one of the potential user determination methods of the first aspect.
According to a fifth aspect of embodiments herein, there is provided a computer program product which, when run on a server, causes the server to perform: method steps of any of the potential user determination methods of the first aspect described above.
In an embodiment of the present application, a server may obtain status information of products in a platform. The status information may include: the first profit value brought to the platform by the product, the second profit value brought to the platform user by the product and the third profit value brought to the product user using the product by the product. Then, the state information and the plurality of action information of the user to be analyzed are input into a deep reinforcement learning model obtained through pre-training, and an estimated value of long-term feedback corresponding to each action information is obtained. Wherein each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier, wherein the action identifier is an identifier of an ordering action for the product or an identifier of a quitting ordering action. And then, determining whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Because the deep reinforcement learning model can establish the optimal mapping relation between the state information and the action information, the server can determine the optimal action corresponding to the current state information of the product through the deep reinforcement learning model, namely, the optimal action of the user to be analyzed on the product can be determined through the deep reinforcement learning model. Further, when the action is determined to be a subscription action, then the user to be analyzed may be determined to be a potential user. In this way, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced. Moreover, the potential user determining method can determine the potential user under the condition of ensuring the benefits of the platform, the platform user and the product user.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart illustrating a method of potential user determination according to an example embodiment.
FIG. 2 is a flow chart illustrating a calculation of Q according to an exemplary embodiment.
FIG. 3 is a schematic diagram illustrating the structure of a deep neural network in accordance with an exemplary embodiment.
Fig. 4 is a block diagram illustrating a potential user determination device in accordance with an example embodiment.
FIG. 5 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
In order to solve the problems that the method for determining the potential customer is inefficient and requires a large amount of labor cost in the prior art, embodiments of the present application provide a potential customer determination method, apparatus, server, and computer-readable storage medium.
The following first explains a potential customer determination method provided in an embodiment of the present application.
FIG. 1 is a flow diagram illustrating a method of potential customer determination in accordance with an exemplary embodiment. The potential customer determination method is applied to a server, and as shown in fig. 1, the method comprises the following steps:
s101: obtaining status information of a product in the platform; the state information includes: a first profit value brought to the platform by the product, a second profit value brought to a platform user of the platform by the product and a third profit value brought to a product user using the product by the product;
the platform in the embodiment of the application can be a live broadcast platform, and the product can be a vermicelli product in the live broadcast platform, but is not limited to the vermicelli product.
It will be appreciated that, in one implementation, the product status information obtained by the server may include: the first profit value brought to the platform by the product, the second profit value brought to the platform user by the product and the third profit value brought to the product user using the product by the product.
The first profit value brought to the platform by the product can be obtained by calculating the number of product users and the product price. For example, the first revenue value is the number of product users and the product price.
The second profit value brought to the platform user by the product can be obtained by the following method: and the server counts the added value of the duration of the platform used by the platform user. The incremental value is then quantified as a second revenue value that the product will provide to the platform user. For example, the second benefit value is an increase in the length of time that the platform user uses the platform. Wherein the first benefit factor may represent: the potential profit value from the increased value per unit time length.
The third profit value of the product to the product user using the product can be calculated as follows: and the server counts the vermicelli increment of the product user and the number of newly added works of the product user. And then, quantifying the fan increment and the number of newly added works into a third profit value brought to a product user by the product. For example, the third profit margin is the fan increase of the product user, the second profit coefficient + the number of newly added works of the product user, the third profit coefficient. Wherein the second benefit factor may represent: for each potential profit value brought by adding one fan, the third profit coefficient can represent: each increase in the potential revenue value of a work.
In another implementation manner, the state information may further include, on the basis of the content, that: one or more of the total number of platform users, the order distribution of the product (e.g., the number of users who purchase 10 orders in the product and the number of users who purchase 20 orders in the product), and the total placed volume of the product (e.g., the total exposure of the product to the live work) are reasonable.
S102: inputting the state information and a plurality of action information of a user to be analyzed into a deep reinforcement learning model obtained by pre-training to obtain a long-term feedback estimation value corresponding to each action information; each action information at least comprises: the method comprises the steps that characteristic information of a user to be analyzed and an action identifier are obtained, wherein the action identifier is an identifier of an ordering action aiming at a product or an identifier of a quitting ordering action;
the deep reinforcement learning model includes a deep Q network model (i.e., DQN model), but is not limited thereto.
The characteristic information of the user to be analyzed may include: and one or more items of account information, the number of fans, the number of live works and the preference work types of the user to be analyzed on the platform. Of course, it is reasonable to include the enbedding feature obtained by pre-training.
Also, in one implementation, the action identification may be an identification of an order action for the product or an identification of a forgoing order action. In this implementation, the identification of the ordering action for the product corresponds to one action information, and the identification of the forgoing ordering action for the product corresponds to another action information. In this way, after the state information of the product and the two pieces of action information are input into the trained deep reinforcement learning model, the estimation value of the long-term feedback corresponding to each piece of action information can be output.
In another implementation, the identification of the ordering action for the product may further include: one or more of a first identification to perform a subscription action based on the user recommendation, a second identification to perform the subscription action based on the privacy recommendation, a third identification to perform the subscription action based on the coupon activity, and a fourth identification to perform the subscription action through a subscription portal in the platform. Wherein, when the identification of the ordering action for the product comprises: the ordering method comprises the steps that when the first identification, the second identification, the third identification and the fourth identification are used, the first identification, the second identification, the third identification and the fourth identification correspond to one piece of action information respectively, and the identification for the ordering abandoning action of the product corresponds to one piece of action information, namely five pieces of action information. Then, after the state information of the product and the five pieces of motion information are input to the trained deep reinforcement learning model, the deep reinforcement learning model may output an estimated value of the long-term feedback corresponding to each piece of motion information.
The deep reinforcement learning model includes a deep Q network model (i.e., DQN model), but is not limited thereto. When the deep reinforcement learning model is the DQN model, as shown in fig. 2, after the state information of the product and the five motion information are input to the trained DQN model, the DQN model may output a Q value corresponding to each motion information, i.e., Q1, Q2, Q3, Q4, and Q5.
Additionally, it is understood that the server may build a markov decision process model prior to performing step S102. A plurality of training samples may then be obtained based on the markov decision process model.
When the constructed Markov decision process model is: { S, A, R, T }, each training sample includes: the system comprises historical state information of a product, action information of actions performed by target users in platform users aiming at the product under the state information, an instant reward value obtained after the target users perform the target actions in the action information, and next state information corresponding to the state information after the target actions are performed. Wherein the target action is: a subscribe action or a forgo subscribe action.
The historical status information may be set according to the setting mode of the status information of the product in step S101, which is not described herein again.
In addition, R ═ R (s, a, s '), R indicates an instant prize value obtained when action a is executed in a state corresponding to state information s and the state is shifted to a state corresponding to state information s'. T ═ T (s, a, s '), T denotes the probability that action a is performed on state s and transitions to state s'. In addition, according to the deep reinforcement learning related art, the state transition corresponding to the state information s is determined by the action taken under the state information.
In one example of the present application, it may be provided that: the instant reward value output by the reward function is the value corresponding to the ordering action (the profit value increased by the first positive platform + the profit value increased by the second positive platform user + the third positive value + the profit value increased by the user to be analyzed) + the value corresponding to the ordering action abandoning ═ the first negative number. And the value corresponding to the ordering action is 1-the value corresponding to the ordering action is abandoned. Of course, the design of the reward function is not limited. The values of the first positive number, the second positive number, the third positive number and the first negative number may be set according to actual conditions, and are not specifically limited herein.
In addition, when the deep reinforcement learning model to be trained is the DQN model, after the training samples are obtained, the server can also optimize the parameters of the initial Q function by using the training samples to obtain the trained DQN model. The deep neural network corresponding to the initial Q function may be composed of two convolutional layers and two fully-connected layers shown in fig. 3. The parameters include: learning rate, discount factor, and Q value. The DQN model obtained by training stores the learned knowledge, and can be used as a mapping relation between state information and optimal actions.
Specifically, Q (S, a) can be defined as Q value in the original state, Q (S ', a) is Q value after S is converted into S' after acting by a, W represents forward propagation of the deep neural network, and then:
q (S', a) ═ W (S, a, characteristic information of the user to be analyzed)
When the network W receives the original state S, the prime mover a and the feature information of the user to be analyzed as input (refer to a), the model optimization function is:
Q(S,A)←Q(S,A)+α[R+γmaxαQ(S’,a)-Q(S,A)];
S←S’;
and circularly iterating the steps until the S converges.
In this strategy, the action a value that maximizes Q (S', a) is solved for the demand, here using the greedy algorithm ε -greedy algorithm:
a=argmaxaq (a), probability 1- ε;
a, randomly selecting an action with a probability epsilon;
wherein the algorithm can achieve the explore-explore balance by adjusting the probability threshold epsilon.
The initial Q function is a function in the DQN-related technique, and the learning rate, the discount factor and the Q value are parameters in the DQN-related technique, which are not described in detail herein.
In addition, after the trained DQN model is obtained, the new training sample can be used for carrying out parameter fine tuning on the DQN model, so that the DQN model is updated. It is reasonable that the update cycle (e.g. 1 week) of the DQN model can be adjusted according to specific requirements, so that the DQN model has better extensibility and robustness, and thus the DQN model can more accurately determine whether a user is a potential customer.
S103: and determining whether the user to be analyzed is a potential user of the product or not according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
When the action corresponding to the maximum estimation value output by the deep reinforcement learning model is an ordering action, the user to be analyzed can be determined to be a potential user, and product recommendation information (such as advertisements) can be sent to the user to be analyzed, so that the potential user is converted into an actual user, and the estimation value of the obtained long-term feedback is maximized. Furthermore, when the action corresponding to the maximum estimation value output by the deep reinforcement learning model is: when the ordering action corresponding to the third identifier of the ordering action is executed based on the coupon activity, the coupon activity information can be sent to the user to be analyzed, so that the advertisement can be accurately touched to the user, and the conversion rate of converting the potential user into the actual user is improved. In addition, when the action corresponding to the maximum estimation value output by the deep reinforcement learning model is the order abandoning action, it can be determined that the user to be analyzed is not a potential user.
In addition, the estimation value of the long-term feedback corresponding to the motion information is: an estimated value of the long-term feedback obtained after the action corresponding to the action information is executed, and therefore, when the estimated value of the long-term feedback is larger, the estimated value of the long-term feedback is more consistent with the expectation to be achieved: the sum of the income of the platform, the platform users and the product users is maximized.
Moreover, the deep reinforcement learning model not only optimizes the short-term click benefit (i.e. the instant reward value), but also captures the long-term benefit index (i.e. the estimated value of the long-term feedback). Therefore, by applying the potential user determination method provided by the embodiment of the application, the ordering behavior can bring the improvement of the long-term income index.
In an embodiment of the present application, a server may obtain status information of products in a platform. The status information may include: the first profit value brought to the platform by the product, the second profit value brought to the platform user by the product and the third profit value brought to the product user using the product by the product. Then, the state information and the plurality of action information of the user to be analyzed are input into a deep reinforcement learning model obtained through pre-training, and an estimated value of long-term feedback corresponding to each action information is obtained. Wherein each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier, wherein the action identifier is an identifier of an ordering action for the product or an identifier of a quitting ordering action. And then, determining whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Because the deep reinforcement learning model can establish the optimal mapping relation between the state information and the action information, the server can determine the optimal action corresponding to the current state information of the product through the deep reinforcement learning model, namely, the optimal action of the user to be analyzed on the product can be determined through the deep reinforcement learning model. Further, when the action is determined to be a subscription action, then the user to be analyzed may be determined to be a potential user. In this way, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced. Moreover, the potential user determining method can determine the potential user under the condition of ensuring the benefits of the platform, the platform user and the product user.
In conclusion, by applying the potential user determination method provided by the embodiment of the application, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced. Moreover, potential users can be determined under the condition that the benefits of the platform, platform users and product users are guaranteed.
Corresponding to the foregoing method embodiment, an embodiment of the present application further provides a potential user determining apparatus, and referring to fig. 4, the apparatus includes:
a first obtaining module 401 configured to obtain status information of a product in a platform; the state information includes: a first profit value brought to the platform by the product, a second profit value brought to a platform user of the platform by the product and a third profit value brought to a product user using the product by the product;
an input module 402, configured to input the state information and the plurality of action information of the user to be analyzed into a deep reinforcement learning model obtained through pre-training, so as to obtain an estimated value of long-term feedback corresponding to each action information; each action information at least comprises: the method comprises the steps that characteristic information of a user to be analyzed and an action identifier are obtained, wherein the action identifier is an identifier of an ordering action aiming at a product or an identifier of a quitting ordering action;
and the determining module 403 is configured to determine whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
By applying the device provided by the embodiment of the application, the server can obtain the state information of the product in the platform. The status information may include: the first profit value brought to the platform by the product, the second profit value brought to the platform user by the product and the third profit value brought to the product user using the product by the product. Then, the state information and the plurality of action information of the user to be analyzed are input into a deep reinforcement learning model obtained through pre-training, and an estimated value of long-term feedback corresponding to each action information is obtained. Wherein each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier, wherein the action identifier is an identifier of an ordering action for the product or an identifier of a quitting ordering action. And then, determining whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Because the deep reinforcement learning model can establish the optimal mapping relation between the state information and the action information, the server can determine the optimal action corresponding to the current state information of the product through the deep reinforcement learning model, namely, the optimal action of the user to be analyzed on the product can be determined through the deep reinforcement learning model. Further, when the action is determined to be a subscription action, then the user to be analyzed may be determined to be a potential user. In this way, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced. Moreover, the potential user determining method can determine the potential user under the condition of ensuring the benefits of the platform, the platform user and the product user.
Optionally, in an embodiment of the present application, the deep reinforcement learning model includes a deep Q network model.
Optionally, in an embodiment of the present application, the apparatus further includes:
the system comprises a building module, a state information acquisition module and a state information analysis module, wherein the building module is configured to build a Markov decision process model before inputting state information and a plurality of action information of a user to be analyzed into a depth reinforcement learning model obtained by training in advance; wherein the Markov decision process model is: { S, A, R, T }; s represents the state information of the product, A represents the action information of the platform user for the action executed by the product, R represents the reward function, and T represents the state transition function;
a second obtaining module configured to obtain a plurality of training samples based on a Markov decision process model; wherein, each training sample comprises: the system comprises historical state information of a product, action information of an action executed by a target user in platform users aiming at the product under the state information, an instant reward value obtained after the target user executes the target action in the action information, and next state information corresponding to the state information after the target action is executed; the target action is as follows: a subscription action or a forgoing subscription action;
the optimization module is configured to optimize parameters of the initial Q function by using the training samples to obtain a trained deep Q network model; the deep neural network corresponding to the initial Q function consists of two convolutional layers and two fully-connected layers; the parameters include: learning rate, discount factor, and Q value.
Optionally, in this embodiment of the present application, the optimization module is specifically configured to:
and optimizing the parameters of the initial Q function by using a training sample and a greedy algorithm epsilon-greedy algorithm to obtain a trained depth Q network model.
Optionally, in this embodiment of the application, the instant reward value output by the reward function is a value corresponding to the ordering action (the revenue value increased by the first positive platform + the revenue value increased by the second positive platform user + the revenue value increased by the third positive platform user + the revenue value increased by the user to be analyzed) + a value corresponding to the abandoning ordering action is a first negative number; and the value corresponding to the ordering action is 1-the value corresponding to the ordering action is abandoned.
Optionally, in this embodiment of the present application, the identification of the ordering action for the product includes: one or more of a first identification to perform a subscription action based on the user recommendation, a second identification to perform the subscription action based on the privacy recommendation, a third identification to perform the subscription action based on the coupon activity, and a fourth identification to perform the subscription action through a subscription portal in the platform.
Optionally, in this embodiment of the present application, the feature information of the user to be analyzed includes:
and one or more items of account information, the number of fans, the number of live broadcast works and the type of preference works of the user to be analyzed.
Fig. 5 is a block diagram illustrating an apparatus 1900 for implementing the determination of potential users in accordance with an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 5, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform method steps of any of the potential user determination methods described above.
The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an embodiment of the present application, a server may obtain status information of products in a platform. The status information may include: the first profit value brought to the platform by the product, the second profit value brought to the platform user by the product and the third profit value brought to the product user using the product by the product. Then, the state information and the plurality of action information of the user to be analyzed are input into a deep reinforcement learning model obtained through pre-training, and an estimated value of long-term feedback corresponding to each action information is obtained. Wherein each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier, wherein the action identifier is an identifier of an ordering action for the product or an identifier of a quitting ordering action. And then, determining whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Because the deep reinforcement learning model can establish the optimal mapping relation between the state information and the action information, the server can determine the optimal action corresponding to the current state information of the product through the deep reinforcement learning model, namely, the optimal action of the user to be analyzed on the product can be determined through the deep reinforcement learning model. Further, when the action is determined to be a subscription action, then the user to be analyzed may be determined to be a potential user. In this way, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced. Moreover, the potential user determining method can determine the potential user under the condition of ensuring the benefits of the platform, the platform user and the product user.
Corresponding to the above method embodiment, the present application further provides a readable storage medium, and when executed by a processor of a server, the instructions in the storage medium enable the server to perform the method steps of any one of the above potential user determination methods. Wherein the readable storage medium is a computer readable storage medium.
After the computer program stored in the readable storage medium provided by the embodiment of the application is executed by the processor of the server, the server can obtain the state information of the product in the platform. The status information may include: the first profit value brought to the platform by the product, the second profit value brought to the platform user by the product and the third profit value brought to the product user using the product by the product. Then, the state information and the plurality of action information of the user to be analyzed are input into a deep reinforcement learning model obtained through pre-training, and an estimated value of long-term feedback corresponding to each action information is obtained. Wherein each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier, wherein the action identifier is an identifier of an ordering action for the product or an identifier of a quitting ordering action. And then, determining whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Because the deep reinforcement learning model can establish the optimal mapping relation between the state information and the action information, the server can determine the optimal action corresponding to the current state information of the product through the deep reinforcement learning model, namely, the optimal action of the user to be analyzed on the product can be determined through the deep reinforcement learning model. Further, when the action is determined to be a subscription action, then the user to be analyzed may be determined to be a potential user. In this way, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced. Moreover, the potential user determining method can determine the potential user under the condition of ensuring the benefits of the platform, the platform user and the product user.
Corresponding to the above method embodiment, this application embodiment also provides a computer program product, which when run on a server, causes the server to perform: method steps of any one of the above information push methods.
After the computer program product provided by the embodiment of the application is executed by the processor of the server, the server can obtain the state information of the product in the platform. The status information may include: the first profit value brought to the platform by the product, the second profit value brought to the platform user by the product and the third profit value brought to the product user using the product by the product. Then, the state information and the plurality of action information of the user to be analyzed are input into a deep reinforcement learning model obtained through pre-training, and an estimated value of long-term feedback corresponding to each action information is obtained. Wherein each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier, wherein the action identifier is an identifier of an ordering action for the product or an identifier of a quitting ordering action. And then, determining whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
Because the deep reinforcement learning model can establish the optimal mapping relation between the state information and the action information, the server can determine the optimal action corresponding to the current state information of the product through the deep reinforcement learning model, namely, the optimal action of the user to be analyzed on the product can be determined through the deep reinforcement learning model. Further, when the action is determined to be a subscription action, then the user to be analyzed may be determined to be a potential user. In this way, the potential customers can be determined through the deep reinforcement learning model, so that the efficiency of determining the potential customers is improved, and the labor cost can be reduced. Moreover, the potential user determining method can determine the potential user under the condition of ensuring the benefits of the platform, the platform user and the product user.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, server, computer-readable storage medium, and computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for related matters, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (16)

1. A method for potential user determination, the method comprising:
obtaining status information of a product in the platform; the state information includes: a first profit value brought to the platform by the product, a second profit value brought to a platform user of the platform by the product, and a third profit value brought to a product user using the product by the product;
inputting the state information and a plurality of action information of the user to be analyzed into a deep reinforcement learning model obtained by pre-training to obtain a long-term feedback estimation value corresponding to each action information; each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier are the identifier of the ordering action of the product or the identifier of the abandoning ordering action, wherein the deep reinforcement learning model obtained by pre-training is as follows: optimizing parameters of an initial Q function for a sample to obtain a trained deep Q network model, wherein the parameters comprise historical state information of the product, action information of a target user in the platform user aiming at an action executed by the product under the state information, an instant reward value obtained after the target user executes a target action in the action information, and next state information corresponding to the state information after the target action is executed, and the parameters comprise: learning rate, discount factor, and Q value;
and determining whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning model.
2. The method of claim 1, wherein the deep reinforcement learning model comprises a deep Q network model.
3. The method of claim 2, wherein prior to the step of inputting the state information and the plurality of motion information of the user to be analyzed into a pre-trained deep reinforcement learning model, the method further comprises:
constructing a Markov decision process model; wherein the Markov decision process model is: { S, A, R, T }; the S represents state information of the product, the A represents action information of an action performed by the platform user on the product, the R represents a reward function, and the T represents a state transition function;
obtaining a plurality of training samples based on the Markov decision process model; wherein, each training sample comprises: the system comprises historical state information of the product, action information of a target user in the platform users on actions performed on the product under the state information, an instant reward value obtained after the target user performs a target action in the action information, and next state information corresponding to the state information after the target action is performed; the target action is as follows: a subscription action or a forgoing subscription action;
optimizing parameters of the initial Q function by using the training sample to obtain a trained deep Q network model; the deep neural network corresponding to the initial Q function consists of two convolutional layers and two fully-connected layers; the parameters include: learning rate, discount factor, and Q value.
4. The method of claim 3, wherein the optimizing the parameters of the initial Q function using the training samples to obtain the trained deep Q network model comprises:
and optimizing the parameters of the initial Q function by using the training sample and a greedy algorithm epsilon-greedy algorithm to obtain a trained depth Q network model.
5. The method according to claim 3, wherein the instant reward value output by the reward function is a value corresponding to an order action (a first positive number, a platform added benefit value + a second positive number, a platform user added benefit value + a third positive number, the user added benefit value to be analyzed) + a value corresponding to a forgoing order action, a first negative number; and the value corresponding to the ordering action is 1-the value corresponding to the ordering action is abandoned.
6. The method of claim 1, wherein the identification of the ordering action for the product comprises: one or more of a first identification to perform a subscription action based on a user recommendation, a second identification to perform a subscription action based on a privacy recommendation, a third identification to perform a subscription action based on coupon activity, and a fourth identification to perform a subscription action through a subscription portal in the platform.
7. The method according to any one of claims 1-6, wherein the characteristic information of the user to be analyzed comprises:
and one or more items of account information, the number of fans, the number of live broadcast works and the type of preference works of the user to be analyzed.
8. A potential user determination apparatus, the apparatus comprising:
a first obtaining module configured to obtain status information of a product in a platform; the state information includes: a first profit value brought to the platform by the product, a second profit value brought to a platform user of the platform by the product, and a third profit value brought to a product user using the product by the product;
the input module is configured to input the state information and the plurality of pieces of action information of the user to be analyzed into a deep reinforcement learning model obtained through pre-training to obtain an estimated value of long-term feedback corresponding to each piece of action information; each action information at least comprises: the characteristic information of the user to be analyzed and an action identifier are the identifier of the ordering action of the product or the identifier of the abandoning ordering action, wherein the deep reinforcement learning model obtained by pre-training is as follows: optimizing parameters of an initial Q function for a sample to obtain a trained deep Q network model, wherein the parameters comprise historical state information of the product, action information of a target user in the platform user aiming at an action executed by the product under the state information, an instant reward value obtained after the target user executes a target action in the action information, and next state information corresponding to the state information after the target action is executed, and the parameters comprise: learning rate, discount factor, and Q value;
and the determining module is configured to determine whether the user to be analyzed is a potential user of the product according to the action corresponding to the maximum estimation value output by the deep reinforcement learning.
9. The apparatus of claim 8, wherein the deep reinforcement learning model comprises a deep Q network model.
10. The apparatus of claim 9, further comprising:
a building module configured to build a Markov decision process model before inputting the state information and a plurality of action information of a user to be analyzed into a deep reinforcement learning model trained in advance; wherein the Markov decision process model is: { S, A, R, T }; the S represents state information of the product, the A represents action information of an action performed by the platform user on the product, the R represents a reward function, and the T represents a state transition function;
a second obtaining module configured to obtain a plurality of training samples based on the Markov decision process model; wherein, each training sample comprises: the system comprises historical state information of the product, action information of a target user in the platform users on actions performed on the product under the state information, an instant reward value obtained after the target user performs a target action in the action information, and next state information corresponding to the state information after the target action is performed; the target action is as follows: a subscription action or a forgoing subscription action;
the optimization module is configured to optimize parameters of the initial Q function by using the training samples to obtain a trained deep Q network model; the deep neural network corresponding to the initial Q function consists of two convolutional layers and two fully-connected layers; the parameters include: learning rate, discount factor, and Q value.
11. The apparatus of claim 10, wherein the optimization module is specifically configured to:
and optimizing the parameters of the initial Q function by using the training sample and a greedy algorithm epsilon-greedy algorithm to obtain a trained depth Q network model.
12. The apparatus according to claim 10, wherein the reward function outputs an instant reward value (first positive value + platform added benefit value + second positive value + platform user added benefit value + third positive value + user added benefit value + to-be-analyzed user added benefit value) + abandoning the value corresponding to the ordering action (first negative value); and the value corresponding to the ordering action is 1-the value corresponding to the ordering action is abandoned.
13. The apparatus of claim 8, wherein the identification of the ordering action for the product comprises: one or more of a first identification to perform a subscription action based on a user recommendation, a second identification to perform a subscription action based on a privacy recommendation, a third identification to perform a subscription action based on coupon activity, and a fourth identification to perform a subscription action through a subscription portal in the platform.
14. The apparatus according to any one of claims 8-13, wherein the characteristic information of the user to be analyzed comprises:
and one or more items of account information, the number of fans, the number of live broadcast works and the type of preference works of the user to be analyzed.
15. A server, comprising:
a processor, a memory for storing processor-executable instructions;
wherein the processor is configured to perform the method of any one of claims 1 to 7.
16. A readable storage medium in which instructions, when executed by a processor of a server, enable the server to perform the method of any one of claims 1 to 7.
CN201811526942.8A 2018-12-13 2018-12-13 Potential customer determination method, device, server and readable storage medium Active CN109711871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811526942.8A CN109711871B (en) 2018-12-13 2018-12-13 Potential customer determination method, device, server and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811526942.8A CN109711871B (en) 2018-12-13 2018-12-13 Potential customer determination method, device, server and readable storage medium

Publications (2)

Publication Number Publication Date
CN109711871A CN109711871A (en) 2019-05-03
CN109711871B true CN109711871B (en) 2021-03-12

Family

ID=66255738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811526942.8A Active CN109711871B (en) 2018-12-13 2018-12-13 Potential customer determination method, device, server and readable storage medium

Country Status (1)

Country Link
CN (1) CN109711871B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027676B (en) * 2019-11-28 2022-03-18 支付宝(杭州)信息技术有限公司 Target user selection method and device
CN111382359B (en) * 2020-03-09 2024-01-12 北京京东振世信息技术有限公司 Service policy recommendation method and device based on reinforcement learning, and electronic equipment
CN112200610A (en) * 2020-10-10 2021-01-08 苏州创旅天下信息技术有限公司 Marketing information delivery method, system and storage medium
CN113129108B (en) * 2021-04-26 2023-05-30 山东大学 Product recommendation method and device based on Double DQN algorithm
CN113256390A (en) * 2021-06-16 2021-08-13 平安科技(深圳)有限公司 Product recommendation method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005918A (en) * 2015-07-24 2015-10-28 金鹃传媒科技股份有限公司 Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof
CN108230058A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Products Show method and system
CN108305167A (en) * 2018-01-12 2018-07-20 华南理工大学 A kind of foreign currency trade method and system enhancing learning algorithm based on depth

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6429819B2 (en) * 2016-03-18 2018-11-28 ヤフー株式会社 Information providing apparatus and information providing method
CN107451832B (en) * 2016-05-30 2023-09-05 北京京东尚科信息技术有限公司 Method and device for pushing information
CN108230057A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 A kind of intelligent recommendation method and system
CN108492146A (en) * 2018-03-30 2018-09-04 口口相传(北京)网络技术有限公司 Preferential value calculating method, server-side and client based on user-association behavior
CN108960929A (en) * 2018-07-16 2018-12-07 苏州大学 Consider the social networks marketing seed user choosing method that existing product influences

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005918A (en) * 2015-07-24 2015-10-28 金鹃传媒科技股份有限公司 Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof
CN108230058A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Products Show method and system
CN108305167A (en) * 2018-01-12 2018-07-20 华南理工大学 A kind of foreign currency trade method and system enhancing learning algorithm based on depth

Also Published As

Publication number Publication date
CN109711871A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109711871B (en) Potential customer determination method, device, server and readable storage medium
US11200592B2 (en) Simulation-based evaluation of a marketing channel attribution model
CN107463701B (en) Method and device for pushing information stream based on artificial intelligence
US20110258045A1 (en) Inventory management
CN108777701B (en) Method and device for determining information audience
CN107463580B (en) Click rate estimation model training method and device and click rate estimation method and device
US11593860B2 (en) Method, medium, and system for utilizing item-level importance sampling models for digital content selection policies
CN112100489B (en) Object recommendation method, device and computer storage medium
US10657559B2 (en) Generating and utilizing a conversational index for marketing campaigns
CN110971659A (en) Recommendation message pushing method and device and storage medium
JP2011096255A (en) Ranking oriented cooperative filtering recommendation method and device
CN108734499B (en) Promotion information effect analysis method and device and computer readable medium
CN111798280B (en) Multimedia information recommendation method, device and equipment and storage medium
WO2021130771A1 (en) System and method of machine learning based deviation prediction and interconnected-metrics derivation for action recommendations
CN112785144A (en) Model construction method, device and storage medium based on federal learning
CN116739665A (en) Information delivery method and device, electronic equipment and storage medium
CN107527128B (en) Resource parameter determination method and equipment for advertisement platform
US20220108334A1 (en) Inferring unobserved event probabilities
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN110490682B (en) Method and device for analyzing commodity attributes
CN111260416A (en) Method and device for determining associated user of object
CN115345635A (en) Processing method and device for recommended content, computer equipment and storage medium
CN112015978B (en) Custom information sending method and device and electronic equipment
CN117114766A (en) Cost control factor determining method, device, equipment and storage medium
CN111506643B (en) Method, device and system for generating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant