CN109711871A - A kind of potential customers determine method, apparatus, server and readable storage medium storing program for executing - Google Patents
A kind of potential customers determine method, apparatus, server and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN109711871A CN109711871A CN201811526942.8A CN201811526942A CN109711871A CN 109711871 A CN109711871 A CN 109711871A CN 201811526942 A CN201811526942 A CN 201811526942A CN 109711871 A CN109711871 A CN 109711871A
- Authority
- CN
- China
- Prior art keywords
- user
- product
- platform
- value
- status information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application is to determine method, apparatus, server and readable storage medium storing program for executing about a kind of potential customers.This method comprises: obtaining the status information of the product in platform;Status information, which includes: product to the first financial value of platform bring, product, gives product user bring third financial value using product to the second financial value of platform user bring and product of platform;Multiple action messages of status information and user to be analyzed are input to the deeply learning model that training obtains in advance, obtain the estimated value of the corresponding long-term feedback of each action message;According to deeply learning model export maximum estimated value corresponding to movement, determine user to be analyzed whether be product potential user.In this way, potential customers can be determined by deeply learning model, to improve the efficiency of determining potential customers and can reduce human cost.
Description
Technical field
This application involves Internet technical field, more particularly to a kind of potential customers determine method, apparatus, server and
Readable storage medium storing program for executing.
Background technique
In order to allow the product (such as bean vermicelli top news) in live streaming platform preferably to promote, it is often necessary to determine the potential of product
Client, so as to targetedly attract potential user to use the product.Wherein, potential user typically refers to have purchase meaning
To the customer for but not yet becoming product user.
Currently, determining the mode of the potential customers of product are as follows: which user staff artificially determines according to previous experiences
For potential customers.But this kind determines that the mode of potential customers needs staff to have experience more abundant;Moreover, technology
Personnel need just determine potential customers after analyzing a large amount of user.That is, this kind determines potential customers'
Mode efficiency is lower and needs to consume a large amount of human cost.
Summary of the invention
To overcome the problems in correlation technique, the application provides a kind of potential customers and determines method, apparatus, server
And readable storage medium storing program for executing, can be improved the efficiency of determining potential customers and can reduce human cost.
According to the embodiment of the present application in a first aspect, providing a kind of potential user determines method, this method comprises:
Obtain the status information of the product in platform;Status information includes: product to the first financial value of platform bring, production
Product give the product user bring third financial value using product to the second financial value of platform user bring and product of platform;
Multiple action messages of status information and user to be analyzed are input to the deeply study that training obtains in advance
Model obtains the estimated value of the corresponding long-term feedback of each action message;It is included at least in each action message: user to be analyzed
Characteristic information and an action identification, the action identification be for product order movement mark or abandon order movement
Mark;Movement corresponding to the maximum estimated value exported according to deeply learning model, determines whether user to be analyzed is production
The potential user of product.
Optionally, in the embodiment of the present application, deeply learning model includes depth Q network model.
Optionally, in the embodiment of the present application, multiple action messages of the status information and user to be analyzed are defeated
Enter to before the step of training obtained deeply learning model in advance, this method can also include:
Construct Markovian decision process model;Wherein, Markovian decision process model are as follows: { S, A, R, T };S is indicated
The status information of product, A indicate that platform user indicates that reward function, T indicate for the performed action message acted of product, R
State transition function;
Based on Markovian decision process model, multiple training samples are obtained;It wherein, include: to produce in each training sample
Target user in a product status information in history, platform user acts under the status information for performed by product
Action message, that target user executes instant reward value obtained after the target action in the action message, performance objective is dynamic
The corresponding NextState information of the status information after work;Target action are as follows: order acts or abandon order movement;
It is optimized using parameter of the training sample to initial Q function, the depth Q network model after being trained;Initial Q
Deep neural network corresponding to function is made of two layers of convolutional layer and two layers of full articulamentum;Parameter includes: learning rate, discount
The factor and Q value.
Optionally, in the embodiment of the present application, it is optimized, is instructed using parameter of the training sample to initial Q function
Depth Q network model after white silk includes:
The parameter of initial Q function is optimized using training sample and greedy algorithm ε-greedy algorithm, is trained
Depth Q network model afterwards.
Optionally, in the embodiment of the present application, instant reward value=order of reward function output acts corresponding value *
(the increased increased financial value of the+the second positive number of the financial value * platform user+third positive number * user to be analyzed of the first positive number * platform
Increased financial value)+abandon corresponding the first negative of value * of order movement;Wherein, order acts corresponding value=1- and abandons
Order acts corresponding value.
Optionally, in the embodiment of the present application, the mark for the order movement of product includes: to recommend to execute based on user
The first identifier of order movement recommends the second identifier for executing order movement based on personal letter, executes order based on discount coupon activity
The third of movement identifies, and, by one or more in the 4th mark for ordering entrance execution order movement in platform.
Optionally, in the embodiment of the present application, the characteristic information of user to be analyzed includes:
One or more in the account information of user to be analyzed, bean vermicelli quantity, live streaming works quantity and preference works type
?.
According to the second aspect of the embodiment of the present application, a kind of potential user's determining device is provided, which includes:
First obtains module, is configured as obtaining the status information of the product in platform;Status information includes: product to flat
The first financial value of platform bring, product give the product using product to the second financial value of platform user bring and product of platform
User's bring third financial value;
Input module is configured as multiple action messages of status information and user to be analyzed being input to trained in advance
The deeply learning model arrived, obtains the estimated value of the corresponding long-term feedback of each action message;In each action message extremely
It less include: the characteristic information and an action identification of user to be analyzed, which is the mark for the order movement of product
Know or abandon the mark of order movement;
Determining module is configured as movement corresponding to the maximum estimated value exported according to deeply learning model, really
Fixed user to be analyzed whether be product potential user.
Optionally, in the embodiment of the present application, deeply learning model includes depth Q network model.
Optionally, in the embodiment of the present application, the device further include:
Module is constructed, is configured as multiple action messages of status information and user to be analyzed being input to preparatory training
Before obtained deeply learning model, Markovian decision process model is constructed;Wherein, Markovian decision process model
Are as follows: { S, A, R, T };S indicates that the status information of product, A indicate platform user for the performed action message acted of product, R
Indicate that reward function, T indicate state transition function;
Second obtains module, is configured as obtaining multiple training samples based on Markovian decision process model;Wherein,
It include: that a product status information in history, the target user in platform user believe in the state in each training sample
Breath is lower executed for the performed action message acted of product, target user it is obtained after the target action in the action message
The corresponding NextState information of the status information after instant reward value, performance objective movement;Target action are as follows: order is acted or put
Abandon order movement;
Optimization module is configured as optimizing using parameter of the training sample to initial Q function, the depth after being trained
Spend Q network model;Deep neural network corresponding to initial Q function is made of two layers of convolutional layer and two layers of full articulamentum;Parameter
It include: learning rate, discount factor and Q value.
Optionally, in the embodiment of the present application, optimization module is configured specifically are as follows:
The parameter of initial Q function is optimized using training sample and greedy algorithm ε-greedy algorithm, is trained
Depth Q network model afterwards.
Optionally, in the embodiment of the present application, instant reward value=order of reward function output acts corresponding value *
(the increased increased financial value of the+the second positive number of the financial value * platform user+third positive number * user to be analyzed of the first positive number * platform
Increased financial value)+abandon corresponding the first negative of value * of order movement;Wherein, order acts corresponding value=1- and abandons
Order acts corresponding value.
Optionally, in the embodiment of the present application, the mark for the order movement of product includes: to recommend to execute based on user
The first identifier of order movement recommends the second identifier for executing order movement based on personal letter, executes order based on discount coupon activity
The third of movement identifies, and, by one or more in the 4th mark for ordering entrance execution order movement in platform.
Optionally, in the embodiment of the present application, the characteristic information of user to be analyzed includes:
One or more in the account information of user to be analyzed, bean vermicelli quantity, live streaming works quantity and preference works type
?.
According to the third aspect of the embodiment of the present application, a kind of server is provided, comprising:
Processor, the memory for storage processor executable instruction;
Wherein, processor is configured as executing the method step that any one of above-mentioned first aspect potential user determines method
Suddenly.
According to the fourth aspect of the embodiment of the present application, provide a kind of readable storage medium storing program for executing, when the instruction in storage medium by
When the processor of server executes, enables the server to execute any one of above-mentioned first aspect potential user and determine method
Method and step.
According to the 5th of the embodiment of the present application the aspect, a kind of computer program product is provided, when it runs on the server
When, so that server executes: any one of above-mentioned first aspect potential user determines the method and step of method.
In the embodiment of the present application, server can obtain the status information of the product in platform.The status information can be with
The second financial value of platform user bring of to include: the product the give platform to first financial value of platform bring, the product
The product user bring third financial value using the product is given with the product.Then, by the status information and user to be analyzed
Multiple action messages be input to the obtained deeply learning model of training in advance, it is corresponding long-term to obtain each action message
The estimated value of feedback.Wherein, it is included at least in each action message: the characteristic information of the user to be analyzed and a movement mark
Know, which is the mark for the order movement of the product or the mark for abandoning order movement.It later, can be according to depth
Movement corresponding to the maximum estimated value of intensified learning model output is spent, determines whether the user to be analyzed is the potential of the product
User.
Since deeply learning model can establish the optimal mapping relations of status information and action message, thus service
Device can determine best movement corresponding to the current status information of product by deeply learning model, that is, Ke Yitong
Depth intensified learning model is crossed to determine best movement of the user to be analyzed for the product.In turn, when determine the movement be order
When purchase acts, then it can determine that the user to be analyzed is potential user.In this way, can be determined by deeply learning model
Potential customers, to improve the efficiency of determining potential customers and can reduce human cost.Moreover, this kind of potential user determination side
Formula can determine potential user in the case where guaranteeing platform, platform user and the income of product user tripartite.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The application can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application
Example, and together with specification it is used to explain the principle of the application.
Fig. 1 is the flow chart that a kind of potential user shown according to an exemplary embodiment determines method.
Fig. 2 is a kind of flow chart for calculating Q value shown according to an exemplary embodiment.
Fig. 3 is a kind of structural schematic diagram of deep neural network shown according to an exemplary embodiment.
Fig. 4 is a kind of block diagram of potential user's determining device shown according to an exemplary embodiment.
Fig. 5 is a kind of block diagram of server shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the application.
Mode efficiency in order to solve to determine potential customers in the prior art is lower and needs to consume a large amount of human cost
The problem of, the embodiment of the present application provides a kind of potential customers and determines method, apparatus, server and computer-readable storage medium
Matter.
Method, which is illustrated, to be determined to potential customers provided by the embodiments of the present application first below.
Fig. 1 is the flow chart that a kind of potential customers shown according to an exemplary embodiment determine method.The potential customers
Determine that method is applied to server, as shown in Figure 1, method includes the following steps:
S101: the status information of the product in platform is obtained;Status information includes: that product gives platform bring the first income
Value, product are given to the second financial value of platform user bring and product of platform and are received using the product user bring third of product
Benefit value;
Wherein, the platform in the embodiment of the present application can be live streaming platform, which can be the bean vermicelli in live streaming platform
Top product, is not limited thereto certainly.
It is understood that in one implementation, the status information of server product obtained may include: to produce
Product are given to the second financial value of platform user bring and product of platform to the first financial value of platform bring, product and use product
Product user bring third financial value.
Wherein, product can be calculated to the first financial value of platform bring by product user quantity and product price
It arrives.For example, the first financial value=product user quantity * product price.
Product can obtain in the following way to the second financial value of platform user bring: server statistics platform user
Use the value added of the duration of the platform.Then, which is quantified as the product gives platform user bring the second income
Value.For example, value added * first income coefficient of the second financial value=platform user using the duration of the platform.Wherein, this first
Income coefficient can indicate: potential income value brought by every value added for increasing unit time.
Product is given can be calculated in the following way using the product user bring third financial value of product: server is united
Count the bean vermicelli incrementss of product user and the newly-increased works number of product user.Then, by the bean vermicelli incrementss and newly-increased works
Number is quantified as the product and gives product user bring third financial value.For example, third financial value=product user bean vermicelli increases
The newly-increased works number * third income coefficient of the second income of dosage * coefficient+product user.Wherein, which can be with
Indicate: potential income value brought by one bean vermicelli of every increase, third income coefficient can indicate: one works institute band of every increase
The potential income value come.
In another implementation, status information can also include: platform user on the basis of comprising above content
The order distribution of total quantity, the product (such as buy in the product number of users of 10 yuan of orders and buy 20 in the product
The number of users of first order) and total injected volume (such as the product is to total exposure amount of live streaming works) of the product in it is a kind of or more
Kind, this is all reasonable.
S102: multiple action messages of status information and user to be analyzed are input to the deeply that training obtains in advance
Learning model obtains the estimated value of the corresponding long-term feedback of each action message;It is included at least in each action message: to be analyzed
The characteristic information of user and an action identification, the action identification are the mark for the order movement of product or abandon ordering dynamic
The mark of work;
Wherein, deeply learning model includes depth Q network model (i.e. DQN model), is not limited thereto certainly.
The characteristic information of user to be analyzed may include: user to be analyzed in the account information of the platform, bean vermicelli quantity, straight
It broadcasts one or more in works quantity and preference works type.It is, of course, also possible to include the embedding that training obtains in advance
Feature, this is all reasonable.
Also, in one implementation, action identification can be the mark for the order movement of product or abandon ordering
The mark of purchase movement.In this kind of implementation, for the corresponding action message of mark of the order movement of product, for production
The mark for abandoning order movement of product corresponds to another action message.In this way, when the status information of product is acted with the two
After information input to trained deeply learning model, estimating for the corresponding long-term feedback of each action message can be exported
Evaluation.
In another implementation, the mark that should be acted for the order of product can also include: to be recommended based on user
The first identifier of order movement is executed, the second identifier for executing order movement is recommended based on personal letter, is executed based on discount coupon activity
Order movement third mark, and, by platform order entrance execute order movement the 4th mark in one or
It is multinomial.Wherein, the mark that the order when this for product acts includes: first identifier, second identifier, third mark and the 4th mark
When knowledge, first identifier, second identifier, third mark and the 4th mark respectively correspond an action message, for abandoning for product
The mark of order movement corresponds to an action message, totally five action messages.So, by the status information of product and this five
Action message is input to after trained deeply learning model, and deeply learning model can export each movement letter
Cease the estimated value of corresponding long-term feedback.
Wherein, deeply learning model includes depth Q network model (i.e. DQN model), is not limited thereto certainly.When
When deeply learning model is DQN model, as shown in Fig. 2, being inputted by the status information of product and this five action messages
After to trained DQN model, DQN model can export the corresponding Q value of each action message, i.e. Q1, Q2, Q3, Q4 and Q5.
Further it will be understood that server can construct Markovian decision process mould before executing step S102
Type.It is then possible to be based on Markovian decision process model, multiple training samples are obtained.
When constructed Markovian decision process model are as follows: include: product in each training sample when { S, A, R, T }
Target user in a status information in history, platform user is under the status information for movement performed by product
Action message, target user execute instant reward value obtained after the target action in the action message, performance objective movement
The corresponding NextState information of the status information afterwards.Wherein, target action are as follows: order acts or abandon order movement.
Wherein, an above-mentioned historical status information can be according to the setting of the status information of the product in step S101
Mode is set, and this will not be repeated here.
In addition, R=R (s, a, s'), R indicate that execution acts a in the state that status information s is corresponded to, and it is transferred to state
Instant reward value obtained when the corresponding state of information s'.T=T (s, a, s'), T indicate that execution acts a on state s, and
It is transferred to the probability of state s'.In addition, learning the relevant technologies according to deeply it is found that the corresponding state of status information s shifts
It is determined by the movement taken under the status information.
In one example of the application, can be set: the instant reward value of reward function output=order movement is corresponding
(the increased increased financial value of the+the second positive number of the financial value * platform user+third positive number * of the first positive number * platform is to be analyzed by value *
The increased financial value of user)+abandon corresponding the first negative of value * of order movement.Wherein, order acts corresponding value=1-
It abandons order and acts corresponding value.Certainly, the design of reward function is not limited to.Wherein, the first positive number, the second positive number,
The value of third positive number and the first negative can be set according to the actual situation, be not specifically limited herein.
In addition, when the deeply learning model to be trained is DQN model, after obtaining training sample, service
Device can also optimize the parameter of initial Q function using training sample, the DQN model after being trained.Wherein, initial Q
Deep neural network corresponding to function can two layers of convolutional layer as shown in Figure 3 and two layers of full articulamentum composition.Parameter includes:
Learning rate, discount factor and Q value.Also, the obtained DQN model of training stores the knowledge learnt, which can be with
The mapping relations as status information and most preferably acted.
Specifically, can define Q (S, A) is Q value under original state, Q (S ', after a) being converted into S ' after movement a effect for S
Q value, W indicate deep neural network propagated forward, then:
Q (S ', a)=W (S, A, the characteristic information of user to be analyzed)
The characteristic information that network W receives original state S, former movement A and user to be analyzed is input (generation refers to a), then model is excellent
Change function are as follows:
Q (S, A) ← Q (S, A)+α [R+ γ maxαQ (S ', a)-Q (S, A)];
S←S';
Loop iteration above step is until S restrains.
In this strategy, demand solution make Q (S ', a) maximum movement a value, is used herein as greedy algorithm ε-greedy algorithm:
A=argmaxaQ (a), probability 1- ε;
A=randomly chooses a movement, probability ε;
Wherein, which can realize the balance of explore-exploit by adjusting probability threshold value ε.
Wherein, initial Q function is the function of DQN in the related technology, and learning rate, discount factor and Q value are the related skill of DQN
Parameter in art, does not elaborate herein.
In addition, can also be joined using new training sample to the DQN model after the DQN model after being trained
Number fine tuning, to realize DQN model modification.Wherein it is possible to adjust the update cycle (such as 1 of DQN model according to specific requirements
Week) so that DQN model possesses better scalability and robustness, so that DQN model can determine user more accurately
It whether is potential customers, this is reasonable.
S103: movement corresponding to the maximum estimated value exported according to deeply learning model determines user to be analyzed
Whether be product potential user.
Wherein, when deeply learning model output maximum estimated value corresponding to movement be order act when, then may be used
With determination, user to be analyzed is potential user, and can send Products Show information (such as advertisement) to the user to be analyzed,
So that potential user is converted into actual user, to keep the estimated value of long-term feedback obtained maximum.Moreover, working as deeply
Movement corresponding to the maximum estimated value of learning model output are as follows: the third for executing order movement based on discount coupon activity identifies institute
When corresponding order acts, discount coupon action message can be sent to the user to be analyzed, so as to accurately touch advertisement
Up to user, to improve the conversion ratio that potential user is converted into actual user.In addition, when the maximum of deeply learning model output
Movement corresponding to estimated value is that when abandoning order movement, then can determine that user to be analyzed is not potential user.
Further, since the estimated value of the corresponding long-term feedback of action message are as follows: execute movement corresponding to the action message
The estimated value of long-term feedback obtained afterwards, thus, then show more to meet to be reached when the estimated value of the long-term feedback is bigger
To expectation: keep platform, platform user and product user tripartite's income summation maximum.
Optimize moreover, because deeply learning model is done not only for short-term click income (instant reward value),
Long-term proceeds indicatior (i.e. the estimated value of long-term feedback) can also be captured.Thus, using provided by the embodiments of the present application potential
User determines method, and order behavior can be enable to bring the promotion of long-term gain index.
In the embodiment of the present application, server can obtain the status information of the product in platform.The status information can be with
The second financial value of platform user bring of to include: the product the give platform to first financial value of platform bring, the product
The product user bring third financial value using the product is given with the product.Then, by the status information and user to be analyzed
Multiple action messages be input to the obtained deeply learning model of training in advance, it is corresponding long-term to obtain each action message
The estimated value of feedback.Wherein, it is included at least in each action message: the characteristic information of the user to be analyzed and a movement mark
Know, which is the mark for the order movement of the product or the mark for abandoning order movement.It later, can be according to depth
Movement corresponding to the maximum estimated value of intensified learning model output is spent, determines whether the user to be analyzed is the potential of the product
User.
Since deeply learning model can establish the optimal mapping relations of status information and action message, thus service
Device can determine best movement corresponding to the current status information of product by deeply learning model, that is, Ke Yitong
Depth intensified learning model is crossed to determine best movement of the user to be analyzed for the product.In turn, when determine the movement be order
When purchase acts, then it can determine that the user to be analyzed is potential user.In this way, can be determined by deeply learning model
Potential customers, to improve the efficiency of determining potential customers and can reduce human cost.Moreover, this kind of potential user determination side
Formula can determine potential user in the case where guaranteeing platform, platform user and the income of product user tripartite.
To sum up, method is determined using potential user provided by the embodiments of the present application, can pass through deeply learning model
Potential customers are determined, to improve the efficiency of determining potential customers and can reduce human cost.Furthermore, it is possible to guaranteeing to put down
Potential user is determined in the case where the income of platform, platform user and product user tripartite.
Corresponding to above method embodiment, the embodiment of the present application also provides a kind of potential user's determining devices, referring to figure
4, which includes:
First obtains module 401, is configured as obtaining the status information of the product in platform;Status information includes: product
It is given to the second financial value of platform user bring and product of platform using product to the first financial value of platform bring, product
Product user bring third financial value;
Input module 402 is configured as multiple action messages of status information and user to be analyzed being input to preparatory instruction
The deeply learning model got, obtains the estimated value of the corresponding long-term feedback of each action message;Each action message
In include at least: the characteristic information of user to be analyzed and an action identification, the action identification be for product order act
Mark or abandon order movement mark;
Determining module 403 is configured as movement corresponding to the maximum estimated value exported according to deeply learning model,
Determine user to be analyzed whether be product potential user.
Using device provided by the embodiments of the present application, server can obtain the status information of the product in platform.The shape
State information, which may include: the product, gives the platform user bring of the platform to first financial value of platform bring, the product
Second financial value and the product give the product user bring third financial value using the product.Then, by the status information and
Multiple action messages of user to be analyzed are input to the deeply learning model that training obtains in advance, obtain each action message
The estimated value of corresponding long-term feedback.Wherein, it is included at least in each action message: the characteristic information of the user to be analyzed and one
A action identification, the action identification are the mark for the order movement of the product or the mark for abandoning order movement.Later, may be used
With movement corresponding to the maximum estimated value that exports according to deeply learning model, determine whether the user to be analyzed is the production
The potential user of product.
Since deeply learning model can establish the optimal mapping relations of status information and action message, thus service
Device can determine best movement corresponding to the current status information of product by deeply learning model, that is, Ke Yitong
Depth intensified learning model is crossed to determine best movement of the user to be analyzed for the product.In turn, when determine the movement be order
When purchase acts, then it can determine that the user to be analyzed is potential user.In this way, can be determined by deeply learning model
Potential customers, to improve the efficiency of determining potential customers and can reduce human cost.Moreover, this kind of potential user determination side
Formula can determine potential user in the case where guaranteeing platform, platform user and the income of product user tripartite.
Optionally, in the embodiment of the present application, deeply learning model includes depth Q network model.
Optionally, in the embodiment of the present application, the device further include:
Module is constructed, is configured as multiple action messages of status information and user to be analyzed being input to preparatory training
Before obtained deeply learning model, Markovian decision process model is constructed;Wherein, Markovian decision process model
Are as follows: { S, A, R, T };S indicates that the status information of product, A indicate platform user for the performed action message acted of product, R
Indicate that reward function, T indicate state transition function;
Second obtains module, is configured as obtaining multiple training samples based on Markovian decision process model;Wherein,
It include: that a product status information in history, the target user in platform user believe in the state in each training sample
Breath is lower executed for the performed action message acted of product, target user it is obtained after the target action in the action message
The corresponding NextState information of the status information after instant reward value, performance objective movement;Target action are as follows: order is acted or put
Abandon order movement;
Optimization module is configured as optimizing using parameter of the training sample to initial Q function, the depth after being trained
Spend Q network model;Deep neural network corresponding to initial Q function is made of two layers of convolutional layer and two layers of full articulamentum;Parameter
It include: learning rate, discount factor and Q value.
Optionally, in the embodiment of the present application, optimization module is configured specifically are as follows:
The parameter of initial Q function is optimized using training sample and greedy algorithm ε-greedy algorithm, is trained
Depth Q network model afterwards.
Optionally, in the embodiment of the present application, instant reward value=order of reward function output acts corresponding value *
(the increased increased financial value of the+the second positive number of the financial value * platform user+third positive number * user to be analyzed of the first positive number * platform
Increased financial value)+abandon corresponding the first negative of value * of order movement;Wherein, order acts corresponding value=1- and abandons
Order acts corresponding value.
Optionally, in the embodiment of the present application, the mark for the order movement of product includes: to recommend to execute based on user
The first identifier of order movement recommends the second identifier for executing order movement based on personal letter, executes order based on discount coupon activity
The third of movement identifies, and, by one or more in the 4th mark for ordering entrance execution order movement in platform.
Optionally, in the embodiment of the present application, the characteristic information of user to be analyzed includes:
One or more in the account information of user to be analyzed, bean vermicelli quantity, live streaming works quantity and preference works type
?.
Fig. 5 is a kind of frame for realizing the device 1900 for determining potential user shown according to an exemplary embodiment
Figure.For example, device 1900 may be provided as a server.Referring to Fig. 5, device 1900 includes processing component 1922, into one
Step includes one or more processors and memory resource represented by a memory 1932, and being used to store can be by processing group
The instruction of the execution of part 1922, such as application program.The application program stored in memory 1932 may include one or one
Each above corresponds to the module of one group of instruction.In addition, processing component 1922 is configured as executing instruction, it is above-mentioned to execute
Any one potential user determines the method and step of method.
Device 1900 can also include that a power supply module 1926 be configured as the power management of executive device 1900, and one
Wired or wireless network interface 1950 is configured as device 1900 being connected to network and input and output (I/O) interface
1958.Device 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac
OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In the embodiment of the present application, server can obtain the status information of the product in platform.The status information can be with
The second financial value of platform user bring of to include: the product the give platform to first financial value of platform bring, the product
The product user bring third financial value using the product is given with the product.Then, by the status information and user to be analyzed
Multiple action messages be input to the obtained deeply learning model of training in advance, it is corresponding long-term to obtain each action message
The estimated value of feedback.Wherein, it is included at least in each action message: the characteristic information of the user to be analyzed and a movement mark
Know, which is the mark for the order movement of the product or the mark for abandoning order movement.It later, can be according to depth
Movement corresponding to the maximum estimated value of intensified learning model output is spent, determines whether the user to be analyzed is the potential of the product
User.
Since deeply learning model can establish the optimal mapping relations of status information and action message, thus service
Device can determine best movement corresponding to the current status information of product by deeply learning model, that is, Ke Yitong
Depth intensified learning model is crossed to determine best movement of the user to be analyzed for the product.In turn, when determine the movement be order
When purchase acts, then it can determine that the user to be analyzed is potential user.In this way, can be determined by deeply learning model
Potential customers, to improve the efficiency of determining potential customers and can reduce human cost.Moreover, this kind of potential user determination side
Formula can determine potential user in the case where guaranteeing platform, platform user and the income of product user tripartite.
Corresponding to above method embodiment, the embodiment of the present application also provides a kind of readable storage medium storing program for executing, work as storage medium
In instruction by server processor execute when, enable the server to execute any of the above-described potential user determine method
Method and step.Wherein, which is computer readable storage medium.
The computer program stored in readable storage medium storing program for executing provided by the embodiments of the present application is executed by the processor of server
Afterwards, server can obtain the status information of the product in platform.The status information may include: that the product is brought to the platform
The first financial value, the product to the platform the second financial value of platform user bring and the product give using the product production
Product user's bring third financial value.Then, multiple action messages of the status information and user to be analyzed are input in advance
The deeply learning model that training obtains, obtains the estimated value of the corresponding long-term feedback of each action message.Wherein, Mei Gedong
Make to include at least in information: the characteristic information of the user to be analyzed and an action identification, the action identification are for the product
Order movement mark or abandon order movement mark.Later, the maximum that can be exported according to deeply learning model
Movement corresponding to estimated value, determine the user to be analyzed whether be the product potential user.
Since deeply learning model can establish the optimal mapping relations of status information and action message, thus service
Device can determine best movement corresponding to the current status information of product by deeply learning model, that is, Ke Yitong
Depth intensified learning model is crossed to determine best movement of the user to be analyzed for the product.In turn, when determine the movement be order
When purchase acts, then it can determine that the user to be analyzed is potential user.In this way, can be determined by deeply learning model
Potential customers, to improve the efficiency of determining potential customers and can reduce human cost.Moreover, this kind of potential user determination side
Formula can determine potential user in the case where guaranteeing platform, platform user and the income of product user tripartite.
Corresponding to above method embodiment, the embodiment of the present application also provides a kind of computer program products, when it is taking
When being run on business device, so that server executes: the method and step of any of the above-described information-pushing method.
After computer program product provided by the embodiments of the present application is executed by the processor of server, server can be obtained
The status information of product in platform.The status information may include: the product to first financial value of platform bring, the production
Product give the product user bring third using the product to the second financial value of platform user bring of the platform and the product
Financial value.Then, it is strong that multiple action messages of the status information and user to be analyzed are input to the depth trained obtain in advance
Change learning model, obtains the estimated value of the corresponding long-term feedback of each action message.Wherein, it is at least wrapped in each action message
Include: the characteristic information of the user to be analyzed and an action identification, the action identification are the mark for the order movement of the product
Know or abandon the mark of order movement.It later, can be according to corresponding to the maximum estimated value that deeply learning model exports
Movement, determine the user to be analyzed whether be the product potential user.
Since deeply learning model can establish the optimal mapping relations of status information and action message, thus service
Device can determine best movement corresponding to the current status information of product by deeply learning model, that is, Ke Yitong
Depth intensified learning model is crossed to determine best movement of the user to be analyzed for the product.In turn, when determine the movement be order
When purchase acts, then it can determine that the user to be analyzed is potential user.In this way, can be determined by deeply learning model
Potential customers, to improve the efficiency of determining potential customers and can reduce human cost.Moreover, this kind of potential user determination side
Formula can determine potential user in the case where guaranteeing platform, platform user and the income of product user tripartite.
Those skilled in the art will readily occur to its of the application after considering specification and practicing the application applied here
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or
The common knowledge in the art that person's adaptive change follows the general principle of the application and do not apply including the application
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by the application
Claim point out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device,
For server, computer readable storage medium and computer program product embodiments, implement since it is substantially similar to method
Example, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely the preferred embodiments of the application, are not intended to limit the protection scope of the application.It is all
Any modification, equivalent replacement, improvement and so within spirit herein and principle are all contained in the protection scope of the application
It is interior.
Claims (10)
1. a kind of potential user determines method, which is characterized in that the described method includes:
Obtain the status information of the product in platform;The status information includes: the product to the platform bring first
Financial value, the product are given to the second financial value of platform user bring of the platform and the product using the product
Product user bring third financial value;
Multiple action messages of the status information and user to be analyzed are input to the deeply study that training obtains in advance
Model obtains the estimated value of the corresponding long-term feedback of each action message;It is included at least in each action message: described to be analyzed
The characteristic information of user and an action identification, the action identification are the mark for the order movement of the product or abandon ordering
The mark of purchase movement;
Movement corresponding to the maximum estimated value exported according to the deeply learning model, determines that the user to be analyzed is
The no potential user for the product.
2. the method according to claim 1, wherein the deeply learning model includes depth Q network mould
Type.
3. according to the method described in claim 2, it is characterized in that, described by the more of the status information and user to be analyzed
A action message was input to before the step of deeply learning model that training obtains in advance, the method also includes:
Construct Markovian decision process model;Wherein, the Markovian decision process model are as follows: { S, A, R, T };The S
Indicate the status information of the product, the A indicate the platform user for the performed action message acted of the product,
The R indicates that reward function, the T indicate state transition function;
Based on the Markovian decision process model, multiple training samples are obtained;It wherein, include: institute in each training sample
A product status information in history, the target user in the platform user are stated under the status information for the production
The performed action message acted of product, the target user execute instant prize obtained after the target action in the action message
It encourages value, execute the corresponding NextState information of the status information after the target action;The target action are as follows: order movement or
Abandon order movement;
It is optimized using parameter of the training sample to initial Q function, the depth Q network model after being trained;It is described
Deep neural network corresponding to initial Q function is made of two layers of convolutional layer and two layers of full articulamentum;The parameter includes: study
Rate, discount factor and Q value.
4. according to the method described in claim 3, it is characterized in that, it is described using the training sample to the ginseng of initial Q function
Number optimizes, and the depth Q network model after being trained includes:
The parameter of initial Q function is optimized using the training sample and greedy algorithm ε-greedy algorithm, is trained
Depth Q network model afterwards.
5. according to the method described in claim 3, it is characterized in that, instant reward value=order of reward function output is dynamic
Make corresponding value * (the increased financial value+the of platform user described in increased the+the second positive number of the financial value * of the first positive number * platform
The increased financial value of user to be analyzed described in three positive number *)+abandon corresponding the first negative of value * of order movement;Wherein, described
Order acts corresponding value=1- and abandons the corresponding value of order movement.
6. the method according to claim 1, wherein the mark packet of the order movement for the product
It includes: the first identifier for executing order movement being recommended based on user, recommends the second identifier for executing order movement based on personal letter, be based on
Discount coupon activity executes the third mark of order movement, and, order movement is executed by the entrance of ordering in the platform
It is one or more in 4th mark.
7. method according to claim 1 to 6, which is characterized in that the characteristic information packet of the user to be analyzed
It includes:
One or more in the account information of the user to be analyzed, bean vermicelli quantity, live streaming works quantity and preference works type
?.
8. a kind of potential user's determining device, which is characterized in that described device includes:
First obtains module, is configured as obtaining the status information of the product in platform;The status information includes: the product
To first financial value of platform bring, the product to the second financial value of platform user bring of the platform and described
Product gives the product user bring third financial value using the product;
Input module is configured as multiple action messages of the status information and user to be analyzed being input to trained in advance
The deeply learning model arrived, obtains the estimated value of the corresponding long-term feedback of each action message;In each action message extremely
It less include: the characteristic information and an action identification of the user to be analyzed, which is the order for the product
The mark of movement or the mark for abandoning order movement;
Determining module is configured as movement corresponding to the maximum estimated value for learning output according to the deeply, determines institute
State user to be analyzed whether be the product potential user.
9. a kind of server characterized by comprising
Processor, the memory for storage processor executable instruction;
Wherein, the processor is configured to executing the method and step that any one of above-mentioned 1-7 potential user determines method.
10. a kind of readable storage medium storing program for executing, when the instruction in the storage medium is executed by the processor of server, so that service
Device is able to carry out the method and step that any one of above-mentioned 1-7 potential user determines method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811526942.8A CN109711871B (en) | 2018-12-13 | 2018-12-13 | Potential customer determination method, device, server and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811526942.8A CN109711871B (en) | 2018-12-13 | 2018-12-13 | Potential customer determination method, device, server and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109711871A true CN109711871A (en) | 2019-05-03 |
CN109711871B CN109711871B (en) | 2021-03-12 |
Family
ID=66255738
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811526942.8A Active CN109711871B (en) | 2018-12-13 | 2018-12-13 | Potential customer determination method, device, server and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109711871B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027676A (en) * | 2019-11-28 | 2020-04-17 | 支付宝(杭州)信息技术有限公司 | Target user selection method and device |
CN111382359A (en) * | 2020-03-09 | 2020-07-07 | 北京京东振世信息技术有限公司 | Service strategy recommendation method and device based on reinforcement learning and electronic equipment |
CN112200610A (en) * | 2020-10-10 | 2021-01-08 | 苏州创旅天下信息技术有限公司 | Marketing information delivery method, system and storage medium |
CN113129108A (en) * | 2021-04-26 | 2021-07-16 | 山东大学 | Product recommendation method and device based on Double DQN algorithm |
CN113256390A (en) * | 2021-06-16 | 2021-08-13 | 平安科技(深圳)有限公司 | Product recommendation method and device, computer equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005918A (en) * | 2015-07-24 | 2015-10-28 | 金鹃传媒科技股份有限公司 | Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof |
JP2017173873A (en) * | 2016-03-18 | 2017-09-28 | ヤフー株式会社 | Information providing device and information providing method |
CN107451832A (en) * | 2016-05-30 | 2017-12-08 | 北京京东尚科信息技术有限公司 | The method and apparatus of pushed information |
CN108230058A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Products Show method and system |
CN108230057A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of intelligent recommendation method and system |
CN108305167A (en) * | 2018-01-12 | 2018-07-20 | 华南理工大学 | A kind of foreign currency trade method and system enhancing learning algorithm based on depth |
CN108492146A (en) * | 2018-03-30 | 2018-09-04 | 口口相传(北京)网络技术有限公司 | Preferential value calculating method, server-side and client based on user-association behavior |
CN108960929A (en) * | 2018-07-16 | 2018-12-07 | 苏州大学 | Consider the social networks marketing seed user choosing method that existing product influences |
-
2018
- 2018-12-13 CN CN201811526942.8A patent/CN109711871B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005918A (en) * | 2015-07-24 | 2015-10-28 | 金鹃传媒科技股份有限公司 | Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof |
JP2017173873A (en) * | 2016-03-18 | 2017-09-28 | ヤフー株式会社 | Information providing device and information providing method |
CN107451832A (en) * | 2016-05-30 | 2017-12-08 | 北京京东尚科信息技术有限公司 | The method and apparatus of pushed information |
CN108230058A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Products Show method and system |
CN108230057A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of intelligent recommendation method and system |
CN108305167A (en) * | 2018-01-12 | 2018-07-20 | 华南理工大学 | A kind of foreign currency trade method and system enhancing learning algorithm based on depth |
CN108492146A (en) * | 2018-03-30 | 2018-09-04 | 口口相传(北京)网络技术有限公司 | Preferential value calculating method, server-side and client based on user-association behavior |
CN108960929A (en) * | 2018-07-16 | 2018-12-07 | 苏州大学 | Consider the social networks marketing seed user choosing method that existing product influences |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027676A (en) * | 2019-11-28 | 2020-04-17 | 支付宝(杭州)信息技术有限公司 | Target user selection method and device |
CN111027676B (en) * | 2019-11-28 | 2022-03-18 | 支付宝(杭州)信息技术有限公司 | Target user selection method and device |
CN111382359A (en) * | 2020-03-09 | 2020-07-07 | 北京京东振世信息技术有限公司 | Service strategy recommendation method and device based on reinforcement learning and electronic equipment |
CN111382359B (en) * | 2020-03-09 | 2024-01-12 | 北京京东振世信息技术有限公司 | Service policy recommendation method and device based on reinforcement learning, and electronic equipment |
CN112200610A (en) * | 2020-10-10 | 2021-01-08 | 苏州创旅天下信息技术有限公司 | Marketing information delivery method, system and storage medium |
CN113129108A (en) * | 2021-04-26 | 2021-07-16 | 山东大学 | Product recommendation method and device based on Double DQN algorithm |
CN113256390A (en) * | 2021-06-16 | 2021-08-13 | 平安科技(深圳)有限公司 | Product recommendation method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109711871B (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109711871A (en) | A kind of potential customers determine method, apparatus, server and readable storage medium storing program for executing | |
Cai et al. | Real-time bidding by reinforcement learning in display advertising | |
Zhang et al. | Incentivize crowd labeling under budget constraint | |
US11188928B2 (en) | Marketing method and apparatus based on deep reinforcement learning | |
Cui et al. | A novel multi-objective evolutionary algorithm for recommendation systems | |
Caron et al. | Leveraging side observations in stochastic bandits | |
US10853730B2 (en) | Systems and methods for generating a brand Bayesian hierarchical model with a category Bayesian hierarchical model | |
US11436434B2 (en) | Machine learning techniques to identify predictive features and predictive values for each feature | |
Lian et al. | A modified colonial competitive algorithm for the mixed-model U-line balancing and sequencing problem | |
Gan et al. | Incentivize multi-class crowd labeling under budget constraint | |
CN108460082A (en) | A kind of recommendation method and device, electronic equipment | |
CN106372101B (en) | A kind of video recommendation method and device | |
US20170345054A1 (en) | Generating and utilizing a conversational index for marketing campaigns | |
KR101790788B1 (en) | Collaborative networking with optimized inter-domain information quality assessment | |
CN103744917A (en) | Mixed recommendation method and system | |
US11295154B2 (en) | Physical item optimization using velocity factors | |
CN108364198A (en) | A kind of online motivational techniques of mobile crowdsourcing based on social networks | |
Wang et al. | Quality of service measure approach of web service for service selection | |
US20140257972A1 (en) | Method, computer readable medium and system for determining true scores for a plurality of touchpoint encounters | |
Gao et al. | Combination of auction theory and multi-armed bandits: Model, algorithm, and application | |
US20170330219A1 (en) | Competitor-specific bid recommendations | |
Zhao et al. | Truthful incentive mechanism for federated learning with crowdsourced data labeling | |
CN113205391A (en) | Historical order matching degree based order dispatching method, electronic equipment and computer readable medium | |
CN111724176A (en) | Shop traffic adjusting method, device, equipment and computer readable storage medium | |
CN115185606A (en) | Method, device, equipment and storage medium for obtaining service configuration parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |