CN109711871A

CN109711871A - A kind of potential customers determine method, apparatus, server and readable storage medium storing program for executing

Info

Publication number: CN109711871A
Application number: CN201811526942.8A
Authority: CN
Inventors: 盛名扬; 陆子龙
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2019-05-03
Anticipated expiration: 2038-12-13
Also published as: CN109711871B

Abstract

The application is to determine method, apparatus, server and readable storage medium storing program for executing about a kind of potential customers.This method comprises: obtaining the status information of the product in platform；Status information, which includes: product to the first financial value of platform bring, product, gives product user bring third financial value using product to the second financial value of platform user bring and product of platform；Multiple action messages of status information and user to be analyzed are input to the deeply learning model that training obtains in advance, obtain the estimated value of the corresponding long-term feedback of each action message；According to deeply learning model export maximum estimated value corresponding to movement, determine user to be analyzed whether be product potential user.In this way, potential customers can be determined by deeply learning model, to improve the efficiency of determining potential customers and can reduce human cost.

Description

A kind of potential customers determine method, apparatus, server and readable storage medium storing program for executing

Technical field

This application involves Internet technical field, more particularly to a kind of potential customers determine method, apparatus, server and Readable storage medium storing program for executing.

Background technique

In order to allow the product (such as bean vermicelli top news) in live streaming platform preferably to promote, it is often necessary to determine the potential of product Client, so as to targetedly attract potential user to use the product.Wherein, potential user typically refers to have purchase meaning To the customer for but not yet becoming product user.

Currently, determining the mode of the potential customers of product are as follows: which user staff artificially determines according to previous experiences For potential customers.But this kind determines that the mode of potential customers needs staff to have experience more abundant；Moreover, technology Personnel need just determine potential customers after analyzing a large amount of user.That is, this kind determines potential customers' Mode efficiency is lower and needs to consume a large amount of human cost.

Summary of the invention

To overcome the problems in correlation technique, the application provides a kind of potential customers and determines method, apparatus, server And readable storage medium storing program for executing, can be improved the efficiency of determining potential customers and can reduce human cost.

According to the embodiment of the present application in a first aspect, providing a kind of potential user determines method, this method comprises:

Obtain the status information of the product in platform；Status information includes: product to the first financial value of platform bring, production Product give the product user bring third financial value using product to the second financial value of platform user bring and product of platform；

Multiple action messages of status information and user to be analyzed are input to the deeply study that training obtains in advance Model obtains the estimated value of the corresponding long-term feedback of each action message；It is included at least in each action message: user to be analyzed Characteristic information and an action identification, the action identification be for product order movement mark or abandon order movement Mark；Movement corresponding to the maximum estimated value exported according to deeply learning model, determines whether user to be analyzed is production The potential user of product.

Optionally, in the embodiment of the present application, deeply learning model includes depth Q network model.

Optionally, in the embodiment of the present application, multiple action messages of the status information and user to be analyzed are defeated Enter to before the step of training obtained deeply learning model in advance, this method can also include:

Construct Markovian decision process model；Wherein, Markovian decision process model are as follows: { S, A, R, T }；S is indicated The status information of product, A indicate that platform user indicates that reward function, T indicate for the performed action message acted of product, R State transition function；

Based on Markovian decision process model, multiple training samples are obtained；It wherein, include: to produce in each training sample Target user in a product status information in history, platform user acts under the status information for performed by product Action message, that target user executes instant reward value obtained after the target action in the action message, performance objective is dynamic The corresponding NextState information of the status information after work；Target action are as follows: order acts or abandon order movement；

It is optimized using parameter of the training sample to initial Q function, the depth Q network model after being trained；Initial Q Deep neural network corresponding to function is made of two layers of convolutional layer and two layers of full articulamentum；Parameter includes: learning rate, discount The factor and Q value.

Optionally, in the embodiment of the present application, it is optimized, is instructed using parameter of the training sample to initial Q function Depth Q network model after white silk includes:

The parameter of initial Q function is optimized using training sample and greedy algorithm ε-greedy algorithm, is trained Depth Q network model afterwards.

Optionally, in the embodiment of the present application, instant reward value=order of reward function output acts corresponding value * (the increased increased financial value of the+the second positive number of the financial value * platform user+third positive number * user to be analyzed of the first positive number * platform Increased financial value)+abandon corresponding the first negative of value * of order movement；Wherein, order acts corresponding value=1- and abandons Order acts corresponding value.

Optionally, in the embodiment of the present application, the mark for the order movement of product includes: to recommend to execute based on user The first identifier of order movement recommends the second identifier for executing order movement based on personal letter, executes order based on discount coupon activity The third of movement identifies, and, by one or more in the 4th mark for ordering entrance execution order movement in platform.

Optionally, in the embodiment of the present application, the characteristic information of user to be analyzed includes:

One or more in the account information of user to be analyzed, bean vermicelli quantity, live streaming works quantity and preference works type ?.

According to the second aspect of the embodiment of the present application, a kind of potential user's determining device is provided, which includes:

First obtains module, is configured as obtaining the status information of the product in platform；Status information includes: product to flat The first financial value of platform bring, product give the product using product to the second financial value of platform user bring and product of platform User's bring third financial value；

Input module is configured as multiple action messages of status information and user to be analyzed being input to trained in advance The deeply learning model arrived, obtains the estimated value of the corresponding long-term feedback of each action message；In each action message extremely It less include: the characteristic information and an action identification of user to be analyzed, which is the mark for the order movement of product Know or abandon the mark of order movement；

Determining module is configured as movement corresponding to the maximum estimated value exported according to deeply learning model, really Fixed user to be analyzed whether be product potential user.

Optionally, in the embodiment of the present application, the device further include:

Module is constructed, is configured as multiple action messages of status information and user to be analyzed being input to preparatory training Before obtained deeply learning model, Markovian decision process model is constructed；Wherein, Markovian decision process model Are as follows: { S, A, R, T }；S indicates that the status information of product, A indicate platform user for the performed action message acted of product, R Indicate that reward function, T indicate state transition function；

Second obtains module, is configured as obtaining multiple training samples based on Markovian decision process model；Wherein, It include: that a product status information in history, the target user in platform user believe in the state in each training sample Breath is lower executed for the performed action message acted of product, target user it is obtained after the target action in the action message The corresponding NextState information of the status information after instant reward value, performance objective movement；Target action are as follows: order is acted or put Abandon order movement；

Optimization module is configured as optimizing using parameter of the training sample to initial Q function, the depth after being trained Spend Q network model；Deep neural network corresponding to initial Q function is made of two layers of convolutional layer and two layers of full articulamentum；Parameter It include: learning rate, discount factor and Q value.

Optionally, in the embodiment of the present application, optimization module is configured specifically are as follows:

According to the third aspect of the embodiment of the present application, a kind of server is provided, comprising:

Processor, the memory for storage processor executable instruction；

Wherein, processor is configured as executing the method step that any one of above-mentioned first aspect potential user determines method Suddenly.

According to the fourth aspect of the embodiment of the present application, provide a kind of readable storage medium storing program for executing, when the instruction in storage medium by When the processor of server executes, enables the server to execute any one of above-mentioned first aspect potential user and determine method Method and step.

According to the 5th of the embodiment of the present application the aspect, a kind of computer program product is provided, when it runs on the server When, so that server executes: any one of above-mentioned first aspect potential user determines the method and step of method.

In the embodiment of the present application, server can obtain the status information of the product in platform.The status information can be with The second financial value of platform user bring of to include: the product the give platform to first financial value of platform bring, the product The product user bring third financial value using the product is given with the product.Then, by the status information and user to be analyzed Multiple action messages be input to the obtained deeply learning model of training in advance, it is corresponding long-term to obtain each action message The estimated value of feedback.Wherein, it is included at least in each action message: the characteristic information of the user to be analyzed and a movement mark Know, which is the mark for the order movement of the product or the mark for abandoning order movement.It later, can be according to depth Movement corresponding to the maximum estimated value of intensified learning model output is spent, determines whether the user to be analyzed is the potential of the product User.

Since deeply learning model can establish the optimal mapping relations of status information and action message, thus service Device can determine best movement corresponding to the current status information of product by deeply learning model, that is, Ke Yitong Depth intensified learning model is crossed to determine best movement of the user to be analyzed for the product.In turn, when determine the movement be order When purchase acts, then it can determine that the user to be analyzed is potential user.In this way, can be determined by deeply learning model Potential customers, to improve the efficiency of determining potential customers and can reduce human cost.Moreover, this kind of potential user determination side Formula can determine potential user in the case where guaranteeing platform, platform user and the income of product user tripartite.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The application can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application Example, and together with specification it is used to explain the principle of the application.

Fig. 1 is the flow chart that a kind of potential user shown according to an exemplary embodiment determines method.

Fig. 2 is a kind of flow chart for calculating Q value shown according to an exemplary embodiment.

Fig. 3 is a kind of structural schematic diagram of deep neural network shown according to an exemplary embodiment.

Fig. 4 is a kind of block diagram of potential user's determining device shown according to an exemplary embodiment.

Fig. 5 is a kind of block diagram of server shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.

Mode efficiency in order to solve to determine potential customers in the prior art is lower and needs to consume a large amount of human cost The problem of, the embodiment of the present application provides a kind of potential customers and determines method, apparatus, server and computer-readable storage medium Matter.

Method, which is illustrated, to be determined to potential customers provided by the embodiments of the present application first below.

Fig. 1 is the flow chart that a kind of potential customers shown according to an exemplary embodiment determine method.The potential customers Determine that method is applied to server, as shown in Figure 1, method includes the following steps:

S101: the status information of the product in platform is obtained；Status information includes: that product gives platform bring the first income Value, product are given to the second financial value of platform user bring and product of platform and are received using the product user bring third of product Benefit value；

Wherein, the platform in the embodiment of the present application can be live streaming platform, which can be the bean vermicelli in live streaming platform Top product, is not limited thereto certainly.

It is understood that in one implementation, the status information of server product obtained may include: to produce Product are given to the second financial value of platform user bring and product of platform to the first financial value of platform bring, product and use product Product user bring third financial value.

Wherein, product can be calculated to the first financial value of platform bring by product user quantity and product price It arrives.For example, the first financial value=product user quantity * product price.

Product can obtain in the following way to the second financial value of platform user bring: server statistics platform user Use the value added of the duration of the platform.Then, which is quantified as the product gives platform user bring the second income Value.For example, value added * first income coefficient of the second financial value=platform user using the duration of the platform.Wherein, this first Income coefficient can indicate: potential income value brought by every value added for increasing unit time.

Product is given can be calculated in the following way using the product user bring third financial value of product: server is united Count the bean vermicelli incrementss of product user and the newly-increased works number of product user.Then, by the bean vermicelli incrementss and newly-increased works Number is quantified as the product and gives product user bring third financial value.For example, third financial value=product user bean vermicelli increases The newly-increased works number * third income coefficient of the second income of dosage * coefficient+product user.Wherein, which can be with Indicate: potential income value brought by one bean vermicelli of every increase, third income coefficient can indicate: one works institute band of every increase The potential income value come.

In another implementation, status information can also include: platform user on the basis of comprising above content The order distribution of total quantity, the product (such as buy in the product number of users of 10 yuan of orders and buy 20 in the product The number of users of first order) and total injected volume (such as the product is to total exposure amount of live streaming works) of the product in it is a kind of or more Kind, this is all reasonable.

S102: multiple action messages of status information and user to be analyzed are input to the deeply that training obtains in advance Learning model obtains the estimated value of the corresponding long-term feedback of each action message；It is included at least in each action message: to be analyzed The characteristic information of user and an action identification, the action identification are the mark for the order movement of product or abandon ordering dynamic The mark of work；

Wherein, deeply learning model includes depth Q network model (i.e. DQN model), is not limited thereto certainly.

The characteristic information of user to be analyzed may include: user to be analyzed in the account information of the platform, bean vermicelli quantity, straight It broadcasts one or more in works quantity and preference works type.It is, of course, also possible to include the embedding that training obtains in advance Feature, this is all reasonable.

Also, in one implementation, action identification can be the mark for the order movement of product or abandon ordering The mark of purchase movement.In this kind of implementation, for the corresponding action message of mark of the order movement of product, for production The mark for abandoning order movement of product corresponds to another action message.In this way, when the status information of product is acted with the two After information input to trained deeply learning model, estimating for the corresponding long-term feedback of each action message can be exported Evaluation.

In another implementation, the mark that should be acted for the order of product can also include: to be recommended based on user The first identifier of order movement is executed, the second identifier for executing order movement is recommended based on personal letter, is executed based on discount coupon activity Order movement third mark, and, by platform order entrance execute order movement the 4th mark in one or It is multinomial.Wherein, the mark that the order when this for product acts includes: first identifier, second identifier, third mark and the 4th mark When knowledge, first identifier, second identifier, third mark and the 4th mark respectively correspond an action message, for abandoning for product The mark of order movement corresponds to an action message, totally five action messages.So, by the status information of product and this five Action message is input to after trained deeply learning model, and deeply learning model can export each movement letter Cease the estimated value of corresponding long-term feedback.

Wherein, deeply learning model includes depth Q network model (i.e. DQN model), is not limited thereto certainly.When When deeply learning model is DQN model, as shown in Fig. 2, being inputted by the status information of product and this five action messages After to trained DQN model, DQN model can export the corresponding Q value of each action message, i.e. Q1, Q2, Q3, Q4 and Q5.

Further it will be understood that server can construct Markovian decision process mould before executing step S102 Type.It is then possible to be based on Markovian decision process model, multiple training samples are obtained.

When constructed Markovian decision process model are as follows: include: product in each training sample when { S, A, R, T } Target user in a status information in history, platform user is under the status information for movement performed by product Action message, target user execute instant reward value obtained after the target action in the action message, performance objective movement The corresponding NextState information of the status information afterwards.Wherein, target action are as follows: order acts or abandon order movement.

Wherein, an above-mentioned historical status information can be according to the setting of the status information of the product in step S101 Mode is set, and this will not be repeated here.

In addition, R=R (s, a, s'), R indicate that execution acts a in the state that status information s is corresponded to, and it is transferred to state Instant reward value obtained when the corresponding state of information s'.T=T (s, a, s'), T indicate that execution acts a on state s, and It is transferred to the probability of state s'.In addition, learning the relevant technologies according to deeply it is found that the corresponding state of status information s shifts It is determined by the movement taken under the status information.

In one example of the application, can be set: the instant reward value of reward function output=order movement is corresponding (the increased increased financial value of the+the second positive number of the financial value * platform user+third positive number * of the first positive number * platform is to be analyzed by value * The increased financial value of user)+abandon corresponding the first negative of value * of order movement.Wherein, order acts corresponding value=1- It abandons order and acts corresponding value.Certainly, the design of reward function is not limited to.Wherein, the first positive number, the second positive number, The value of third positive number and the first negative can be set according to the actual situation, be not specifically limited herein.

In addition, when the deeply learning model to be trained is DQN model, after obtaining training sample, service Device can also optimize the parameter of initial Q function using training sample, the DQN model after being trained.Wherein, initial Q Deep neural network corresponding to function can two layers of convolutional layer as shown in Figure 3 and two layers of full articulamentum composition.Parameter includes: Learning rate, discount factor and Q value.Also, the obtained DQN model of training stores the knowledge learnt, which can be with The mapping relations as status information and most preferably acted.

Specifically, can define Q (S, A) is Q value under original state, Q (S ', after a) being converted into S ' after movement a effect for S Q value, W indicate deep neural network propagated forward, then:

Q (S ', a)=W (S, A, the characteristic information of user to be analyzed)

The characteristic information that network W receives original state S, former movement A and user to be analyzed is input (generation refers to a), then model is excellent Change function are as follows:

Q (S, A) ← Q (S, A)+α [R+ γ max_αQ (S ', a)-Q (S, A)]；

S←S'；

Loop iteration above step is until S restrains.

In this strategy, demand solution make Q (S ', a) maximum movement a value, is used herein as greedy algorithm ε-greedy algorithm:

A=argmax_aQ (a), probability 1- ε；

A=randomly chooses a movement, probability ε；

Wherein, which can realize the balance of explore-exploit by adjusting probability threshold value ε.

Wherein, initial Q function is the function of DQN in the related technology, and learning rate, discount factor and Q value are the related skill of DQN Parameter in art, does not elaborate herein.

In addition, can also be joined using new training sample to the DQN model after the DQN model after being trained Number fine tuning, to realize DQN model modification.Wherein it is possible to adjust the update cycle (such as 1 of DQN model according to specific requirements Week) so that DQN model possesses better scalability and robustness, so that DQN model can determine user more accurately It whether is potential customers, this is reasonable.

S103: movement corresponding to the maximum estimated value exported according to deeply learning model determines user to be analyzed Whether be product potential user.

Wherein, when deeply learning model output maximum estimated value corresponding to movement be order act when, then may be used With determination, user to be analyzed is potential user, and can send Products Show information (such as advertisement) to the user to be analyzed, So that potential user is converted into actual user, to keep the estimated value of long-term feedback obtained maximum.Moreover, working as deeply Movement corresponding to the maximum estimated value of learning model output are as follows: the third for executing order movement based on discount coupon activity identifies institute When corresponding order acts, discount coupon action message can be sent to the user to be analyzed, so as to accurately touch advertisement Up to user, to improve the conversion ratio that potential user is converted into actual user.In addition, when the maximum of deeply learning model output Movement corresponding to estimated value is that when abandoning order movement, then can determine that user to be analyzed is not potential user.

Further, since the estimated value of the corresponding long-term feedback of action message are as follows: execute movement corresponding to the action message The estimated value of long-term feedback obtained afterwards, thus, then show more to meet to be reached when the estimated value of the long-term feedback is bigger To expectation: keep platform, platform user and product user tripartite's income summation maximum.

Optimize moreover, because deeply learning model is done not only for short-term click income (instant reward value), Long-term proceeds indicatior (i.e. the estimated value of long-term feedback) can also be captured.Thus, using provided by the embodiments of the present application potential User determines method, and order behavior can be enable to bring the promotion of long-term gain index.

To sum up, method is determined using potential user provided by the embodiments of the present application, can pass through deeply learning model Potential customers are determined, to improve the efficiency of determining potential customers and can reduce human cost.Furthermore, it is possible to guaranteeing to put down Potential user is determined in the case where the income of platform, platform user and product user tripartite.

Corresponding to above method embodiment, the embodiment of the present application also provides a kind of potential user's determining devices, referring to figure 4, which includes:

First obtains module 401, is configured as obtaining the status information of the product in platform；Status information includes: product It is given to the second financial value of platform user bring and product of platform using product to the first financial value of platform bring, product Product user bring third financial value；

Input module 402 is configured as multiple action messages of status information and user to be analyzed being input to preparatory instruction The deeply learning model got, obtains the estimated value of the corresponding long-term feedback of each action message；Each action message In include at least: the characteristic information of user to be analyzed and an action identification, the action identification be for product order act Mark or abandon order movement mark；

Determining module 403 is configured as movement corresponding to the maximum estimated value exported according to deeply learning model, Determine user to be analyzed whether be product potential user.

Using device provided by the embodiments of the present application, server can obtain the status information of the product in platform.The shape State information, which may include: the product, gives the platform user bring of the platform to first financial value of platform bring, the product Second financial value and the product give the product user bring third financial value using the product.Then, by the status information and Multiple action messages of user to be analyzed are input to the deeply learning model that training obtains in advance, obtain each action message The estimated value of corresponding long-term feedback.Wherein, it is included at least in each action message: the characteristic information of the user to be analyzed and one A action identification, the action identification are the mark for the order movement of the product or the mark for abandoning order movement.Later, may be used With movement corresponding to the maximum estimated value that exports according to deeply learning model, determine whether the user to be analyzed is the production The potential user of product.

Fig. 5 is a kind of frame for realizing the device 1900 for determining potential user shown according to an exemplary embodiment Figure.For example, device 1900 may be provided as a server.Referring to Fig. 5, device 1900 includes processing component 1922, into one Step includes one or more processors and memory resource represented by a memory 1932, and being used to store can be by processing group The instruction of the execution of part 1922, such as application program.The application program stored in memory 1932 may include one or one Each above corresponds to the module of one group of instruction.In addition, processing component 1922 is configured as executing instruction, it is above-mentioned to execute Any one potential user determines the method and step of method.

Device 1900 can also include that a power supply module 1926 be configured as the power management of executive device 1900, and one Wired or wireless network interface 1950 is configured as device 1900 being connected to network and input and output (I/O) interface 1958.Device 1900 can be operated based on the operating system for being stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

Corresponding to above method embodiment, the embodiment of the present application also provides a kind of readable storage medium storing program for executing, work as storage medium In instruction by server processor execute when, enable the server to execute any of the above-described potential user determine method Method and step.Wherein, which is computer readable storage medium.

The computer program stored in readable storage medium storing program for executing provided by the embodiments of the present application is executed by the processor of server Afterwards, server can obtain the status information of the product in platform.The status information may include: that the product is brought to the platform The first financial value, the product to the platform the second financial value of platform user bring and the product give using the product production Product user's bring third financial value.Then, multiple action messages of the status information and user to be analyzed are input in advance The deeply learning model that training obtains, obtains the estimated value of the corresponding long-term feedback of each action message.Wherein, Mei Gedong Make to include at least in information: the characteristic information of the user to be analyzed and an action identification, the action identification are for the product Order movement mark or abandon order movement mark.Later, the maximum that can be exported according to deeply learning model Movement corresponding to estimated value, determine the user to be analyzed whether be the product potential user.

Corresponding to above method embodiment, the embodiment of the present application also provides a kind of computer program products, when it is taking When being run on business device, so that server executes: the method and step of any of the above-described information-pushing method.

After computer program product provided by the embodiments of the present application is executed by the processor of server, server can be obtained The status information of product in platform.The status information may include: the product to first financial value of platform bring, the production Product give the product user bring third using the product to the second financial value of platform user bring of the platform and the product Financial value.Then, it is strong that multiple action messages of the status information and user to be analyzed are input to the depth trained obtain in advance Change learning model, obtains the estimated value of the corresponding long-term feedback of each action message.Wherein, it is at least wrapped in each action message Include: the characteristic information of the user to be analyzed and an action identification, the action identification are the mark for the order movement of the product Know or abandon the mark of order movement.It later, can be according to corresponding to the maximum estimated value that deeply learning model exports Movement, determine the user to be analyzed whether be the product potential user.

Those skilled in the art will readily occur to its of the application after considering specification and practicing the application applied here Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or The common knowledge in the art that person's adaptive change follows the general principle of the application and do not apply including the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by the application Claim point out.

It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For server, computer readable storage medium and computer program product embodiments, implement since it is substantially similar to method Example, so being described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.

The foregoing is merely the preferred embodiments of the application, are not intended to limit the protection scope of the application.It is all Any modification, equivalent replacement, improvement and so within spirit herein and principle are all contained in the protection scope of the application It is interior.

Claims

1. a kind of potential user determines method, which is characterized in that the described method includes:

Obtain the status information of the product in platform；The status information includes: the product to the platform bring first Financial value, the product are given to the second financial value of platform user bring of the platform and the product using the product Product user bring third financial value；

Multiple action messages of the status information and user to be analyzed are input to the deeply study that training obtains in advance Model obtains the estimated value of the corresponding long-term feedback of each action message；It is included at least in each action message: described to be analyzed The characteristic information of user and an action identification, the action identification are the mark for the order movement of the product or abandon ordering The mark of purchase movement；

Movement corresponding to the maximum estimated value exported according to the deeply learning model, determines that the user to be analyzed is The no potential user for the product.

2. the method according to claim 1, wherein the deeply learning model includes depth Q network mould Type.

3. according to the method described in claim 2, it is characterized in that, described by the more of the status information and user to be analyzed A action message was input to before the step of deeply learning model that training obtains in advance, the method also includes:

Construct Markovian decision process model；Wherein, the Markovian decision process model are as follows: { S, A, R, T }；The S Indicate the status information of the product, the A indicate the platform user for the performed action message acted of the product, The R indicates that reward function, the T indicate state transition function；

Based on the Markovian decision process model, multiple training samples are obtained；It wherein, include: institute in each training sample A product status information in history, the target user in the platform user are stated under the status information for the production The performed action message acted of product, the target user execute instant prize obtained after the target action in the action message It encourages value, execute the corresponding NextState information of the status information after the target action；The target action are as follows: order movement or Abandon order movement；

It is optimized using parameter of the training sample to initial Q function, the depth Q network model after being trained；It is described Deep neural network corresponding to initial Q function is made of two layers of convolutional layer and two layers of full articulamentum；The parameter includes: study Rate, discount factor and Q value.

4. according to the method described in claim 3, it is characterized in that, it is described using the training sample to the ginseng of initial Q function Number optimizes, and the depth Q network model after being trained includes:

The parameter of initial Q function is optimized using the training sample and greedy algorithm ε-greedy algorithm, is trained Depth Q network model afterwards.

5. according to the method described in claim 3, it is characterized in that, instant reward value=order of reward function output is dynamic Make corresponding value * (the increased financial value+the of platform user described in increased the+the second positive number of the financial value * of the first positive number * platform The increased financial value of user to be analyzed described in three positive number *)+abandon corresponding the first negative of value * of order movement；Wherein, described Order acts corresponding value=1- and abandons the corresponding value of order movement.

6. the method according to claim 1, wherein the mark packet of the order movement for the product It includes: the first identifier for executing order movement being recommended based on user, recommends the second identifier for executing order movement based on personal letter, be based on Discount coupon activity executes the third mark of order movement, and, order movement is executed by the entrance of ordering in the platform It is one or more in 4th mark.

7. method according to claim 1 to 6, which is characterized in that the characteristic information packet of the user to be analyzed It includes:

One or more in the account information of the user to be analyzed, bean vermicelli quantity, live streaming works quantity and preference works type ?.

8. a kind of potential user's determining device, which is characterized in that described device includes:

First obtains module, is configured as obtaining the status information of the product in platform；The status information includes: the product To first financial value of platform bring, the product to the second financial value of platform user bring of the platform and described Product gives the product user bring third financial value using the product；

Input module is configured as multiple action messages of the status information and user to be analyzed being input to trained in advance The deeply learning model arrived, obtains the estimated value of the corresponding long-term feedback of each action message；In each action message extremely It less include: the characteristic information and an action identification of the user to be analyzed, which is the order for the product The mark of movement or the mark for abandoning order movement；

Determining module is configured as movement corresponding to the maximum estimated value for learning output according to the deeply, determines institute State user to be analyzed whether be the product potential user.

9. a kind of server characterized by comprising

Processor, the memory for storage processor executable instruction；

Wherein, the processor is configured to executing the method and step that any one of above-mentioned 1-7 potential user determines method.

10. a kind of readable storage medium storing program for executing, when the instruction in the storage medium is executed by the processor of server, so that service Device is able to carry out the method and step that any one of above-mentioned 1-7 potential user determines method.