CN109559216A

CN109559216A - Learn the method and device of prediction user behavior using deeply

Info

Publication number: CN109559216A
Application number: CN201811210445.7A
Authority: CN
Inventors: 阎翔; 李晨晨
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-10-17
Filing date: 2018-10-17
Publication date: 2019-04-02

Abstract

This specification embodiment provides a kind of method and apparatus for learning prediction user behavior using deeply, wherein method includes, the current ambient conditions s that the characteristic of active user learns as deeply is obtained first, and this feature data include at least, the fund related data of active user.Then, this feature data are inputted into deep neural network, which is trained to, the corresponding bonus points r of a is acted according at least to various alternative debt-credits, it determines at current ambient conditions s, takes various alternative debt-credit movements respectively, Q is awarded to the various accumulations being contemplated that；Wherein, each alternative debt-credit movement includes loaning bill number, borrowing time and borrowing rate, and corresponding bonus points r is determined according at least to loaning bill number, borrowing time and borrowing rate.Then, Q is awarded according to obtained each accumulation, is selected in various alternative debt-credit movements, the debt-credit movement of the active user as prediction.

Description

Learn the method and device of prediction user behavior using deeply

Technical field

This specification one or more embodiment is related to artificial intelligence and machine learning field, more particularly to strong using depth The method and apparatus that chemistry practises prediction user's lend-borrow action.

Background technique

With the development of computer technology, start to predict using various computer models or the operation or use of analog subscriber Family behavior is designed closer to true user behavior so as to basis and more meets user using desired product, improves user's body It tests.

For example, starting to construct some user models, carrying out mould to improve the subscriber usage of specific products and usage experience Fit the use process of prediction user.Traditional user model generally by being constructed according to the demand of product or operation, is such as established and is used The direct relation of the buying rate (or other concern indexs) of family basic data (gender, age, region etc.) and target product, is realized Such as predict function.Such model usually excessively customizes, and is merely able to be suitable for specific products and specific set of data；Meanwhile When selecting the data type for portraying user, traditional user model is needed based on modeler for user, product, city's field depth The understanding at quarter has very high requirement to the professional ability of modeler, and workload is also quite large.

In the user model of most of marketing scenes, the mistake that user makes a policy when in face of marketing equity is usually assumed that Journey is the judgement based on rational, such as whether user can reach the threshold value of heart expectation according to the value for receiving marketing equity And decide whether to buy, or the increase of the value with marketing equity, the probability of user's purchase becomes larger therewith.This means that The marketing equity that can only continuously attempt to provide a variety of different values when design marketing equity, explores the threshold value that user's heart is expected, Or user's purchase probability reaches the threshold value of equity value when marketing is expected.It so needs constantly to be attempted to approach threshold value, effect Rate is very low.Also, as user's intelligence is with time change, it is also necessary to constantly carry out new trial.

Accordingly, it would be desirable to there is improved plan, user is more efficiently predicted to the usage behavior of product, especially to borrowing Borrow the usage behavior of class product.

Summary of the invention

This specification one or more embodiment describes a kind of method for learning prediction user behavior using deeply And device, it can the lend-borrow action of more accurate and effective simulation and prediction user in contract of commodatum product.

According in a first aspect, providing a kind of method for learning prediction user behavior using deeply, the method packet It includes:

Obtain the current ambient conditions s that the characteristic of active user learns as the deeply, the current use The characteristic at family includes at least, the fund related data of the active user；

The characteristic of the active user is inputted into deep neural network, the deep neural network is trained to, until It is few that the corresponding bonus points r of a is acted according to each alternative debt-credit in a variety of alternative debt-credit movements, it determines in current ambient conditions s Under, the multiple alternative debt-credit movement is taken respectively, and Q is awarded to the multiple accumulations being contemplated that；Wherein, each alternative debt-credit The motion characteristic for acting a includes user's loaning bill number, borrowing time and borrowing rate；Each alternative debt-credit acts corresponding reward Score r is the difference of loaning bill income item and expenditure item of refunding, and the loaning bill income item is determined, institute based on user's loaning bill number Refund expenditure item is stated at least to be based on user's loaning bill number, borrowing time and borrowing rate and determine；

Q is awarded according to the multiple accumulation, the first alternative action is selected from a variety of alternative debt-credit movements, as pre- The debt-credit of the active user of survey acts.

In one embodiment, above-mentioned fund related data may include the use of funds history in predetermined amount of time Data, and current fund state.

Further, in one example, use of funds historical data includes one or more in following: transactions history Data sequence borrows or lends money historical data sequence, refund historical data sequence.

According to a kind of possible design, user's loaning bill number is selected within the scope of preset maximum loaning bill amount, described Borrowing time is selected in optional loaning bill duration, when the borrowing rate is according to user's loaning bill number and/or the loaning bill Between and determine.

In one embodiment, the characteristic of active user further includes being determined according to the fund related data The expected discount rate of user, for characterizing the borrowing demand degree of active user；Correspondingly, the refund expenditure item is also based on institute It states the expected discount rate of user and determines.

In one embodiment, above-mentioned fund related data includes multiple data item, and each data item, which has, to be preset The relative coefficient with expected discount rate；The expected discount rate of the user, which passes through, utilizes the relative coefficient, will be described more A data item is integrated and is determined.

Further, in one embodiment, expenditure of refunding item is determined based on refund number and discount factor, described to go back Amount of money volume is based on user's loaning bill number, borrowing time and borrowing rate and determines, the discount factor is according to the user It is expected that discount rate and the borrowing time and determine.

More specifically, in one example, discount factor is determined by following formula:

Wherein, Rd is the expected discount rate of the user, and D is the borrowing time.

According to a kind of embodiment, the deep neural network is trained by the sample data of multiple samples, each sample This sample data includes at least, sample environment state S, sample action A and sample bonus points R；Wherein:

The sample environment state S includes the fund related data of sample of users；

The sample action A is the sample debt-credit movement executed at the sample environment state S, and sample debt-credit is dynamic Work includes sample loaning bill number, sample borrowing time and sample borrowing rate；

The sample bonus points R is income item and the difference for paying item, and the income item is based on the sample loaning bill number And determine, the expenditure item is at least based on the sample loaning bill number, sample borrowing time and sample borrowing rate and determines.

In a kind of possible design, the first alternative action is selected in the following manner:

In the multiple accumulation award, the corresponding alternative debt-credit of the maximum accumulation award of fractional value is acted, selection As first alternative action.

In alternatively possible design, the first alternative action is selected in the following manner:

According to the fractional value of accumulation award each in the multiple accumulation award, determine that each accumulation award is corresponding The select probability of each alternative debt-credit movement；

According to the select probability, the first alternative action is selected from the multiple alternative debt-credit movement.

According to second aspect, a kind of device for learning prediction user behavior using deeply is provided, described device includes:

Environment acquiring unit, the characteristic for being configured to obtain active user work as front ring as what the deeply learnt Border state s, the characteristic of the active user include at least, the fund related data of the active user；

Determination unit is awarded, is configured to the characteristic of the active user inputting deep neural network, the depth Neural network is trained to, and acts the corresponding bonus points r of a according at least to each alternative debt-credit in a variety of alternative debt-credit movements, It determines at current ambient conditions s, takes the multiple alternative debt-credit movement respectively, the multiple accumulations being contemplated that are awarded Q；Wherein, the motion characteristic of each alternative debt-credit movement a includes user's loaning bill number, borrowing time and borrowing rate；It is each standby Debt-credit is selected to act the difference that corresponding bonus points r is loaning bill income item and expenditure item of refunding, the loaning bill income item is based on described User's loaning bill number and determine, refund expenditure item is at least based on user's loaning bill number, borrowing time and borrowing rate And it determines；

Action prediction unit is configured to award Q according to the multiple accumulation, selects from a variety of alternative debt-credit movements First alternative action, the debt-credit movement of the active user as prediction.

According to the third aspect, a kind of computer readable storage medium is provided, computer program is stored thereon with, when described When computer program executes in a computer, enable computer execute first aspect method.

According to fourth aspect, a kind of calculating equipment, including memory and processor are provided, is stored in the memory Executable code, when the processor executes the executable code, the method for realizing first aspect.

It is moved according to the method and apparatus that this specification embodiment provides by the ambient condition in learning to deeply Make and bonus points carry out the characterization for being directed to lend-borrow action, deeply study is applied to user and is borrowed or lent money in scene, so that intelligence Lend-borrow action of the user in contract of commodatum product can be simulated and be predicted to energy body.In this way, by the way of deeply study, User's lend-borrow action is more accurately predicted, to obtain more accurate and flexible user model, facilitates product designer There can be more deep understanding to user, to take marketing strategy more appropriate, or improve product design to improve User experience.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others Attached drawing.

Fig. 1 shows the schematic diagram of deeply learning system；

Fig. 2 shows the flow charts for learning to predict the method for user behavior using deeply according to one embodiment；

Fig. 3 shows the signal that product design is improved according to the user model using deeply study of one embodiment Figure；

Fig. 4 shows the schematic block diagram of the user's behavior prediction device according to one embodiment.

Specific embodiment

With reference to the accompanying drawing, the scheme provided this specification is described.

As previously mentioned, conventional user model is very closely dependent upon modeler to product use process and user's applied mental It understands in depth, modeler is required high and inflexible.Also, for borrowing or lending money class product, there is no applicability is preferable at present User model.For this purpose, consideration is simulated and is predicted using intensified learning model, Yong Hu in the embodiment of this specification User behavior when contract of commodatum class product.

Fig. 1 shows the schematic diagram of deeply learning system.Generally, deeply learning system includes intelligent body and holds Row environment, intelligent body are constantly learnt by interaction and feedback with performing environment, optimize its strategy.Specifically, intelligence The state (state) of performing environment is observed and obtained to body, and according to certain strategy, the state determination for current execution environment will be adopted The behavior or movement (act ion) taken.Such behavior acts on performing environment, can change the state of performing environment, produce simultaneously Raw one feeds back to intelligent body, which also known as awards or bonus points (reward).Intelligent body is according to the reward score of acquisition For number to judge, whether behavior before is correct, and whether strategy needs to adjust, and then updates its strategy.By seeing once and again State is examined, behavior is determined, receives feedback, allow intelligent body to constantly update strategy, final goal can be learnt to one Strategy maximizes so that the bonus points obtained are accumulated.Here it is typical intensified learning processes.In study and adjustable strategies mistake Cheng Zhong, if intelligent body take be some deep learnings including neural network algorithm, such system is then Referred to as deeply learning system.

Based on the These characteristics of deeply learning system, the scheme of this specification is proposed, can use for reference deeply The decision process of intelligent body in habit carrys out decision process when analog subscriber contract of commodatum product.That is, user is in selection No contract of commodatum product, and how contract of commodatum product when, the starting point that user makes a policy is also to make the accumulation of its own The decision process of intelligent body has certain similitude in maximum revenue, such decision process and deeply study, therefore, It can be using the policing algorithm of intelligent body in deeply learning system, to simulate and predict the lend-borrow action of user.

In the case where deeply learning system is applied to the scene of user's contract of commodatum product, intelligent body can be used for simulating The decision process of user.Specifically, the loaning bill environment of ambient condition i.e. user observed by before intelligent body makes a policy, intelligence The debt-credit movement for the user that energy body is namely simulated based on the movement that certain strategy is taken, each movement have corresponding reward Score, intelligent body make a policy foundation be so that, accumulation award it is optimal.Further, learning to come in advance using deeply During surveying user's lend-borrow action, the characteristics of how being directed to loan product, to ambient condition state, movement action and prize It encourages score reward to be portrayed, is the key that Accurate Prediction user behavior.

Fig. 2 shows the flow charts for learning to predict the method for user behavior using deeply according to one embodiment.It can To understand, the executing subject of this method can be the intelligent body in deeply learning system, and intelligent body can pass through any tool There are calculating, the device of processing capacity or equipment to realize.As shown in Fig. 2, this method comprises: step 21, obtains active user's The characteristic of the current ambient conditions that characteristic learns as deeply, the active user includes at least, described to work as The fund related data of preceding user；Step 22, the characteristic of active user is inputted into deep neural network, the depth nerve Network is trained to, and is acted corresponding bonus points according at least to each alternative debt-credit in a variety of alternative debt-credit movements, is determined Under current ambient conditions, the multiple alternative debt-credit movement is taken respectively, the multiple accumulations being contemplated that are awarded；Step 23, it is awarded according to the multiple accumulation, the first alternative action is selected from a variety of alternative debt-credit movements, as working as prediction The debt-credit of preceding user acts.The specific implementation procedure of above each step is described below.

Firstly, obtaining the current ambient conditions that the characteristic of active user learns as deeply in step 21.For Deeply study is applied to the scene of user's debt-credit, it is necessary first to characterize to the ambient condition of intensified learning.This In, using the characteristic of user as current ambient conditions, and this feature data include the fund related data of user.

In one embodiment, fund related data includes use of funds historical data of the user in predetermined amount of time, And the current fund state of user.

More specifically, in one example, use of funds historical data includes, predetermined amount of time (such as last month, when In the one week time before the preceding time, within 30 days, etc.) in transaction history data sequence, wherein may include user Consumption and purchaser record, can also include with trade it is related other expenditure or be transferred to.In one example, use of funds is gone through History data may include that the debt-credit historical data sequence in predetermined amount of time, which mainly includes the use that can be obtained Loaning bill record before family.In another example, use of funds historical data may include the refund history in predetermined amount of time Data sequence, this partial data may include the credit card repayment record of user and the refund record of various loan products.? In other examples, use of funds historical data can also include, others to fund using relevant historical data, such as with The record of investment or financing of the family in a period of time.

The current fund state of user may include that user is currently available remaining in related product (such as Yuebao) Volume and the arrears of user.Arrears may include, amount owed and the contract of commodatum product of credit card and not The borrowing balance paid off.

In one embodiment, the characteristic of user further includes the expected discount rate Rd of user.User is expected discount rate Rd It is the parameter for characterizing the borrowing demand degree size of user, the stronger user of borrowing demand, it is expected that discount rate is higher.With The expected discount rate Rd in family can be based on above-mentioned fund related data, be determined according to scheduled algorithm.

For example, in one example, fund related data includes multiple data item, such as available balance, credit card debt The amount of money, the amount of consumption, etc. counted according to transaction history data.Preset the phase of these data item with expected discount rate Rd Property coefficient is closed, the positive and negative of the relative coefficient can embody, and corresponding data item and expected discount rate are to be positively correlated or negatively correlated, The size of relative coefficient can embody being associated between corresponding data item and expected discount rate.Based on these relative coefficients, Each data item in fund related data is integrated, so that it may determine the expected discount rate Rd of user.

In one embodiment, for it is determined above go out the expected discount rate Rd of user, further by statistical distribution and Rd processing is the value between 0 to 1 by normalized.

Certainly, the characteristic of user can also include drawing a portrait relevant other data of information to user, for example including with The attributive character at family, such as gender, age, area, income range, registration time length etc. can also include crowd's mark of user Label, such as workplace new person, foundation new hand, etc., these crowd's labels, which can be, pre- to be first passed through the modes such as cluster, classification and will use It incorporates some crowd into and obtains in family.These users portrait information can assist the loaning bill environment for portraying user more fully hereinafter.

In this way, the loaning bill environment of user is portrayed and characterized based on the fund related data of user, as deeply The ambient condition of habit.On the basis of observing and getting ambient condition, the intelligent body of deeply learning system will basis Certain strategy, determines the movement that should be executed under the ambient condition.Intelligent body determine execution movement starting point be so that, execute The obtained accumulation award of the movement is optimal.

Generally, frequently with state-movement value function (state-action value in the Policy evaluation of intensified learning Fuction) Q indicates desired accumulation award.State-movement value function is also known as Q function, and wherein (s a) is indicated, from shape Q State s sets out, and tactful π bring accumulation award is reused after execution movement a.T step accumulation usually can be used in the calculating of accumulation award The form of reward or the reward of γ accumulation of discount, is expression of the Q function under both forms below:

Wherein subscript t indicates the step number of subsequent execution, r_tFor the bonus points of t step.

Intensified learning process is a kind of markov decision process, has Markov form, i.e. system subsequent time State is only determined that then Q function can also be expressed as meeting the recursive form of Bellman equation by the state at current time:

Q_i+1(s, a)=E_s′∈∈[r+γmax_a′Q^*(s ', a ') | s, a]

The improvement and determination of intelligent body strategy actually solve above-mentioned Bellman equation, so that Q function maximization Process.

It can be seen that Q function is to pass through the cumulative acquisitions of bonus points under certain strategy.In ambient condition space Under the lesser simple scenario of alternative action Spatial Dimension, Q function can be expressed by a table or matrix, and tactful comments Estimating and improving can be carried out based on Q function table or matrix.

However, learning faced complex environment for deeply, environmental model is unknown, such as user is borrowed For loan behavior, loaning bill environment has how many kind state, the transition probability of environment, the bonus points etc. of various alternative actions altogether It is all extremely complex, it is difficult to be realized by simply searching Q table or solving Q function.In this case, then pass through depth Neural network realizes the determination of Q function.

Therefore, in deeply study, a deep neural network model, the deep neural network model are trained in advance It can be taken respectively each alternative dynamic according to the corresponding bonus points of various alternative actions, to determine under current ambient conditions When making, the accumulation being contemplated that is awarded, i.e. Q function.Intelligent body is based on the corresponding Q function of various alternative actions, according to one Selection strategy is determined, to select movement to be executed.

Be applied in the scene of user's debt-credit, step 21 observation get the characteristic of user as ambient condition it Afterwards, above-mentioned deep neural network is inputted using the characteristic of active user as ambient condition in step 22.Depth mind It is trained to through network, acts corresponding bonus points according at least to each alternative debt-credit in a variety of alternative debt-credit movements, determine Under current ambient conditions, each alternative debt-credit movement is taken respectively, the accumulation being contemplated that is awarded, i.e. Q function.

As shown in the expression of aforementioned Q function, Q function is to add up to expected bonus points, including take and currently alternatively borrow The current bonus points obtained after loan movement, and continue to execute expected bonus points obtained in t step of the strategy.This is just needed A is acted to alternative debt-credit and each alternative debt-credit acts corresponding bonus points r and characterized and portrayed.

In one embodiment, the motion characteristic of alternative debt-credit movement includes: user's loaning bill number, borrowing time and loaning bill Interest rate.

In one embodiment, loan product is provided with maximum loaning bill amount, which can be fixed value, It can also be associated with user credit state.For example, loan product can assess user credit in advance, according to user credit state Determine its maximum loaning bill amount.Correspondingly, user's loaning bill number is defined within the scope of the maximum loaning bill amount.Some are borrowed Product is borrowed within the scope of above-mentioned maximum loaning bill amount, the option of loaning bill number is provided.For example it is assumed that maximum loaning bill amount is 1 Wan Yuan, then 1k can be provided, the option of the optional loaning bill number of 10 of 2k to 10k.So at this time user's loaning bill number be Within the scope of maximum loaning bill amount, the amount of money selected from multiple options.

Similar, loan product also can be set optional loaning bill duration, such as 3 months, and 6 months, 12 months, etc. Deng user's borrowing time can be selected in optional loaning bill duration.Certainly, when loan product can also only provide longest loaning bill Long, user is defined as the value in longest loaning bill duration range the borrowing time at this time.

In one embodiment, borrowing rate is arranged by loan product, and is fixed value.In another embodiment, it borrows or lends money Product can determine borrowing rate according to user's loaning bill number and/or borrowing time, floating ground.For example, loaning bill number is bigger, borrow Money interest rate is smaller；Borrowing time is longer, and borrowing rate is smaller.In this way, determining the alternative loaning bill number in alternative debt-credit movement Volume and on the basis of the alternative borrowing time, so that it may determine corresponding borrowing rate.

In this way, alternative debt-credit movement is characterized by this 3 dimensions of loaning bill number, borrowing time and borrowing rate, i.e., it is standby Debt-credit is selected to act a=(M, D, i), wherein M is loaning bill number, and D is the borrowing time, and i is borrowing rate.In one example, false Determine loaning bill number have m kind may numerical value (such as loan product provides the amount of money selection from 1k to 10k, then m=10), when loaning bill Between have the selection of n kind, borrowing rate is determined by loaning bill number and/or borrowing time, is alternatively borrowed or lent money then at least there is m*n+1 kind Movement, wherein 1 except m*n kind corresponds to and does not borrow or lend money, at this time M=0.

On the basis of so characterizing alternative debt-credit movement a, the corresponding bonus points r of a is acted for each alternative debt-credit It can be characterized as follows.Here it should be understood that being different from the award of accumulation represented by Q function, bonus points r below is to hold The instant bonus points that can be obtained from environment after the alternative debt-credit movement a of row, or be current bonus points.

In one embodiment, current bonus points r can be expressed as the difference of loaning bill income item B and the expenditure item P that refunds:

R=B-P

Wherein, user loaning bill number M in the motion characteristic that loaning bill income item B is acted based on alternative debt-credit and determine, also Money expenditure item P is at least determined based on user's loaning bill number in motion characteristic, borrowing time and borrowing rate.

More specifically, loaning bill income item B, for characterizing, user passes through debt-credit and acts the income obtained.In an example In, loaning bill income item B can be set as, and be equal to user's loaning bill number M.In another example, loaning bill income item B can be set For user loaning bill number M is multiplied by certain proportionality coefficient α:

B=M* α

Aforementioned proportion coefficient can be set as needed.In one example, aforementioned proportion factor alpha can be with the spy of user The expected discount rate R of user levied in data is related, such as is proportional to R.It discounts in this way, introducing expection in loaning bill income item B The influence of rate R.User's borrowing demand is stronger, and R is bigger, it is believed that user is bigger by the income that loaning bill obtains.

On the other hand, expenditure item P is refunded for characterizing, the cost that user needs to pay for debt-credit movement.At one In example, expenditure of refunding item P is set equal to refund number M ', which is borrowed money based on the user in motion characteristic Number M, borrowing time D and borrowing rate i and determine.More specifically, it can be based on loaning bill number M, borrowing time D and benefit of borrowing money Rate i determines interest number I, is then based on loaning bill number M and interest number I, determines total refund number M '.

In another example, the influence of the expected discount rate Rd of user is introduced in the expenditure item P that refunds.

Specifically, refund expenditure item P can be determined based on refund number M ' and discount factor d, it may be assumed that

P=d*M '

Wherein refund number M ' can be true based on user's loaning bill number M, borrowing time P and borrowing rate i as described above It is fixed, and discount factor d is determined according to the expected discount rate Rd and borrowing time D of user.

In one example, discount factor d can be indicated are as follows: d=(1+Rd) * D

In another example, the lend-borrow action for being D days for the borrowing time, the expenditure that can be refunded item P are indicated are as follows:

Wherein, M is loaning bill number, and I is interest number, and M+I is refund number M '；Rd is the expected discount rate of user, because In this above formula, by 1/ (1+Rd) using number of days t as index, summation is used as discount factor d in borrowing time total number of days D.

By the above various modes, loaning bill income item B is determined respectively and the expenditure item P that refunds, and then determine alternatively to borrow Loan acts the corresponding bonus points r=B-P of a.

It has been trained based on the various alternative debt-credit movement a and corresponding bonus points r so characterized in step 22 Deep neural network be assured that, at current ambient conditions s, if executing each alternative debt-credit acts a, to being contemplated that Obtained each accumulation award, the i.e. value of Q function.

It is assumed that there are the possible alternative debt-credit movement (such as in the preceding example, N=m*n+1) of N kind, a₁, a₂..., a_N, So in step 22, deep neural network is assured that, at current ambient conditions s, this N number of alternative debt-credit movement is right respectively The accumulation award Q answered₁(s,a₁), Q₂(s,a₂),…,Q_N(s,a_N)。

It should be noted that being adopted in deeply study for determining that the neural network of Q function can use training sample It is trained in various manners and learns to obtain.Generally, the training sample suitable for deeply study includes < s, a, r, S ' > sequence, wherein s indicates that ambient condition, a indicate the movement that executes at ambient condition s of intelligent body, and r indicates that execution acts a Obtained bonus points, s ' indicate the new state that arrives of environmental transport and transfer after execution movement a.By a plurality of such sample, using each Kind training method, so that it may which training obtains the neural network that Q function is determined in deeply study.

In the case of deeply study is applied to user's debt-credit scene, for determining current environment shape in step 22 Each alternative debt-credit acts the neural network of corresponding Q function under state, can also be based on training sample, be carried out using various methods Training obtains, and this specification is not construed as limiting this.

About training sample, with the application scenarios of deeply study correspondingly, which can be using known The sample of lend-borrow action is trained.Specifically, in one embodiment, above-mentioned neural network can use multiple debt-credit samples Sample data be trained, each debt-credit sample have corresponding sample of users, and it is each debt-credit sample sample data It includes at least, sample environment state S, sample action A and sample bonus points R.Optionally, sample data can also include holding After row sample action A, new state S ' that sample environment moves to.It is appreciated that sample environment state S, sample action A and sample The characterization of bonus points R and portray with it is aforementioned corresponding to the characterization of active user.

Specifically, sample environment state S includes the fund related data of sample of users, such as use of funds historical data With fund state；It optionally, further include the expection discount rate of sample of users.

Sample action A is, at aforementioned sample environment state S, performed sample debt-credit movement, and sample debt-credit movement Including, sample loaning bill number, sample borrowing time and sample borrowing rate.

Sample bonus points R is income item and the difference for paying item, and the income item is true based on the sample loaning bill number Fixed, the expenditure item is at least based on the sample loaning bill number, sample borrowing time and sample borrowing rate and determines.In difference In embodiment, above-mentioned income item pays embodying for item, can from loaning bill income item B above-mentioned in different embodiments, and Expenditure of refunding item P is respectively corresponded, and details are not described herein.

Above-mentioned training sample can be obtained by way of Monte Carlo.For example, simulation intelligence that can be random The loaning bill movement that body issues, such as since any one loaning bill ambient condition S, the selection of each loaning bill movement A of stochastic simulation, Then the new loaning bill ambient condition S' for calculating corresponding bonus points R and moving to repeats this process, obtains to S' again Such one group<S, A, R, S '>sequence.By simulating such process many times, sequence as multiple groups is obtained, as nerve net The training sample of network.

Based on enough training samples, various training methods can be used, such as DQN (Deep Q-Network), Actor-cretic, PG (Policy Gradient) etc., training obtain deep neural network needed for step 22.In this way, Deep neural network is assured that, N number of alternatively to borrow at the current ambient conditions s that is constituted of characteristic of active user Loan acts corresponding accumulation and awards Q₁(s,a₁), Q₂(s,a₂),…,Q_N(s,a_N)。

Then, it in step 23, is awarded according to above-mentioned multiple accumulations, it is standby to select first from various alternative debt-credit movements Choosing movement, the debt-credit movement of the active user as prediction.

In one embodiment, Q is awarded in multiple accumulations₁,Q₂,…,Q_NIn, select the maximum accumulation award Q of fractional value_i, Determine its corresponding alternative debt-credit movement a_i, user's debt-credit which is selected as prediction is acted.This is corresponded to " greed " strategy in deeply study.

In another embodiment, Q is awarded according to above-mentioned multiple accumulations₁,Q₂,…,Q_NIn it is each accumulation award fractional value, Determine the select probability P of the corresponding each alternative debt-credit movement of each accumulation award₁,P₂,…,P_N.For example, each selection Probability P₁,P₂,…,P_NIt can be set as being proportional to corresponding accumulation award, and be normalized to probability and be 1.Then, according to Above-mentioned select probability P₁,P₂,…,P_N, a is acted from each alternative debt-credit₁, a₂..., a_NIn select an alternative action, as User's debt-credit of prediction acts.

In other embodiments, corresponding accumulation award can also be acted based on each alternative debt-credit according to other strategies Q determines that the debt-credit to be executed acts, user's lend-borrow action as prediction.

In this way, movement and bonus points are carried out for lend-borrow action by the ambient condition in learning to deeply Deeply study is applied to user and borrowed or lent money in scene, intelligent body is allowed to simulate and predict that user borrows in use by characterization Borrow lend-borrow action when product.The obtained intelligent body can be used as the user model for loan product.

By the way of the study of the above deeply, user behavior can be more accurately predicted, to obtain more smart Quasi- and flexible user model, enables product designer to have more deep understanding to user, thus take it is more appropriate Marketing strategy, or improve product design and improve user experience.

In one embodiment, the user model obtained with upper type can be used, to assist carrying out product design, thus Optimize product effect and user experience.Fig. 3 is shown to be improved according to the user model using deeply study of one embodiment The schematic diagram of product design.As shown in figure 3, product design scheme alternative in n can be provided, each design scheme has difference Product element.For example, product element may include user's loaning bill amount, optional borrow in the case where product is loan product Money time, borrowing rate, it is the option range that user provides that these, which belong to product designer, and it is strong that these elements have corresponded to depth Chemistry practises the feature space of the alternative debt-credit movement of user in user model.Since each alternate product scheme has different spies Space is levied, the above-mentioned user model based on deeply study can be obtained for the training of each alternate product design scheme.

Then product measure of merit can be carried out using such user model.Specifically, it will can similarly test Use-case inputs the corresponding user model of each alternate product scheme, and above-mentioned test case for example can be the loaning bill ring of test user Border state.Using method shown in Fig. 2, the user model of deeply study predicts borrowing for user according to loaning bill ambient condition Loan behavior.In this way, can simulate to obtain, user is when using different alternate product schemes, the different lend-borrow actions that may make. Next, considering these different lend-borrow actions, which more meets product expection, such as calculates the expection of various user's lend-borrow actions Income judges which kind of lend-borrow action can bring more excellent income, just determines it as more excellent behavior.In turn, will to generate this more excellent The alternate product scheme of behavior is determined as more excellent products scheme.In this way, from preferably being produced in selection in a variety of alternate product schemes Product design, enables such product preferably to realize desired product effect, and promote user experience.

According to the embodiment of another aspect, a kind of device for learning prediction user behavior using deeply is also provided.It should Device can use with calculating, the device of processing capacity or equipment and realize.Fig. 4 shows user's row according to one embodiment For the schematic block diagram of prediction meanss.As shown in figure 4, the device 400 includes:

Environment acquiring unit 41, be configured to obtain active user characteristic learn as the deeply it is current Ambient condition s, the characteristic of the active user include at least, the fund related data of the active user；

Determination unit 42 is awarded, is configured to the characteristic of the active user inputting deep neural network, the depth Degree neural network is trained to, and acts the corresponding bonus points of a according at least to each alternative debt-credit in a variety of alternative debt-credit movements R is determined at current ambient conditions s, is taken the multiple alternative debt-credit movement respectively, is encouraged to the multiple accumulations being contemplated that Appreciate Q；Wherein, the motion characteristic of each alternative debt-credit movement a includes user's loaning bill number, borrowing time and borrowing rate；It is each Alternative debt-credit acts the difference that corresponding bonus points r is loaning bill income item and expenditure item of refunding, and the loaning bill income item is based on institute It states user's loaning bill number and determines, the refund expenditure item is at least based on user's loaning bill number, borrowing time and benefit of borrowing money Rate and determine；

Action prediction unit 43 is configured to award Q according to the multiple accumulation, selects from a variety of alternative debt-credit movements The first alternative action is selected, the debt-credit movement of the active user as prediction.

Further, above-mentioned use of funds historical data may include one or more in following, transaction history data Sequence borrows or lends money historical data sequence, refund historical data sequence, etc..

According to a kind of embodiment, above-mentioned user's loaning bill number is selected within the scope of preset maximum loaning bill amount, described Borrowing time is selected in optional loaning bill duration, when the borrowing rate is according to user's loaning bill number and/or the loaning bill Between and determine.

According to a kind of possible design, active user's characteristic as ambient condition further includes, according to the fund The expected discount rate of the user of related data and determination, for characterizing the borrowing demand degree of active user；Correspondingly, the refund Expenditure item is also determined based on the expected discount rate of the user.

In one embodiment, the fund related data includes multiple data item, and each data item has to be set in advance The relative coefficient with expected discount rate set；Then, the expected discount rate of the user can be by utilizing the correlation system The multiple data item is integrated and is determined by number.

In one embodiment, the refund expenditure item in bonus points is determined based on refund number and discount factor, institute It states refund number to be based on user's loaning bill number, borrowing time and borrowing rate and determine, the discount factor is according to User is expected discount rate and the borrowing time and determines.

More specifically, in one example, the discount factor is determined by following formula:

According to a kind of embodiment, above-mentioned deep neural network is trained by the sample data of multiple samples, each sample This sample data includes at least, sample environment state S, sample action A and sample bonus points R；Wherein:

According to a kind of possible design, the action prediction unit 43 is configurable to:

According to alternatively possible design, the action prediction unit 43 is configurable to:

In this way, the debt-credit row by apparatus above, by the way of deeply study, when to user's contract of commodatum product To be simulated and being predicted.

According to the embodiment of another aspect, a kind of computer readable storage medium is also provided, is stored thereon with computer journey Sequence enables computer execute method described in conjunction with Figure 2 when the computer program executes in a computer.

According to the embodiment of another further aspect, a kind of calculating equipment, including memory and processor, the memory are also provided In be stored with executable code, when the processor executes the executable code, realize the method in conjunction with described in Fig. 2.

Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all Including within protection scope of the present invention.

Claims

1. a kind of method for learning prediction user behavior using deeply, which comprises

The current ambient conditions s that the characteristic of active user learns as the deeply is obtained, the active user's Characteristic includes at least, the fund related data of the active user；

The characteristic of the active user is inputted into deep neural network, the deep neural network is trained to, at least root The corresponding bonus points r of a is acted according to each alternative debt-credit in a variety of alternative debt-credit movements, is determined at current ambient conditions s, point The multiple alternative debt-credit movement is not taken, and Q is awarded to the multiple accumulations being contemplated that；Wherein, each alternative debt-credit acts a Motion characteristic include user's loaning bill number, borrowing time and borrowing rate；Each alternative debt-credit acts corresponding bonus points r For the difference of loaning bill income item and expenditure item of refunding, the loaning bill income item is determined based on user's loaning bill number, described to go back Money expenditure item is at least based on user's loaning bill number, borrowing time and borrowing rate and determines；

Q is awarded according to the multiple accumulation, the first alternative action is selected from a variety of alternative debt-credit movements, as prediction The debt-credit of active user acts.

2. according to the method described in claim 1, wherein, the fund related data includes the fund in predetermined amount of time Usage history data, and current fund state.

3. according to the method described in claim 2, wherein, the use of funds historical data includes one or more in following , transaction history data sequence borrows or lends money historical data sequence, refund historical data sequence.

4. according to the method described in claim 1, wherein user's loaning bill number is within the scope of preset maximum loaning bill amount It selects, the borrowing time is selected in optional loaning bill duration, and the borrowing rate is according to user's loaning bill number and/or institute It states the borrowing time and determines.

5. method according to any of claims 1-4, wherein the characteristic of the active user further includes, according to The expected discount rate of the user of the fund related data and determination, for characterizing the borrowing demand degree of active user；

The refund expenditure item is also determined based on the expected discount rate of the user.

6. each data item has according to the method described in claim 5, wherein the fund related data includes multiple data item There is the pre-set relative coefficient with expected discount rate；The expected discount rate of the user, which passes through, utilizes the correlation system Number, the multiple data item is integrated and is determined.

7. according to the method described in claim 5, wherein refund expenditure item is determined based on refund number and discount factor, The refund number is based on user's loaning bill number, borrowing time and borrowing rate and determines, the discount factor is according to institute It states the expected discount rate of user and the borrowing time and determines.

8. according to the method described in claim 7, wherein the discount factor is determined by following formula:

9. according to the method described in claim 1, the deep neural network is trained by the sample data of multiple samples, often The sample data of a sample includes at least, sample environment state S, sample action A and sample bonus points R；Wherein:

The sample action A is the sample debt-credit movement executed at the sample environment state S, sample debt-credit movement packet It includes, sample loaning bill number, sample borrowing time and sample borrowing rate；

The sample bonus points R is income item and the difference for paying item, and the income item is true based on the sample loaning bill number Fixed, the expenditure item is at least based on the sample loaning bill number, sample borrowing time and sample borrowing rate and determines.

10. being moved according to the method described in claim 1, wherein awarding Q according to the multiple accumulation from a variety of alternative debt-credits The first alternative action is selected to include: in work

In the multiple accumulation award, the corresponding alternative debt-credit of the maximum accumulation award of fractional value is acted, is selected as First alternative action.

11. being moved according to the method described in claim 1, wherein awarding Q according to the multiple accumulation from a variety of alternative debt-credits The first alternative action is selected to include: in work

According to the fractional value of accumulation award each in the multiple accumulation award, determine that each accumulation award is corresponding each The select probability of alternative debt-credit movement；

12. a kind of device for learning prediction user behavior using deeply, described device include:

Environment acquiring unit is configured to obtain the current environment shape that the characteristic of active user learns as the deeply State s, the characteristic of the active user include at least, the fund related data of the active user；

Determination unit is awarded, is configured to inputting the characteristic of the active user into deep neural network, the depth nerve Network is trained to, and is acted the corresponding bonus points r of a according at least to each alternative debt-credit in a variety of alternative debt-credit movements, is determined At current ambient conditions s, the multiple alternative debt-credit movement is taken respectively, Q is awarded to the multiple accumulations being contemplated that；Its In, the motion characteristic of each alternative debt-credit movement a includes user's loaning bill number, borrowing time and borrowing rate；It is each alternatively to borrow Loan acts the difference that corresponding bonus points r is loaning bill income item and expenditure item of refunding, and the loaning bill income item is based on the user Loaning bill number and determine, refund expenditure item is at least based on user's loaning bill number, borrowing time and borrowing rate and true It is fixed；

Action prediction unit is configured to award Q according to the multiple accumulation, selects first from a variety of alternative debt-credit movements Alternative action, the debt-credit movement of the active user as prediction.

13. device according to claim 12, wherein the fund related data includes the money in predetermined amount of time Golden usage history data, and current fund state.

14. device according to claim 13, wherein the use of funds historical data includes one or more in following , transaction history data sequence borrows or lends money historical data sequence, refund historical data sequence.

15. device according to claim 12, wherein user's loaning bill number is in preset maximum loaning bill amount range Inside select, the borrowing time is selected in optional loaning bill duration, the borrowing rate according to user's loaning bill number and/or The borrowing time and determine.

16. device described in any one of 2-15 according to claim 1, wherein the characteristic of the active user further includes, The expected discount rate of the user of determination according to the fund related data, for characterizing the borrowing demand degree of active user；

17. device according to claim 16, wherein the fund related data includes multiple data item, each data item With the pre-set relative coefficient with expected discount rate；The expected discount rate of the user, which passes through, utilizes the correlation system Number, the multiple data item is integrated and is determined.

18. device according to claim 16, wherein refund expenditure item is true based on refund number and discount factor Fixed, the refund number is based on user's loaning bill number, borrowing time and borrowing rate and determines, the discount factor according to The user is expected discount rate and the borrowing time and determines.

19. device according to claim 18, wherein the discount factor is determined by following formula:

20. device according to claim 12, the deep neural network is trained by the sample data of multiple samples, The sample data of each sample includes at least, sample environment state S, sample action A and sample bonus points R；Wherein:

21. device according to claim 12, wherein the action prediction unit is configured that

22. device according to claim 12, wherein the action prediction unit is configured that

23. a kind of computer readable storage medium, is stored thereon with computer program, when the computer program in a computer When execution, computer perform claim is enabled to require the method for any one of 1-11.

24. a kind of calculating equipment, including memory and processor, which is characterized in that be stored with executable generation in the memory Code realizes method of any of claims 1-11 when the processor executes the executable code.