Summary of the invention
This specification one or more embodiment describes a kind of method for learning prediction user behavior using deeply
And device, it can the lend-borrow action of more accurate and effective simulation and prediction user in contract of commodatum product.
According in a first aspect, providing a kind of method for learning prediction user behavior using deeply, the method packet
It includes:
Obtain the current ambient conditions s that the characteristic of active user learns as the deeply, the current use
The characteristic at family includes at least, the fund related data of the active user;
The characteristic of the active user is inputted into deep neural network, the deep neural network is trained to, until
It is few that the corresponding bonus points r of a is acted according to each alternative debt-credit in a variety of alternative debt-credit movements, it determines in current ambient conditions s
Under, the multiple alternative debt-credit movement is taken respectively, and Q is awarded to the multiple accumulations being contemplated that;Wherein, each alternative debt-credit
The motion characteristic for acting a includes user's loaning bill number, borrowing time and borrowing rate;Each alternative debt-credit acts corresponding reward
Score r is the difference of loaning bill income item and expenditure item of refunding, and the loaning bill income item is determined, institute based on user's loaning bill number
Refund expenditure item is stated at least to be based on user's loaning bill number, borrowing time and borrowing rate and determine;
Q is awarded according to the multiple accumulation, the first alternative action is selected from a variety of alternative debt-credit movements, as pre-
The debt-credit of the active user of survey acts.
In one embodiment, above-mentioned fund related data may include the use of funds history in predetermined amount of time
Data, and current fund state.
Further, in one example, use of funds historical data includes one or more in following: transactions history
Data sequence borrows or lends money historical data sequence, refund historical data sequence.
According to a kind of possible design, user's loaning bill number is selected within the scope of preset maximum loaning bill amount, described
Borrowing time is selected in optional loaning bill duration, when the borrowing rate is according to user's loaning bill number and/or the loaning bill
Between and determine.
In one embodiment, the characteristic of active user further includes being determined according to the fund related data
The expected discount rate of user, for characterizing the borrowing demand degree of active user;Correspondingly, the refund expenditure item is also based on institute
It states the expected discount rate of user and determines.
In one embodiment, above-mentioned fund related data includes multiple data item, and each data item, which has, to be preset
The relative coefficient with expected discount rate;The expected discount rate of the user, which passes through, utilizes the relative coefficient, will be described more
A data item is integrated and is determined.
Further, in one embodiment, expenditure of refunding item is determined based on refund number and discount factor, described to go back
Amount of money volume is based on user's loaning bill number, borrowing time and borrowing rate and determines, the discount factor is according to the user
It is expected that discount rate and the borrowing time and determine.
More specifically, in one example, discount factor is determined by following formula:
Wherein, Rd is the expected discount rate of the user, and D is the borrowing time.
According to a kind of embodiment, the deep neural network is trained by the sample data of multiple samples, each sample
This sample data includes at least, sample environment state S, sample action A and sample bonus points R;Wherein:
The sample environment state S includes the fund related data of sample of users;
The sample action A is the sample debt-credit movement executed at the sample environment state S, and sample debt-credit is dynamic
Work includes sample loaning bill number, sample borrowing time and sample borrowing rate;
The sample bonus points R is income item and the difference for paying item, and the income item is based on the sample loaning bill number
And determine, the expenditure item is at least based on the sample loaning bill number, sample borrowing time and sample borrowing rate and determines.
In a kind of possible design, the first alternative action is selected in the following manner:
In the multiple accumulation award, the corresponding alternative debt-credit of the maximum accumulation award of fractional value is acted, selection
As first alternative action.
In alternatively possible design, the first alternative action is selected in the following manner:
According to the fractional value of accumulation award each in the multiple accumulation award, determine that each accumulation award is corresponding
The select probability of each alternative debt-credit movement;
According to the select probability, the first alternative action is selected from the multiple alternative debt-credit movement.
According to second aspect, a kind of device for learning prediction user behavior using deeply is provided, described device includes:
Environment acquiring unit, the characteristic for being configured to obtain active user work as front ring as what the deeply learnt
Border state s, the characteristic of the active user include at least, the fund related data of the active user;
Determination unit is awarded, is configured to the characteristic of the active user inputting deep neural network, the depth
Neural network is trained to, and acts the corresponding bonus points r of a according at least to each alternative debt-credit in a variety of alternative debt-credit movements,
It determines at current ambient conditions s, takes the multiple alternative debt-credit movement respectively, the multiple accumulations being contemplated that are awarded
Q;Wherein, the motion characteristic of each alternative debt-credit movement a includes user's loaning bill number, borrowing time and borrowing rate;It is each standby
Debt-credit is selected to act the difference that corresponding bonus points r is loaning bill income item and expenditure item of refunding, the loaning bill income item is based on described
User's loaning bill number and determine, refund expenditure item is at least based on user's loaning bill number, borrowing time and borrowing rate
And it determines;
Action prediction unit is configured to award Q according to the multiple accumulation, selects from a variety of alternative debt-credit movements
First alternative action, the debt-credit movement of the active user as prediction.
According to the third aspect, a kind of computer readable storage medium is provided, computer program is stored thereon with, when described
When computer program executes in a computer, enable computer execute first aspect method.
According to fourth aspect, a kind of calculating equipment, including memory and processor are provided, is stored in the memory
Executable code, when the processor executes the executable code, the method for realizing first aspect.
It is moved according to the method and apparatus that this specification embodiment provides by the ambient condition in learning to deeply
Make and bonus points carry out the characterization for being directed to lend-borrow action, deeply study is applied to user and is borrowed or lent money in scene, so that intelligence
Lend-borrow action of the user in contract of commodatum product can be simulated and be predicted to energy body.In this way, by the way of deeply study,
User's lend-borrow action is more accurately predicted, to obtain more accurate and flexible user model, facilitates product designer
There can be more deep understanding to user, to take marketing strategy more appropriate, or improve product design to improve
User experience.
Specific embodiment
With reference to the accompanying drawing, the scheme provided this specification is described.
As previously mentioned, conventional user model is very closely dependent upon modeler to product use process and user's applied mental
It understands in depth, modeler is required high and inflexible.Also, for borrowing or lending money class product, there is no applicability is preferable at present
User model.For this purpose, consideration is simulated and is predicted using intensified learning model, Yong Hu in the embodiment of this specification
User behavior when contract of commodatum class product.
Fig. 1 shows the schematic diagram of deeply learning system.Generally, deeply learning system includes intelligent body and holds
Row environment, intelligent body are constantly learnt by interaction and feedback with performing environment, optimize its strategy.Specifically, intelligence
The state (state) of performing environment is observed and obtained to body, and according to certain strategy, the state determination for current execution environment will be adopted
The behavior or movement (act ion) taken.Such behavior acts on performing environment, can change the state of performing environment, produce simultaneously
Raw one feeds back to intelligent body, which also known as awards or bonus points (reward).Intelligent body is according to the reward score of acquisition
For number to judge, whether behavior before is correct, and whether strategy needs to adjust, and then updates its strategy.By seeing once and again
State is examined, behavior is determined, receives feedback, allow intelligent body to constantly update strategy, final goal can be learnt to one
Strategy maximizes so that the bonus points obtained are accumulated.Here it is typical intensified learning processes.In study and adjustable strategies mistake
Cheng Zhong, if intelligent body take be some deep learnings including neural network algorithm, such system is then
Referred to as deeply learning system.
Based on the These characteristics of deeply learning system, the scheme of this specification is proposed, can use for reference deeply
The decision process of intelligent body in habit carrys out decision process when analog subscriber contract of commodatum product.That is, user is in selection
No contract of commodatum product, and how contract of commodatum product when, the starting point that user makes a policy is also to make the accumulation of its own
The decision process of intelligent body has certain similitude in maximum revenue, such decision process and deeply study, therefore,
It can be using the policing algorithm of intelligent body in deeply learning system, to simulate and predict the lend-borrow action of user.
In the case where deeply learning system is applied to the scene of user's contract of commodatum product, intelligent body can be used for simulating
The decision process of user.Specifically, the loaning bill environment of ambient condition i.e. user observed by before intelligent body makes a policy, intelligence
The debt-credit movement for the user that energy body is namely simulated based on the movement that certain strategy is taken, each movement have corresponding reward
Score, intelligent body make a policy foundation be so that, accumulation award it is optimal.Further, learning to come in advance using deeply
During surveying user's lend-borrow action, the characteristics of how being directed to loan product, to ambient condition state, movement action and prize
It encourages score reward to be portrayed, is the key that Accurate Prediction user behavior.
Fig. 2 shows the flow charts for learning to predict the method for user behavior using deeply according to one embodiment.It can
To understand, the executing subject of this method can be the intelligent body in deeply learning system, and intelligent body can pass through any tool
There are calculating, the device of processing capacity or equipment to realize.As shown in Fig. 2, this method comprises: step 21, obtains active user's
The characteristic of the current ambient conditions that characteristic learns as deeply, the active user includes at least, described to work as
The fund related data of preceding user;Step 22, the characteristic of active user is inputted into deep neural network, the depth nerve
Network is trained to, and is acted corresponding bonus points according at least to each alternative debt-credit in a variety of alternative debt-credit movements, is determined
Under current ambient conditions, the multiple alternative debt-credit movement is taken respectively, the multiple accumulations being contemplated that are awarded;Step
23, it is awarded according to the multiple accumulation, the first alternative action is selected from a variety of alternative debt-credit movements, as working as prediction
The debt-credit of preceding user acts.The specific implementation procedure of above each step is described below.
Firstly, obtaining the current ambient conditions that the characteristic of active user learns as deeply in step 21.For
Deeply study is applied to the scene of user's debt-credit, it is necessary first to characterize to the ambient condition of intensified learning.This
In, using the characteristic of user as current ambient conditions, and this feature data include the fund related data of user.
In one embodiment, fund related data includes use of funds historical data of the user in predetermined amount of time,
And the current fund state of user.
More specifically, in one example, use of funds historical data includes, predetermined amount of time (such as last month, when
In the one week time before the preceding time, within 30 days, etc.) in transaction history data sequence, wherein may include user
Consumption and purchaser record, can also include with trade it is related other expenditure or be transferred to.In one example, use of funds is gone through
History data may include that the debt-credit historical data sequence in predetermined amount of time, which mainly includes the use that can be obtained
Loaning bill record before family.In another example, use of funds historical data may include the refund history in predetermined amount of time
Data sequence, this partial data may include the credit card repayment record of user and the refund record of various loan products.?
In other examples, use of funds historical data can also include, others to fund using relevant historical data, such as with
The record of investment or financing of the family in a period of time.
The current fund state of user may include that user is currently available remaining in related product (such as Yuebao)
Volume and the arrears of user.Arrears may include, amount owed and the contract of commodatum product of credit card and not
The borrowing balance paid off.
In one embodiment, the characteristic of user further includes the expected discount rate Rd of user.User is expected discount rate Rd
It is the parameter for characterizing the borrowing demand degree size of user, the stronger user of borrowing demand, it is expected that discount rate is higher.With
The expected discount rate Rd in family can be based on above-mentioned fund related data, be determined according to scheduled algorithm.
For example, in one example, fund related data includes multiple data item, such as available balance, credit card debt
The amount of money, the amount of consumption, etc. counted according to transaction history data.Preset the phase of these data item with expected discount rate Rd
Property coefficient is closed, the positive and negative of the relative coefficient can embody, and corresponding data item and expected discount rate are to be positively correlated or negatively correlated,
The size of relative coefficient can embody being associated between corresponding data item and expected discount rate.Based on these relative coefficients,
Each data item in fund related data is integrated, so that it may determine the expected discount rate Rd of user.
In one embodiment, for it is determined above go out the expected discount rate Rd of user, further by statistical distribution and
Rd processing is the value between 0 to 1 by normalized.
Certainly, the characteristic of user can also include drawing a portrait relevant other data of information to user, for example including with
The attributive character at family, such as gender, age, area, income range, registration time length etc. can also include crowd's mark of user
Label, such as workplace new person, foundation new hand, etc., these crowd's labels, which can be, pre- to be first passed through the modes such as cluster, classification and will use
It incorporates some crowd into and obtains in family.These users portrait information can assist the loaning bill environment for portraying user more fully hereinafter.
In this way, the loaning bill environment of user is portrayed and characterized based on the fund related data of user, as deeply
The ambient condition of habit.On the basis of observing and getting ambient condition, the intelligent body of deeply learning system will basis
Certain strategy, determines the movement that should be executed under the ambient condition.Intelligent body determine execution movement starting point be so that, execute
The obtained accumulation award of the movement is optimal.
Generally, frequently with state-movement value function (state-action value in the Policy evaluation of intensified learning
Fuction) Q indicates desired accumulation award.State-movement value function is also known as Q function, and wherein (s a) is indicated, from shape Q
State s sets out, and tactful π bring accumulation award is reused after execution movement a.T step accumulation usually can be used in the calculating of accumulation award
The form of reward or the reward of γ accumulation of discount, is expression of the Q function under both forms below:
Wherein subscript t indicates the step number of subsequent execution, rtFor the bonus points of t step.
Intensified learning process is a kind of markov decision process, has Markov form, i.e. system subsequent time
State is only determined that then Q function can also be expressed as meeting the recursive form of Bellman equation by the state at current time:
Qi+1(s, a)=Es′∈∈[r+γmaxa′Q*(s ', a ') | s, a]
The improvement and determination of intelligent body strategy actually solve above-mentioned Bellman equation, so that Q function maximization
Process.
It can be seen that Q function is to pass through the cumulative acquisitions of bonus points under certain strategy.In ambient condition space
Under the lesser simple scenario of alternative action Spatial Dimension, Q function can be expressed by a table or matrix, and tactful comments
Estimating and improving can be carried out based on Q function table or matrix.
However, learning faced complex environment for deeply, environmental model is unknown, such as user is borrowed
For loan behavior, loaning bill environment has how many kind state, the transition probability of environment, the bonus points etc. of various alternative actions altogether
It is all extremely complex, it is difficult to be realized by simply searching Q table or solving Q function.In this case, then pass through depth
Neural network realizes the determination of Q function.
Therefore, in deeply study, a deep neural network model, the deep neural network model are trained in advance
It can be taken respectively each alternative dynamic according to the corresponding bonus points of various alternative actions, to determine under current ambient conditions
When making, the accumulation being contemplated that is awarded, i.e. Q function.Intelligent body is based on the corresponding Q function of various alternative actions, according to one
Selection strategy is determined, to select movement to be executed.
Be applied in the scene of user's debt-credit, step 21 observation get the characteristic of user as ambient condition it
Afterwards, above-mentioned deep neural network is inputted using the characteristic of active user as ambient condition in step 22.Depth mind
It is trained to through network, acts corresponding bonus points according at least to each alternative debt-credit in a variety of alternative debt-credit movements, determine
Under current ambient conditions, each alternative debt-credit movement is taken respectively, the accumulation being contemplated that is awarded, i.e. Q function.
As shown in the expression of aforementioned Q function, Q function is to add up to expected bonus points, including take and currently alternatively borrow
The current bonus points obtained after loan movement, and continue to execute expected bonus points obtained in t step of the strategy.This is just needed
A is acted to alternative debt-credit and each alternative debt-credit acts corresponding bonus points r and characterized and portrayed.
In one embodiment, the motion characteristic of alternative debt-credit movement includes: user's loaning bill number, borrowing time and loaning bill
Interest rate.
In one embodiment, loan product is provided with maximum loaning bill amount, which can be fixed value,
It can also be associated with user credit state.For example, loan product can assess user credit in advance, according to user credit state
Determine its maximum loaning bill amount.Correspondingly, user's loaning bill number is defined within the scope of the maximum loaning bill amount.Some are borrowed
Product is borrowed within the scope of above-mentioned maximum loaning bill amount, the option of loaning bill number is provided.For example it is assumed that maximum loaning bill amount is 1
Wan Yuan, then 1k can be provided, the option of the optional loaning bill number of 10 of 2k to 10k.So at this time user's loaning bill number be
Within the scope of maximum loaning bill amount, the amount of money selected from multiple options.
Similar, loan product also can be set optional loaning bill duration, such as 3 months, and 6 months, 12 months, etc.
Deng user's borrowing time can be selected in optional loaning bill duration.Certainly, when loan product can also only provide longest loaning bill
Long, user is defined as the value in longest loaning bill duration range the borrowing time at this time.
In one embodiment, borrowing rate is arranged by loan product, and is fixed value.In another embodiment, it borrows or lends money
Product can determine borrowing rate according to user's loaning bill number and/or borrowing time, floating ground.For example, loaning bill number is bigger, borrow
Money interest rate is smaller;Borrowing time is longer, and borrowing rate is smaller.In this way, determining the alternative loaning bill number in alternative debt-credit movement
Volume and on the basis of the alternative borrowing time, so that it may determine corresponding borrowing rate.
In this way, alternative debt-credit movement is characterized by this 3 dimensions of loaning bill number, borrowing time and borrowing rate, i.e., it is standby
Debt-credit is selected to act a=(M, D, i), wherein M is loaning bill number, and D is the borrowing time, and i is borrowing rate.In one example, false
Determine loaning bill number have m kind may numerical value (such as loan product provides the amount of money selection from 1k to 10k, then m=10), when loaning bill
Between have the selection of n kind, borrowing rate is determined by loaning bill number and/or borrowing time, is alternatively borrowed or lent money then at least there is m*n+1 kind
Movement, wherein 1 except m*n kind corresponds to and does not borrow or lend money, at this time M=0.
On the basis of so characterizing alternative debt-credit movement a, the corresponding bonus points r of a is acted for each alternative debt-credit
It can be characterized as follows.Here it should be understood that being different from the award of accumulation represented by Q function, bonus points r below is to hold
The instant bonus points that can be obtained from environment after the alternative debt-credit movement a of row, or be current bonus points.
In one embodiment, current bonus points r can be expressed as the difference of loaning bill income item B and the expenditure item P that refunds:
R=B-P
Wherein, user loaning bill number M in the motion characteristic that loaning bill income item B is acted based on alternative debt-credit and determine, also
Money expenditure item P is at least determined based on user's loaning bill number in motion characteristic, borrowing time and borrowing rate.
More specifically, loaning bill income item B, for characterizing, user passes through debt-credit and acts the income obtained.In an example
In, loaning bill income item B can be set as, and be equal to user's loaning bill number M.In another example, loaning bill income item B can be set
For user loaning bill number M is multiplied by certain proportionality coefficient α:
B=M* α
Aforementioned proportion coefficient can be set as needed.In one example, aforementioned proportion factor alpha can be with the spy of user
The expected discount rate R of user levied in data is related, such as is proportional to R.It discounts in this way, introducing expection in loaning bill income item B
The influence of rate R.User's borrowing demand is stronger, and R is bigger, it is believed that user is bigger by the income that loaning bill obtains.
On the other hand, expenditure item P is refunded for characterizing, the cost that user needs to pay for debt-credit movement.At one
In example, expenditure of refunding item P is set equal to refund number M ', which is borrowed money based on the user in motion characteristic
Number M, borrowing time D and borrowing rate i and determine.More specifically, it can be based on loaning bill number M, borrowing time D and benefit of borrowing money
Rate i determines interest number I, is then based on loaning bill number M and interest number I, determines total refund number M '.
In another example, the influence of the expected discount rate Rd of user is introduced in the expenditure item P that refunds.
Specifically, refund expenditure item P can be determined based on refund number M ' and discount factor d, it may be assumed that
P=d*M '
Wherein refund number M ' can be true based on user's loaning bill number M, borrowing time P and borrowing rate i as described above
It is fixed, and discount factor d is determined according to the expected discount rate Rd and borrowing time D of user.
In one example, discount factor d can be indicated are as follows: d=(1+Rd) * D
In another example, the lend-borrow action for being D days for the borrowing time, the expenditure that can be refunded item P are indicated are as follows:
Wherein, M is loaning bill number, and I is interest number, and M+I is refund number M ';Rd is the expected discount rate of user, because
In this above formula, by 1/ (1+Rd) using number of days t as index, summation is used as discount factor d in borrowing time total number of days D.
By the above various modes, loaning bill income item B is determined respectively and the expenditure item P that refunds, and then determine alternatively to borrow
Loan acts the corresponding bonus points r=B-P of a.
It has been trained based on the various alternative debt-credit movement a and corresponding bonus points r so characterized in step 22
Deep neural network be assured that, at current ambient conditions s, if executing each alternative debt-credit acts a, to being contemplated that
Obtained each accumulation award, the i.e. value of Q function.
It is assumed that there are the possible alternative debt-credit movement (such as in the preceding example, N=m*n+1) of N kind, a1, a2..., aN,
So in step 22, deep neural network is assured that, at current ambient conditions s, this N number of alternative debt-credit movement is right respectively
The accumulation award Q answered1(s,a1), Q2(s,a2),…,QN(s,aN)。
It should be noted that being adopted in deeply study for determining that the neural network of Q function can use training sample
It is trained in various manners and learns to obtain.Generally, the training sample suitable for deeply study includes < s, a, r,
S ' > sequence, wherein s indicates that ambient condition, a indicate the movement that executes at ambient condition s of intelligent body, and r indicates that execution acts a
Obtained bonus points, s ' indicate the new state that arrives of environmental transport and transfer after execution movement a.By a plurality of such sample, using each
Kind training method, so that it may which training obtains the neural network that Q function is determined in deeply study.
In the case of deeply study is applied to user's debt-credit scene, for determining current environment shape in step 22
Each alternative debt-credit acts the neural network of corresponding Q function under state, can also be based on training sample, be carried out using various methods
Training obtains, and this specification is not construed as limiting this.
About training sample, with the application scenarios of deeply study correspondingly, which can be using known
The sample of lend-borrow action is trained.Specifically, in one embodiment, above-mentioned neural network can use multiple debt-credit samples
Sample data be trained, each debt-credit sample have corresponding sample of users, and it is each debt-credit sample sample data
It includes at least, sample environment state S, sample action A and sample bonus points R.Optionally, sample data can also include holding
After row sample action A, new state S ' that sample environment moves to.It is appreciated that sample environment state S, sample action A and sample
The characterization of bonus points R and portray with it is aforementioned corresponding to the characterization of active user.
Specifically, sample environment state S includes the fund related data of sample of users, such as use of funds historical data
With fund state;It optionally, further include the expection discount rate of sample of users.
Sample action A is, at aforementioned sample environment state S, performed sample debt-credit movement, and sample debt-credit movement
Including, sample loaning bill number, sample borrowing time and sample borrowing rate.
Sample bonus points R is income item and the difference for paying item, and the income item is true based on the sample loaning bill number
Fixed, the expenditure item is at least based on the sample loaning bill number, sample borrowing time and sample borrowing rate and determines.In difference
In embodiment, above-mentioned income item pays embodying for item, can from loaning bill income item B above-mentioned in different embodiments, and
Expenditure of refunding item P is respectively corresponded, and details are not described herein.
Above-mentioned training sample can be obtained by way of Monte Carlo.For example, simulation intelligence that can be random
The loaning bill movement that body issues, such as since any one loaning bill ambient condition S, the selection of each loaning bill movement A of stochastic simulation,
Then the new loaning bill ambient condition S' for calculating corresponding bonus points R and moving to repeats this process, obtains to S' again
Such one group<S, A, R, S '>sequence.By simulating such process many times, sequence as multiple groups is obtained, as nerve net
The training sample of network.
Based on enough training samples, various training methods can be used, such as DQN (Deep Q-Network),
Actor-cretic, PG (Policy Gradient) etc., training obtain deep neural network needed for step 22.In this way,
Deep neural network is assured that, N number of alternatively to borrow at the current ambient conditions s that is constituted of characteristic of active user
Loan acts corresponding accumulation and awards Q1(s,a1), Q2(s,a2),…,QN(s,aN)。
Then, it in step 23, is awarded according to above-mentioned multiple accumulations, it is standby to select first from various alternative debt-credit movements
Choosing movement, the debt-credit movement of the active user as prediction.
In one embodiment, Q is awarded in multiple accumulations1,Q2,…,QNIn, select the maximum accumulation award Q of fractional valuei,
Determine its corresponding alternative debt-credit movement ai, user's debt-credit which is selected as prediction is acted.This is corresponded to
" greed " strategy in deeply study.
In another embodiment, Q is awarded according to above-mentioned multiple accumulations1,Q2,…,QNIn it is each accumulation award fractional value,
Determine the select probability P of the corresponding each alternative debt-credit movement of each accumulation award1,P2,…,PN.For example, each selection
Probability P1,P2,…,PNIt can be set as being proportional to corresponding accumulation award, and be normalized to probability and be 1.Then, according to
Above-mentioned select probability P1,P2,…,PN, a is acted from each alternative debt-credit1, a2..., aNIn select an alternative action, as
User's debt-credit of prediction acts.
In other embodiments, corresponding accumulation award can also be acted based on each alternative debt-credit according to other strategies
Q determines that the debt-credit to be executed acts, user's lend-borrow action as prediction.
In this way, movement and bonus points are carried out for lend-borrow action by the ambient condition in learning to deeply
Deeply study is applied to user and borrowed or lent money in scene, intelligent body is allowed to simulate and predict that user borrows in use by characterization
Borrow lend-borrow action when product.The obtained intelligent body can be used as the user model for loan product.
By the way of the study of the above deeply, user behavior can be more accurately predicted, to obtain more smart
Quasi- and flexible user model, enables product designer to have more deep understanding to user, thus take it is more appropriate
Marketing strategy, or improve product design and improve user experience.
In one embodiment, the user model obtained with upper type can be used, to assist carrying out product design, thus
Optimize product effect and user experience.Fig. 3 is shown to be improved according to the user model using deeply study of one embodiment
The schematic diagram of product design.As shown in figure 3, product design scheme alternative in n can be provided, each design scheme has difference
Product element.For example, product element may include user's loaning bill amount, optional borrow in the case where product is loan product
Money time, borrowing rate, it is the option range that user provides that these, which belong to product designer, and it is strong that these elements have corresponded to depth
Chemistry practises the feature space of the alternative debt-credit movement of user in user model.Since each alternate product scheme has different spies
Space is levied, the above-mentioned user model based on deeply study can be obtained for the training of each alternate product design scheme.
Then product measure of merit can be carried out using such user model.Specifically, it will can similarly test
Use-case inputs the corresponding user model of each alternate product scheme, and above-mentioned test case for example can be the loaning bill ring of test user
Border state.Using method shown in Fig. 2, the user model of deeply study predicts borrowing for user according to loaning bill ambient condition
Loan behavior.In this way, can simulate to obtain, user is when using different alternate product schemes, the different lend-borrow actions that may make.
Next, considering these different lend-borrow actions, which more meets product expection, such as calculates the expection of various user's lend-borrow actions
Income judges which kind of lend-borrow action can bring more excellent income, just determines it as more excellent behavior.In turn, will to generate this more excellent
The alternate product scheme of behavior is determined as more excellent products scheme.In this way, from preferably being produced in selection in a variety of alternate product schemes
Product design, enables such product preferably to realize desired product effect, and promote user experience.
According to the embodiment of another aspect, a kind of device for learning prediction user behavior using deeply is also provided.It should
Device can use with calculating, the device of processing capacity or equipment and realize.Fig. 4 shows user's row according to one embodiment
For the schematic block diagram of prediction meanss.As shown in figure 4, the device 400 includes:
Environment acquiring unit 41, be configured to obtain active user characteristic learn as the deeply it is current
Ambient condition s, the characteristic of the active user include at least, the fund related data of the active user;
Determination unit 42 is awarded, is configured to the characteristic of the active user inputting deep neural network, the depth
Degree neural network is trained to, and acts the corresponding bonus points of a according at least to each alternative debt-credit in a variety of alternative debt-credit movements
R is determined at current ambient conditions s, is taken the multiple alternative debt-credit movement respectively, is encouraged to the multiple accumulations being contemplated that
Appreciate Q;Wherein, the motion characteristic of each alternative debt-credit movement a includes user's loaning bill number, borrowing time and borrowing rate;It is each
Alternative debt-credit acts the difference that corresponding bonus points r is loaning bill income item and expenditure item of refunding, and the loaning bill income item is based on institute
It states user's loaning bill number and determines, the refund expenditure item is at least based on user's loaning bill number, borrowing time and benefit of borrowing money
Rate and determine;
Action prediction unit 43 is configured to award Q according to the multiple accumulation, selects from a variety of alternative debt-credit movements
The first alternative action is selected, the debt-credit movement of the active user as prediction.
In one embodiment, above-mentioned fund related data may include the use of funds history in predetermined amount of time
Data, and current fund state.
Further, above-mentioned use of funds historical data may include one or more in following, transaction history data
Sequence borrows or lends money historical data sequence, refund historical data sequence, etc..
According to a kind of embodiment, above-mentioned user's loaning bill number is selected within the scope of preset maximum loaning bill amount, described
Borrowing time is selected in optional loaning bill duration, when the borrowing rate is according to user's loaning bill number and/or the loaning bill
Between and determine.
According to a kind of possible design, active user's characteristic as ambient condition further includes, according to the fund
The expected discount rate of the user of related data and determination, for characterizing the borrowing demand degree of active user;Correspondingly, the refund
Expenditure item is also determined based on the expected discount rate of the user.
In one embodiment, the fund related data includes multiple data item, and each data item has to be set in advance
The relative coefficient with expected discount rate set;Then, the expected discount rate of the user can be by utilizing the correlation system
The multiple data item is integrated and is determined by number.
In one embodiment, the refund expenditure item in bonus points is determined based on refund number and discount factor, institute
It states refund number to be based on user's loaning bill number, borrowing time and borrowing rate and determine, the discount factor is according to
User is expected discount rate and the borrowing time and determines.
More specifically, in one example, the discount factor is determined by following formula:
Wherein, Rd is the expected discount rate of the user, and D is the borrowing time.
According to a kind of embodiment, above-mentioned deep neural network is trained by the sample data of multiple samples, each sample
This sample data includes at least, sample environment state S, sample action A and sample bonus points R;Wherein:
The sample environment state S includes the fund related data of sample of users;
The sample action A is the sample debt-credit movement executed at the sample environment state S, and sample debt-credit is dynamic
Work includes sample loaning bill number, sample borrowing time and sample borrowing rate;
The sample bonus points R is income item and the difference for paying item, and the income item is based on the sample loaning bill number
And determine, the expenditure item is at least based on the sample loaning bill number, sample borrowing time and sample borrowing rate and determines.
According to a kind of possible design, the action prediction unit 43 is configurable to:
In the multiple accumulation award, the corresponding alternative debt-credit of the maximum accumulation award of fractional value is acted, selection
As first alternative action.
According to alternatively possible design, the action prediction unit 43 is configurable to:
According to the fractional value of accumulation award each in the multiple accumulation award, determine that each accumulation award is corresponding
The select probability of each alternative debt-credit movement;
According to the select probability, the first alternative action is selected from the multiple alternative debt-credit movement.
In this way, the debt-credit row by apparatus above, by the way of deeply study, when to user's contract of commodatum product
To be simulated and being predicted.
According to the embodiment of another aspect, a kind of computer readable storage medium is also provided, is stored thereon with computer journey
Sequence enables computer execute method described in conjunction with Figure 2 when the computer program executes in a computer.
According to the embodiment of another further aspect, a kind of calculating equipment, including memory and processor, the memory are also provided
In be stored with executable code, when the processor executes the executable code, realize the method in conjunction with described in Fig. 2.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention
It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions
Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all
Including within protection scope of the present invention.