CN106447463A

CN106447463A - Commodity recommendation method based on Markov decision-making process model

Info

Publication number: CN106447463A
Application number: CN201610920407.5A
Authority: CN
Inventors: 刘峰; 蔡慧; 刘劭; 罗瑶; 文煊义
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2017-02-22

Abstract

The invention discloses a commodity recommendation method based on a Markov decision-making process model. The commodity recommendation method comprises the following steps: 1) a preparatory stage; 2) an initial model generation stage: reading a subsequence instant state s in a state set C and a commodity r in a commodity set, calculating a migration probability that the state s is migrated to a subsequent state s.r, comprising trMDP(s,r,s.r) for recommending the commodity r and trMDP(s,r',s.r) for recommending other commodities r', and generating a state migration function; 3) recommendation stage: obtaining the lately purchased or browsing history of a current user; generating the state of the current user according to records; obtaining a recommending item for generating the maximum profit, and returning the recommending item to the current user; recording the recommending item and a purchased or browsing option of the user, and generating a state-recommendation-selection log; and 4) offline model updating stage: carrying out offline model updating at a fixed interval T.

Description

A kind of Method of Commodity Recommendation based on markov decision process model

Technical field

The present invention relates to the Method of Commodity Recommendation based on electric business platform, more particularly to a kind of based on markov decision process The Method of Commodity Recommendation of model.

Background technology

Commercial product recommending be in a kind of Characteristic of Interest according to user and purchasing behavior, to user's recommended users business interested The technology of product.With the continuous expansion of ecommerce scale, commodity number and species rapid growth, customer need spends substantial amounts of Time can just find the commodity for oneself wanting to buy.This information unrelated in a large number and product process of browsing can undoubtedly make to be submerged in information Consumer in overload problem is constantly lost in.In order to solve these problems, commercial product recommending system set up in mass data excavate and On the basis of intelligent decision, e-commerce website is helped to provide effective decision support and information service for its customer purchase.

Commercial product recommending system finds rule from the behavior of user and preference, and is thus recommended.User behavior includes： Score, browse, buying, page time of staying etc., it is electric business platform more effective user that wherein user browses and buys daily record Preference acquiring way.At present, the main algorithm of commercial product recommending system has：Based on the proposed algorithm of correlation rule, based on content Proposed algorithm and Collaborative Filtering Recommendation Algorithm.

Commending system based on simple correlation rule is due to being not based on complete model, it is impossible to effectively reflect user not Determine point of interest.The commending system feature extraction of Cempetency-based education limited in one's ability, it is impossible to find for client new interested Resource, and satisfied recommendation effect can not be produced for some more difficult commodity for extracting content.Based on collaborative filtering recommending Commending system cannot process between user similarity may inaccurate sparse sex chromosome mosaicism, and increasing with user and commodity Scalability problem.

Log information is browsed and is bought in order to be able to make full use of the user in electric business platform, invented herein a kind of based on horse The Method of Commodity Recommendation of Er Kefu decision making process model, can strengthen the effect of commercial product recommending by the mechanism of intensified learning.

Content of the invention

The invention provides a kind of novel Method of Commodity Recommendation based on markov decision process model.Each user Browse or purchaser record entry s is made up of multiple item of merchandise r ∈ s.Can using adjacent browse or purchaser record entry change Transition probability, and the accuracy of commercial product recommending is strengthened to the iterative of state value function.The method obtains user first Buy or record is browsed, data genaration training data is then filtered, then markov decision process mould is generated to data prediction Type (MDP), and the optimum recommendation under each state is obtained to MDP iterative.Bought with reference to active user recently or browsing Record carries out commercial product recommending, and continues to record user behavior generation daily record, periodically enters line drag more to MDP model accordingly Newly.

Markov decision process (MDP) is a kind of generally applicable decision model.MDP is all of decision making process correlation Agency sets up state space S and motion space A.The action impact conditions Ambient of agency so that state makees uncertain conversion, moves The feedback of work can also affect the Action Selection that acts on behalf of.In the present invention, user buys or browses Series Modeling for state S, according to purchase Buy or browse the done recommendation of record and be modeled as A, according to state value function by financial value desirably up to commercial product recommending to user. Purchase before user or browse impact recommendation results, at the same recommendation results user can be affected to browse next time or buy certainly Plan.This process does not stop iteration, until terminating.

The technical scheme is that：A kind of recommendation method based on markov decision process model, the method include as Lower step：

1) preparatory stage

A) from electric business platform, data set is obtained, the data set includes two parts：User buy data set (purchaser record) with And user's browse path of acquisition is recorded from web log file；

B) data filtering is carried out, generates training data.The standard of data filtering is：Filter purchase or number of visits is less than N Secondary commodity item data (in the present invention, N takes 100), filters and buys or browse record less than k (k=5 in the present invention) individual item of merchandise User data；

C) terminate the preparatory stage；

2) initial model generation phase

A) read step 1-b) the middle training data for generating, parse and record each commodity, represented with r, all commodity groups Become set R, R={ r_i}；Parsing user journal, generates user purchase sequence set C_buy, user browses arrangement set C_view, superset Close C=C_buy∪C_view, C={ s_i, each the subsequence s (referred to as state) in set includes k item of merchandise, if last of s Individual item of merchandise is for buying commodity, then s is partitioned into purchase sequence set C_buyIf last item of merchandise of s is to browse commodity, Then s is partitioned into browsing arrangement set C_view；

B) a commodity r in reading state set C in state s and commodity set, state s of calculating moves to follow-up State s r transition probability, including the tr of Recommendations r_MDP(s, r, s r) and the tr of recommendation other commodity r '_MDP(s,r′,s· R), state transition function is generated；

C) all commodity of the repeat step b) in commodity set are disposed；

D) repeat step b), c) in state set, all states are processed finishes；

E) state s in reading state set C, iterates to calculate optimal recommendation items in this condition and stores；

F) repeat step e) all states in set are processed finishes；

G) terminate model generation phase；

3) recommend the stage

A) obtain active user to buy recently or browse record；

B) current user state is generated according to record；

C) recommendation items for producing maximum return are obtained, the recommendation items is returned to active user；

D) record the purchase of recommendation items and user or optionies are browsed, generate state-recommendation-selection daily record；

E) repeat step a), b), c) until active user exits session；

F) terminate the recommendation stage；

4) the line drag more new stage

A) fixed time intervals T, enters the renewal of line drag；

B) the end lines drag more new stage；

Wherein step 2-a) described in subsequence s：

1) obtain from training data and buy or browse commodity path x₁,x₂,…,x_n(n is commodity purchasing or browses total Amount), the path is decomposed into multiple subsequences in order<x₁,…,x_k>,<x₂,…,x_k+1>,…,<x_n-k+1,…,x_n>, each Subsequence is referred to as state, contains k commodity in each state；

2) terminate；

Wherein step 2-b) described in successor states s r：

1) the 1st of successor states s r is arrived the sequence of kth -1 and is protected with the sequence of the 2nd to kth item of original state s Hold unanimously, be expressed as original state s:<x₁,…,x_k>, successor states s r：<x₂,…,x_k,r>；

2) terminate；

Wherein step 2-b) described in state transition function：

1) according to the state set C in training data, using Maximum Likelihood Estimation, initial transition probabilities are calculated as followsIt is designated as tr_predict(s, s r), wherein count (< x₁,,x₂,…,x_k>) it is expressed as sequence x₁,,x₂,…,x_kThe number of times for occurring in data set C；

2) initial transition probabilities are done following correction by the impact for user being produced in view of commercial product recommending：

a)tr_MDP(s, r, s r)=α_s,r·tr_predict(s, s r), wherein For buying the probability of commodity r, ω is that (in the present invention, ω takes minimum constant), count (r) represents commodity r The number of times for occurring is concentrated in training data；

b)tr_MDP(s, r ', s r)=β_s,r·tr_predict(s, s r), r ' ≠ r, wherein P (s r | s) it is the probability for buying commodity r under state s.If the β for calculating is negative, to be set to one Individual little on the occasion of (taking in the present invention), then probability is standardized；

C) terminate；

Wherein step 2-e) described in optimal recommendation items calculating：

1) carry out solving optimal recommendation items r=π (s) using Policy iteration method, solution procedure is as follows：

A) initial policy π₀(s₀)=argmax_r∈RRwd(s₀, r), Rwd (s₀, r) represent in state s₀Lower recommendation r is returned Report, argmax_r∈RRwd(s₀, r) representing, return value maximum is selected in the r of all recommendations；

B) value function, and more New Policy are calculated according to history strategy：

i.

ii.

Wherein V (s) is the value function of state s, and γ ∈ [0,1) it is discount factor (in the present invention, γ takes 0.6), state is immediately The computation rule of return value Rwd (s, r) is as follows：

I. the successor states s r of state s and selection recommendation items r generation is only present in set C_buyIn, Rwd (s, r)=μ Reward (r), μ>1 (in the present invention, μ takes 1.5)；

Ii. the successor states s r of state s and selection recommendation items r generation is only present in set C_viewIn, Rwd (s, r)=ν Reward (r), ν ∈ [0,1) (in the present invention, ν takes 0.5)；

Iii. the successor states s r of state s and selection recommendation items r generation is in set C_buyWith set C_viewIn all have Existing, Rwd (s, r)=(μ+ν) Reward (r)；

Wherein Reward (r) is the net profit with regard to item of merchandise r for being given by electric business platform；

C) repeat step b) reaches greatest iteration value and (takes in the present invention until converging to optimal strategy or number of repetition 200), optimal recommendation items are generated；

2) terminate；

Wherein step 3-b) described in generation User Status：

1) if buying or browsing record number 0≤m ＜ k, User Status s are generated₀, do not include item of merchandise in the state, i.e., Dummy status；

2) if buying or record number m >=k being browsed, nearest k item record is only obtained, generates User Status s₀, in the state Comprising k item of merchandise；

3) terminate；

Wherein step 3-c) described in maximum return recommendation items acquisition：

1) if User Status s₀For dummy status, then recommend Reward (r) value highest item of merchandise r；

2) if User Status s₀Concentrate in training data, then the corresponding optimal strategy of the state in model is obtained, i.e., most Good recommendation items

3) if User Status s₀Training data concentration is not appeared in, then search condition set C, obtain and active user's shape State s₀Similarity highest state s^*, return state s in model^*Corresponding optimal recommendation items.State s^*Calculation as follows：

Whereinδ (x, y) is Kronecker function, is defined as Expression state s_iIn m-th element；

4) terminate；

Wherein step 3-d) described in state-recommendation-selection daily record：

1) in state-recommendation-selection daily record, status representative user's original state s₀；Recommend to represent acquisition from model Optimal recommendation r^*；The next step for representing user is selected to select (to select r^*Or other commodity r ', r^*≠ r '), select including only clear Look at, only buy, browse and buy three types；

2) terminate；

Wherein step 4-b) described in off-line model update：

1) state transition function is updated：

If a) finding new state s in state-recommendation-selection daily record^new, need state set C is updated, buy sequence Row set C_buyAnd browse arrangement set C_view, and setting initial value as follows is set；

i.

ii.

iii.

Wherein, C_in(s, r, s r) represents under state s, it is recommended that the received number of times of item r, C_out(s, r, s r) represents User in state s without recommended r on the premise of, select item of merchandise r number of times, C_total(s, s r) represents that user selects business The total degree of product item r.In initialization procedure, it is to improve precision,Value ξ_sWith the appearance of data collected state s Number of times is directly proportional, in the present invention, ξ_sValue is 10 count (s)；

B) state s in set C is chosen, if the state carries out offline renewal for the first time, needs setting initial Value, setting up procedure such as step a) is described, otherwise enters next step c)；

C), according to User Status-recommendation-selection daily record, under recording status s, it is recommended that after commodity r, user selects the number of times of r count(s,r,s·r)；Under recording status s, user selects the total degree count (s, s r) of r, updates transfer function：

Wherein：

i.

ii.

iii.

2) the corresponding optimal recommendation items of each state in set C, the same 2-e of calculating process are updated) described；

Terminate；

Description of the drawings

The workflow diagram of Fig. 1 summation present invention；

The recommendation method workflow diagram based on markov decision process model of Fig. 2 present invention；

Fig. 3 generating states set closes flow chart；

The workflow diagram that Fig. 4 initial model is generated；

The workflow diagram that Fig. 5 recommends；

The workflow diagram of the calculating of the optimal recommendation items of Fig. 6.

Specific embodiment

The present invention is described in detail below in conjunction with the accompanying drawings.

The present invention is the recommendation method based on markov decision process model, it is intended that improves the effectiveness that recommends, is use Family provides useful and satisfied recommendation.As shown in figure 1, describing the processing procedure of the present invention.The present invention obtain first purchase and Record being browsed, training dataset is generated, then initial recommendation model is generated, commodity are carried out then according to particular user state and push away Recommend, under last line, carry out model modification.

In the present invention, process is divided into four-stage：Preparatory stage, initial model generation phase, it is recommended that under stage and line Model generation phase is as shown in Figure 2.It is critical only that for the present invention is obtained under each User Status of generation according to training data Maximum return recommendation items, then bought according to user recently or browse record Recommendations, and pushed away using user The purchase made after recommending item or browse optionies state-recommendations-selection daily record is generated, for the generation of line drag.

Step 2-0 is the initial state of the recommendation method based on markov decision process model of the present invention；

Preparatory stage includes step 2-1,2-2；

Step 2-1 obtains user's purchase from electric business platform and browses record；

Step 2-2 crosses filter data, generates training data；

Initial model generation phase includes step 2-3, step 2-4, step 2-5, step 2-6；

Step 2-3 data prediction, reads training data, obtains set R and set C；

Step 2-4 reads each state and each merchandise news；

Step 2-5 calculates the transition probability of successor states s r, generates state transition function；

Step 2-6 is calculated and stores optimal recommendation items under each state；

The recommendation stage includes step 2-7, step 2-8, step 2-9；

Step 2-7 obtains active user and buys or browse record recently, generates User Status；

Step 2-8 returns the recommendation items of maximum return according to User Status；

Step 2-9 record recommendation items and user's optionies, generate daily record；

The line drag more new stage includes step 2-10；

Step 2-10 line drag updates；

Step 2-11 is done state.

Fig. 3 is the detailed description of the process that generating states set is closed.

Step 3-0 is the beginning state that generating states set is closed；

Step 3-1 obtains purchase from training data or browses commodity path；

Path decomposing is multiple subsequences by step 3-2, and each sub-series of packets contains k commodity；

Step 3-3 reads a subsequence s；

According to subsequence, s is partitioned into purchase sequence set or browses arrangement set step 3-4 by last

Step 3-5 judges whether subsequence is disposed, if be disposed, goes to 3-6, if be not disposed, Go to 3-3；

Step 3-6 is the done state that generating states set is closed.

Fig. 4 is the detailed description of the process that initial model is generated.

Step 4-0 is the beginning state that initial model is generated；

Step 4-1 data prediction；

Step 4-2 obtains training data, generates commodity set R and state set S；

Step 4-3 reads state s；

Step 4-4 reads a commodity r；

Step 4-5 calculates s->S r transition probability tr, generates state transition function；

Step 4-6 judges whether commodity are disposed, if be disposed, goes to 4-7, if be not disposed, turns To 4-4；

Step 4-7 judges whether state is disposed, if be disposed, goes to 4-8, if be not disposed, turns To 4-3；

Step 4-8 reads a state, calculates optimal recommendation items and stores；

Step 4-9 judges whether state is disposed, if be disposed, goes to 4-10, if it has not ended, going to 4- 8；

The done state of the process that step 4-10 is generated for initial model.

Fig. 5 is the detailed description of the process that recommends.

Step 5-0 is the beginning state that recommends；

Step 5-1 obtains active user and buys recently or browse record；

Step 5-2 obtains the recommendation items of generation maximum return and returns to user；

Step 5-3 records the purchase of recommendation items and user or browses optionies, generates state-recommendation-selection daily record；

Step 5-4 is the done state that recommends.

Fig. 6 is the detailed description of the process of the calculating of optimal recommendation items.

Step 6-0 is the beginning state of the calculating of optimal recommendation items；

Step 6-1 arranges initial policy；

Step 6-2 calculates value function according to history strategy；

Step 6-3 more New Policy；

Step 6-4 judges whether to converge to optimal strategy or reaches maximum iteration time, if it is, 6-5 is gone to, if No, go to 6-2；

Step 6-5 generates optimal recommendation items；

Step 6-6 is the done state of the calculating of optimal recommendation items.

Claims

1. a kind of Method of Commodity Recommendation based on markov decision process model, it is characterised in that comprise the steps：

1) preparatory stage

A) from electric business platform, data set is obtained, the data set includes two parts：User buy data set (purchaser record) and from The user's browse path record for obtaining in web log file；

B) data filtering is carried out, generates training data.The standard of data filtering is：Purchase or number of visits are filtered less than n times Commodity item data (in the present invention, N takes 100), filters the user data for buying or browsing record only one of which item of merchandise；

C) terminate the preparatory stage；

2) initial model generation phase

A) read step 1-b) the middle training data for generating, parse and record each commodity, represented with r, all commodity composition collection Close R, R={ r_i}；Parsing user journal, generates user purchase sequence set C_buy, user browses arrangement set C_view, superset conjunction C =C_buy∪C_view, C={ s_i, each subsequence s (referred to as state) that superset is closed in C includes k item of merchandise, if last of s Individual item of merchandise is for buying commodity, then s is partitioned into purchase sequence set C_buyIf last item of merchandise of s is to browse commodity, Then s is partitioned into browsing arrangement set C_view；

B) in reading state set C, a subsequence is state s and a commodity r in commodity set, and state s of calculating is moved to Successor states s r transition probability, including the tr of Recommendations r_MDP(s, r, s r) and the tr of recommendation other commodity r '_MDP(s, R ', s r), generate state transition function；

C) all commodity of the repeat step b) in commodity set are disposed；

D) repeat step b), c) in state set, all states are processed finishes；

E) state s in reading state set C, calculates optimal recommendation items in this condition and stores；

F) repeat step e) all states in set are processed finishes；

G) terminate model generation phase；

3) recommend the stage

A) obtain active user to buy recently or browse record；

B) current user state is generated according to record；

E) repeat step a), b), c) until active user exits session；

F) terminate the recommendation stage；

4) the line drag more new stage

A) fixed time intervals T, enters the renewal of line drag；

B) the end lines drag more new stage；

Wherein step 2-a) described in subsequence s：

1) obtain from training data and buy or browse commodity path x₁,x₂,…,x_n(n is commodity purchasing or browses total amount), will The path is decomposed into multiple subsequences in order<x₁,…,x_k>,<x₂,…,x_k+1>,…,<x_n-k+1,…,x_n>, each subsequence Referred to as state, contains k commodity in each state；

2) terminate；

Wherein step 2-b) described in successor states s r：

1) the 1st of successor states s r arrives the sequence of kth -1 and keeps one with the sequence of the 2nd to kth item of original state s Cause, be expressed as original state s:<x₁,…,x_k>, successor states s r：<x₂,…,x_k,r>；

2) terminate；

Wherein step 2-b) described in state transition function：

1) according to the state set C in training data, using Maximum Likelihood Estimation, initial transition probabilities are calculated as followsIt is designated as tr_predict(s, s r), wherein count (< x_1,,x₂,…,x_k>) it is expressed as sequence x_1,,x₂,…,x_kThe number of times for occurring in data set；

b)tr_MDP(s, r ', s r)=β_s,r·tr_predict(s, s r), r ' ≠ r, whereinα_s,r<1, P (s r | s) it is the probability for buying commodity r under state s.If the β for calculating be negative, be set to one little On the occasion of (taking in the present invention), then probability is standardized；

C) terminate；

Wherein step 2-e) described in optimal recommendation items calculating：

A) initial recommendation strategy π₀(s₀)=argmax_r∈RRwd(s₀, r), Rwd (s₀, r) represent in state s₀Lower recommendation r is returned Report, argmax_r∈RRwd(s₀, r) representing, return value maximum is selected in the r of all recommendations；

i.

ii.

Wherein V (s) is the value function of state s, and γ ∈ [0,1) it is discount factor (in the present invention, γ takes 0.6), state is returned immediately The computation rule of value Rwd (s, r) is as follows：

Iii. the successor states s r of state s and selection recommendation items r generation is in set C_buyWith set C_viewIn occur, Rwd (s, r)=(μ+ν) Reward (r)；

C) repeat step b) generates optimal recommendation items until converging to optimal strategy；

2) terminate.

2. the Method of Commodity Recommendation based on markov decision process model according to claim 1, it is characterised in that

Wherein step 3-b) described in generation User Status：

1) if buying or browsing record number m=0, User Status s are generated₀, in the state, do not include item of merchandise, i.e. dummy status；

2) if buying or browsing record number 0<m<K, then generate User Status s₀, in the state, include m item of merchandise；

3) if buying or record number m >=k being browsed, nearest k item record is only obtained, generates User Status s₀, in the state, include k Individual item of merchandise；

4) terminate；

2) if User Status s₀Concentrate in training data, then the corresponding optimal strategy of the state in model is obtained, be i.e. optimal recommendation ?

3) if User Status s₀Training data concentration is not appeared in, then search condition set C, obtain and current user state s₀ Similarity highest state s^*, return state s in model^*Corresponding optimal recommendation items.State s^*Calculation as follows：

s^{*} = \underset{s_{i} &Element; C}{argmax} [s i m (s_{0}, s_{i})]

Whereinδ (x, y) is Kronecker function,Expression state s_iIn m Individual element；

4) terminate；

Wherein step 3-d) described in state-recommendation-selection daily record：

1) in state-recommendation-selection daily record, status representative user's original state s₀；What recommendation representative was obtained from model most preferably pushes away Recommend r^*；The next step for representing user is selected to select (to select r^*Or other commodity r ', r^*≠ r '), select to include only to browse, only purchase Buy, browse and buy three types；

2) terminate.

3. the Method of Commodity Recommendation based on markov decision process model according to claim 1, it is characterised in that

Wherein step 4-b) described in off-line model update：

1) state transition function is updated：

If a) finding new state s in state-recommendation-selection daily record^new, need to update state set C, purchase sequence collection Close C_buyAnd browse arrangement set C_view, and setting initial value as follows is set；

i.

ii.

iii.

Wherein, C_in(s, r, s r) represents under state s, it is recommended that the received number of times of item r, C_out(s, r, s r) represents user In state s without recommended r on the premise of, select item of merchandise r number of times, C_total(s, s r) represents that user selects item of merchandise The total degree of r.In initialization procedure, it is to improve precision,Value ξ_sThe number of times for occurring with data collected state s It is directly proportional, in the present invention, ξ_sValue is 10 count (s)；

B) state s in set C is chosen, if the state carries out offline renewal for the first time, needs initial value is arranged, if Put process such as step a) described, otherwise enter next step c)；

C), according to User Status-recommendation-selection daily record, under recording status s, it is recommended that after commodity r, user selects the number of times count of r (s,r,s·r)；Under recording status s, user selects the total degree count (s, s r) of r, updates transfer function：

{tr}_{M D P} (s, r, s \cdot r) = \frac{C_{i n}^{t + 1} (s, r, s \cdot r)}{C_{t o t a l}^{t + 1} (s, s \cdot r)}, {tr}_{M D P} (s, r^{'}, s \cdot r) = \frac{C_{o u t}^{t + 1} (s, r, s \cdot r)}{C_{t o t a l}^{t + 1} (s, s \cdot r)}, r &NotEqual; r^{'}

Wherein：

i.

ii.

iii.

3) terminate.