CN106447463A - Commodity recommendation method based on Markov decision-making process model - Google Patents

Commodity recommendation method based on Markov decision-making process model Download PDF

Info

Publication number
CN106447463A
CN106447463A CN201610920407.5A CN201610920407A CN106447463A CN 106447463 A CN106447463 A CN 106447463A CN 201610920407 A CN201610920407 A CN 201610920407A CN 106447463 A CN106447463 A CN 106447463A
Authority
CN
China
Prior art keywords
state
commodity
user
recommendation
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610920407.5A
Other languages
Chinese (zh)
Inventor
刘峰
蔡慧
刘劭
罗瑶
文煊义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201610920407.5A priority Critical patent/CN106447463A/en
Publication of CN106447463A publication Critical patent/CN106447463A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a commodity recommendation method based on a Markov decision-making process model. The commodity recommendation method comprises the following steps: 1) a preparatory stage; 2) an initial model generation stage: reading a subsequence instant state s in a state set C and a commodity r in a commodity set, calculating a migration probability that the state s is migrated to a subsequent state s.r, comprising trMDP(s,r,s.r) for recommending the commodity r and trMDP(s,r',s.r) for recommending other commodities r', and generating a state migration function; 3) recommendation stage: obtaining the lately purchased or browsing history of a current user; generating the state of the current user according to records; obtaining a recommending item for generating the maximum profit, and returning the recommending item to the current user; recording the recommending item and a purchased or browsing option of the user, and generating a state-recommendation-selection log; and 4) offline model updating stage: carrying out offline model updating at a fixed interval T.

Description

A kind of Method of Commodity Recommendation based on markov decision process model
Technical field
The present invention relates to the Method of Commodity Recommendation based on electric business platform, more particularly to a kind of based on markov decision process The Method of Commodity Recommendation of model.
Background technology
Commercial product recommending be in a kind of Characteristic of Interest according to user and purchasing behavior, to user's recommended users business interested The technology of product.With the continuous expansion of ecommerce scale, commodity number and species rapid growth, customer need spends substantial amounts of Time can just find the commodity for oneself wanting to buy.This information unrelated in a large number and product process of browsing can undoubtedly make to be submerged in information Consumer in overload problem is constantly lost in.In order to solve these problems, commercial product recommending system set up in mass data excavate and On the basis of intelligent decision, e-commerce website is helped to provide effective decision support and information service for its customer purchase.
Commercial product recommending system finds rule from the behavior of user and preference, and is thus recommended.User behavior includes: Score, browse, buying, page time of staying etc., it is electric business platform more effective user that wherein user browses and buys daily record Preference acquiring way.At present, the main algorithm of commercial product recommending system has:Based on the proposed algorithm of correlation rule, based on content Proposed algorithm and Collaborative Filtering Recommendation Algorithm.
Commending system based on simple correlation rule is due to being not based on complete model, it is impossible to effectively reflect user not Determine point of interest.The commending system feature extraction of Cempetency-based education limited in one's ability, it is impossible to find for client new interested Resource, and satisfied recommendation effect can not be produced for some more difficult commodity for extracting content.Based on collaborative filtering recommending Commending system cannot process between user similarity may inaccurate sparse sex chromosome mosaicism, and increasing with user and commodity Scalability problem.
Log information is browsed and is bought in order to be able to make full use of the user in electric business platform, invented herein a kind of based on horse The Method of Commodity Recommendation of Er Kefu decision making process model, can strengthen the effect of commercial product recommending by the mechanism of intensified learning.
Content of the invention
The invention provides a kind of novel Method of Commodity Recommendation based on markov decision process model.Each user Browse or purchaser record entry s is made up of multiple item of merchandise r ∈ s.Can using adjacent browse or purchaser record entry change Transition probability, and the accuracy of commercial product recommending is strengthened to the iterative of state value function.The method obtains user first Buy or record is browsed, data genaration training data is then filtered, then markov decision process mould is generated to data prediction Type (MDP), and the optimum recommendation under each state is obtained to MDP iterative.Bought with reference to active user recently or browsing Record carries out commercial product recommending, and continues to record user behavior generation daily record, periodically enters line drag more to MDP model accordingly Newly.
Markov decision process (MDP) is a kind of generally applicable decision model.MDP is all of decision making process correlation Agency sets up state space S and motion space A.The action impact conditions Ambient of agency so that state makees uncertain conversion, moves The feedback of work can also affect the Action Selection that acts on behalf of.In the present invention, user buys or browses Series Modeling for state S, according to purchase Buy or browse the done recommendation of record and be modeled as A, according to state value function by financial value desirably up to commercial product recommending to user. Purchase before user or browse impact recommendation results, at the same recommendation results user can be affected to browse next time or buy certainly Plan.This process does not stop iteration, until terminating.
The technical scheme is that:A kind of recommendation method based on markov decision process model, the method include as Lower step:
1) preparatory stage
A) from electric business platform, data set is obtained, the data set includes two parts:User buy data set (purchaser record) with And user's browse path of acquisition is recorded from web log file;
B) data filtering is carried out, generates training data.The standard of data filtering is:Filter purchase or number of visits is less than N Secondary commodity item data (in the present invention, N takes 100), filters and buys or browse record less than k (k=5 in the present invention) individual item of merchandise User data;
C) terminate the preparatory stage;
2) initial model generation phase
A) read step 1-b) the middle training data for generating, parse and record each commodity, represented with r, all commodity groups Become set R, R={ ri};Parsing user journal, generates user purchase sequence set Cbuy, user browses arrangement set Cview, superset Close C=Cbuy∪Cview, C={ si, each the subsequence s (referred to as state) in set includes k item of merchandise, if last of s Individual item of merchandise is for buying commodity, then s is partitioned into purchase sequence set CbuyIf last item of merchandise of s is to browse commodity, Then s is partitioned into browsing arrangement set Cview
B) a commodity r in reading state set C in state s and commodity set, state s of calculating moves to follow-up State s r transition probability, including the tr of Recommendations rMDP(s, r, s r) and the tr of recommendation other commodity r 'MDP(s,r′,s· R), state transition function is generated;
C) all commodity of the repeat step b) in commodity set are disposed;
D) repeat step b), c) in state set, all states are processed finishes;
E) state s in reading state set C, iterates to calculate optimal recommendation items in this condition and stores;
F) repeat step e) all states in set are processed finishes;
G) terminate model generation phase;
3) recommend the stage
A) obtain active user to buy recently or browse record;
B) current user state is generated according to record;
C) recommendation items for producing maximum return are obtained, the recommendation items is returned to active user;
D) record the purchase of recommendation items and user or optionies are browsed, generate state-recommendation-selection daily record;
E) repeat step a), b), c) until active user exits session;
F) terminate the recommendation stage;
4) the line drag more new stage
A) fixed time intervals T, enters the renewal of line drag;
B) the end lines drag more new stage;
Wherein step 2-a) described in subsequence s:
1) obtain from training data and buy or browse commodity path x1,x2,…,xn(n is commodity purchasing or browses total Amount), the path is decomposed into multiple subsequences in order<x1,…,xk>,<x2,…,xk+1>,…,<xn-k+1,…,xn>, each Subsequence is referred to as state, contains k commodity in each state;
2) terminate;
Wherein step 2-b) described in successor states s r:
1) the 1st of successor states s r is arrived the sequence of kth -1 and is protected with the sequence of the 2nd to kth item of original state s Hold unanimously, be expressed as original state s:<x1,…,xk>, successor states s r:<x2,…,xk,r>;
2) terminate;
Wherein step 2-b) described in state transition function:
1) according to the state set C in training data, using Maximum Likelihood Estimation, initial transition probabilities are calculated as followsIt is designated as trpredict(s, s r), wherein count (< x1,,x2,…,xk>) it is expressed as sequence x1,,x2,…,xkThe number of times for occurring in data set C;
2) initial transition probabilities are done following correction by the impact for user being produced in view of commercial product recommending:
a)trMDP(s, r, s r)=αs,r·trpredict(s, s r), wherein For buying the probability of commodity r, ω is that (in the present invention, ω takes minimum constant), count (r) represents commodity r The number of times for occurring is concentrated in training data;
b)trMDP(s, r ', s r)=βs,r·trpredict(s, s r), r ' ≠ r, wherein P (s r | s) it is the probability for buying commodity r under state s.If the β for calculating is negative, to be set to one Individual little on the occasion of (taking in the present invention), then probability is standardized;
C) terminate;
Wherein step 2-e) described in optimal recommendation items calculating:
1) carry out solving optimal recommendation items r=π (s) using Policy iteration method, solution procedure is as follows:
A) initial policy π0(s0)=argmaxr∈RRwd(s0, r), Rwd (s0, r) represent in state s0Lower recommendation r is returned Report, argmaxr∈RRwd(s0, r) representing, return value maximum is selected in the r of all recommendations;
B) value function, and more New Policy are calculated according to history strategy:
i.
ii.
Wherein V (s) is the value function of state s, and γ ∈ [0,1) it is discount factor (in the present invention, γ takes 0.6), state is immediately The computation rule of return value Rwd (s, r) is as follows:
I. the successor states s r of state s and selection recommendation items r generation is only present in set CbuyIn, Rwd (s, r)=μ Reward (r), μ>1 (in the present invention, μ takes 1.5);
Ii. the successor states s r of state s and selection recommendation items r generation is only present in set CviewIn, Rwd (s, r)=ν Reward (r), ν ∈ [0,1) (in the present invention, ν takes 0.5);
Iii. the successor states s r of state s and selection recommendation items r generation is in set CbuyWith set CviewIn all have Existing, Rwd (s, r)=(μ+ν) Reward (r);
Wherein Reward (r) is the net profit with regard to item of merchandise r for being given by electric business platform;
C) repeat step b) reaches greatest iteration value and (takes in the present invention until converging to optimal strategy or number of repetition 200), optimal recommendation items are generated;
2) terminate;
Wherein step 3-b) described in generation User Status:
1) if buying or browsing record number 0≤m < k, User Status s are generated0, do not include item of merchandise in the state, i.e., Dummy status;
2) if buying or record number m >=k being browsed, nearest k item record is only obtained, generates User Status s0, in the state Comprising k item of merchandise;
3) terminate;
Wherein step 3-c) described in maximum return recommendation items acquisition:
1) if User Status s0For dummy status, then recommend Reward (r) value highest item of merchandise r;
2) if User Status s0Concentrate in training data, then the corresponding optimal strategy of the state in model is obtained, i.e., most Good recommendation items
3) if User Status s0Training data concentration is not appeared in, then search condition set C, obtain and active user's shape State s0Similarity highest state s*, return state s in model*Corresponding optimal recommendation items.State s*Calculation as follows:
Whereinδ (x, y) is Kronecker function, is defined as Expression state siIn m-th element;
4) terminate;
Wherein step 3-d) described in state-recommendation-selection daily record:
1) in state-recommendation-selection daily record, status representative user's original state s0;Recommend to represent acquisition from model Optimal recommendation r*;The next step for representing user is selected to select (to select r*Or other commodity r ', r*≠ r '), select including only clear Look at, only buy, browse and buy three types;
2) terminate;
Wherein step 4-b) described in off-line model update:
1) state transition function is updated:
If a) finding new state s in state-recommendation-selection daily recordnew, need state set C is updated, buy sequence Row set CbuyAnd browse arrangement set Cview, and setting initial value as follows is set;
i.
ii.
iii.
Wherein, Cin(s, r, s r) represents under state s, it is recommended that the received number of times of item r, Cout(s, r, s r) represents User in state s without recommended r on the premise of, select item of merchandise r number of times, Ctotal(s, s r) represents that user selects business The total degree of product item r.In initialization procedure, it is to improve precision,Value ξsWith the appearance of data collected state s Number of times is directly proportional, in the present invention, ξsValue is 10 count (s);
B) state s in set C is chosen, if the state carries out offline renewal for the first time, needs setting initial Value, setting up procedure such as step a) is described, otherwise enters next step c);
C), according to User Status-recommendation-selection daily record, under recording status s, it is recommended that after commodity r, user selects the number of times of r count(s,r,s·r);Under recording status s, user selects the total degree count (s, s r) of r, updates transfer function:
Wherein:
i.
ii.
iii.
2) the corresponding optimal recommendation items of each state in set C, the same 2-e of calculating process are updated) described;
Terminate;
Description of the drawings
The workflow diagram of Fig. 1 summation present invention;
The recommendation method workflow diagram based on markov decision process model of Fig. 2 present invention;
Fig. 3 generating states set closes flow chart;
The workflow diagram that Fig. 4 initial model is generated;
The workflow diagram that Fig. 5 recommends;
The workflow diagram of the calculating of the optimal recommendation items of Fig. 6.
Specific embodiment
The present invention is described in detail below in conjunction with the accompanying drawings.
The present invention is the recommendation method based on markov decision process model, it is intended that improves the effectiveness that recommends, is use Family provides useful and satisfied recommendation.As shown in figure 1, describing the processing procedure of the present invention.The present invention obtain first purchase and Record being browsed, training dataset is generated, then initial recommendation model is generated, commodity are carried out then according to particular user state and push away Recommend, under last line, carry out model modification.
In the present invention, process is divided into four-stage:Preparatory stage, initial model generation phase, it is recommended that under stage and line Model generation phase is as shown in Figure 2.It is critical only that for the present invention is obtained under each User Status of generation according to training data Maximum return recommendation items, then bought according to user recently or browse record Recommendations, and pushed away using user The purchase made after recommending item or browse optionies state-recommendations-selection daily record is generated, for the generation of line drag.
Step 2-0 is the initial state of the recommendation method based on markov decision process model of the present invention;
Preparatory stage includes step 2-1,2-2;
Step 2-1 obtains user's purchase from electric business platform and browses record;
Step 2-2 crosses filter data, generates training data;
Initial model generation phase includes step 2-3, step 2-4, step 2-5, step 2-6;
Step 2-3 data prediction, reads training data, obtains set R and set C;
Step 2-4 reads each state and each merchandise news;
Step 2-5 calculates the transition probability of successor states s r, generates state transition function;
Step 2-6 is calculated and stores optimal recommendation items under each state;
The recommendation stage includes step 2-7, step 2-8, step 2-9;
Step 2-7 obtains active user and buys or browse record recently, generates User Status;
Step 2-8 returns the recommendation items of maximum return according to User Status;
Step 2-9 record recommendation items and user's optionies, generate daily record;
The line drag more new stage includes step 2-10;
Step 2-10 line drag updates;
Step 2-11 is done state.
Fig. 3 is the detailed description of the process that generating states set is closed.
Step 3-0 is the beginning state that generating states set is closed;
Step 3-1 obtains purchase from training data or browses commodity path;
Path decomposing is multiple subsequences by step 3-2, and each sub-series of packets contains k commodity;
Step 3-3 reads a subsequence s;
According to subsequence, s is partitioned into purchase sequence set or browses arrangement set step 3-4 by last
Step 3-5 judges whether subsequence is disposed, if be disposed, goes to 3-6, if be not disposed, Go to 3-3;
Step 3-6 is the done state that generating states set is closed.
Fig. 4 is the detailed description of the process that initial model is generated.
Step 4-0 is the beginning state that initial model is generated;
Step 4-1 data prediction;
Step 4-2 obtains training data, generates commodity set R and state set S;
Step 4-3 reads state s;
Step 4-4 reads a commodity r;
Step 4-5 calculates s->S r transition probability tr, generates state transition function;
Step 4-6 judges whether commodity are disposed, if be disposed, goes to 4-7, if be not disposed, turns To 4-4;
Step 4-7 judges whether state is disposed, if be disposed, goes to 4-8, if be not disposed, turns To 4-3;
Step 4-8 reads a state, calculates optimal recommendation items and stores;
Step 4-9 judges whether state is disposed, if be disposed, goes to 4-10, if it has not ended, going to 4- 8;
The done state of the process that step 4-10 is generated for initial model.
Fig. 5 is the detailed description of the process that recommends.
Step 5-0 is the beginning state that recommends;
Step 5-1 obtains active user and buys recently or browse record;
Step 5-2 obtains the recommendation items of generation maximum return and returns to user;
Step 5-3 records the purchase of recommendation items and user or browses optionies, generates state-recommendation-selection daily record;
Step 5-4 is the done state that recommends.
Fig. 6 is the detailed description of the process of the calculating of optimal recommendation items.
Step 6-0 is the beginning state of the calculating of optimal recommendation items;
Step 6-1 arranges initial policy;
Step 6-2 calculates value function according to history strategy;
Step 6-3 more New Policy;
Step 6-4 judges whether to converge to optimal strategy or reaches maximum iteration time, if it is, 6-5 is gone to, if No, go to 6-2;
Step 6-5 generates optimal recommendation items;
Step 6-6 is the done state of the calculating of optimal recommendation items.

Claims (3)

1. a kind of Method of Commodity Recommendation based on markov decision process model, it is characterised in that comprise the steps:
1) preparatory stage
A) from electric business platform, data set is obtained, the data set includes two parts:User buy data set (purchaser record) and from The user's browse path record for obtaining in web log file;
B) data filtering is carried out, generates training data.The standard of data filtering is:Purchase or number of visits are filtered less than n times Commodity item data (in the present invention, N takes 100), filters the user data for buying or browsing record only one of which item of merchandise;
C) terminate the preparatory stage;
2) initial model generation phase
A) read step 1-b) the middle training data for generating, parse and record each commodity, represented with r, all commodity composition collection Close R, R={ ri};Parsing user journal, generates user purchase sequence set Cbuy, user browses arrangement set Cview, superset conjunction C =Cbuy∪Cview, C={ si, each subsequence s (referred to as state) that superset is closed in C includes k item of merchandise, if last of s Individual item of merchandise is for buying commodity, then s is partitioned into purchase sequence set CbuyIf last item of merchandise of s is to browse commodity, Then s is partitioned into browsing arrangement set Cview
B) in reading state set C, a subsequence is state s and a commodity r in commodity set, and state s of calculating is moved to Successor states s r transition probability, including the tr of Recommendations rMDP(s, r, s r) and the tr of recommendation other commodity r 'MDP(s, R ', s r), generate state transition function;
C) all commodity of the repeat step b) in commodity set are disposed;
D) repeat step b), c) in state set, all states are processed finishes;
E) state s in reading state set C, calculates optimal recommendation items in this condition and stores;
F) repeat step e) all states in set are processed finishes;
G) terminate model generation phase;
3) recommend the stage
A) obtain active user to buy recently or browse record;
B) current user state is generated according to record;
C) recommendation items for producing maximum return are obtained, the recommendation items is returned to active user;
D) record the purchase of recommendation items and user or optionies are browsed, generate state-recommendation-selection daily record;
E) repeat step a), b), c) until active user exits session;
F) terminate the recommendation stage;
4) the line drag more new stage
A) fixed time intervals T, enters the renewal of line drag;
B) the end lines drag more new stage;
Wherein step 2-a) described in subsequence s:
1) obtain from training data and buy or browse commodity path x1,x2,…,xn(n is commodity purchasing or browses total amount), will The path is decomposed into multiple subsequences in order<x1,…,xk>,<x2,…,xk+1>,…,<xn-k+1,…,xn>, each subsequence Referred to as state, contains k commodity in each state;
2) terminate;
Wherein step 2-b) described in successor states s r:
1) the 1st of successor states s r arrives the sequence of kth -1 and keeps one with the sequence of the 2nd to kth item of original state s Cause, be expressed as original state s:<x1,…,xk>, successor states s r:<x2,…,xk,r>;
2) terminate;
Wherein step 2-b) described in state transition function:
1) according to the state set C in training data, using Maximum Likelihood Estimation, initial transition probabilities are calculated as followsIt is designated as trpredict(s, s r), wherein count (< x1,,x2,…,xk>) it is expressed as sequence x1,,x2,…,xkThe number of times for occurring in data set;
2) initial transition probabilities are done following correction by the impact for user being produced in view of commercial product recommending:
a)trMDP(s, r, s r)=αs,r·trpredict(s, s r), wherein For buying the probability of commodity r, ω is that (in the present invention, ω takes minimum constant), count (r) represents commodity r The number of times for occurring is concentrated in training data;
b)trMDP(s, r ', s r)=βs,r·trpredict(s, s r), r ' ≠ r, whereinαs,r<1, P (s r | s) it is the probability for buying commodity r under state s.If the β for calculating be negative, be set to one little On the occasion of (taking in the present invention), then probability is standardized;
C) terminate;
Wherein step 2-e) described in optimal recommendation items calculating:
1) carry out solving optimal recommendation items r=π (s) using Policy iteration method, solution procedure is as follows:
A) initial recommendation strategy π0(s0)=argmaxr∈RRwd(s0, r), Rwd (s0, r) represent in state s0Lower recommendation r is returned Report, argmaxr∈RRwd(s0, r) representing, return value maximum is selected in the r of all recommendations;
B) value function, and more New Policy are calculated according to history strategy:
i.
ii.
Wherein V (s) is the value function of state s, and γ ∈ [0,1) it is discount factor (in the present invention, γ takes 0.6), state is returned immediately The computation rule of value Rwd (s, r) is as follows:
I. the successor states s r of state s and selection recommendation items r generation is only present in set CbuyIn, Rwd (s, r)=μ Reward (r), μ>1 (in the present invention, μ takes 1.5);
Ii. the successor states s r of state s and selection recommendation items r generation is only present in set CviewIn, Rwd (s, r)=ν Reward (r), ν ∈ [0,1) (in the present invention, ν takes 0.5);
Iii. the successor states s r of state s and selection recommendation items r generation is in set CbuyWith set CviewIn occur, Rwd (s, r)=(μ+ν) Reward (r);
Wherein Reward (r) is the net profit with regard to item of merchandise r for being given by electric business platform;
C) repeat step b) generates optimal recommendation items until converging to optimal strategy;
2) terminate.
2. the Method of Commodity Recommendation based on markov decision process model according to claim 1, it is characterised in that
Wherein step 3-b) described in generation User Status:
1) if buying or browsing record number m=0, User Status s are generated0, in the state, do not include item of merchandise, i.e. dummy status;
2) if buying or browsing record number 0<m<K, then generate User Status s0, in the state, include m item of merchandise;
3) if buying or record number m >=k being browsed, nearest k item record is only obtained, generates User Status s0, in the state, include k Individual item of merchandise;
4) terminate;
Wherein step 3-c) described in maximum return recommendation items acquisition:
1) if User Status s0For dummy status, then recommend Reward (r) value highest item of merchandise r;
2) if User Status s0Concentrate in training data, then the corresponding optimal strategy of the state in model is obtained, be i.e. optimal recommendation ?
3) if User Status s0Training data concentration is not appeared in, then search condition set C, obtain and current user state s0 Similarity highest state s*, return state s in model*Corresponding optimal recommendation items.State s*Calculation as follows:
s * = argmax s i &Element; C &lsqb; s i m ( s 0 , s i ) &rsqb;
Whereinδ (x, y) is Kronecker function,Expression state siIn m Individual element;
4) terminate;
Wherein step 3-d) described in state-recommendation-selection daily record:
1) in state-recommendation-selection daily record, status representative user's original state s0;What recommendation representative was obtained from model most preferably pushes away Recommend r*;The next step for representing user is selected to select (to select r*Or other commodity r ', r*≠ r '), select to include only to browse, only purchase Buy, browse and buy three types;
2) terminate.
3. the Method of Commodity Recommendation based on markov decision process model according to claim 1, it is characterised in that
Wherein step 4-b) described in off-line model update:
1) state transition function is updated:
If a) finding new state s in state-recommendation-selection daily recordnew, need to update state set C, purchase sequence collection Close CbuyAnd browse arrangement set Cview, and setting initial value as follows is set;
i.
ii.
iii.
Wherein, Cin(s, r, s r) represents under state s, it is recommended that the received number of times of item r, Cout(s, r, s r) represents user In state s without recommended r on the premise of, select item of merchandise r number of times, Ctotal(s, s r) represents that user selects item of merchandise The total degree of r.In initialization procedure, it is to improve precision,Value ξsThe number of times for occurring with data collected state s It is directly proportional, in the present invention, ξsValue is 10 count (s);
B) state s in set C is chosen, if the state carries out offline renewal for the first time, needs initial value is arranged, if Put process such as step a) described, otherwise enter next step c);
C), according to User Status-recommendation-selection daily record, under recording status s, it is recommended that after commodity r, user selects the number of times count of r (s,r,s·r);Under recording status s, user selects the total degree count (s, s r) of r, updates transfer function:
tr M D P ( s , r , s &CenterDot; r ) = C i n t + 1 ( s , r , s &CenterDot; r ) C t o t a l t + 1 ( s , s &CenterDot; r ) , tr M D P ( s , r &prime; , s &CenterDot; r ) = C o u t t + 1 ( s , r , s &CenterDot; r ) C t o t a l t + 1 ( s , s &CenterDot; r ) , r &NotEqual; r &prime;
Wherein:
i.
ii.
iii.
2) the corresponding optimal recommendation items of each state in set C, the same 2-e of calculating process are updated) described;
3) terminate.
CN201610920407.5A 2016-10-21 2016-10-21 Commodity recommendation method based on Markov decision-making process model Pending CN106447463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610920407.5A CN106447463A (en) 2016-10-21 2016-10-21 Commodity recommendation method based on Markov decision-making process model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610920407.5A CN106447463A (en) 2016-10-21 2016-10-21 Commodity recommendation method based on Markov decision-making process model

Publications (1)

Publication Number Publication Date
CN106447463A true CN106447463A (en) 2017-02-22

Family

ID=58176526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610920407.5A Pending CN106447463A (en) 2016-10-21 2016-10-21 Commodity recommendation method based on Markov decision-making process model

Country Status (1)

Country Link
CN (1) CN106447463A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997488A (en) * 2017-03-22 2017-08-01 扬州大学 A kind of action knowledge extraction method of combination markov decision process
CN107885774A (en) * 2017-09-29 2018-04-06 北京京东尚科信息技术有限公司 Data processing method and system
CN108092891A (en) * 2017-12-07 2018-05-29 重庆邮电大学 A kind of data dispatching method based on markov decision process
TWI645350B (en) * 2017-11-24 2018-12-21 財團法人工業技術研究院 Decision factors analyzing device and decision factors analyzing method
CN109062919A (en) * 2018-05-31 2018-12-21 腾讯科技(深圳)有限公司 A kind of content recommendation method and device based on deeply study
CN109472629A (en) * 2018-05-14 2019-03-15 口口相传(北京)网络技术有限公司 It is a kind of configuration and displaying favor information method and device and electronics and storage equipment
CN109493195A (en) * 2018-12-24 2019-03-19 成都品果科技有限公司 A kind of double focusing class recommendation method and system based on intensified learning
CN109697255A (en) * 2017-10-23 2019-04-30 中国科学院沈阳自动化研究所 A kind of Personalize News jettison system and method based on automatic measure on line
CN109858985A (en) * 2017-11-30 2019-06-07 阿里巴巴集团控股有限公司 Merchandise news processing, the method shown and device
CN110020168A (en) * 2017-12-27 2019-07-16 艾迪普(北京)文化科技股份有限公司 A kind of three-dimensional material recommended method based on big data
CN110413867A (en) * 2018-04-28 2019-11-05 第四范式(北京)技术有限公司 Method and system for commending contents
CN110708469A (en) * 2018-07-10 2020-01-17 北京地平线机器人技术研发有限公司 Method and device for adapting exposure parameters and corresponding camera exposure system
CN111222931A (en) * 2018-11-23 2020-06-02 阿里巴巴集团控股有限公司 Product recommendation method and system
CN111401937A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Data pushing method and device and storage medium
CN114444698A (en) * 2022-01-28 2022-05-06 腾讯科技(深圳)有限公司 Information recommendation model training method and device, computer equipment and storage medium
CN115270004A (en) * 2022-09-28 2022-11-01 云南师范大学 Education resource recommendation method based on field factor decomposition

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997488A (en) * 2017-03-22 2017-08-01 扬州大学 A kind of action knowledge extraction method of combination markov decision process
CN107885774B (en) * 2017-09-29 2020-11-20 北京京东尚科信息技术有限公司 Data processing method and system
CN107885774A (en) * 2017-09-29 2018-04-06 北京京东尚科信息技术有限公司 Data processing method and system
CN109697255A (en) * 2017-10-23 2019-04-30 中国科学院沈阳自动化研究所 A kind of Personalize News jettison system and method based on automatic measure on line
CN109840796B (en) * 2017-11-24 2021-08-24 财团法人工业技术研究院 Decision factor analysis device and decision factor analysis method
TWI645350B (en) * 2017-11-24 2018-12-21 財團法人工業技術研究院 Decision factors analyzing device and decision factors analyzing method
CN109840796A (en) * 2017-11-24 2019-06-04 财团法人工业技术研究院 Decision factor analytical equipment and decision factor analysis method
US10572929B2 (en) 2017-11-24 2020-02-25 Industrial Technology Research Institute Decision factors analyzing device and decision factors analyzing method
CN109858985A (en) * 2017-11-30 2019-06-07 阿里巴巴集团控股有限公司 Merchandise news processing, the method shown and device
CN108092891A (en) * 2017-12-07 2018-05-29 重庆邮电大学 A kind of data dispatching method based on markov decision process
CN110020168A (en) * 2017-12-27 2019-07-16 艾迪普(北京)文化科技股份有限公司 A kind of three-dimensional material recommended method based on big data
CN110413867A (en) * 2018-04-28 2019-11-05 第四范式(北京)技术有限公司 Method and system for commending contents
CN109472629A (en) * 2018-05-14 2019-03-15 口口相传(北京)网络技术有限公司 It is a kind of configuration and displaying favor information method and device and electronics and storage equipment
CN109062919A (en) * 2018-05-31 2018-12-21 腾讯科技(深圳)有限公司 A kind of content recommendation method and device based on deeply study
CN110708469A (en) * 2018-07-10 2020-01-17 北京地平线机器人技术研发有限公司 Method and device for adapting exposure parameters and corresponding camera exposure system
CN111222931A (en) * 2018-11-23 2020-06-02 阿里巴巴集团控股有限公司 Product recommendation method and system
CN111222931B (en) * 2018-11-23 2023-05-05 阿里巴巴集团控股有限公司 Product recommendation method and system
CN109493195B (en) * 2018-12-24 2021-07-30 成都品果科技有限公司 Double-gathering recommendation method and system based on reinforcement learning
CN109493195A (en) * 2018-12-24 2019-03-19 成都品果科技有限公司 A kind of double focusing class recommendation method and system based on intensified learning
CN111401937A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Data pushing method and device and storage medium
WO2021169218A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Data pushing method and system, electronic device and storage medium
CN114444698A (en) * 2022-01-28 2022-05-06 腾讯科技(深圳)有限公司 Information recommendation model training method and device, computer equipment and storage medium
CN115270004A (en) * 2022-09-28 2022-11-01 云南师范大学 Education resource recommendation method based on field factor decomposition
CN115270004B (en) * 2022-09-28 2023-10-27 云南师范大学 Educational resource recommendation method based on field factor decomposition

Similar Documents

Publication Publication Date Title
CN106447463A (en) Commodity recommendation method based on Markov decision-making process model
US20220301024A1 (en) Sequential recommendation method based on long-term and short-term interests
Sharma et al. Collaborative filtering-based recommender system: Approaches and research challenges
CN110222272A (en) A kind of potential customers excavate and recommended method
CN103473354A (en) Insurance recommendation system framework and insurance recommendation method based on e-commerce platform
CN102567900A (en) Method for recommending commodities to customers
CN112365283B (en) Coupon issuing method and device, terminal equipment and storage medium
CN111709810A (en) Object recommendation method and device based on recommendation model
CN104268292A (en) Label word library update method of portrait system
CN108717654B (en) Multi-provider cross recommendation method based on clustering feature migration
CN103678518A (en) Method and device for adjusting recommendation lists
Chen et al. Dig users’ intentions via attention flow network for personalized recommendation
CN102073720A (en) FR method for optimizing personalized recommendation results
KR102049777B1 (en) Item recommendation method and apparatus based on user behavior
CN105630946A (en) Big data based field cross recommendation method and apparatus
CN110689402A (en) Method and device for recommending merchants, electronic equipment and readable storage medium
CN113190751B (en) Recommendation method fusing keyword generation
CN117495458B (en) Advertisement online pushing method based on user portrait
Li Accurate digital marketing communication based on intelligent data analysis
CN116957691B (en) Cross-platform intelligent advertisement putting method and system for commodities of e-commerce merchants
CN115860880B (en) Personalized commodity recommendation method and system based on multi-layer heterogeneous graph convolution model
CN104933595A (en) Collaborative filtering recommendation method based on Markov prediction model
CN111429214B (en) Transaction data-based buyer and seller matching method and device
CN110347923B (en) Traceable fast fission type user portrait construction method
CN115600009A (en) Deep reinforcement learning-based recommendation method considering future preference of user

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170222