CN106447463A - Commodity recommendation method based on Markov decision-making process model - Google Patents
Commodity recommendation method based on Markov decision-making process model Download PDFInfo
- Publication number
- CN106447463A CN106447463A CN201610920407.5A CN201610920407A CN106447463A CN 106447463 A CN106447463 A CN 106447463A CN 201610920407 A CN201610920407 A CN 201610920407A CN 106447463 A CN106447463 A CN 106447463A
- Authority
- CN
- China
- Prior art keywords
- state
- commodity
- user
- recommendation
- item
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000008569 process Effects 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims description 21
- 230000007704 transition Effects 0.000 claims description 17
- 239000013065 commercial product Substances 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 239000012141 concentrate Substances 0.000 claims description 2
- 238000012937 correction Methods 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims 1
- 230000005012 migration Effects 0.000 abstract 2
- 238000013508 migration Methods 0.000 abstract 2
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 5
- 239000000047 product Substances 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 206010068052 Mosaicism Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Algebra (AREA)
- Artificial Intelligence (AREA)
- Computational Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a commodity recommendation method based on a Markov decision-making process model. The commodity recommendation method comprises the following steps: 1) a preparatory stage; 2) an initial model generation stage: reading a subsequence instant state s in a state set C and a commodity r in a commodity set, calculating a migration probability that the state s is migrated to a subsequent state s.r, comprising trMDP(s,r,s.r) for recommending the commodity r and trMDP(s,r',s.r) for recommending other commodities r', and generating a state migration function; 3) recommendation stage: obtaining the lately purchased or browsing history of a current user; generating the state of the current user according to records; obtaining a recommending item for generating the maximum profit, and returning the recommending item to the current user; recording the recommending item and a purchased or browsing option of the user, and generating a state-recommendation-selection log; and 4) offline model updating stage: carrying out offline model updating at a fixed interval T.
Description
Technical field
The present invention relates to the Method of Commodity Recommendation based on electric business platform, more particularly to a kind of based on markov decision process
The Method of Commodity Recommendation of model.
Background technology
Commercial product recommending be in a kind of Characteristic of Interest according to user and purchasing behavior, to user's recommended users business interested
The technology of product.With the continuous expansion of ecommerce scale, commodity number and species rapid growth, customer need spends substantial amounts of
Time can just find the commodity for oneself wanting to buy.This information unrelated in a large number and product process of browsing can undoubtedly make to be submerged in information
Consumer in overload problem is constantly lost in.In order to solve these problems, commercial product recommending system set up in mass data excavate and
On the basis of intelligent decision, e-commerce website is helped to provide effective decision support and information service for its customer purchase.
Commercial product recommending system finds rule from the behavior of user and preference, and is thus recommended.User behavior includes:
Score, browse, buying, page time of staying etc., it is electric business platform more effective user that wherein user browses and buys daily record
Preference acquiring way.At present, the main algorithm of commercial product recommending system has:Based on the proposed algorithm of correlation rule, based on content
Proposed algorithm and Collaborative Filtering Recommendation Algorithm.
Commending system based on simple correlation rule is due to being not based on complete model, it is impossible to effectively reflect user not
Determine point of interest.The commending system feature extraction of Cempetency-based education limited in one's ability, it is impossible to find for client new interested
Resource, and satisfied recommendation effect can not be produced for some more difficult commodity for extracting content.Based on collaborative filtering recommending
Commending system cannot process between user similarity may inaccurate sparse sex chromosome mosaicism, and increasing with user and commodity
Scalability problem.
Log information is browsed and is bought in order to be able to make full use of the user in electric business platform, invented herein a kind of based on horse
The Method of Commodity Recommendation of Er Kefu decision making process model, can strengthen the effect of commercial product recommending by the mechanism of intensified learning.
Content of the invention
The invention provides a kind of novel Method of Commodity Recommendation based on markov decision process model.Each user
Browse or purchaser record entry s is made up of multiple item of merchandise r ∈ s.Can using adjacent browse or purchaser record entry change
Transition probability, and the accuracy of commercial product recommending is strengthened to the iterative of state value function.The method obtains user first
Buy or record is browsed, data genaration training data is then filtered, then markov decision process mould is generated to data prediction
Type (MDP), and the optimum recommendation under each state is obtained to MDP iterative.Bought with reference to active user recently or browsing
Record carries out commercial product recommending, and continues to record user behavior generation daily record, periodically enters line drag more to MDP model accordingly
Newly.
Markov decision process (MDP) is a kind of generally applicable decision model.MDP is all of decision making process correlation
Agency sets up state space S and motion space A.The action impact conditions Ambient of agency so that state makees uncertain conversion, moves
The feedback of work can also affect the Action Selection that acts on behalf of.In the present invention, user buys or browses Series Modeling for state S, according to purchase
Buy or browse the done recommendation of record and be modeled as A, according to state value function by financial value desirably up to commercial product recommending to user.
Purchase before user or browse impact recommendation results, at the same recommendation results user can be affected to browse next time or buy certainly
Plan.This process does not stop iteration, until terminating.
The technical scheme is that:A kind of recommendation method based on markov decision process model, the method include as
Lower step:
1) preparatory stage
A) from electric business platform, data set is obtained, the data set includes two parts:User buy data set (purchaser record) with
And user's browse path of acquisition is recorded from web log file;
B) data filtering is carried out, generates training data.The standard of data filtering is:Filter purchase or number of visits is less than N
Secondary commodity item data (in the present invention, N takes 100), filters and buys or browse record less than k (k=5 in the present invention) individual item of merchandise
User data;
C) terminate the preparatory stage;
2) initial model generation phase
A) read step 1-b) the middle training data for generating, parse and record each commodity, represented with r, all commodity groups
Become set R, R={ ri};Parsing user journal, generates user purchase sequence set Cbuy, user browses arrangement set Cview, superset
Close C=Cbuy∪Cview, C={ si, each the subsequence s (referred to as state) in set includes k item of merchandise, if last of s
Individual item of merchandise is for buying commodity, then s is partitioned into purchase sequence set CbuyIf last item of merchandise of s is to browse commodity,
Then s is partitioned into browsing arrangement set Cview;
B) a commodity r in reading state set C in state s and commodity set, state s of calculating moves to follow-up
State s r transition probability, including the tr of Recommendations rMDP(s, r, s r) and the tr of recommendation other commodity r 'MDP(s,r′,s·
R), state transition function is generated;
C) all commodity of the repeat step b) in commodity set are disposed;
D) repeat step b), c) in state set, all states are processed finishes;
E) state s in reading state set C, iterates to calculate optimal recommendation items in this condition and stores;
F) repeat step e) all states in set are processed finishes;
G) terminate model generation phase;
3) recommend the stage
A) obtain active user to buy recently or browse record;
B) current user state is generated according to record;
C) recommendation items for producing maximum return are obtained, the recommendation items is returned to active user;
D) record the purchase of recommendation items and user or optionies are browsed, generate state-recommendation-selection daily record;
E) repeat step a), b), c) until active user exits session;
F) terminate the recommendation stage;
4) the line drag more new stage
A) fixed time intervals T, enters the renewal of line drag;
B) the end lines drag more new stage;
Wherein step 2-a) described in subsequence s:
1) obtain from training data and buy or browse commodity path x1,x2,…,xn(n is commodity purchasing or browses total
Amount), the path is decomposed into multiple subsequences in order<x1,…,xk>,<x2,…,xk+1>,…,<xn-k+1,…,xn>, each
Subsequence is referred to as state, contains k commodity in each state;
2) terminate;
Wherein step 2-b) described in successor states s r:
1) the 1st of successor states s r is arrived the sequence of kth -1 and is protected with the sequence of the 2nd to kth item of original state s
Hold unanimously, be expressed as original state s:<x1,…,xk>, successor states s r:<x2,…,xk,r>;
2) terminate;
Wherein step 2-b) described in state transition function:
1) according to the state set C in training data, using Maximum Likelihood Estimation, initial transition probabilities are calculated as followsIt is designated as trpredict(s, s r), wherein count (<
x1,,x2,…,xk>) it is expressed as sequence x1,,x2,…,xkThe number of times for occurring in data set C;
2) initial transition probabilities are done following correction by the impact for user being produced in view of commercial product recommending:
a)trMDP(s, r, s r)=αs,r·trpredict(s, s r), wherein For buying the probability of commodity r, ω is that (in the present invention, ω takes minimum constant), count (r) represents commodity r
The number of times for occurring is concentrated in training data;
b)trMDP(s, r ', s r)=βs,r·trpredict(s, s r), r ' ≠ r, wherein P (s r | s) it is the probability for buying commodity r under state s.If the β for calculating is negative, to be set to one
Individual little on the occasion of (taking in the present invention), then probability is standardized;
C) terminate;
Wherein step 2-e) described in optimal recommendation items calculating:
1) carry out solving optimal recommendation items r=π (s) using Policy iteration method, solution procedure is as follows:
A) initial policy π0(s0)=argmaxr∈RRwd(s0, r), Rwd (s0, r) represent in state s0Lower recommendation r is returned
Report, argmaxr∈RRwd(s0, r) representing, return value maximum is selected in the r of all recommendations;
B) value function, and more New Policy are calculated according to history strategy:
i.
ii.
Wherein V (s) is the value function of state s, and γ ∈ [0,1) it is discount factor (in the present invention, γ takes 0.6), state is immediately
The computation rule of return value Rwd (s, r) is as follows:
I. the successor states s r of state s and selection recommendation items r generation is only present in set CbuyIn, Rwd (s, r)=μ
Reward (r), μ>1 (in the present invention, μ takes 1.5);
Ii. the successor states s r of state s and selection recommendation items r generation is only present in set CviewIn, Rwd (s, r)=ν
Reward (r), ν ∈ [0,1) (in the present invention, ν takes 0.5);
Iii. the successor states s r of state s and selection recommendation items r generation is in set CbuyWith set CviewIn all have
Existing, Rwd (s, r)=(μ+ν) Reward (r);
Wherein Reward (r) is the net profit with regard to item of merchandise r for being given by electric business platform;
C) repeat step b) reaches greatest iteration value and (takes in the present invention until converging to optimal strategy or number of repetition
200), optimal recommendation items are generated;
2) terminate;
Wherein step 3-b) described in generation User Status:
1) if buying or browsing record number 0≤m < k, User Status s are generated0, do not include item of merchandise in the state, i.e.,
Dummy status;
2) if buying or record number m >=k being browsed, nearest k item record is only obtained, generates User Status s0, in the state
Comprising k item of merchandise;
3) terminate;
Wherein step 3-c) described in maximum return recommendation items acquisition:
1) if User Status s0For dummy status, then recommend Reward (r) value highest item of merchandise r;
2) if User Status s0Concentrate in training data, then the corresponding optimal strategy of the state in model is obtained, i.e., most
Good recommendation items
3) if User Status s0Training data concentration is not appeared in, then search condition set C, obtain and active user's shape
State s0Similarity highest state s*, return state s in model*Corresponding optimal recommendation items.State s*Calculation as follows:
Whereinδ (x, y) is Kronecker function, is defined as Expression state siIn m-th element;
4) terminate;
Wherein step 3-d) described in state-recommendation-selection daily record:
1) in state-recommendation-selection daily record, status representative user's original state s0;Recommend to represent acquisition from model
Optimal recommendation r*;The next step for representing user is selected to select (to select r*Or other commodity r ', r*≠ r '), select including only clear
Look at, only buy, browse and buy three types;
2) terminate;
Wherein step 4-b) described in off-line model update:
1) state transition function is updated:
If a) finding new state s in state-recommendation-selection daily recordnew, need state set C is updated, buy sequence
Row set CbuyAnd browse arrangement set Cview, and setting initial value as follows is set;
i.
ii.
iii.
Wherein, Cin(s, r, s r) represents under state s, it is recommended that the received number of times of item r, Cout(s, r, s r) represents
User in state s without recommended r on the premise of, select item of merchandise r number of times, Ctotal(s, s r) represents that user selects business
The total degree of product item r.In initialization procedure, it is to improve precision,Value ξsWith the appearance of data collected state s
Number of times is directly proportional, in the present invention, ξsValue is 10 count (s);
B) state s in set C is chosen, if the state carries out offline renewal for the first time, needs setting initial
Value, setting up procedure such as step a) is described, otherwise enters next step c);
C), according to User Status-recommendation-selection daily record, under recording status s, it is recommended that after commodity r, user selects the number of times of r
count(s,r,s·r);Under recording status s, user selects the total degree count (s, s r) of r, updates transfer function:
Wherein:
i.
ii.
iii.
2) the corresponding optimal recommendation items of each state in set C, the same 2-e of calculating process are updated) described;
Terminate;
Description of the drawings
The workflow diagram of Fig. 1 summation present invention;
The recommendation method workflow diagram based on markov decision process model of Fig. 2 present invention;
Fig. 3 generating states set closes flow chart;
The workflow diagram that Fig. 4 initial model is generated;
The workflow diagram that Fig. 5 recommends;
The workflow diagram of the calculating of the optimal recommendation items of Fig. 6.
Specific embodiment
The present invention is described in detail below in conjunction with the accompanying drawings.
The present invention is the recommendation method based on markov decision process model, it is intended that improves the effectiveness that recommends, is use
Family provides useful and satisfied recommendation.As shown in figure 1, describing the processing procedure of the present invention.The present invention obtain first purchase and
Record being browsed, training dataset is generated, then initial recommendation model is generated, commodity are carried out then according to particular user state and push away
Recommend, under last line, carry out model modification.
In the present invention, process is divided into four-stage:Preparatory stage, initial model generation phase, it is recommended that under stage and line
Model generation phase is as shown in Figure 2.It is critical only that for the present invention is obtained under each User Status of generation according to training data
Maximum return recommendation items, then bought according to user recently or browse record Recommendations, and pushed away using user
The purchase made after recommending item or browse optionies state-recommendations-selection daily record is generated, for the generation of line drag.
Step 2-0 is the initial state of the recommendation method based on markov decision process model of the present invention;
Preparatory stage includes step 2-1,2-2;
Step 2-1 obtains user's purchase from electric business platform and browses record;
Step 2-2 crosses filter data, generates training data;
Initial model generation phase includes step 2-3, step 2-4, step 2-5, step 2-6;
Step 2-3 data prediction, reads training data, obtains set R and set C;
Step 2-4 reads each state and each merchandise news;
Step 2-5 calculates the transition probability of successor states s r, generates state transition function;
Step 2-6 is calculated and stores optimal recommendation items under each state;
The recommendation stage includes step 2-7, step 2-8, step 2-9;
Step 2-7 obtains active user and buys or browse record recently, generates User Status;
Step 2-8 returns the recommendation items of maximum return according to User Status;
Step 2-9 record recommendation items and user's optionies, generate daily record;
The line drag more new stage includes step 2-10;
Step 2-10 line drag updates;
Step 2-11 is done state.
Fig. 3 is the detailed description of the process that generating states set is closed.
Step 3-0 is the beginning state that generating states set is closed;
Step 3-1 obtains purchase from training data or browses commodity path;
Path decomposing is multiple subsequences by step 3-2, and each sub-series of packets contains k commodity;
Step 3-3 reads a subsequence s;
According to subsequence, s is partitioned into purchase sequence set or browses arrangement set step 3-4 by last
Step 3-5 judges whether subsequence is disposed, if be disposed, goes to 3-6, if be not disposed,
Go to 3-3;
Step 3-6 is the done state that generating states set is closed.
Fig. 4 is the detailed description of the process that initial model is generated.
Step 4-0 is the beginning state that initial model is generated;
Step 4-1 data prediction;
Step 4-2 obtains training data, generates commodity set R and state set S;
Step 4-3 reads state s;
Step 4-4 reads a commodity r;
Step 4-5 calculates s->S r transition probability tr, generates state transition function;
Step 4-6 judges whether commodity are disposed, if be disposed, goes to 4-7, if be not disposed, turns
To 4-4;
Step 4-7 judges whether state is disposed, if be disposed, goes to 4-8, if be not disposed, turns
To 4-3;
Step 4-8 reads a state, calculates optimal recommendation items and stores;
Step 4-9 judges whether state is disposed, if be disposed, goes to 4-10, if it has not ended, going to 4-
8;
The done state of the process that step 4-10 is generated for initial model.
Fig. 5 is the detailed description of the process that recommends.
Step 5-0 is the beginning state that recommends;
Step 5-1 obtains active user and buys recently or browse record;
Step 5-2 obtains the recommendation items of generation maximum return and returns to user;
Step 5-3 records the purchase of recommendation items and user or browses optionies, generates state-recommendation-selection daily record;
Step 5-4 is the done state that recommends.
Fig. 6 is the detailed description of the process of the calculating of optimal recommendation items.
Step 6-0 is the beginning state of the calculating of optimal recommendation items;
Step 6-1 arranges initial policy;
Step 6-2 calculates value function according to history strategy;
Step 6-3 more New Policy;
Step 6-4 judges whether to converge to optimal strategy or reaches maximum iteration time, if it is, 6-5 is gone to, if
No, go to 6-2;
Step 6-5 generates optimal recommendation items;
Step 6-6 is the done state of the calculating of optimal recommendation items.
Claims (3)
1. a kind of Method of Commodity Recommendation based on markov decision process model, it is characterised in that comprise the steps:
1) preparatory stage
A) from electric business platform, data set is obtained, the data set includes two parts:User buy data set (purchaser record) and from
The user's browse path record for obtaining in web log file;
B) data filtering is carried out, generates training data.The standard of data filtering is:Purchase or number of visits are filtered less than n times
Commodity item data (in the present invention, N takes 100), filters the user data for buying or browsing record only one of which item of merchandise;
C) terminate the preparatory stage;
2) initial model generation phase
A) read step 1-b) the middle training data for generating, parse and record each commodity, represented with r, all commodity composition collection
Close R, R={ ri};Parsing user journal, generates user purchase sequence set Cbuy, user browses arrangement set Cview, superset conjunction C
=Cbuy∪Cview, C={ si, each subsequence s (referred to as state) that superset is closed in C includes k item of merchandise, if last of s
Individual item of merchandise is for buying commodity, then s is partitioned into purchase sequence set CbuyIf last item of merchandise of s is to browse commodity,
Then s is partitioned into browsing arrangement set Cview;
B) in reading state set C, a subsequence is state s and a commodity r in commodity set, and state s of calculating is moved to
Successor states s r transition probability, including the tr of Recommendations rMDP(s, r, s r) and the tr of recommendation other commodity r 'MDP(s,
R ', s r), generate state transition function;
C) all commodity of the repeat step b) in commodity set are disposed;
D) repeat step b), c) in state set, all states are processed finishes;
E) state s in reading state set C, calculates optimal recommendation items in this condition and stores;
F) repeat step e) all states in set are processed finishes;
G) terminate model generation phase;
3) recommend the stage
A) obtain active user to buy recently or browse record;
B) current user state is generated according to record;
C) recommendation items for producing maximum return are obtained, the recommendation items is returned to active user;
D) record the purchase of recommendation items and user or optionies are browsed, generate state-recommendation-selection daily record;
E) repeat step a), b), c) until active user exits session;
F) terminate the recommendation stage;
4) the line drag more new stage
A) fixed time intervals T, enters the renewal of line drag;
B) the end lines drag more new stage;
Wherein step 2-a) described in subsequence s:
1) obtain from training data and buy or browse commodity path x1,x2,…,xn(n is commodity purchasing or browses total amount), will
The path is decomposed into multiple subsequences in order<x1,…,xk>,<x2,…,xk+1>,…,<xn-k+1,…,xn>, each subsequence
Referred to as state, contains k commodity in each state;
2) terminate;
Wherein step 2-b) described in successor states s r:
1) the 1st of successor states s r arrives the sequence of kth -1 and keeps one with the sequence of the 2nd to kth item of original state s
Cause, be expressed as original state s:<x1,…,xk>, successor states s r:<x2,…,xk,r>;
2) terminate;
Wherein step 2-b) described in state transition function:
1) according to the state set C in training data, using Maximum Likelihood Estimation, initial transition probabilities are calculated as followsIt is designated as trpredict(s, s r), wherein count (<
x1,,x2,…,xk>) it is expressed as sequence x1,,x2,…,xkThe number of times for occurring in data set;
2) initial transition probabilities are done following correction by the impact for user being produced in view of commercial product recommending:
a)trMDP(s, r, s r)=αs,r·trpredict(s, s r), wherein For buying the probability of commodity r, ω is that (in the present invention, ω takes minimum constant), count (r) represents commodity r
The number of times for occurring is concentrated in training data;
b)trMDP(s, r ', s r)=βs,r·trpredict(s, s r), r ' ≠ r, whereinαs,r<1,
P (s r | s) it is the probability for buying commodity r under state s.If the β for calculating be negative, be set to one little
On the occasion of (taking in the present invention), then probability is standardized;
C) terminate;
Wherein step 2-e) described in optimal recommendation items calculating:
1) carry out solving optimal recommendation items r=π (s) using Policy iteration method, solution procedure is as follows:
A) initial recommendation strategy π0(s0)=argmaxr∈RRwd(s0, r), Rwd (s0, r) represent in state s0Lower recommendation r is returned
Report, argmaxr∈RRwd(s0, r) representing, return value maximum is selected in the r of all recommendations;
B) value function, and more New Policy are calculated according to history strategy:
i.
ii.
Wherein V (s) is the value function of state s, and γ ∈ [0,1) it is discount factor (in the present invention, γ takes 0.6), state is returned immediately
The computation rule of value Rwd (s, r) is as follows:
I. the successor states s r of state s and selection recommendation items r generation is only present in set CbuyIn, Rwd (s, r)=μ Reward
(r), μ>1 (in the present invention, μ takes 1.5);
Ii. the successor states s r of state s and selection recommendation items r generation is only present in set CviewIn, Rwd (s, r)=ν
Reward (r), ν ∈ [0,1) (in the present invention, ν takes 0.5);
Iii. the successor states s r of state s and selection recommendation items r generation is in set CbuyWith set CviewIn occur, Rwd
(s, r)=(μ+ν) Reward (r);
Wherein Reward (r) is the net profit with regard to item of merchandise r for being given by electric business platform;
C) repeat step b) generates optimal recommendation items until converging to optimal strategy;
2) terminate.
2. the Method of Commodity Recommendation based on markov decision process model according to claim 1, it is characterised in that
Wherein step 3-b) described in generation User Status:
1) if buying or browsing record number m=0, User Status s are generated0, in the state, do not include item of merchandise, i.e. dummy status;
2) if buying or browsing record number 0<m<K, then generate User Status s0, in the state, include m item of merchandise;
3) if buying or record number m >=k being browsed, nearest k item record is only obtained, generates User Status s0, in the state, include k
Individual item of merchandise;
4) terminate;
Wherein step 3-c) described in maximum return recommendation items acquisition:
1) if User Status s0For dummy status, then recommend Reward (r) value highest item of merchandise r;
2) if User Status s0Concentrate in training data, then the corresponding optimal strategy of the state in model is obtained, be i.e. optimal recommendation
?
3) if User Status s0Training data concentration is not appeared in, then search condition set C, obtain and current user state s0
Similarity highest state s*, return state s in model*Corresponding optimal recommendation items.State s*Calculation as follows:
Whereinδ (x, y) is Kronecker function,Expression state siIn m
Individual element;
4) terminate;
Wherein step 3-d) described in state-recommendation-selection daily record:
1) in state-recommendation-selection daily record, status representative user's original state s0;What recommendation representative was obtained from model most preferably pushes away
Recommend r*;The next step for representing user is selected to select (to select r*Or other commodity r ', r*≠ r '), select to include only to browse, only purchase
Buy, browse and buy three types;
2) terminate.
3. the Method of Commodity Recommendation based on markov decision process model according to claim 1, it is characterised in that
Wherein step 4-b) described in off-line model update:
1) state transition function is updated:
If a) finding new state s in state-recommendation-selection daily recordnew, need to update state set C, purchase sequence collection
Close CbuyAnd browse arrangement set Cview, and setting initial value as follows is set;
i.
ii.
iii.
Wherein, Cin(s, r, s r) represents under state s, it is recommended that the received number of times of item r, Cout(s, r, s r) represents user
In state s without recommended r on the premise of, select item of merchandise r number of times, Ctotal(s, s r) represents that user selects item of merchandise
The total degree of r.In initialization procedure, it is to improve precision,Value ξsThe number of times for occurring with data collected state s
It is directly proportional, in the present invention, ξsValue is 10 count (s);
B) state s in set C is chosen, if the state carries out offline renewal for the first time, needs initial value is arranged, if
Put process such as step a) described, otherwise enter next step c);
C), according to User Status-recommendation-selection daily record, under recording status s, it is recommended that after commodity r, user selects the number of times count of r
(s,r,s·r);Under recording status s, user selects the total degree count (s, s r) of r, updates transfer function:
Wherein:
i.
ii.
iii.
2) the corresponding optimal recommendation items of each state in set C, the same 2-e of calculating process are updated) described;
3) terminate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610920407.5A CN106447463A (en) | 2016-10-21 | 2016-10-21 | Commodity recommendation method based on Markov decision-making process model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610920407.5A CN106447463A (en) | 2016-10-21 | 2016-10-21 | Commodity recommendation method based on Markov decision-making process model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106447463A true CN106447463A (en) | 2017-02-22 |
Family
ID=58176526
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610920407.5A Pending CN106447463A (en) | 2016-10-21 | 2016-10-21 | Commodity recommendation method based on Markov decision-making process model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106447463A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997488A (en) * | 2017-03-22 | 2017-08-01 | 扬州大学 | A kind of action knowledge extraction method of combination markov decision process |
CN107885774A (en) * | 2017-09-29 | 2018-04-06 | 北京京东尚科信息技术有限公司 | Data processing method and system |
CN108092891A (en) * | 2017-12-07 | 2018-05-29 | 重庆邮电大学 | A kind of data dispatching method based on markov decision process |
TWI645350B (en) * | 2017-11-24 | 2018-12-21 | 財團法人工業技術研究院 | Decision factors analyzing device and decision factors analyzing method |
CN109062919A (en) * | 2018-05-31 | 2018-12-21 | 腾讯科技(深圳)有限公司 | A kind of content recommendation method and device based on deeply study |
CN109472629A (en) * | 2018-05-14 | 2019-03-15 | 口口相传(北京)网络技术有限公司 | It is a kind of configuration and displaying favor information method and device and electronics and storage equipment |
CN109493195A (en) * | 2018-12-24 | 2019-03-19 | 成都品果科技有限公司 | A kind of double focusing class recommendation method and system based on intensified learning |
CN109697255A (en) * | 2017-10-23 | 2019-04-30 | 中国科学院沈阳自动化研究所 | A kind of Personalize News jettison system and method based on automatic measure on line |
CN109858985A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | Merchandise news processing, the method shown and device |
CN110020168A (en) * | 2017-12-27 | 2019-07-16 | 艾迪普(北京)文化科技股份有限公司 | A kind of three-dimensional material recommended method based on big data |
CN110413867A (en) * | 2018-04-28 | 2019-11-05 | 第四范式(北京)技术有限公司 | Method and system for commending contents |
CN110708469A (en) * | 2018-07-10 | 2020-01-17 | 北京地平线机器人技术研发有限公司 | Method and device for adapting exposure parameters and corresponding camera exposure system |
CN111222931A (en) * | 2018-11-23 | 2020-06-02 | 阿里巴巴集团控股有限公司 | Product recommendation method and system |
CN111401937A (en) * | 2020-02-26 | 2020-07-10 | 平安科技(深圳)有限公司 | Data pushing method and device and storage medium |
CN114444698A (en) * | 2022-01-28 | 2022-05-06 | 腾讯科技(深圳)有限公司 | Information recommendation model training method and device, computer equipment and storage medium |
CN115270004A (en) * | 2022-09-28 | 2022-11-01 | 云南师范大学 | Education resource recommendation method based on field factor decomposition |
-
2016
- 2016-10-21 CN CN201610920407.5A patent/CN106447463A/en active Pending
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997488A (en) * | 2017-03-22 | 2017-08-01 | 扬州大学 | A kind of action knowledge extraction method of combination markov decision process |
CN107885774B (en) * | 2017-09-29 | 2020-11-20 | 北京京东尚科信息技术有限公司 | Data processing method and system |
CN107885774A (en) * | 2017-09-29 | 2018-04-06 | 北京京东尚科信息技术有限公司 | Data processing method and system |
CN109697255A (en) * | 2017-10-23 | 2019-04-30 | 中国科学院沈阳自动化研究所 | A kind of Personalize News jettison system and method based on automatic measure on line |
CN109840796B (en) * | 2017-11-24 | 2021-08-24 | 财团法人工业技术研究院 | Decision factor analysis device and decision factor analysis method |
TWI645350B (en) * | 2017-11-24 | 2018-12-21 | 財團法人工業技術研究院 | Decision factors analyzing device and decision factors analyzing method |
CN109840796A (en) * | 2017-11-24 | 2019-06-04 | 财团法人工业技术研究院 | Decision factor analytical equipment and decision factor analysis method |
US10572929B2 (en) | 2017-11-24 | 2020-02-25 | Industrial Technology Research Institute | Decision factors analyzing device and decision factors analyzing method |
CN109858985A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | Merchandise news processing, the method shown and device |
CN108092891A (en) * | 2017-12-07 | 2018-05-29 | 重庆邮电大学 | A kind of data dispatching method based on markov decision process |
CN110020168A (en) * | 2017-12-27 | 2019-07-16 | 艾迪普(北京)文化科技股份有限公司 | A kind of three-dimensional material recommended method based on big data |
CN110413867A (en) * | 2018-04-28 | 2019-11-05 | 第四范式(北京)技术有限公司 | Method and system for commending contents |
CN109472629A (en) * | 2018-05-14 | 2019-03-15 | 口口相传(北京)网络技术有限公司 | It is a kind of configuration and displaying favor information method and device and electronics and storage equipment |
CN109062919A (en) * | 2018-05-31 | 2018-12-21 | 腾讯科技(深圳)有限公司 | A kind of content recommendation method and device based on deeply study |
CN110708469A (en) * | 2018-07-10 | 2020-01-17 | 北京地平线机器人技术研发有限公司 | Method and device for adapting exposure parameters and corresponding camera exposure system |
CN111222931A (en) * | 2018-11-23 | 2020-06-02 | 阿里巴巴集团控股有限公司 | Product recommendation method and system |
CN111222931B (en) * | 2018-11-23 | 2023-05-05 | 阿里巴巴集团控股有限公司 | Product recommendation method and system |
CN109493195B (en) * | 2018-12-24 | 2021-07-30 | 成都品果科技有限公司 | Double-gathering recommendation method and system based on reinforcement learning |
CN109493195A (en) * | 2018-12-24 | 2019-03-19 | 成都品果科技有限公司 | A kind of double focusing class recommendation method and system based on intensified learning |
CN111401937A (en) * | 2020-02-26 | 2020-07-10 | 平安科技(深圳)有限公司 | Data pushing method and device and storage medium |
WO2021169218A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Data pushing method and system, electronic device and storage medium |
CN114444698A (en) * | 2022-01-28 | 2022-05-06 | 腾讯科技(深圳)有限公司 | Information recommendation model training method and device, computer equipment and storage medium |
CN115270004A (en) * | 2022-09-28 | 2022-11-01 | 云南师范大学 | Education resource recommendation method based on field factor decomposition |
CN115270004B (en) * | 2022-09-28 | 2023-10-27 | 云南师范大学 | Educational resource recommendation method based on field factor decomposition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106447463A (en) | Commodity recommendation method based on Markov decision-making process model | |
US20220301024A1 (en) | Sequential recommendation method based on long-term and short-term interests | |
Sharma et al. | Collaborative filtering-based recommender system: Approaches and research challenges | |
CN110222272A (en) | A kind of potential customers excavate and recommended method | |
CN103473354A (en) | Insurance recommendation system framework and insurance recommendation method based on e-commerce platform | |
CN102567900A (en) | Method for recommending commodities to customers | |
CN112365283B (en) | Coupon issuing method and device, terminal equipment and storage medium | |
CN111709810A (en) | Object recommendation method and device based on recommendation model | |
CN104268292A (en) | Label word library update method of portrait system | |
CN108717654B (en) | Multi-provider cross recommendation method based on clustering feature migration | |
CN103678518A (en) | Method and device for adjusting recommendation lists | |
Chen et al. | Dig users’ intentions via attention flow network for personalized recommendation | |
CN102073720A (en) | FR method for optimizing personalized recommendation results | |
KR102049777B1 (en) | Item recommendation method and apparatus based on user behavior | |
CN105630946A (en) | Big data based field cross recommendation method and apparatus | |
CN110689402A (en) | Method and device for recommending merchants, electronic equipment and readable storage medium | |
CN113190751B (en) | Recommendation method fusing keyword generation | |
CN117495458B (en) | Advertisement online pushing method based on user portrait | |
Li | Accurate digital marketing communication based on intelligent data analysis | |
CN116957691B (en) | Cross-platform intelligent advertisement putting method and system for commodities of e-commerce merchants | |
CN115860880B (en) | Personalized commodity recommendation method and system based on multi-layer heterogeneous graph convolution model | |
CN104933595A (en) | Collaborative filtering recommendation method based on Markov prediction model | |
CN111429214B (en) | Transaction data-based buyer and seller matching method and device | |
CN110347923B (en) | Traceable fast fission type user portrait construction method | |
CN115600009A (en) | Deep reinforcement learning-based recommendation method considering future preference of user |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170222 |