CN109471963A - A kind of proposed algorithm based on deeply study - Google Patents
A kind of proposed algorithm based on deeply study Download PDFInfo
- Publication number
- CN109471963A CN109471963A CN201811070447.0A CN201811070447A CN109471963A CN 109471963 A CN109471963 A CN 109471963A CN 201811070447 A CN201811070447 A CN 201811070447A CN 109471963 A CN109471963 A CN 109471963A
- Authority
- CN
- China
- Prior art keywords
- neural network
- value
- current state
- movement
- nextstate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Abstract
The present invention proposes a kind of proposed algorithm based on deeply study, construct the dual network structural model of MainNet neural network and TargetNet neural network, wherein MainNet neural network is main neural network, for obtaining family to recommendation list, TargetNet neural network is used for training pattern parameter, obtain optimal model parameters, and constantly update model parameter, the current state of input as MainNet neural network not only includes long-term characteristic, and including external condition feature, lay a good foundation for the Accurate Prediction of user's Shopping Behaviors.The shortcomings that the present invention overcomes conventional machines study, does not need historical data accumulation, as long as website, there are trading activity, which may be implemented self-teaching, self-optimization and self-perfection.
Description
Technical field
The present invention relates to recommended method fields, more particularly, to a kind of proposed algorithm based on deeply study.
Background technique
Currently, analysis user behavior, allows system " conjecture " to go out the interested article of user, user experience is promoted, is one
Great system engineering.Common proposed algorithm includes collaborative filtering, content-based recommendation algorithm, based on correlation rule
Proposed algorithm, and these algorithms have the following problems: (1) be cold-started problem: being difficult to determine which kind of article recommended to new user,
It is difficult to determine that new article is recommended to which user, new user, new article is caused to be unable to get scientific and reasonable recommendation;(2)
Long-tail phenomenon: largely recommending popular article, and the article for comparing unexpected winner carries out less recommendation, causes
Popular article is more and more popular, and the article of unexpected winner increasingly unexpected winner also causes recommender system that can not recommend novel article, no
It can be brought to user pleasantly surprised;(3) privacy of user is protected: recommender system needs the historical behavior information using user, even user
Demographic attributes information, a system that cannot protect privacy of user very well can allow user to lack the sense of security, be reluctant to provide
Personal information, or even cause that effective recommendation can not be provided;(4) common proposed algorithm belongs to machine learning field, requires
A large amount of history data set accumulates without user, transaction data for just online e-commerce system, is unrealistic
's.
Summary of the invention
The present invention in order to overcome at least one of the drawbacks of the prior art described above, provide it is a kind of based on deeply study
Proposed algorithm.
In order to solve the above technical problems, technical scheme is as follows:
It is a kind of based on deeply study proposed algorithm, which is characterized in that it the following steps are included:
S1: its capacity W is arranged in initialization experience pond, and experience pond is the set of commercial product recommending movement, for storing trained sample
This, experience pond is sky before starting training;
S2: establishing MainNet neural network, and initialize to it, and the MainNet neural network is main nerve
Network, for obtaining recommendation list;
S3: establishing TargetNet neural network, and initialize to it, and the TargetNet neural network is used for
Training pattern parameter, obtains optimal model parameters;
S4: training segment sum M is set;
S5: the N number of commodity browsed recently according to t moment user initialize current state stIf the quotient that user browses recently
Product are sky, with the replacement of N number of much-sought-after item;
S6: current state stAs the input of MainNet neural network, current state s is obtainedtUnder optional movement Q value column
Table Q (st,at,θu), wherein stIt is current state, atIt is to execute movement, θuIt is the parameter of MainNet neural network;
S7: a is acted according to executiont, user click according to the interest of oneself/purchase/after ignoring, and calculates award rtWith
NextState st+1;
S8: by commercial product recommending set of actions (st,at,rt,st+1) be stored in experience pond;
S9: circulation executes step S6-S8, until storing W training data in experience pond;
S10: M training data is taken out from experience pond at random, by each of training data NextState st+1As
The input of TargetNet neural network obtains NextState st+1Under optional movement Q value list Q (st+1,at+1,θu′);
S11: the parameter θ of MainNet neural network is updatedu;
S12: every to take turns iteration by C, wherein C is preset iterative numerical, and the parameter of MainNet neural network is copied to
TargetNet neural network.
Further, initialization described in step S2 includes: initiation parameter θu, input as current state st, export and be
Current state stUnder optional movement Q value list Q (st,at,θu)。
Further, the current state stIt is expressed as follows:
Wherein,It is i-th of commodity that t moment user browses recently, sex is the gender of user, holiday, month,
Day, weather are festivals or holidays, current slot, time, weather respectively.
Further, initialization described in step S3 includes: initiation parameter θu′, input as NextState st+1, output
For NextState st+1Under optional movement Q value list Q (st+1,at+1,θu′)。
Further, step S6 specifically includes the following steps:
S61: the current state s of calculatingtUnder optional movement Q value, calculation formula is as follows:
Q(st,at)=E π [Rt+1+γRt+2+γ2Rt+3+ ... | s=st, a=at]
Wherein, γ is discount factor, stIt is current state, atIt is current action, Rt+1It is the reward function value at t+1 moment,
Rt+2It is the reward function value at t+2 moment, Rt+3It is the reward function value at t+3 moment, EπIt is Q (s, a, θu) value maximum when return
Functional value is a state decision function;
S62: current state s is generatedtUnder optional movement Q value list Q (st,at,θu)。
Further, execution described in step S7 acts atIt is expressed as follows:
Wherein, K is the commodity number for recommending user,For i-th of commodity for recommending user.
Further, award r described in step S7tDefinition be: under current state in case of commodity click
Movement, then award is the commodity number that user clicks;In case of the movement of commodity purchasing under current state, then award is use
The price of family purchase commodity;In the case of other, reward value 0.
Further, step S10 specifically includes the following steps:
S101: M training data is taken out from experience pond at random;
S102: NextState s is calculatedt+1Under optional movement Q value, calculation formula is as follows:
Q(st+1,at+1)=E π [Rt+1+γRt+2+γ2Rt+3+ ... | s=st+1, a=at+1]
Wherein, γ is discount factor, st+1It is NextState, at+1It is next movement, Rt+2It is the reward function at t+2 moment
Value, Rt+3It is the reward function value at t+3 moment, Rt+4It is the reward function value at t+4 moment;
S103: NextState s is generatedt+1Under optional movement Q value list Q (st+1,at+1,θu′)。
Further, step S11 specifically includes the following steps:
S111: current state s is calculatedtUnder TargetQ value, calculation formula is as follows:
TargetQ=rt+γmaxQ(st+1,at+1,θu′)
Wherein, rtIt is the award of current action, γ is discount factor, st+1It is NextState, at+1It is next movement, θu′It is
The parameter of TargetNet neural network, Q (st+1,at+1,θu′) it is the NextState s that TargetNet neural network exportst+1Under can
The Q value list of choosing movement;
S112: calculating loss function, when loss function obtains minimum value, updates the parameter θ of MainNet neural networku, damage
It is as follows to lose function:
L(θu)=E [(TargetQ-Q (st,at,θu))2]
=E [(rt+γmaxQ(st+1,at+1,θu′)-Q(st,at,θu))2]
Wherein, E is to average, rtIt is the award of current action, γ is discount factor, Q (st,at,θu) it is current state st
Under optional movement Q value list, Q (st+1,at+1,θu′) it is NextState st+1Under optional movement Q value list, stIt is current shape
State, atIt is current action, st+1It is NextState, at+1It is next movement, θuIt is the parameter of MainNet neural network, θu′It is
The parameter of TargetNet neural network.
Further, current state stIn sex be user long-term characteristic, for distinguishing different groups, for not
Same user group can also make different selections under identical recommendation list;Current state stIn holiday,
Month, day, weather are external condition feature, and different external condition features can largely change the shopping of user
Behavior, such as user are more active in the behavior of festivals or holidays.
Further, when user acts a to executiontIt does not click or when purchase acts, recommendation list is constant, when user is dynamic
A is made to executiontWhen having click or purchase, recommendation list changes, i.e., removes the browsing commodity of front in recommendation list, fills just
Generated the commodity of click or buying behavior.
Compared with prior art, the beneficial effect of technical solution of the present invention is: (1) data edge, using strong based on depth
The proposed algorithm that chemistry is practised overcomes the shortcomings that conventional machines learn, does not need historical data, as long as website has transaction row
Gradually to learn, self-optimization and perfect;(2) using the Q value list of optional movement, the correlation of article is fully taken into account, it will
Executing action definition is the items list for recommending user;(3) building includes MainNet neural network and TargetNet nerve
The dual network structure of network, improves algorithm stability.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the proposed algorithm based on deeply study of the present invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product
Size;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing
's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
In conjunction with Fig. 1, specific implementation step of the invention is as follows:
It is a kind of based on deeply study proposed algorithm, which is characterized in that it the following steps are included:
S1: its capacity W=100000 is arranged in initialization experience pond, and experience pond is the set of commercial product recommending movement, for depositing
Training sample is stored up, experience pond is sky before starting training;
S2: establishing MainNet neural network, and initialize to it, and the MainNet neural network is main nerve
Network, for obtaining recommendation list, initialization content includes: by standardized normal distribution initialization network parameter θu, input to work as
Preceding state st, export as current state stUnder optional movement Q value list Q (st,at,θu);
S3: establishing TargetNet neural network, and initialize to it, and the TargetNet neural network is used for
Training pattern parameter obtains optimal model parameters, and initialization content includes: by standardized normal distribution initiation parameter θu′, input
For NextState st+1, export as NextState st+1Under optional movement Q value list Q (st+1,at+1,θu′);
S4: training segment sum M=64 are set;
S5: the N number of commodity browsed recently according to t moment user, wherein N=10, initializes current state stIf user is most
The commodity closely browsed are sky, with the replacement of N number of much-sought-after item;
S6: current state stAs the input of MainNet neural network, current state s is obtainedtUnder optional movement Q value column
Table Q (st,at,θu);
S7: a is acted according to executiont, user click according to the interest of oneself/purchase/after ignoring, and calculates award rtWith
NextState st+1;
S8: the set (s that commercial product recommending is actedt,at,rt,st+1) be stored in experience pond;
S9: circulation executes step S6-S8, until W training data is stored in experience pond, wherein W=100000;
S10: M training data is taken out from experience pond at random, wherein M=64, next by each of training data
State st+1As the input of TargetNet neural network, NextState s is obtainedt+1Under optional movement Q value list Q (st+1,
at+1,θu′);
S11: the parameter θ of MainNet neural network is updatedu;
S12: every to take turns iteration by C, the parameter of MainNet neural network is copied to TargetNet nerve net by C=5
Network.
Specifically, the current state stIt is expressed as follows:
Wherein,It is i-th of commodity that t moment user browses recently, sex is the gender of user, holiday, month,
Day, weather are festivals or holidays, current slot, time, weather respectively.
Specifically, step S6 specifically includes the following steps:
S61: the current state s of calculatingtUnder optional movement Q value, calculation formula is as follows:
Q(st,at)=E π [Rt+1+γRt+2+γ2Rt+3+ ... | s=st, a=at]
Wherein, γ is discount factor, stIt is current state, atIt is current action, Rt+1It is the reward function value at t+1 moment,
Rt+2It is the reward function value at t+2 moment, Rt+3It is the reward function value at t+3 moment, EπIt is Q (s, a, θu) value maximum when return
Functional value is a state decision function;
S62: current state s is generatedtUnder optional movement Q value list Q (st,at,θu)。
Specifically, execution described in step S7 acts atIt is expressed as follows:
Wherein, K is the commodity number for recommending user,For i-th of commodity for recommending user.
Specifically, award r described in step S7tDefinition be: click under current state in case of commodity dynamic
Make, then award is the commodity number that user clicks;In case of the movement of commodity purchasing under current state, then award is user
Buy the price of commodity;In the case of other, reward value 0.
Specifically, step S10 specifically includes the following steps:
S101: M training data is taken out from experience pond at random, wherein M=64;
S102: NextState s is calculatedt+1Under optional movement Q value, calculation formula is as follows:
Q(st+1,at+1)=Eπ[Rt+1+γRt+2+γ2Rt+3+ ... | s=st+1, a=at+1]
Wherein, γ is discount factor, st+1It is NextState, at+1It is next movement, Rt+2It is the reward function at t+2 moment
Value, Rt+3It is the reward function value at t+3 moment, Rt+4It is the reward function value at t+4 moment;
S103: NextState s is generatedt+1Under optional movement Q value list Q (st+1,at+1,θu′)。
Specifically, step S11 specifically includes the following steps:
S111: current state s is calculatedtUnder TargetQ value, calculation formula is as follows:
TargetQ=rt+γmaxQ(st+1,at+1,θu′)
Wherein, rtIt is the award of current action, γ is discount factor, st+1It is NextState, at+1It is next movement, θu′It is
The parameter of TargetNet neural network, Q (st+1,at+1,θu′) it is TargetNet neural network NextState st+1Under optional movement
Q value list;
S112: calculating loss function, when loss function obtains minimum value, updates the parameter θ of MainNet neural networku, damage
It is as follows to lose function:
L(θu)=E [(TargetQ-Q (st,at,θu))2]
=E [(rt+γmax Q(st+1,at+1,θu′)-Q(st,at,θu))2]
Wherein, E is to average, rtIt is the award of current action, γ is discount factor, Q (st,at,θu) it is current state st
Under optional movement Q value list, Q (st+1,at+1,θu′) it is NextState st+1Under optional movement Q value list, stIt is current shape
State, atIt is current action, st+1It is NextState, at+1It is next movement, θuIt is the parameter of MainNet neural network, θu′It is
The parameter of TargetNet neural network.
Specifically, current state stIn sex be user long-term characteristic, for distinguishing different groups, for difference
User group can also make different selections under identical recommendation list;Current state stIn holiday, month,
Day, weather are external condition feature, and different external condition features can largely change the Shopping Behaviors of user, than
As user is more active in the behavior of festivals or holidays.
Specifically, when user acts a to executiontIt does not click or when purchase acts, recommendation list is constant, when user is to holding
A is made in actiontWhen having click or purchase, recommendation list changes, i.e., removes the browsing commodity of front in recommendation list, fill rigid production
Gave birth to the commodity of click or buying behavior.
The same or similar label correspond to the same or similar components;
The terms describing the positional relationship in the drawings are only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention
Protection scope within.
Claims (9)
1. it is a kind of based on deeply study proposed algorithm, which is characterized in that it the following steps are included:
S1: its capacity W is arranged in initialization experience pond, and experience pond is the set of commercial product recommending movement, for storing training sample,
Experience pond is sky before starting training;
S2: establishing MainNet neural network, and initialize to it, and the MainNet neural network is main nerve net
Network, for obtaining recommendation list;
S3: establishing TargetNet neural network, and initialize to it, and the TargetNet neural network is for training
Model parameter obtains optimal model parameters;
S4: training segment sum M is set;
S5: the N number of commodity browsed recently according to t moment user initialize current state stIf the commodity that user browses recently are
Sky, with the replacement of N number of much-sought-after item;
S6: current state stAs the input of MainNet neural network, current state s is obtainedtUnder optional movement Q value list Q
(st,at,θu), wherein stIt is current state, atIt is to execute movement, θuIt is the parameter of MainNet neural network;
S7: a is acted according to executiont, user click according to the interest of oneself/purchase/after ignoring, and calculates award rtWith it is next
State st+1;
S8: by commercial product recommending set of actions (st,at,rt,st+1) be stored in experience pond;
S9: circulation executes step S6-S8, until storing W training data in experience pond;
S10: M training data is taken out from experience pond at random, by each of training data NextState st+1As
The input of TargetNet neural network obtains NextState st+1Under optional movement Q value list Q (st+1,at+1,θu′);
S11: the parameter θ of MainNet neural network is updatedu;
S12: every to take turns iteration by C, wherein C is preset iterative numerical, and the parameter of MainNet neural network is copied to
TargetNet neural network.
2. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S2
The initialization stated includes: initiation parameter θu, input as current state st, export as current state stUnder optional movement Q value column
Table Q (st,at,θu)。
3. a kind of proposed algorithm based on deeply study according to claim 2, which is characterized in that described is current
State stIt is expressed as follows:
Wherein,I-th of commodity that t moment user browses recently, sex is the gender of user, holiday, month, day,
Weather is festivals or holidays, current slot, time, weather respectively.
4. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S3
The initialization stated includes: initiation parameter θu′, input as NextState st+1, export as NextState st+1Under optional movement Q
Value list Q (st+1,at+1,θu′)。
5. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that step S6 is specific
The following steps are included:
S61: the current state s of calculatingtUnder optional movement Q value, calculation formula is as follows:
Q(st,at)=Eπ[Rt+1+γRt+2+γ2Rt+3+ ... | s=st, a=at]
Wherein, γ is discount factor, stIt is current state, atIt is current action, Rt+1It is the reward function value at t+1 moment, Rt+2It is
The reward function value at t+2 moment, Rt+3It is the reward function value at t+3 moment, EπIt is Q (s, a, θu) value maximum when Reward Program
Value, is a state decision function;
S62: current state s is generatedtUnder optional movement Q value list Q (st,at,θu)。
6. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S7
The execution movement a statedtIt is expressed as follows:
Wherein, K is the commodity number for recommending user,For i-th of commodity for recommending user.
7. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S7
The award r statedtDefinition be: under current state in case of commodity click movement, then award be user click commodity
Number;In case of the movement of commodity purchasing under current state, then award is the price that user buys commodity;Other situations
Under, reward value 0.
8. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that step S10 tool
Body the following steps are included:
S101: M training data is taken out from experience pond at random;
S102: NextState s is calculatedt+1Under optional movement Q value, calculation formula is as follows:
Q(st+1,at+1)=Eπ[Rt+1+γRt+2+γ2Rt+3+ ... | s=st+1, a=at+1]
Wherein, γ is discount factor, st+1It is NextState, at+1It is next movement, Rt+2It is the reward function value at t+2 moment,
Rt+3It is the reward function value at t+3 moment, Rt+4It is the reward function value at t+4 moment;
S103: NextState s is generatedt+1Under optional movement Q value list Q (st+1,at+1,θu′)。
9. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that step S11 tool
Body the following steps are included:
S111: current state s is calculatedtUnder TargetQ value, calculation formula is as follows:
TargetQ=rt+γmaxQ(st+1,at+1,θu′)
Wherein, rtIt is the award of current action, γ is discount factor, st+1It is NextState, at+1It is next movement, θu′It is
The network parameter of TargetNet, Q (st+1,at+1,θu′) it is the NextState s that the network of TargetNet exportst+1Under optional movement
Q value list;
S112: calculating loss function, when loss function obtains minimum value, updates the parameter θ of MainNet neural networku, lose letter
Number is as follows:
L(θu)=E [(TargetQ-Q (st,at,θu))2]
=E [(rt+γmaxQ(st+1,at+1,θu′)-Q(st,at,θu))2]
Wherein, E is to average, rtIt is the award of current action, γ is discount factor, Q (st,at,θu) it is current state stUnder can
The Q value list of choosing movement, Q (st+1,at+1,θu′) it is NextState st+1Under optional movement Q value list, stIt is current state, at
It is current action, st+1It is NextState, at+1It is next movement, θuIt is the parameter of MainNet neural network, θu′It is
The parameter of TargetNet neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811070447.0A CN109471963A (en) | 2018-09-13 | 2018-09-13 | A kind of proposed algorithm based on deeply study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811070447.0A CN109471963A (en) | 2018-09-13 | 2018-09-13 | A kind of proposed algorithm based on deeply study |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109471963A true CN109471963A (en) | 2019-03-15 |
Family
ID=65664609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811070447.0A Pending CN109471963A (en) | 2018-09-13 | 2018-09-13 | A kind of proposed algorithm based on deeply study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109471963A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109967741A (en) * | 2019-03-29 | 2019-07-05 | 贵州翰凯斯智能技术有限公司 | A kind of 3D printing technique optimization method based on enhancing study |
CN110135951A (en) * | 2019-05-15 | 2019-08-16 | 网易(杭州)网络有限公司 | Recommended method, device and the readable storage medium storing program for executing of game commodity |
CN110581808A (en) * | 2019-08-22 | 2019-12-17 | 武汉大学 | Congestion control method and system based on deep reinforcement learning |
CN110659947A (en) * | 2019-10-11 | 2020-01-07 | 沈阳民航东北凯亚有限公司 | Commodity recommendation method and device |
CN110838024A (en) * | 2019-10-16 | 2020-02-25 | 支付宝(杭州)信息技术有限公司 | Information pushing method, device and equipment based on deep reinforcement learning |
CN110942208A (en) * | 2019-12-10 | 2020-03-31 | 萍乡市恒升特种材料有限公司 | Method for determining optimal production conditions of silicon carbide foam ceramic |
CN111159558A (en) * | 2019-12-31 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Recommendation list generation method and device and electronic equipment |
CN111309907A (en) * | 2020-02-10 | 2020-06-19 | 大连海事大学 | Real-time Bug assignment method based on deep reinforcement learning |
CN111339675A (en) * | 2020-03-10 | 2020-06-26 | 南栖仙策(南京)科技有限公司 | Training method for intelligent marketing strategy based on machine learning simulation environment |
CN111401937A (en) * | 2020-02-26 | 2020-07-10 | 平安科技(深圳)有限公司 | Data pushing method and device and storage medium |
CN111738787A (en) * | 2019-06-13 | 2020-10-02 | 北京京东尚科信息技术有限公司 | Information pushing method and device |
CN111859099A (en) * | 2019-12-05 | 2020-10-30 | 马上消费金融股份有限公司 | Recommendation method, device, terminal and storage medium based on reinforcement learning |
CN112085524A (en) * | 2020-08-31 | 2020-12-15 | 中国人民大学 | Q learning model-based result pushing method and system |
CN112733004A (en) * | 2021-01-22 | 2021-04-30 | 上海交通大学 | Movie and television work recommendation method based on multi-arm tiger machine algorithm |
CN117290609A (en) * | 2023-11-24 | 2023-12-26 | 中国科学技术大学 | Product data recommendation method and product data recommendation device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
US20170337478A1 (en) * | 2016-05-22 | 2017-11-23 | Microsoft Technology Licensing, Llc | Self-Learning Technique for Training a PDA Component and a Simulated User Component |
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
CN108230058A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Products Show method and system |
CN108230057A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of intelligent recommendation method and system |
-
2018
- 2018-09-13 CN CN201811070447.0A patent/CN109471963A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
US20170337478A1 (en) * | 2016-05-22 | 2017-11-23 | Microsoft Technology Licensing, Llc | Self-Learning Technique for Training a PDA Component and a Simulated User Component |
CN108230058A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Products Show method and system |
CN108230057A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of intelligent recommendation method and system |
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
Non-Patent Citations (2)
Title |
---|
秦星辰: "基于RHadoop云平台的推荐系统的研究与设计", 《中国优秀硕士学位论文全文数据库 信息科技辑(月刊)》 * |
草帽B-O-Y: "深度强化学习-DQN", 《CSDN HTTPS://BLOG.CSDN.NET/U013236946/ARTICLE/DETAILS/72871858》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109967741A (en) * | 2019-03-29 | 2019-07-05 | 贵州翰凯斯智能技术有限公司 | A kind of 3D printing technique optimization method based on enhancing study |
CN109967741B (en) * | 2019-03-29 | 2021-02-02 | 贵州翰凯斯智能技术有限公司 | 3D printing process optimization method based on reinforcement learning |
CN110135951A (en) * | 2019-05-15 | 2019-08-16 | 网易(杭州)网络有限公司 | Recommended method, device and the readable storage medium storing program for executing of game commodity |
CN111738787A (en) * | 2019-06-13 | 2020-10-02 | 北京京东尚科信息技术有限公司 | Information pushing method and device |
CN110581808A (en) * | 2019-08-22 | 2019-12-17 | 武汉大学 | Congestion control method and system based on deep reinforcement learning |
CN110659947A (en) * | 2019-10-11 | 2020-01-07 | 沈阳民航东北凯亚有限公司 | Commodity recommendation method and device |
CN110838024A (en) * | 2019-10-16 | 2020-02-25 | 支付宝(杭州)信息技术有限公司 | Information pushing method, device and equipment based on deep reinforcement learning |
CN111859099B (en) * | 2019-12-05 | 2021-08-31 | 马上消费金融股份有限公司 | Recommendation method, device, terminal and storage medium based on reinforcement learning |
CN111859099A (en) * | 2019-12-05 | 2020-10-30 | 马上消费金融股份有限公司 | Recommendation method, device, terminal and storage medium based on reinforcement learning |
CN110942208B (en) * | 2019-12-10 | 2023-07-07 | 萍乡市恒升特种材料有限公司 | Method for determining optimal production conditions of silicon carbide foam ceramic |
CN110942208A (en) * | 2019-12-10 | 2020-03-31 | 萍乡市恒升特种材料有限公司 | Method for determining optimal production conditions of silicon carbide foam ceramic |
CN111159558A (en) * | 2019-12-31 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Recommendation list generation method and device and electronic equipment |
CN111159558B (en) * | 2019-12-31 | 2023-07-18 | 支付宝(杭州)信息技术有限公司 | Recommendation list generation method and device and electronic equipment |
CN111309907A (en) * | 2020-02-10 | 2020-06-19 | 大连海事大学 | Real-time Bug assignment method based on deep reinforcement learning |
CN111401937A (en) * | 2020-02-26 | 2020-07-10 | 平安科技(深圳)有限公司 | Data pushing method and device and storage medium |
WO2021169218A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Data pushing method and system, electronic device and storage medium |
CN111339675A (en) * | 2020-03-10 | 2020-06-26 | 南栖仙策(南京)科技有限公司 | Training method for intelligent marketing strategy based on machine learning simulation environment |
CN112085524A (en) * | 2020-08-31 | 2020-12-15 | 中国人民大学 | Q learning model-based result pushing method and system |
CN112085524B (en) * | 2020-08-31 | 2022-11-15 | 中国人民大学 | Q learning model-based result pushing method and system |
CN112733004A (en) * | 2021-01-22 | 2021-04-30 | 上海交通大学 | Movie and television work recommendation method based on multi-arm tiger machine algorithm |
CN112733004B (en) * | 2021-01-22 | 2022-09-30 | 上海交通大学 | Movie and television work recommendation method based on multi-arm tiger machine algorithm |
CN117290609A (en) * | 2023-11-24 | 2023-12-26 | 中国科学技术大学 | Product data recommendation method and product data recommendation device |
CN117290609B (en) * | 2023-11-24 | 2024-03-29 | 中国科学技术大学 | Product data recommendation method and product data recommendation device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109471963A (en) | A kind of proposed algorithm based on deeply study | |
CN103678518B (en) | Method and device for adjusting recommendation lists | |
CN108009897A (en) | A kind of real-time recommendation method of commodity, system and readable storage medium storing program for executing | |
CN103246980B (en) | Information output method and server | |
CN108153791B (en) | Resource recommendation method and related device | |
CN106447463A (en) | Commodity recommendation method based on Markov decision-making process model | |
CN103886001A (en) | Personalized commodity recommendation system | |
CN102479366A (en) | Commodity recommending method and system | |
CN106168980A (en) | Multimedia resource recommends sort method and device | |
CN107145506B (en) | Improved content-based agricultural commodity recommendation method | |
Fainmesser | Community structure and market outcomes: A repeated games-in-networks approach | |
CN104933595A (en) | Collaborative filtering recommendation method based on Markov prediction model | |
CN109034960A (en) | A method of more inferred from attributes based on user node insertion | |
US20160196579A1 (en) | Dynamic deep links based on user activity of a particular user | |
Lu et al. | Research on e-commerce customer repeat purchase behavior and purchase stickiness | |
Flajolet et al. | Real-time bidding with side information | |
Karpenko et al. | The influence of the consumer’s type–physical or digital–on their behavioral characteristics | |
Jiang et al. | Intertemporal pricing via nonparametric estimation: Integrating reference effects and consumer heterogeneity | |
Bergemann et al. | Progressive participation | |
CN110288419A (en) | A kind of dynamic updates the electric business agricultural product recommended method of weight | |
CN107967627A (en) | A kind of linear regression finance product based on content recommends method | |
CN113781134A (en) | Item recommendation method and device and computer-readable storage medium | |
Wu et al. | Design of optimal control strategies for a supply chain with competing manufacturers under consignment contract | |
Liu et al. | A semiparametric varying coefficient model of monotone auction bidding processes | |
Kota et al. | Temporal multi-hierarchy smoothing for estimating rates of rare events |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190315 |
|
RJ01 | Rejection of invention patent application after publication |