CN109471963A - A kind of proposed algorithm based on deeply study - Google Patents

A kind of proposed algorithm based on deeply study Download PDF

Info

Publication number
CN109471963A
CN109471963A CN201811070447.0A CN201811070447A CN109471963A CN 109471963 A CN109471963 A CN 109471963A CN 201811070447 A CN201811070447 A CN 201811070447A CN 109471963 A CN109471963 A CN 109471963A
Authority
CN
China
Prior art keywords
neural network
value
current state
movement
nextstate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811070447.0A
Other languages
Chinese (zh)
Inventor
陈曦
蓝志坚
余智君
陈卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Feng Shi Technology Co Ltd
Original Assignee
Guangzhou Feng Shi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Feng Shi Technology Co Ltd filed Critical Guangzhou Feng Shi Technology Co Ltd
Priority to CN201811070447.0A priority Critical patent/CN109471963A/en
Publication of CN109471963A publication Critical patent/CN109471963A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Abstract

The present invention proposes a kind of proposed algorithm based on deeply study, construct the dual network structural model of MainNet neural network and TargetNet neural network, wherein MainNet neural network is main neural network, for obtaining family to recommendation list, TargetNet neural network is used for training pattern parameter, obtain optimal model parameters, and constantly update model parameter, the current state of input as MainNet neural network not only includes long-term characteristic, and including external condition feature, lay a good foundation for the Accurate Prediction of user's Shopping Behaviors.The shortcomings that the present invention overcomes conventional machines study, does not need historical data accumulation, as long as website, there are trading activity, which may be implemented self-teaching, self-optimization and self-perfection.

Description

A kind of proposed algorithm based on deeply study
Technical field
The present invention relates to recommended method fields, more particularly, to a kind of proposed algorithm based on deeply study.
Background technique
Currently, analysis user behavior, allows system " conjecture " to go out the interested article of user, user experience is promoted, is one Great system engineering.Common proposed algorithm includes collaborative filtering, content-based recommendation algorithm, based on correlation rule Proposed algorithm, and these algorithms have the following problems: (1) be cold-started problem: being difficult to determine which kind of article recommended to new user, It is difficult to determine that new article is recommended to which user, new user, new article is caused to be unable to get scientific and reasonable recommendation;(2) Long-tail phenomenon: largely recommending popular article, and the article for comparing unexpected winner carries out less recommendation, causes Popular article is more and more popular, and the article of unexpected winner increasingly unexpected winner also causes recommender system that can not recommend novel article, no It can be brought to user pleasantly surprised;(3) privacy of user is protected: recommender system needs the historical behavior information using user, even user Demographic attributes information, a system that cannot protect privacy of user very well can allow user to lack the sense of security, be reluctant to provide Personal information, or even cause that effective recommendation can not be provided;(4) common proposed algorithm belongs to machine learning field, requires A large amount of history data set accumulates without user, transaction data for just online e-commerce system, is unrealistic 's.
Summary of the invention
The present invention in order to overcome at least one of the drawbacks of the prior art described above, provide it is a kind of based on deeply study Proposed algorithm.
In order to solve the above technical problems, technical scheme is as follows:
It is a kind of based on deeply study proposed algorithm, which is characterized in that it the following steps are included:
S1: its capacity W is arranged in initialization experience pond, and experience pond is the set of commercial product recommending movement, for storing trained sample This, experience pond is sky before starting training;
S2: establishing MainNet neural network, and initialize to it, and the MainNet neural network is main nerve Network, for obtaining recommendation list;
S3: establishing TargetNet neural network, and initialize to it, and the TargetNet neural network is used for Training pattern parameter, obtains optimal model parameters;
S4: training segment sum M is set;
S5: the N number of commodity browsed recently according to t moment user initialize current state stIf the quotient that user browses recently Product are sky, with the replacement of N number of much-sought-after item;
S6: current state stAs the input of MainNet neural network, current state s is obtainedtUnder optional movement Q value column Table Q (st,atu), wherein stIt is current state, atIt is to execute movement, θuIt is the parameter of MainNet neural network;
S7: a is acted according to executiont, user click according to the interest of oneself/purchase/after ignoring, and calculates award rtWith NextState st+1
S8: by commercial product recommending set of actions (st,at,rt,st+1) be stored in experience pond;
S9: circulation executes step S6-S8, until storing W training data in experience pond;
S10: M training data is taken out from experience pond at random, by each of training data NextState st+1As The input of TargetNet neural network obtains NextState st+1Under optional movement Q value list Q (st+1,at+1u′);
S11: the parameter θ of MainNet neural network is updatedu
S12: every to take turns iteration by C, wherein C is preset iterative numerical, and the parameter of MainNet neural network is copied to TargetNet neural network.
Further, initialization described in step S2 includes: initiation parameter θu, input as current state st, export and be Current state stUnder optional movement Q value list Q (st,atu)。
Further, the current state stIt is expressed as follows:
Wherein,It is i-th of commodity that t moment user browses recently, sex is the gender of user, holiday, month, Day, weather are festivals or holidays, current slot, time, weather respectively.
Further, initialization described in step S3 includes: initiation parameter θu′, input as NextState st+1, output For NextState st+1Under optional movement Q value list Q (st+1,at+1u′)。
Further, step S6 specifically includes the following steps:
S61: the current state s of calculatingtUnder optional movement Q value, calculation formula is as follows:
Q(st,at)=E π [Rt+1+γRt+22Rt+3+ ... | s=st, a=at]
Wherein, γ is discount factor, stIt is current state, atIt is current action, Rt+1It is the reward function value at t+1 moment, Rt+2It is the reward function value at t+2 moment, Rt+3It is the reward function value at t+3 moment, EπIt is Q (s, a, θu) value maximum when return Functional value is a state decision function;
S62: current state s is generatedtUnder optional movement Q value list Q (st,atu)。
Further, execution described in step S7 acts atIt is expressed as follows:
Wherein, K is the commodity number for recommending user,For i-th of commodity for recommending user.
Further, award r described in step S7tDefinition be: under current state in case of commodity click Movement, then award is the commodity number that user clicks;In case of the movement of commodity purchasing under current state, then award is use The price of family purchase commodity;In the case of other, reward value 0.
Further, step S10 specifically includes the following steps:
S101: M training data is taken out from experience pond at random;
S102: NextState s is calculatedt+1Under optional movement Q value, calculation formula is as follows:
Q(st+1,at+1)=E π [Rt+1+γRt+22Rt+3+ ... | s=st+1, a=at+1]
Wherein, γ is discount factor, st+1It is NextState, at+1It is next movement, Rt+2It is the reward function at t+2 moment Value, Rt+3It is the reward function value at t+3 moment, Rt+4It is the reward function value at t+4 moment;
S103: NextState s is generatedt+1Under optional movement Q value list Q (st+1,at+1u′)。
Further, step S11 specifically includes the following steps:
S111: current state s is calculatedtUnder TargetQ value, calculation formula is as follows:
TargetQ=rt+γmaxQ(st+1,at+1u′)
Wherein, rtIt is the award of current action, γ is discount factor, st+1It is NextState, at+1It is next movement, θu′It is The parameter of TargetNet neural network, Q (st+1,at+1u′) it is the NextState s that TargetNet neural network exportst+1Under can The Q value list of choosing movement;
S112: calculating loss function, when loss function obtains minimum value, updates the parameter θ of MainNet neural networku, damage It is as follows to lose function:
L(θu)=E [(TargetQ-Q (st,atu))2]
=E [(rt+γmaxQ(st+1,at+1u′)-Q(st,atu))2]
Wherein, E is to average, rtIt is the award of current action, γ is discount factor, Q (st,atu) it is current state st Under optional movement Q value list, Q (st+1,at+1u′) it is NextState st+1Under optional movement Q value list, stIt is current shape State, atIt is current action, st+1It is NextState, at+1It is next movement, θuIt is the parameter of MainNet neural network, θu′It is The parameter of TargetNet neural network.
Further, current state stIn sex be user long-term characteristic, for distinguishing different groups, for not Same user group can also make different selections under identical recommendation list;Current state stIn holiday, Month, day, weather are external condition feature, and different external condition features can largely change the shopping of user Behavior, such as user are more active in the behavior of festivals or holidays.
Further, when user acts a to executiontIt does not click or when purchase acts, recommendation list is constant, when user is dynamic A is made to executiontWhen having click or purchase, recommendation list changes, i.e., removes the browsing commodity of front in recommendation list, fills just Generated the commodity of click or buying behavior.
Compared with prior art, the beneficial effect of technical solution of the present invention is: (1) data edge, using strong based on depth The proposed algorithm that chemistry is practised overcomes the shortcomings that conventional machines learn, does not need historical data, as long as website has transaction row Gradually to learn, self-optimization and perfect;(2) using the Q value list of optional movement, the correlation of article is fully taken into account, it will Executing action definition is the items list for recommending user;(3) building includes MainNet neural network and TargetNet nerve The dual network structure of network, improves algorithm stability.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the proposed algorithm based on deeply study of the present invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product Size;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
In conjunction with Fig. 1, specific implementation step of the invention is as follows:
It is a kind of based on deeply study proposed algorithm, which is characterized in that it the following steps are included:
S1: its capacity W=100000 is arranged in initialization experience pond, and experience pond is the set of commercial product recommending movement, for depositing Training sample is stored up, experience pond is sky before starting training;
S2: establishing MainNet neural network, and initialize to it, and the MainNet neural network is main nerve Network, for obtaining recommendation list, initialization content includes: by standardized normal distribution initialization network parameter θu, input to work as Preceding state st, export as current state stUnder optional movement Q value list Q (st,atu);
S3: establishing TargetNet neural network, and initialize to it, and the TargetNet neural network is used for Training pattern parameter obtains optimal model parameters, and initialization content includes: by standardized normal distribution initiation parameter θu′, input For NextState st+1, export as NextState st+1Under optional movement Q value list Q (st+1,at+1u′);
S4: training segment sum M=64 are set;
S5: the N number of commodity browsed recently according to t moment user, wherein N=10, initializes current state stIf user is most The commodity closely browsed are sky, with the replacement of N number of much-sought-after item;
S6: current state stAs the input of MainNet neural network, current state s is obtainedtUnder optional movement Q value column Table Q (st,atu);
S7: a is acted according to executiont, user click according to the interest of oneself/purchase/after ignoring, and calculates award rtWith NextState st+1
S8: the set (s that commercial product recommending is actedt,at,rt,st+1) be stored in experience pond;
S9: circulation executes step S6-S8, until W training data is stored in experience pond, wherein W=100000;
S10: M training data is taken out from experience pond at random, wherein M=64, next by each of training data State st+1As the input of TargetNet neural network, NextState s is obtainedt+1Under optional movement Q value list Q (st+1, at+1u′);
S11: the parameter θ of MainNet neural network is updatedu
S12: every to take turns iteration by C, the parameter of MainNet neural network is copied to TargetNet nerve net by C=5 Network.
Specifically, the current state stIt is expressed as follows:
Wherein,It is i-th of commodity that t moment user browses recently, sex is the gender of user, holiday, month, Day, weather are festivals or holidays, current slot, time, weather respectively.
Specifically, step S6 specifically includes the following steps:
S61: the current state s of calculatingtUnder optional movement Q value, calculation formula is as follows:
Q(st,at)=E π [Rt+1+γRt+22Rt+3+ ... | s=st, a=at]
Wherein, γ is discount factor, stIt is current state, atIt is current action, Rt+1It is the reward function value at t+1 moment, Rt+2It is the reward function value at t+2 moment, Rt+3It is the reward function value at t+3 moment, EπIt is Q (s, a, θu) value maximum when return Functional value is a state decision function;
S62: current state s is generatedtUnder optional movement Q value list Q (st,atu)。
Specifically, execution described in step S7 acts atIt is expressed as follows:
Wherein, K is the commodity number for recommending user,For i-th of commodity for recommending user.
Specifically, award r described in step S7tDefinition be: click under current state in case of commodity dynamic Make, then award is the commodity number that user clicks;In case of the movement of commodity purchasing under current state, then award is user Buy the price of commodity;In the case of other, reward value 0.
Specifically, step S10 specifically includes the following steps:
S101: M training data is taken out from experience pond at random, wherein M=64;
S102: NextState s is calculatedt+1Under optional movement Q value, calculation formula is as follows:
Q(st+1,at+1)=Eπ[Rt+1+γRt+22Rt+3+ ... | s=st+1, a=at+1]
Wherein, γ is discount factor, st+1It is NextState, at+1It is next movement, Rt+2It is the reward function at t+2 moment Value, Rt+3It is the reward function value at t+3 moment, Rt+4It is the reward function value at t+4 moment;
S103: NextState s is generatedt+1Under optional movement Q value list Q (st+1,at+1u′)。
Specifically, step S11 specifically includes the following steps:
S111: current state s is calculatedtUnder TargetQ value, calculation formula is as follows:
TargetQ=rt+γmaxQ(st+1,at+1u′)
Wherein, rtIt is the award of current action, γ is discount factor, st+1It is NextState, at+1It is next movement, θu′It is The parameter of TargetNet neural network, Q (st+1,at+1u′) it is TargetNet neural network NextState st+1Under optional movement Q value list;
S112: calculating loss function, when loss function obtains minimum value, updates the parameter θ of MainNet neural networku, damage It is as follows to lose function:
L(θu)=E [(TargetQ-Q (st,atu))2]
=E [(rt+γmax Q(st+1,at+1u′)-Q(st,atu))2]
Wherein, E is to average, rtIt is the award of current action, γ is discount factor, Q (st,atu) it is current state st Under optional movement Q value list, Q (st+1,at+1u′) it is NextState st+1Under optional movement Q value list, stIt is current shape State, atIt is current action, st+1It is NextState, at+1It is next movement, θuIt is the parameter of MainNet neural network, θu′It is The parameter of TargetNet neural network.
Specifically, current state stIn sex be user long-term characteristic, for distinguishing different groups, for difference User group can also make different selections under identical recommendation list;Current state stIn holiday, month, Day, weather are external condition feature, and different external condition features can largely change the Shopping Behaviors of user, than As user is more active in the behavior of festivals or holidays.
Specifically, when user acts a to executiontIt does not click or when purchase acts, recommendation list is constant, when user is to holding A is made in actiontWhen having click or purchase, recommendation list changes, i.e., removes the browsing commodity of front in recommendation list, fill rigid production Gave birth to the commodity of click or buying behavior.
The same or similar label correspond to the same or similar components;
The terms describing the positional relationship in the drawings are only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (9)

1. it is a kind of based on deeply study proposed algorithm, which is characterized in that it the following steps are included:
S1: its capacity W is arranged in initialization experience pond, and experience pond is the set of commercial product recommending movement, for storing training sample, Experience pond is sky before starting training;
S2: establishing MainNet neural network, and initialize to it, and the MainNet neural network is main nerve net Network, for obtaining recommendation list;
S3: establishing TargetNet neural network, and initialize to it, and the TargetNet neural network is for training Model parameter obtains optimal model parameters;
S4: training segment sum M is set;
S5: the N number of commodity browsed recently according to t moment user initialize current state stIf the commodity that user browses recently are Sky, with the replacement of N number of much-sought-after item;
S6: current state stAs the input of MainNet neural network, current state s is obtainedtUnder optional movement Q value list Q (st,atu), wherein stIt is current state, atIt is to execute movement, θuIt is the parameter of MainNet neural network;
S7: a is acted according to executiont, user click according to the interest of oneself/purchase/after ignoring, and calculates award rtWith it is next State st+1
S8: by commercial product recommending set of actions (st,at,rt,st+1) be stored in experience pond;
S9: circulation executes step S6-S8, until storing W training data in experience pond;
S10: M training data is taken out from experience pond at random, by each of training data NextState st+1As The input of TargetNet neural network obtains NextState st+1Under optional movement Q value list Q (st+1,at+1u′);
S11: the parameter θ of MainNet neural network is updatedu
S12: every to take turns iteration by C, wherein C is preset iterative numerical, and the parameter of MainNet neural network is copied to TargetNet neural network.
2. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S2 The initialization stated includes: initiation parameter θu, input as current state st, export as current state stUnder optional movement Q value column Table Q (st,atu)。
3. a kind of proposed algorithm based on deeply study according to claim 2, which is characterized in that described is current State stIt is expressed as follows:
Wherein,I-th of commodity that t moment user browses recently, sex is the gender of user, holiday, month, day, Weather is festivals or holidays, current slot, time, weather respectively.
4. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S3 The initialization stated includes: initiation parameter θu′, input as NextState st+1, export as NextState st+1Under optional movement Q Value list Q (st+1,at+1u′)。
5. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that step S6 is specific The following steps are included:
S61: the current state s of calculatingtUnder optional movement Q value, calculation formula is as follows:
Q(st,at)=Eπ[Rt+1+γRt+22Rt+3+ ... | s=st, a=at]
Wherein, γ is discount factor, stIt is current state, atIt is current action, Rt+1It is the reward function value at t+1 moment, Rt+2It is The reward function value at t+2 moment, Rt+3It is the reward function value at t+3 moment, EπIt is Q (s, a, θu) value maximum when Reward Program Value, is a state decision function;
S62: current state s is generatedtUnder optional movement Q value list Q (st,atu)。
6. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S7 The execution movement a statedtIt is expressed as follows:
Wherein, K is the commodity number for recommending user,For i-th of commodity for recommending user.
7. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that institute in step S7 The award r statedtDefinition be: under current state in case of commodity click movement, then award be user click commodity Number;In case of the movement of commodity purchasing under current state, then award is the price that user buys commodity;Other situations Under, reward value 0.
8. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that step S10 tool Body the following steps are included:
S101: M training data is taken out from experience pond at random;
S102: NextState s is calculatedt+1Under optional movement Q value, calculation formula is as follows:
Q(st+1,at+1)=Eπ[Rt+1+γRt+22Rt+3+ ... | s=st+1, a=at+1]
Wherein, γ is discount factor, st+1It is NextState, at+1It is next movement, Rt+2It is the reward function value at t+2 moment, Rt+3It is the reward function value at t+3 moment, Rt+4It is the reward function value at t+4 moment;
S103: NextState s is generatedt+1Under optional movement Q value list Q (st+1,at+1u′)。
9. a kind of proposed algorithm based on deeply study according to claim 1, which is characterized in that step S11 tool Body the following steps are included:
S111: current state s is calculatedtUnder TargetQ value, calculation formula is as follows:
TargetQ=rt+γmaxQ(st+1,at+1u′)
Wherein, rtIt is the award of current action, γ is discount factor, st+1It is NextState, at+1It is next movement, θu′It is The network parameter of TargetNet, Q (st+1,at+1u′) it is the NextState s that the network of TargetNet exportst+1Under optional movement Q value list;
S112: calculating loss function, when loss function obtains minimum value, updates the parameter θ of MainNet neural networku, lose letter Number is as follows:
L(θu)=E [(TargetQ-Q (st,atu))2]
=E [(rt+γmaxQ(st+1,at+1u′)-Q(st,at,θu))2]
Wherein, E is to average, rtIt is the award of current action, γ is discount factor, Q (st,atu) it is current state stUnder can The Q value list of choosing movement, Q (st+1,at+1u′) it is NextState st+1Under optional movement Q value list, stIt is current state, at It is current action, st+1It is NextState, at+1It is next movement, θuIt is the parameter of MainNet neural network, θu′It is The parameter of TargetNet neural network.
CN201811070447.0A 2018-09-13 2018-09-13 A kind of proposed algorithm based on deeply study Pending CN109471963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811070447.0A CN109471963A (en) 2018-09-13 2018-09-13 A kind of proposed algorithm based on deeply study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811070447.0A CN109471963A (en) 2018-09-13 2018-09-13 A kind of proposed algorithm based on deeply study

Publications (1)

Publication Number Publication Date
CN109471963A true CN109471963A (en) 2019-03-15

Family

ID=65664609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811070447.0A Pending CN109471963A (en) 2018-09-13 2018-09-13 A kind of proposed algorithm based on deeply study

Country Status (1)

Country Link
CN (1) CN109471963A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109967741A (en) * 2019-03-29 2019-07-05 贵州翰凯斯智能技术有限公司 A kind of 3D printing technique optimization method based on enhancing study
CN110135951A (en) * 2019-05-15 2019-08-16 网易(杭州)网络有限公司 Recommended method, device and the readable storage medium storing program for executing of game commodity
CN110581808A (en) * 2019-08-22 2019-12-17 武汉大学 Congestion control method and system based on deep reinforcement learning
CN110659947A (en) * 2019-10-11 2020-01-07 沈阳民航东北凯亚有限公司 Commodity recommendation method and device
CN110838024A (en) * 2019-10-16 2020-02-25 支付宝(杭州)信息技术有限公司 Information pushing method, device and equipment based on deep reinforcement learning
CN110942208A (en) * 2019-12-10 2020-03-31 萍乡市恒升特种材料有限公司 Method for determining optimal production conditions of silicon carbide foam ceramic
CN111159558A (en) * 2019-12-31 2020-05-15 支付宝(杭州)信息技术有限公司 Recommendation list generation method and device and electronic equipment
CN111309907A (en) * 2020-02-10 2020-06-19 大连海事大学 Real-time Bug assignment method based on deep reinforcement learning
CN111339675A (en) * 2020-03-10 2020-06-26 南栖仙策(南京)科技有限公司 Training method for intelligent marketing strategy based on machine learning simulation environment
CN111401937A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Data pushing method and device and storage medium
CN111738787A (en) * 2019-06-13 2020-10-02 北京京东尚科信息技术有限公司 Information pushing method and device
CN111859099A (en) * 2019-12-05 2020-10-30 马上消费金融股份有限公司 Recommendation method, device, terminal and storage medium based on reinforcement learning
CN112085524A (en) * 2020-08-31 2020-12-15 中国人民大学 Q learning model-based result pushing method and system
CN112733004A (en) * 2021-01-22 2021-04-30 上海交通大学 Movie and television work recommendation method based on multi-arm tiger machine algorithm
CN117290609A (en) * 2023-11-24 2023-12-26 中国科学技术大学 Product data recommendation method and product data recommendation device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105637540A (en) * 2013-10-08 2016-06-01 谷歌公司 Methods and apparatus for reinforcement learning
US20170337478A1 (en) * 2016-05-22 2017-11-23 Microsoft Technology Licensing, Llc Self-Learning Technique for Training a PDA Component and a Simulated User Component
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls
CN108230058A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Products Show method and system
CN108230057A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 A kind of intelligent recommendation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105637540A (en) * 2013-10-08 2016-06-01 谷歌公司 Methods and apparatus for reinforcement learning
US20170337478A1 (en) * 2016-05-22 2017-11-23 Microsoft Technology Licensing, Llc Self-Learning Technique for Training a PDA Component and a Simulated User Component
CN108230058A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Products Show method and system
CN108230057A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 A kind of intelligent recommendation method and system
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
秦星辰: "基于RHadoop云平台的推荐系统的研究与设计", 《中国优秀硕士学位论文全文数据库 信息科技辑(月刊)》 *
草帽B-O-Y: "深度强化学习-DQN", 《CSDN HTTPS://BLOG.CSDN.NET/U013236946/ARTICLE/DETAILS/72871858》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109967741A (en) * 2019-03-29 2019-07-05 贵州翰凯斯智能技术有限公司 A kind of 3D printing technique optimization method based on enhancing study
CN109967741B (en) * 2019-03-29 2021-02-02 贵州翰凯斯智能技术有限公司 3D printing process optimization method based on reinforcement learning
CN110135951A (en) * 2019-05-15 2019-08-16 网易(杭州)网络有限公司 Recommended method, device and the readable storage medium storing program for executing of game commodity
CN111738787A (en) * 2019-06-13 2020-10-02 北京京东尚科信息技术有限公司 Information pushing method and device
CN110581808A (en) * 2019-08-22 2019-12-17 武汉大学 Congestion control method and system based on deep reinforcement learning
CN110659947A (en) * 2019-10-11 2020-01-07 沈阳民航东北凯亚有限公司 Commodity recommendation method and device
CN110838024A (en) * 2019-10-16 2020-02-25 支付宝(杭州)信息技术有限公司 Information pushing method, device and equipment based on deep reinforcement learning
CN111859099B (en) * 2019-12-05 2021-08-31 马上消费金融股份有限公司 Recommendation method, device, terminal and storage medium based on reinforcement learning
CN111859099A (en) * 2019-12-05 2020-10-30 马上消费金融股份有限公司 Recommendation method, device, terminal and storage medium based on reinforcement learning
CN110942208B (en) * 2019-12-10 2023-07-07 萍乡市恒升特种材料有限公司 Method for determining optimal production conditions of silicon carbide foam ceramic
CN110942208A (en) * 2019-12-10 2020-03-31 萍乡市恒升特种材料有限公司 Method for determining optimal production conditions of silicon carbide foam ceramic
CN111159558A (en) * 2019-12-31 2020-05-15 支付宝(杭州)信息技术有限公司 Recommendation list generation method and device and electronic equipment
CN111159558B (en) * 2019-12-31 2023-07-18 支付宝(杭州)信息技术有限公司 Recommendation list generation method and device and electronic equipment
CN111309907A (en) * 2020-02-10 2020-06-19 大连海事大学 Real-time Bug assignment method based on deep reinforcement learning
CN111401937A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Data pushing method and device and storage medium
WO2021169218A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Data pushing method and system, electronic device and storage medium
CN111339675A (en) * 2020-03-10 2020-06-26 南栖仙策(南京)科技有限公司 Training method for intelligent marketing strategy based on machine learning simulation environment
CN112085524A (en) * 2020-08-31 2020-12-15 中国人民大学 Q learning model-based result pushing method and system
CN112085524B (en) * 2020-08-31 2022-11-15 中国人民大学 Q learning model-based result pushing method and system
CN112733004A (en) * 2021-01-22 2021-04-30 上海交通大学 Movie and television work recommendation method based on multi-arm tiger machine algorithm
CN112733004B (en) * 2021-01-22 2022-09-30 上海交通大学 Movie and television work recommendation method based on multi-arm tiger machine algorithm
CN117290609A (en) * 2023-11-24 2023-12-26 中国科学技术大学 Product data recommendation method and product data recommendation device
CN117290609B (en) * 2023-11-24 2024-03-29 中国科学技术大学 Product data recommendation method and product data recommendation device

Similar Documents

Publication Publication Date Title
CN109471963A (en) A kind of proposed algorithm based on deeply study
CN103678518B (en) Method and device for adjusting recommendation lists
CN108009897A (en) A kind of real-time recommendation method of commodity, system and readable storage medium storing program for executing
CN103246980B (en) Information output method and server
CN108153791B (en) Resource recommendation method and related device
CN106447463A (en) Commodity recommendation method based on Markov decision-making process model
CN103886001A (en) Personalized commodity recommendation system
CN102479366A (en) Commodity recommending method and system
CN106168980A (en) Multimedia resource recommends sort method and device
CN107145506B (en) Improved content-based agricultural commodity recommendation method
Fainmesser Community structure and market outcomes: A repeated games-in-networks approach
CN104933595A (en) Collaborative filtering recommendation method based on Markov prediction model
CN109034960A (en) A method of more inferred from attributes based on user node insertion
US20160196579A1 (en) Dynamic deep links based on user activity of a particular user
Lu et al. Research on e-commerce customer repeat purchase behavior and purchase stickiness
Flajolet et al. Real-time bidding with side information
Karpenko et al. The influence of the consumer’s type–physical or digital–on their behavioral characteristics
Jiang et al. Intertemporal pricing via nonparametric estimation: Integrating reference effects and consumer heterogeneity
Bergemann et al. Progressive participation
CN110288419A (en) A kind of dynamic updates the electric business agricultural product recommended method of weight
CN107967627A (en) A kind of linear regression finance product based on content recommends method
CN113781134A (en) Item recommendation method and device and computer-readable storage medium
Wu et al. Design of optimal control strategies for a supply chain with competing manufacturers under consignment contract
Liu et al. A semiparametric varying coefficient model of monotone auction bidding processes
Kota et al. Temporal multi-hierarchy smoothing for estimating rates of rare events

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190315

RJ01 Rejection of invention patent application after publication