CN106850289A - With reference to Gaussian process and the service combining method of intensified learning - Google Patents

With reference to Gaussian process and the service combining method of intensified learning Download PDF

Info

Publication number
CN106850289A
CN106850289A CN201710055817.2A CN201710055817A CN106850289A CN 106850289 A CN106850289 A CN 106850289A CN 201710055817 A CN201710055817 A CN 201710055817A CN 106850289 A CN106850289 A CN 106850289A
Authority
CN
China
Prior art keywords
state
service
values
action
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710055817.2A
Other languages
Chinese (zh)
Other versions
CN106850289B (en
Inventor
王红兵
李佳杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201710055817.2A priority Critical patent/CN106850289B/en
Publication of CN106850289A publication Critical patent/CN106850289A/en
Application granted granted Critical
Publication of CN106850289B publication Critical patent/CN106850289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Abstract

The invention discloses a kind of combination Gaussian process and the service combining method of intensified learning, comprise the following steps:1st, Services Composition problem is modeled as a four-tuple Markovian decision process;2nd, four-tuple Markovian decision process is solved using the intensified learning method based on Q learning, obtains optimal policy;Q values are updated by setting up Q value Gauss forecast models wherein;The 3rd, optimal policy be mapped as the workflow of web services combination.Study of the method using Gaussian process to Q values is modeled, so as to make it have more preferable accuracy and generalization.

Description

With reference to Gaussian process and the service combining method of intensified learning
Technical field
The present invention relates to a kind of utilization computer to the method for Web service combination, belong to artificial intelligence field.
Background technology
With the development of computer technology, the demand of software systems become increasingly complex it is changeable, along with internet and information The development of technology, has gradually expedited the emergence of out a kind of service-oriented software architecture (Service-Oriented Architecture): The software or component that some functions will be realized are placed in the environment of internet as web services, and user can be disappeared by certain Breath agreement and web services communication, so as to use its function.Finally by various web services are combined, structure meets the new of demand Software systems.Web services common at present have weather service, Orientation on map service etc..
For a certain function, the function of typically having different service provider offer is similar to, but service quality (Quality Of Service, QoS) difference multiple services, the class service that can meet certain function is referred to as abstract service, and multiple meets The specific service of the function is referred to as the candidate service of the abstract service.For a user's request, how from multiple candidate services In select the service of optimal quality, and finally draw the optimum combination of service, be Services Composition problem, according to different clothes The Services Composition that the QoS attributes of business are perceived come the selection for being serviced and Combinatorial Optimization referred to as QoS.Because internet environment has The dynamic of height, the QoS attributes of certain service may over time with the change of environment and fluctuate or change, therefore Service combining method is needed with certain adaptivity, copes with the influence that environmental change brings.Simultaneously as candidate takes Business is on the increase, and business demand also becomes increasingly complex, and the user's request of a complexity usually contains multiple abstract services, Yi Jixiang The candidate service answered, therefore service combining method is also required to face the challenge of this extensive Services Composition problem.It is based on 2 problems of the above, some scholars propose based on markov decision process (Markov Decision Processes, MDP) and intensified learning service combining method.MDP is a kind of decision rule technology, in Services Composition, by current network ring Border and context modeling are the state in MDP, and alternative multiple candidate services under current state are modeled as to enter in MDP Capable multiple actions, after certain action is performed, are just transferred to new state, so as to carry out the selection of next round, until final Complete whole Services Composition.After being modeled to Services Composition process using MDP models, just optimal service group can will be explored Conjunction problem is converted into the Solve problems of MDP models, so as to further use intensified learning method.Intensified learning method is to solve for A kind of effective ways of MDP models, especially under the extensive dynamic environment of Services Composition problem, intensified learning by with ring The iteration interaction in border is learnt, and natural with adaptivity, the Services Composition that can be good at tackling under network environment is asked Topic.In traditional nitrification enhancement Q-learning, Q values lack generalization ability by being worth token record, and the result of study is not yet It is enough accurate, it is affected by noise larger.
The content of the invention
Goal of the invention:For problems of the prior art, Gaussian process is combined with reinforcing the invention discloses one kind The service combining method of study, the study using Gaussian process to Q values is modeled, so as to make it have more preferable accuracy and Generalization.
Technical scheme:The technical solution adopted by the present invention is as follows:
A kind of combination Gaussian process and the service combining method of intensified learning, comprise the following steps:
(1) Services Composition problem is modeled as a four-tuple Markovian decision process;
(2) intensified learning method of the application based on Q-learning solves four-tuple Markovian decision process, obtains most Dominant strategy;
(3) optimal policy is mapped as the workflow of web services combination.
Specifically, Services Composition problem is modeled as following four-tuple Markovian decision process in step (1):
M=<S,A,P,R>
Wherein S is the set of finite state in environment;A is the set of the action that can be called, and A (s) is represented can under state s The set of the action for carrying out;P is to describe MDP state transitional functions, and P (s ' | s, a) represent and turn after call action a under state s Move on to the probability of state s ';R is return value function, and (s a) represents the return value under state s obtained by call action a to R.
Specifically, intensified learning method of step (2) application based on Q-learning solves four-tuple Markovian decision Process, obtains optimal policy, comprises the following steps:
(21) by state action to z=<s,a>Used as input, corresponding Q values Q (z) sets up Q value Gausses pre- as output Survey model;
(22) Q-learning learnings rate σ, discount rate γ, Greedy strategy probability ε, current state s=0 are initialized, when Preceding time step t=0;
(23) a service a is selected as current service a with the Greedy strategy that probability is εtAnd perform,
(24) record is in current state stLower execution current service atReturn value rtWith the state s after execution service at+1; Calculated in state action to z according to following formulat=<st,at>Under Q values:
Wherein Q (zt) it is to z in state actiont=<st,at>Under Q values, σ is learning rate, and r is return value, and γ is discount Rate, st+1To perform service atAfterwards from current state stThe successor states being transferred to, at+1It is in state st+1The service of lower selection, Q (st+1,at+1) represent in state action pair<st+1,at+1>Under Q values;
(25) Q values are updated according to Gauss forecast model:
Wherein I is unit matrix, ωnBe uncertain parameters, Z be historic state act to set,It is corresponding with Z History Q values set, K (Z, Z) is that historic state acts covariance matrix between, and its i-th row j column element is k (zi, zj), k () is kernel function;K(Z,zt+1) it is the state action of historic state action pair and new input to zt+1Between covariance Matrix;
According to state action to zt+1=<st+1,at+1>And corresponding Q values Q (zt+1) update Gauss forecast model;
(26) current state is updated:st=st+1, work as stFor final state and when meeting the condition of convergence, intensified learning terminates, Obtain optimal policy;Otherwise go to step (23).
Specifically, kernel function k () in Gauss forecast model is gaussian kernel function:
Wherein σkIt is the width of gaussian kernel function.
Specifically, the condition of convergence described in step (26) is:The change of Q values is less than Q value thresholdings Qth, i.e.,:|Q(zt)-Q (zt+1)|<Qth
Beneficial effect:Compared with prior art, service combining method disclosed by the invention has advantages below:In the present invention In, when carrying out the calculating of intensified learning Q values, the original traditional method that Q values are recorded and searched by value table is improved, will be every The service of secondary Selection and call and the QoS attributes for observing are considered as an input and output for unknown function, in the iterative process of Q values In, Q values are estimated by Gaussian process, rather than searched by value table, at the same also the parameter of Gaussian process is carried out study with Update, then cause more accurate to the prediction of Q values so that finally give more excellent Services Composition result.Meanwhile, employ height The intensified learning service combining method of this process, can train a Gaussian process model, so as to right from data with existing New data are predicted and estimate, with good generalization ability, are suitable for dynamic, changeable web services combination environment.
Brief description of the drawings
Fig. 1 is basic service compination model;
Fig. 2 is the Services Composition schematic diagram modeled with MDP;
Fig. 3 is basic Gaussian process schematic diagram;
Fig. 4 is the service combining method flow chart for combining Gaussian process and intensified learning.
Specific embodiment
With reference to the accompanying drawings and detailed description, the present invention is furture elucidated.
The basic model of Services Composition is as shown in figure 1, a complicated software system can be regarded as by multiple components or subsystem The workflow of system composition, in Services Composition field, component is web services.Therefore when Services Composition is carried out, the need of user Asking can be modeled with an abstract task work flow diagram, and wherein each component is abstract service.For each abstract service, may In the presence of multiple candidate services, these services have similar function, but have different QoS (service quality), it is possible to base Suitable specific service is selected from candidate service in QoS attributes, available service combination system is finally combined into.
Combination Gaussian process disclosed by the invention and the service combining method of intensified learning, comprise the following steps:
Step 1, Services Composition problem is modeled as a four-tuple Markovian decision process:
M=<S,A,P,R>
Wherein S is the set of finite state in environment;A is the set of the action that can be called, and A (s) is represented can under state s The set of the action for carrying out;P is to describe MDP state transitional functions, and P (s ' | s, a) represent and turn after call action a under state s Move on to the probability of state s ';R is return value function, and (s a) represents the return value under state s obtained by call action a to R.
Fig. 2 gives an example of the Services Composition modeled by MDP, the example describes a clothes during tourism trip Business anabolic process.In MDP models, the candidate service that can be called is modeled as different actions.Different actions are called, may be arrived Up to different states, the set of the service that next can be called with stylish Determines.For the different clothes for calling Business, by the QoS attributes that observe come the Reward Program in the quality of evaluation services, i.e. MDP models.So, a service group Conjunction problem is just converted for a MDP model, and solving-optimizing is carried out such that it is able to pass through intensified learning method.
Step 2, using based on Q-learning intensified learning method solve four-tuple Markovian decision process, obtain Optimal policy;
The optimal service selection strategy under each state is found in the solution of MDP models so that the result of final combination is more It is excellent.In MDP models, a quality for acting is selected to depend not only on the return value immediately produced by the action, while Relevant with return with succeeding state caused by the action, in nitrification enhancement Q-learning, with Q value function Q, (s a) is commented Estimate the assessed value that the selection under state s acts a, its iterative formula is as follows:
Wherein σ is learning rate, the degree size changed during for controlling and update Q values every time;γ is discount rate, for controlling The influence degree of to-be;Intensified learning theory thinks that the influence of return value immediately should be greater than the possibility return value in future, because The value of this γ be 0 to 1 between.R is R, and (s is the return value of the execution action a under state s a).Q (s ', a ') is represented and held After a is made in action, the Q values that state s ' selects a ' afterwards are transferred to, for representing following award value.
During traditional intensified learning, the Q values to calculating are recorded, when updating Q after, before Q (s ', a ') passes through Searched in the Q value tables for calculate, recording and obtained, it is enough in some application scenarios.But in highly dynamic Services Composition In scene, this method lacks generalization ability, it is impossible to tackle the data variation in real scene.And with Services Composition scale Expansion, value table storage and inquiry needed for room and time can also consume very big computing capability, for the requirement of real-time Also cannot meet well.Therefore the present invention proposes that the estimation by Gaussian process to Q values is modeled, so as to improve extensive energy Power, preferably tackles dynamic environment, and more preferable effect is obtained in actual applications.
As shown in figure 4, specifically including following steps:
(21) by state action to z=<s,a>Used as input, corresponding Q values Q (z) sets up Q value Gausses pre- as output Survey model;
The signal of Gaussian process such as Fig. 3, according to known inputoutput data, trains a Gaussian process model, when It is new to be input to when coming, its corresponding output is gone out by model prediction.Gaussian process model is by mean value function and covariance function Uniquely determine, be easily adjusted and optimize, iteration convergence is also relatively fast.
Specifically, choosing one group of n training sample { (zi=(si,ai),Q(zi)) | i=1..n }, wherein zi=(si,ai) It is state action pair, is input;Q(zi) it is state action to corresponding Q values, it is to export.z*And Q*To need the data of prediction. Gaussian process thinks that input meets a joint probability distribution with output, with K (X, X*) represent n × n*All training points X with survey Pilot X*Covariance matrix (n be training points number, n*It is number of checkpoints), K (X, X*) matrix the i-th row j column elements be k (Xi,X*), XiIt is i-th element of set X.
K(X,X),K(X*,X),K(X*,X*) similar, then export training points is with the Joint Distribution of exports test point:
Can be calculated Q (z*) average be desired for α* TK(Z,Z*).WhereinWherein ωn Uncertain parameters are represented, value is 1 in the present embodiment;I is unit matrix;Z be historic state act to set, f is and Z The set of corresponding history Q values, K (Z, Z) is the covariance matrix that historic state is acted between, and its i-th row j column element is k (zi,zj), k () is kernel function;K(Z,zt+1) it is the state action of historic state action pair and new input to zt+1Between association Variance matrix;
(22) Q-learning learnings rate σ, discount rate γ, Greedy strategy probability ε, current state s=0 are initialized, when Preceding time step t=0;
(23) a service a is selected as current service a with the Greedy strategy that probability is εtAnd perform, specially:(0, 1) interval randomly generates a random number υ, if υ>ε, randomly chooses a new service a;If υ≤ε, selection makes current Q It is worth maximum service as new service a;Can so avoid being absorbed in local optimum;
(24) record is in current state stLower execution current service atReturn value rtWith the state s after execution service at+1; Calculated in state action to z according to following formulat=<st,at>Under Q values:
Wherein Q (zt) it is to z in state actiont=<st,at>Under Q values, σ is learning rate, and r is return value, and γ is discount Rate, st+1To perform service atAfterwards from current state stThe successor states being transferred to, at+1It is in state st+1The service of lower selection, Q (st+1,at+1) represent in state action pair<st+1,at+1>Under Q values;
(25) Q values are updated according to Gauss forecast model:
Wherein I is unit matrix, ωnBe uncertain parameters, Z be historic state act to set,It is corresponding with Z History Q values set, K (Z, Z) is that historic state acts covariance matrix between, and its i-th row j column element is k (zi, zj), k () is kernel function;K(Z,zt+1) it is the state action of historic state action pair and new input to zt+1Between covariance Matrix.Kernel function have it is various can use, the present embodiment Kernel Function K selects gaussian kernel function:
Wherein σkIt is the width of gaussian kernel function.
Due to the new data point for adding, Gauss model has generated change, so needing according to state action to zt+1= <st+1,at+1>And corresponding Q values Q (zt+1) Gauss forecast model is updated, the iteration for Q values next time updates;
(26) current state is updated:st=st+1, work as stFor final state and when meeting the condition of convergence, intensified learning terminates, Obtain optimal policy;Otherwise go to step (23).
The condition of convergence in the present embodiment is that Q value changes stabilization, the i.e. change of Q values are less than Q value thresholdings Qth, i.e.,:|Q(zt)- Q(zt+1)|<Qth, optimal policy is now obtained, final Services Composition result is obtained according to this optimal policy.

Claims (5)

1. the service combining method of a kind of combination Gaussian process and intensified learning, it is characterised in that comprise the following steps:
(1) Services Composition problem is modeled as a four-tuple Markovian decision process;
(2) intensified learning method of the application based on Q-learning solves four-tuple Markovian decision process, obtains optimal plan Slightly;
(3) optimal policy is mapped as the workflow of web services combination.
2. the service combining method of combination Gaussian process according to claim 1 and intensified learning, it is characterised in that step (1) Services Composition problem is modeled as following four-tuple Markovian decision process in:
M=<S,A,P,R>
Wherein S is the set of finite state in environment;A is the set of the action that can be called, and A (s) is represented and can carried out under state s Action set;P is to describe MDP state transitional functions, and P (s ' | s, a) represent and be transferred to after call action a under state s The probability of state s ';R is return value function, and (s a) represents the return value under state s obtained by call action a to R.
3. the service combining method of combination Gaussian process according to claim 2 and intensified learning, it is characterised in that described Step (2) intensified learning method of the application based on Q-learning solves four-tuple Markovian decision process, obtains optimal plan Slightly, comprise the following steps:
(21) by state action to z=<s,a>Used as input, corresponding Q values Q (z) sets up Q values Gauss prediction mould as output Type;
(22) initialization Q-learning learnings rate σ, discount rate γ, Greedy strategy probability ε, current state s=0, when current Between step-length t=0;
(23) a service a is selected as current service a with the Greedy strategy that probability is εtAnd perform;
(24) record is in current state stLower execution current service atReturn value rtWith the state s after execution service at+1;According to Following formula is calculated in state action to zt=<st,at>Under Q values:
Q ( z t ) = ( 1 - &sigma; ) * Q ( z t ) + &sigma; * ( r + &gamma; max a t + 1 Q ( s t + 1 , a t + 1 ) )
Wherein Q (zt) it is to z in state actiont=<st,at>Under Q values, σ is learning rate, and r is return value, and γ is discount rate, st+1To perform service atAfterwards from current state stThe successor states being transferred to, at+1It is in state st+1The service of lower selection, Q (st+1,at+1) represent in state action pair<st+1,at+1>Under Q values;
(25) Q values are updated according to Gauss forecast model:
Q ( z t + 1 ) = &lsqb; K ( Z , Z ) + &omega; n 2 I &rsqb; - 1 f &OverBar; K ( Z , z t + 1 ) ,
Wherein I is unit matrix, ωnBe uncertain parameters, Z be historic state act to set,It is go through corresponding with Z The set of history Q values, K (Z, Z) is the covariance matrix that historic state is acted between, and its i-th row j column element is k (zi,zj), k () is kernel function;K(Z,zt+1) it is the state action of historic state action pair and new input to zt+1Between covariance matrix;
According to state action to zt+1=<st+1,at+1>And corresponding Q values Q (zt+1) update Gauss forecast model;
(26) current state is updated:st=st+1, work as stFor final state and when meeting the condition of convergence, intensified learning terminates, and obtains Optimal policy;Otherwise go to step (23).
4. the service combining method of combination Gaussian process according to claim 3 and intensified learning, it is characterised in that Gauss Kernel function k () in forecast model is gaussian kernel function:
k ( x , x &prime; ) = exp ( - | | x - x &prime; | | 2 / 2 &sigma; k 2 )
Wherein σkIt is the width of gaussian kernel function.
5. the service combining method of combination Gaussian process according to claim 3 and intensified learning, it is characterised in that step (26) condition of convergence described in is:The change of Q values is less than Q value thresholdings Qth, i.e.,:|Q(zt)-Q(zt+1)|<Qth
CN201710055817.2A 2017-01-25 2017-01-25 Service combination method combining Gaussian process and reinforcement learning Active CN106850289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710055817.2A CN106850289B (en) 2017-01-25 2017-01-25 Service combination method combining Gaussian process and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710055817.2A CN106850289B (en) 2017-01-25 2017-01-25 Service combination method combining Gaussian process and reinforcement learning

Publications (2)

Publication Number Publication Date
CN106850289A true CN106850289A (en) 2017-06-13
CN106850289B CN106850289B (en) 2020-04-24

Family

ID=59120622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710055817.2A Active CN106850289B (en) 2017-01-25 2017-01-25 Service combination method combining Gaussian process and reinforcement learning

Country Status (1)

Country Link
CN (1) CN106850289B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319852A (en) * 2018-02-08 2018-07-24 北京安信天行科技有限公司 A kind of event identification tactic creation method and device
CN108958916A (en) * 2018-06-29 2018-12-07 杭州电子科技大学 Workflow unloads optimization algorithm under a kind of mobile peripheral surroundings
CN108972546A (en) * 2018-06-22 2018-12-11 华南理工大学 A kind of robot constant force curved surface tracking method based on intensified learning
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN109670637A (en) * 2018-12-06 2019-04-23 苏州科技大学 Building energy consumption prediction technique, storage medium, device and system
CN112101695A (en) * 2019-06-17 2020-12-18 唯慕思解决方案株式会社 Method and device for reinforcement learning and in-factory scheduling based on simulation
CN113065284A (en) * 2021-03-31 2021-07-02 天津国科医工科技发展有限公司 Triple quadrupole mass spectrometer parameter optimization strategy calculation method based on Q learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103248693A (en) * 2013-05-03 2013-08-14 东南大学 Large-scale self-adaptive composite service optimization method based on multi-agent reinforced learning
CN103646008A (en) * 2013-12-13 2014-03-19 东南大学 Web service combination method
CN105046351A (en) * 2015-07-01 2015-11-11 内蒙古大学 Reinforcement learning-based service combination method and system in uncertain environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103248693A (en) * 2013-05-03 2013-08-14 东南大学 Large-scale self-adaptive composite service optimization method based on multi-agent reinforced learning
CN103646008A (en) * 2013-12-13 2014-03-19 东南大学 Web service combination method
CN105046351A (en) * 2015-07-01 2015-11-11 内蒙古大学 Reinforcement learning-based service combination method and system in uncertain environment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGBING WANG 等: ""Integrating Gaussian Process with Reinforcement Learning for Adaptive Service Composition"", 《LECTURE NOTES IN COMPUTER SCIENCE》 *
吴琴: ""基于强化学习的QoS感知的服务组合优化方案研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
赵海燕 等: ""基于多Agent学习机制的服务组合"", 《计算机工程与科学》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319852A (en) * 2018-02-08 2018-07-24 北京安信天行科技有限公司 A kind of event identification tactic creation method and device
CN108319852B (en) * 2018-02-08 2022-05-06 北京安信天行科技有限公司 Event discrimination strategy creating method and device
CN108972546A (en) * 2018-06-22 2018-12-11 华南理工大学 A kind of robot constant force curved surface tracking method based on intensified learning
CN108972546B (en) * 2018-06-22 2021-07-20 华南理工大学 Robot constant force curved surface tracking method based on reinforcement learning
CN108958916A (en) * 2018-06-29 2018-12-07 杭州电子科技大学 Workflow unloads optimization algorithm under a kind of mobile peripheral surroundings
CN108958916B (en) * 2018-06-29 2021-06-22 杭州电子科技大学 Workflow unloading optimization method under mobile edge environment
CN109388484A (en) * 2018-08-16 2019-02-26 广东石油化工学院 A kind of more resource cloud job scheduling methods based on Deep Q-network algorithm
CN109388484B (en) * 2018-08-16 2020-07-28 广东石油化工学院 Multi-resource cloud job scheduling method based on Deep Q-network algorithm
CN109670637A (en) * 2018-12-06 2019-04-23 苏州科技大学 Building energy consumption prediction technique, storage medium, device and system
CN112101695A (en) * 2019-06-17 2020-12-18 唯慕思解决方案株式会社 Method and device for reinforcement learning and in-factory scheduling based on simulation
CN113065284A (en) * 2021-03-31 2021-07-02 天津国科医工科技发展有限公司 Triple quadrupole mass spectrometer parameter optimization strategy calculation method based on Q learning
CN113065284B (en) * 2021-03-31 2022-11-01 天津国科医工科技发展有限公司 Triple quadrupole mass spectrometer parameter optimization strategy calculation method based on Q learning

Also Published As

Publication number Publication date
CN106850289B (en) 2020-04-24

Similar Documents

Publication Publication Date Title
CN106850289A (en) With reference to Gaussian process and the service combining method of intensified learning
US11675940B2 (en) Generating integrated circuit floorplans using neural networks
Xu et al. Determining China's CO2 emissions peak with a dynamic nonlinear artificial neural network approach and scenario analysis
Li et al. Deep learning based densely connected network for load forecasting
CN110235148A (en) Training action selects neural network
CN107241213A (en) A kind of web service composition method learnt based on deeply
Deal et al. The role of multidirectional temporal analysis in scenario planning exercises and Planning Support Systems
CN106875004B (en) Composite mode neuronal messages processing method and system
CN108932671A (en) A kind of LSTM wind-powered electricity generation load forecasting method joined using depth Q neural network tune
CN106600065B (en) Method and system for extracting and splicing personalized learning path based on directed hypergraph
Papageorgiou et al. Application of fuzzy cognitive maps to water demand prediction
CN105894372A (en) Method and device for predicting group credit
CN110378488A (en) Federal training method, device, training terminal and the storage medium of client variation
CN112541302A (en) Air quality prediction model training method, air quality prediction method and device
CN112184089B (en) Training method, device and equipment of test question difficulty prediction model and storage medium
Liu et al. Large-scale and adaptive service composition based on deep reinforcement learning
Cui et al. Construction and Development of Modern Brand Marketing Management Mode Based on Artificial Intelligence
Vliet et al. FCMs as a common base for linking participatory products and models
US20220269835A1 (en) Resource prediction system for executing machine learning models
Huang et al. An Online Inference-Aided Incentive Framework for Information Elicitation Without Verification
CN113239034A (en) Big data resource integration method and system based on artificial intelligence and cloud platform
CN106878403A (en) Based on the nearest heuristic service combining method explored
CN114492905A (en) Customer appeal rate prediction method and device based on multi-model fusion and computer equipment
Bohner Decision-support systems for sustainable urban planning
CN112348175A (en) Method for performing feature engineering based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant