CN106850289A

CN106850289A - With reference to Gaussian process and the service combining method of intensified learning

Info

Publication number: CN106850289A
Application number: CN201710055817.2A
Authority: CN
Inventors: 王红兵; 李佳杰
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2017-01-25
Filing date: 2017-01-25
Publication date: 2017-06-13
Anticipated expiration: 2037-01-25
Also published as: CN106850289B

Abstract

The invention discloses a kind of combination Gaussian process and the service combining method of intensified learning, comprise the following steps：1st, Services Composition problem is modeled as a four-tuple Markovian decision process；2nd, four-tuple Markovian decision process is solved using the intensified learning method based on Q learning, obtains optimal policy；Q values are updated by setting up Q value Gauss forecast models wherein；The 3rd, optimal policy be mapped as the workflow of web services combination.Study of the method using Gaussian process to Q values is modeled, so as to make it have more preferable accuracy and generalization.

Description

With reference to Gaussian process and the service combining method of intensified learning

Technical field

The present invention relates to a kind of utilization computer to the method for Web service combination, belong to artificial intelligence field.

Background technology

With the development of computer technology, the demand of software systems become increasingly complex it is changeable, along with internet and information The development of technology, has gradually expedited the emergence of out a kind of service-oriented software architecture (Service-Oriented Architecture)： The software or component that some functions will be realized are placed in the environment of internet as web services, and user can be disappeared by certain Breath agreement and web services communication, so as to use its function.Finally by various web services are combined, structure meets the new of demand Software systems.Web services common at present have weather service, Orientation on map service etc..

For a certain function, the function of typically having different service provider offer is similar to, but service quality (Quality Of Service, QoS) difference multiple services, the class service that can meet certain function is referred to as abstract service, and multiple meets The specific service of the function is referred to as the candidate service of the abstract service.For a user's request, how from multiple candidate services In select the service of optimal quality, and finally draw the optimum combination of service, be Services Composition problem, according to different clothes The Services Composition that the QoS attributes of business are perceived come the selection for being serviced and Combinatorial Optimization referred to as QoS.Because internet environment has The dynamic of height, the QoS attributes of certain service may over time with the change of environment and fluctuate or change, therefore Service combining method is needed with certain adaptivity, copes with the influence that environmental change brings.Simultaneously as candidate takes Business is on the increase, and business demand also becomes increasingly complex, and the user's request of a complexity usually contains multiple abstract services, Yi Jixiang The candidate service answered, therefore service combining method is also required to face the challenge of this extensive Services Composition problem.It is based on 2 problems of the above, some scholars propose based on markov decision process (Markov Decision Processes, MDP) and intensified learning service combining method.MDP is a kind of decision rule technology, in Services Composition, by current network ring Border and context modeling are the state in MDP, and alternative multiple candidate services under current state are modeled as to enter in MDP Capable multiple actions, after certain action is performed, are just transferred to new state, so as to carry out the selection of next round, until final Complete whole Services Composition.After being modeled to Services Composition process using MDP models, just optimal service group can will be explored Conjunction problem is converted into the Solve problems of MDP models, so as to further use intensified learning method.Intensified learning method is to solve for A kind of effective ways of MDP models, especially under the extensive dynamic environment of Services Composition problem, intensified learning by with ring The iteration interaction in border is learnt, and natural with adaptivity, the Services Composition that can be good at tackling under network environment is asked Topic.In traditional nitrification enhancement Q-learning, Q values lack generalization ability by being worth token record, and the result of study is not yet It is enough accurate, it is affected by noise larger.

The content of the invention

Goal of the invention：For problems of the prior art, Gaussian process is combined with reinforcing the invention discloses one kind The service combining method of study, the study using Gaussian process to Q values is modeled, so as to make it have more preferable accuracy and Generalization.

Technical scheme：The technical solution adopted by the present invention is as follows：

A kind of combination Gaussian process and the service combining method of intensified learning, comprise the following steps：

(1) Services Composition problem is modeled as a four-tuple Markovian decision process；

(2) intensified learning method of the application based on Q-learning solves four-tuple Markovian decision process, obtains most Dominant strategy；

(3) optimal policy is mapped as the workflow of web services combination.

Specifically, Services Composition problem is modeled as following four-tuple Markovian decision process in step (1)：

M=<S,A,P,R>

Wherein S is the set of finite state in environment；A is the set of the action that can be called, and A (s) is represented can under state s The set of the action for carrying out；P is to describe MDP state transitional functions, and P (s ' | s, a) represent and turn after call action a under state s Move on to the probability of state s '；R is return value function, and (s a) represents the return value under state s obtained by call action a to R.

Specifically, intensified learning method of step (2) application based on Q-learning solves four-tuple Markovian decision Process, obtains optimal policy, comprises the following steps：

(21) by state action to z=<s,a>Used as input, corresponding Q values Q (z) sets up Q value Gausses pre- as output Survey model；

(22) Q-learning learnings rate σ, discount rate γ, Greedy strategy probability ε, current state s=0 are initialized, when Preceding time step t=0；

(23) a service a is selected as current service a with the Greedy strategy that probability is ε_tAnd perform,

(24) record is in current state s_tLower execution current service a_tReturn value r_tWith the state s after execution service a_t+1； Calculated in state action to z according to following formula_t=<s_t,a_t>Under Q values：

Wherein Q (z_t) it is to z in state action_t=<s_t,a_t>Under Q values, σ is learning rate, and r is return value, and γ is discount Rate, s_t+1To perform service a_tAfterwards from current state s_tThe successor states being transferred to, a_t+1It is in state s_t+1The service of lower selection, Q (s_t+1,a_t+1) represent in state action pair<s_t+1,a_t+1>Under Q values；

(25) Q values are updated according to Gauss forecast model：

Wherein I is unit matrix, ω_nBe uncertain parameters, Z be historic state act to set,It is corresponding with Z History Q values set, K (Z, Z) is that historic state acts covariance matrix between, and its i-th row j column element is k (z_i, z_j), k () is kernel function；K(Z,z_t+1) it is the state action of historic state action pair and new input to z_t+1Between covariance Matrix；

According to state action to z_t+1=<s_t+1,a_t+1>And corresponding Q values Q (z_t+1) update Gauss forecast model；

(26) current state is updated：s_t=s_t+1, work as s_tFor final state and when meeting the condition of convergence, intensified learning terminates, Obtain optimal policy；Otherwise go to step (23).

Specifically, kernel function k () in Gauss forecast model is gaussian kernel function：

Wherein σ_kIt is the width of gaussian kernel function.

Specifically, the condition of convergence described in step (26) is：The change of Q values is less than Q value thresholdings Q_th, i.e.,：|Q(z_t)-Q (z_t+1)|<Q_th。

Beneficial effect：Compared with prior art, service combining method disclosed by the invention has advantages below：In the present invention In, when carrying out the calculating of intensified learning Q values, the original traditional method that Q values are recorded and searched by value table is improved, will be every The service of secondary Selection and call and the QoS attributes for observing are considered as an input and output for unknown function, in the iterative process of Q values In, Q values are estimated by Gaussian process, rather than searched by value table, at the same also the parameter of Gaussian process is carried out study with Update, then cause more accurate to the prediction of Q values so that finally give more excellent Services Composition result.Meanwhile, employ height The intensified learning service combining method of this process, can train a Gaussian process model, so as to right from data with existing New data are predicted and estimate, with good generalization ability, are suitable for dynamic, changeable web services combination environment.

Brief description of the drawings

Fig. 1 is basic service compination model；

Fig. 2 is the Services Composition schematic diagram modeled with MDP；

Fig. 3 is basic Gaussian process schematic diagram；

Fig. 4 is the service combining method flow chart for combining Gaussian process and intensified learning.

Specific embodiment

With reference to the accompanying drawings and detailed description, the present invention is furture elucidated.

The basic model of Services Composition is as shown in figure 1, a complicated software system can be regarded as by multiple components or subsystem The workflow of system composition, in Services Composition field, component is web services.Therefore when Services Composition is carried out, the need of user Asking can be modeled with an abstract task work flow diagram, and wherein each component is abstract service.For each abstract service, may In the presence of multiple candidate services, these services have similar function, but have different QoS (service quality), it is possible to base Suitable specific service is selected from candidate service in QoS attributes, available service combination system is finally combined into.

Combination Gaussian process disclosed by the invention and the service combining method of intensified learning, comprise the following steps：

Step 1, Services Composition problem is modeled as a four-tuple Markovian decision process：

M=<S,A,P,R>

Fig. 2 gives an example of the Services Composition modeled by MDP, the example describes a clothes during tourism trip Business anabolic process.In MDP models, the candidate service that can be called is modeled as different actions.Different actions are called, may be arrived Up to different states, the set of the service that next can be called with stylish Determines.For the different clothes for calling Business, by the QoS attributes that observe come the Reward Program in the quality of evaluation services, i.e. MDP models.So, a service group Conjunction problem is just converted for a MDP model, and solving-optimizing is carried out such that it is able to pass through intensified learning method.

Step 2, using based on Q-learning intensified learning method solve four-tuple Markovian decision process, obtain Optimal policy；

The optimal service selection strategy under each state is found in the solution of MDP models so that the result of final combination is more It is excellent.In MDP models, a quality for acting is selected to depend not only on the return value immediately produced by the action, while Relevant with return with succeeding state caused by the action, in nitrification enhancement Q-learning, with Q value function Q, (s a) is commented Estimate the assessed value that the selection under state s acts a, its iterative formula is as follows：

Wherein σ is learning rate, the degree size changed during for controlling and update Q values every time；γ is discount rate, for controlling The influence degree of to-be；Intensified learning theory thinks that the influence of return value immediately should be greater than the possibility return value in future, because The value of this γ be 0 to 1 between.R is R, and (s is the return value of the execution action a under state s a).Q (s ', a ') is represented and held After a is made in action, the Q values that state s ' selects a ' afterwards are transferred to, for representing following award value.

During traditional intensified learning, the Q values to calculating are recorded, when updating Q after, before Q (s ', a ') passes through Searched in the Q value tables for calculate, recording and obtained, it is enough in some application scenarios.But in highly dynamic Services Composition In scene, this method lacks generalization ability, it is impossible to tackle the data variation in real scene.And with Services Composition scale Expansion, value table storage and inquiry needed for room and time can also consume very big computing capability, for the requirement of real-time Also cannot meet well.Therefore the present invention proposes that the estimation by Gaussian process to Q values is modeled, so as to improve extensive energy Power, preferably tackles dynamic environment, and more preferable effect is obtained in actual applications.

As shown in figure 4, specifically including following steps：

The signal of Gaussian process such as Fig. 3, according to known inputoutput data, trains a Gaussian process model, when It is new to be input to when coming, its corresponding output is gone out by model prediction.Gaussian process model is by mean value function and covariance function Uniquely determine, be easily adjusted and optimize, iteration convergence is also relatively fast.

Specifically, choosing one group of n training sample { (z_i=(s_i,a_i),Q(z_i)) | i=1..n }, wherein z_i=(s_i,a_i) It is state action pair, is input；Q(z_i) it is state action to corresponding Q values, it is to export.z^*And Q^*To need the data of prediction. Gaussian process thinks that input meets a joint probability distribution with output, with K (X, X^*) represent n × n^*All training points X with survey Pilot X^*Covariance matrix (n be training points number, n^*It is number of checkpoints), K (X, X^*) matrix the i-th row j column elements be k (X_i,X^*), X_iIt is i-th element of set X.

K(X,X),K(X^*,X),K(X^*,X^*) similar, then export training points is with the Joint Distribution of exports test point：

Can be calculated Q (z^*) average be desired for α_* ^TK(Z,Z^*).WhereinWherein ω_n Uncertain parameters are represented, value is 1 in the present embodiment；I is unit matrix；Z be historic state act to set, f is and Z The set of corresponding history Q values, K (Z, Z) is the covariance matrix that historic state is acted between, and its i-th row j column element is k (z_i,z_j), k () is kernel function；K(Z,z_t+1) it is the state action of historic state action pair and new input to z_t+1Between association Variance matrix；

(23) a service a is selected as current service a with the Greedy strategy that probability is ε_tAnd perform, specially：(0, 1) interval randomly generates a random number υ, if υ>ε, randomly chooses a new service a；If υ≤ε, selection makes current Q It is worth maximum service as new service a；Can so avoid being absorbed in local optimum；

(25) Q values are updated according to Gauss forecast model：

Wherein I is unit matrix, ω_nBe uncertain parameters, Z be historic state act to set,It is corresponding with Z History Q values set, K (Z, Z) is that historic state acts covariance matrix between, and its i-th row j column element is k (z_i, z_j), k () is kernel function；K(Z,z_t+1) it is the state action of historic state action pair and new input to z_t+1Between covariance Matrix.Kernel function have it is various can use, the present embodiment Kernel Function K selects gaussian kernel function：

Wherein σ_kIt is the width of gaussian kernel function.

Due to the new data point for adding, Gauss model has generated change, so needing according to state action to z_t+1= <s_t+1,a_t+1>And corresponding Q values Q (z_t+1) Gauss forecast model is updated, the iteration for Q values next time updates；

The condition of convergence in the present embodiment is that Q value changes stabilization, the i.e. change of Q values are less than Q value thresholdings Q_th, i.e.,：|Q(z_t)- Q(z_t+1)|<Q_th, optimal policy is now obtained, final Services Composition result is obtained according to this optimal policy.

Claims

1. the service combining method of a kind of combination Gaussian process and intensified learning, it is characterised in that comprise the following steps：

(2) intensified learning method of the application based on Q-learning solves four-tuple Markovian decision process, obtains optimal plan Slightly；

(3) optimal policy is mapped as the workflow of web services combination.

2. the service combining method of combination Gaussian process according to claim 1 and intensified learning, it is characterised in that step (1) Services Composition problem is modeled as following four-tuple Markovian decision process in：

M=<S,A,P,R>

Wherein S is the set of finite state in environment；A is the set of the action that can be called, and A (s) is represented and can carried out under state s Action set；P is to describe MDP state transitional functions, and P (s ' | s, a) represent and be transferred to after call action a under state s The probability of state s '；R is return value function, and (s a) represents the return value under state s obtained by call action a to R.

3. the service combining method of combination Gaussian process according to claim 2 and intensified learning, it is characterised in that described Step (2) intensified learning method of the application based on Q-learning solves four-tuple Markovian decision process, obtains optimal plan Slightly, comprise the following steps：

(21) by state action to z=<s,a>Used as input, corresponding Q values Q (z) sets up Q values Gauss prediction mould as output Type；

(22) initialization Q-learning learnings rate σ, discount rate γ, Greedy strategy probability ε, current state s=0, when current Between step-length t=0；

(23) a service a is selected as current service a with the Greedy strategy that probability is ε_tAnd perform；

(24) record is in current state s_tLower execution current service a_tReturn value r_tWith the state s after execution service a_t+1；According to Following formula is calculated in state action to z_t=<s_t,a_t>Under Q values：

Q (z_{t}) = (1 - σ) * Q (z_{t}) + σ * (r + γ \max_{a_{t + 1}} Q (s_{t + 1}, a_{t + 1}))

(25) Q values are updated according to Gauss forecast model：

Q (z_{t + 1}) = {[K (Z, Z) + ω_{n}^{2} I]}^{- 1} \overset{&OverBar;}{f} K (Z, z_{t + 1}),

Wherein I is unit matrix, ω_nBe uncertain parameters, Z be historic state act to set,It is go through corresponding with Z The set of history Q values, K (Z, Z) is the covariance matrix that historic state is acted between, and its i-th row j column element is k (z_i,z_j), k () is kernel function；K(Z,z_t+1) it is the state action of historic state action pair and new input to z_t+1Between covariance matrix；

(26) current state is updated：s_t=s_t+1, work as s_tFor final state and when meeting the condition of convergence, intensified learning terminates, and obtains Optimal policy；Otherwise go to step (23).

4. the service combining method of combination Gaussian process according to claim 3 and intensified learning, it is characterised in that Gauss Kernel function k () in forecast model is gaussian kernel function：

k (x, x^{'}) = \exp (- | | x - x^{'} | |^{2} / 2 σ_{k}^{2})

Wherein σ_kIt is the width of gaussian kernel function.

5. the service combining method of combination Gaussian process according to claim 3 and intensified learning, it is characterised in that step (26) condition of convergence described in is：The change of Q values is less than Q value thresholdings Q_th, i.e.,：|Q(z_t)-Q(z_t+1)|<Q_th。