CN105046351A

CN105046351A - Reinforcement learning-based service combination method and system in uncertain environment

Info

Publication number: CN105046351A
Application number: CN201510376842.1A
Authority: CN
Inventors: 于磊
Original assignee: Inner Mongolia University
Current assignee: Inner Mongolia University
Priority date: 2015-07-01
Filing date: 2015-07-01
Publication date: 2015-11-11

Abstract

The invention provides a reinforcement learning-based service combination method and system in an uncertain environment. The method includes: S1. receiving a service request; S2. obtaining an optimal strategy according to a learning algorithm which is trained in advance; and S3. calling a service according to the optimal strategy. The reinforcement learning-based service combination method in the uncertain environment solves the problem that an existing service combination method is slow in learning speed and low in service combination success rate.

Description

Based on the service combining method of intensified learning and system in uncertain environment

Technical field

The present invention relates to field of computer technology, be specifically related to a kind of in uncertain environment based on the service combining method of intensified learning and system.

Background technology

Internet environment has feature that is open and dynamic, Web service has the feature of distribution, isomery, autonomy, dynamic change, they cause the uncertainty of serving appearance two aspects jointly, and the uncertainty of result and the uncertainty of service quality QoS are called in Web service.On the one hand, Web service behavior has uncertainty, namely can not determine the execution result of Web service.On the other hand, the QoS index of Web service, as the response time, availability etc., relevant to a part of external factor of the Web services such as network delay, uncontrollable network delay makes the QoS of service also have uncertainty.Even if seem controlled QoS index value (these indexs can get a promotion by promoting software and hardware processing power) for service is inner, also may because the factors such as system load have certain probability distribution (having uncertainty), as the execution time of Web service, because the load of system different times is different, should directly not adopt the last request processing time as the desired value of service execution time next time.QoS information is difficult to obtain long-term and stable guarantee, thus have impact on the quality of Services Composition success ratio and composite services.So, in the probabilistic situation of consideration, how the Service composition strategy of optimization is provided according to the tacit knowledge of service logic, and how measures the QoS of Web service exactly and dynamically adaptive management carried out to it, significant to the reliable combined method of research service.

Because the uncertainty of result is called in the uncertainty of QoS and Web service, so web service composition method should take into full account the execution result causing this probabilistic service.Such as, when calling, which interval is the value of QoS be in, and whether calling of service be successful.After considering the uncertainty of service, just can adaptive adjustment Web service combination, the execution environment under making it adapt to different condition.Visible, Web service combination should not be only the assembled scheme under some specified conditions, and should be the optimisation strategy under one group of condition of uncertainty.Markovian decision process (MarkovDecisionProcesses, MDP) can be used for instructing Web service combination.The virtual condition current according to system, Markovian decision process can make a policy, but the exact state of system is difficult to obtain under many circumstances.The Markovian decision process of the improvement that this patent proposes is the expansion of MDP.The status information of the MDP supposing the system improved has certain probability distribution, and the MDP thus improved to the system modelling with probability distribution information, can make a policy according to current imperfect information.

Two kinds of service combining methods are proposed in prior art, one is adaptive service combining method, service is mapped as the action in Markovian decision, the network chart of tectonic association service thus, each paths of result of calculation is exactly a workflow flow path generated.

Another kind proposes measure and the Adaptive QoS management architecture of each random Q oS index of Web service, and utilize the dynamic control method of stochastic pattern discrete event system---and Markovian decision process, design the reliable Web service combination algorithm of random Q oS perception.Experimental result shows, consider QoS metric method and the QoS management architecture of randomness, and the MDP balancing " risk " and " remuneration " improves Services Composition success ratio effectively.

The defect of above-mentioned first method is: along with number of tasks and the violent increase planning status number, the computing time of literature method also significantly increases, this is because the time complexity of literature method is higher, and do not adopt the optimal strategy of real-time update, therefore computing time is longer, and pace of learning is fast not.

The defect of above-mentioned second method is: use based on MDP method just by the reliability of service call or call service income as the transition probability of service, do not consider the observable QoS variable of part.In addition, be not also the observability data with probability distribution by QoS grade mapping, make it to be applicable to actual service environment, thus limited to the raising being combined into power.

Summary of the invention

For defect of the prior art, the invention provides a kind of in uncertain environment based on the service combining method of intensified learning and system, solve existing service combining method pace of learning slow, the problem that Services Composition success ratio is low.

For solving the problems of the technologies described above, the invention provides following technical scheme:

First aspect, the invention provides a kind of in uncertain environment the service combining method based on intensified learning, comprising:

S1. services request is received;

S2. good according to training in advance learning algorithm obtains optimal strategy;

S3. service is called according to described optimal strategy.

Further, the training process of described learning algorithm comprises:

S01. original service QoS Sample Storehouse is set up;

S02. initialization services QoS Sample Storehouse;

S03. intensified learning is carried out to QoS data;

S04. the change of service QoS Sample Storehouse is detected;

If S05. the quantity of service of service QoS Sample Storehouse there occurs change, then leave out corresponding service, re-construct service state space;

If S06. the service QoS value of service QoS Sample Storehouse there occurs change, then the qos value of update service, this renewal is made marks;

S07. judge whether the data of service QoS Sample Storehouse meet present situation and the requirement of current service request, if do not meet, then perform step S09, otherwise, perform step S08;

S08. training result is exported;

S09. change original service QoS Sample Storehouse, re-execute step S03.

Further, described step S03 carries out intensified learning to QoS data and comprises:

S031. original service Sample Storehouse is inputted;

S032. the Markovian decision process MDP model of improvement is built;

Wherein, Markovian decision process model is: M=(S, A, T, R, Z, W, H); Wherein, S is the set of state, and A is the set of action, and T is state transition probability, and R performs an action under representing state S the return of A, and Z is the set of the qos value of ladder function, and W is that the A that performs an action under state S obtains the probability of QoS section value Z, and H is the stage of planning;

Web service combination is represented with the MDP improved:

S: state, each Web service represents a state, and composite services just become a process flow diagram from original state to final state, in current service, if select the next one to serve successfully, then forward next service to, otherwise continue to select to serve until success;

A: action is calling of Web service;

T:S × A × S → [0,1] is state transition function, represents that service performs the probability distribution transferring to next service after certain calls under current state; T (s, a, s')=P (s'|s, a) represent that in service be s, after calling a, Service change is the probability of s';

R: be Reward Program, represent under the condition of current service s, take action a, arrive the obtainable return value of next service s';

Z:QoS section value, for the likely set of QoS grade observed of serviced caller;

W:S × A × Z → [0,1] observes function, represents the probability distribution obtaining observation under current state and previous step call action condition; After W (a, s', z)=Pr (z|a, s') represents and calls a, service becomes s', obtain the grade of service observed value be the probability of z;

H: stage, refers to the step number of planning, and H can be divided into limited step and unlimited step two kinds; Discount factor γ ∈ [0,1], describes the size of discount, and it makes financial value reduce along with the increase of planning step number, and in planning, the income of h step is exactly γ ^h;

S033. the MDP model based on described improvement calculates renewal function;

S034. learning outcome is exported.

Further, described step S033 carries out calculating based on described MDP model to renewal function and comprises:

Described renewal function is:

Q(s _t,a _t)←Q(s _t,a _t)+α[r _t+1+γ×Z×W×Q(s _t+1,a _t+1)-Q(s _t,a _t)]；

According to described renewal function, the service combining method based on intensified learning is as follows:

S11. random initializtion Q (s, a);

S12., episode quantity is set;

S13. for each episode, step S14 is performed, until reach the quantity of episode;

S14. initialization s, with generation, from Q, (s, ε-greedy strategy a) is from s _tmiddle selection a _t;

For each step of each episode, perform an action a _t, observe r _t, s _t+1, with generation, from Q, (s, ε-greedy strategy a) is from s _t+1middle selection a _t+1, Q (s _t, a _t) ← Q (s _t, a _t)+α [r _t+1+ γ × Z × W × Q (s _t+1, a _t+1)-Q (s _t, a _t)], s _t← s _t+1, a _t← a _t+1; To s be final state; Wherein, (s, be a) a Q value table, store the value of state s and action a, t is time point to Q, and α is learning rate, s _t, a _t, r _tbe respectively previous step state, action and return, be initially sky;

Wherein, strategy decision assembled scheme, the service combining method based on intensified learning performs combination by learning strategy, and learning outcome is the optimal strategy learning to obtain.

Second aspect, present invention also offers a kind of in uncertain environment the service combination system based on intensified learning, comprising:

Receiving element, for receiving services request;

Optimal strategy acquiring unit, obtains optimal strategy for the learning algorithm good according to training in advance;

Composite services providing unit, for calling service according to described optimal strategy.

Further, described system also comprises training unit, for training study algorithm;

Wherein, described training unit is specifically for performing following operation:

S01. original service QoS Sample Storehouse is set up;

S02. initialization services QoS Sample Storehouse;

S03. intensified learning is carried out to QoS data;

S04. the change of service QoS Sample Storehouse is detected;

S08. training result is exported;

S09. change original service QoS Sample Storehouse, re-execute step S03.

Further, described training unit perform step S03 time specifically for performing following operation:

S031. original service Sample Storehouse is inputted;

S032. the Markovian decision process MDP model of improvement is built;

Web service combination is represented with the MDP improved:

A: action is calling of Web service;

S033. the MDP model based on described improvement calculates renewal function;

S034. learning outcome is exported.

Further, described training unit perform step S033 time specifically for performing following operation:

Described renewal function is:

S11. random initializtion Q (s, a);

S12., episode quantity is set;

As shown from the above technical solution, of the present invention in uncertain environment based on the service combining method of intensified learning, QoS information user being called service is modeled as specific probability distribution, then probability distribution is mapped in markov decision process.The present invention uses real-time Q value renewal function in nitrification enhancement, accelerates the speed of pace of learning and acquisition optimal strategy.The present invention is when quantity of service and service QoS attribute change, and the method for proposition still has adaptive ability, and this is that the characteristic having a rapid convergence due to this algorithm determines.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 be first embodiment of the invention provide in uncertain environment based on the process flow diagram of the service combining method of intensified learning;

Fig. 2 is the training process schematic diagram of learning algorithm;

Fig. 3 is process schematic QoS data being carried out to intensified learning;

Fig. 4 is supply chain schematic diagram;

Fig. 5 is optimisation strategy figure;

Fig. 6 is the optimisation strategy figure changed;

Fig. 7 is that the power contrast that is combined into of the method for the invention and background technology part second method schemes;

Fig. 8 is the time loss amount comparison diagram of the method for the invention and background technology part second method;

Fig. 9 is that the method for the invention and background technology part first method are about the pace of learning comparison diagram of serving number;

Figure 10 is the method for the invention and the background technology part first method pace of learning comparison diagram about number of actions;

Figure 11 is the method for the invention and the average cumulative income comparison diagram of background technology part first method under the different QoS rate of change;

Figure 12 is the method for the invention and the average cumulative income comparison diagram of background technology part first method under the different service rate of change;

Figure 13 be second embodiment of the invention provide in uncertain environment based on the structural representation of the service combination system of intensified learning.

Embodiment

For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, clear, complete description is carried out to the technical scheme in the embodiment of the present invention, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

The present invention proposes the service combining method based on intensified learning under a kind of uncertain environment.Two kinds of uncertainties that the method has for Web service combination, the uncertainty of service call result and the uncertainty of QoS, utilize the Markovian decision process of discrete time to having probabilistic Services Composition modeling, using the QoS polymerizing value of service as instant remuneration, finally try to achieve the service optimal strategy of combination.The method that the present invention proposes does not need the transition probability knowing each state, uses the qos value with probability distribution to Services Composition modeling.This process employs machine learning algorithm and obtain optimum composite services, machine learning algorithm can when unknown-model learning strategy.Experimental result shows that the learning cycle of the method for the invention is short, adaptable.

First embodiment of the invention provide a kind of in uncertain environment the service combining method based on intensified learning, see Fig. 1, comprise the steps:

Step 101: receive services request.

Step 102: the learning algorithm good according to training in advance obtains optimal strategy.

Step 103: call service according to described optimal strategy.

The present invention program comprises training process and Services Composition two processes of intensified learning method.Wherein, see Fig. 2, the training process of described learning algorithm comprises:

Step 201: set up original service QoS Sample Storehouse.

Step 202: initialization services QoS Sample Storehouse.

In this step, each history of serving is called record ordering, comprising the statistical information of service call result, and the concrete numerical value of service QoS when at every turn calling.Comprise again in the statistical information of service call result call successfully, the content such as concrete numerical value that failure and service export.

Step 203: intensified learning is carried out to QoS data.

In this step, intensified learning method adopts the Q-learning learning method improved, and detailed process can see following step 301-304.

Step 204: the change of service QoS Sample Storehouse is detected.

Step 205: if the quantity of service of service QoS Sample Storehouse there occurs change, then leave out corresponding service, re-construct service state space.

Step 206: if the service QoS value of service QoS Sample Storehouse there occurs change, then the qos value of update service, makes marks to this renewal.

Step 207: judge whether the data of service QoS Sample Storehouse meet present situation and the requirement of current service request, if do not meet, then performs step 209, otherwise, perform step 208.

Step 208: export training result.

Step 209: change original service QoS Sample Storehouse, re-execute step 203.

Wherein, see Fig. 3, described step 203 pair QoS data is carried out intensified learning and is comprised:

Step 301: input original service Sample Storehouse.

Step 302: build the Markovian decision process MDP model improved.

In this step, in order to solve the Web service combination problem under uncertain environment, the definition of Markovian decision process is improved to: M=(S, A, T, R, Z, W, H).Wherein, S is the set of state, and A is the set of action, and T is state transition probability, and R performs an action under representing state S the return of A, and Z is the set of the qos value of ladder function, and W is that the A that performs an action under state S obtains the probability of QoS section value Z, and H is the stage of planning.The target of Web service combination is just to locate an optimal strategy, and this optimal strategy specifies the operation that each candidate service should be taked, and makes the accumulative return of composite services maximum.

Web service combination is represented below with the MDP improved:

S: state, each Web service represents a state.Composite services just become a process flow diagram from original state to final state, in current service, if select the next one to serve successfully, then forward next service to, otherwise continue to select to serve until success.

A: action, i.e. the calling of Web service.The result of calling of Web service has uncertainty.

T:S × A × S → [0,1] is state transition function, represents that service performs the probability distribution transferring to next service after certain calls under current state.(s'|s, a) represent that in service be s, after calling a, Service change is the probability of s' to T (s, a, s')=P.In fact, the uncertainty calling effect is incorporated into planning by state transition function by MDP.Because next step state s' is only by the impact of current state s and current taked action a, so state transition function T has Markov property.Transition probability can be learnt by Bayesian or the method such as intensified learning obtains, also can simply using calling reliability of service as transition probability.

R: be return (rewards and punishments) function, represent under the condition of current service S, take action a, arrive the obtainable return value of next service s'.The introducing of Reward Program is that one of MDP and classical planning model is distinguished greatly.The object of planning by Reward Program, is converted to the maximization pursuing long-term gain by MDP model, and takes a walking to move obtained income not play a decisive role.In Web service combination, user can be reflected by returning, such as, to the preference of different supplier the preference of Web service.

Z:QoS section value, exactly the likely set of QoS grade observed of serviced caller.In uncertain QoS environment, due to network, the complicated factor service call persons such as load can not obtain the comprehensive QoS information of service immediately, and therefore QoS information has part observability.QoS grade is the composite rating of QoS attribute, built by general five QoS attributes, be respectively: the execution time (Time), execution cost (Cost), reliability (reliability), availability (available), credit worthiness (reputation).Because the span of QoS attribute is different, numerical value unit is different, and the relative importance of each attribute cannot be weighed when carrying out Services Composition, so the QoS attribute that should standardize, uses unified unit, then transfers unified unit to QoS grade, such as, can divide 3 grades.Again because the QoS attribute of Web service has following character: comprehensive, ambiguity, dynamic, correlativity, so assessed QoS grade has certain uncertainty, such as likely 30% time was grade 1,60% time was grade 2, and 10% time was grade 3.

W:S × A × Z → [0,1] observes function, represents the probability distribution obtaining observation under current state and previous step call action condition.After W (a, s', z)=Pr (z|a, s') represents and calls a, service becomes s', obtain the grade of service observed value be the probability of Z.

H: stage, refers to the step number of planning, and H can be divided into limited step and unlimited step two kinds.Only consider that the income of the current action taked reaching maximum return in the future, therefore cannot will plan multistep action below.Each step in planning all bears interest, and the income obtained more afterwards is less, and aggregate earnings value can be made like this to restrain.Discount factor γ ∈ [0,1], describes the size of discount, and it makes financial value reduce along with the increase of planning step number.In planning, the income of h step is exactly γ ^h.

Step 303: the MDP model based on described improvement calculates renewal function.

This step is carried out calculating based on the MDP model of described improvement to renewal function and is comprised:

Described renewal function is:

According to described renewal function, the service combining method based on intensified learning comprises:

Step 401: random initializtion Q (s, a);

Step 402: episode quantity is set;

Step 403: for each episode, performs step 404, until reach the quantity of episode;

Step 404: initialization s, with generation, from Q, (s, ε-greedy strategy a) is from s _tmiddle selection a _t;

Usually, Q-learning, when selection action, needs to compromise between the new environmental information of exploration (Exploration) and utilization (Expoitation) existing optimisation strategy, so just can avoid being absorbed in locally optimal solution.Document in the past adopts ε-greedy strategy, exactly with the action that ε (0< ε <1) probability selection current Q value is maximum, new trial is carried out with the probability of 1-ε, ensure that study is to each action thus, give right of priority to the maximum action of Q value simultaneously, both ensure that results of learning like this, accelerate speed of convergence again.This step considers the probability distribution factor of qos value on the basis of ε-greedy strategy, and propose a kind of method being suitable for Services Composition under uncertain environment, the renewal function of new method is:

Q(s _t,a _t)←Q(s _t,a _t)+α[r _t+1+γ×Z×W×Q(s _t+1,a _t+1)-Q(s _t,a _t)]

As seen from the above equation, the value of current action, except depending on the state adjacent with it and action, also depends on the QoS grade point of next service and the probability corresponding with grade point.In addition, more new formula eliminates max function, so as to use real-time learning to strategy accelerate to try to achieve the speed of optimal strategy.According to above more new formula, the service combining method based on study that this step proposes is as follows:

Based on the service combining method of study

Wherein, strategy decision assembled scheme, combined method of the present invention performs combination by learning strategy.The result of the method is the Services Composition scheme that optimal strategy or optimum have been arrived in study, can carry out the environment of Adaptive change by changing learning strategy.

Step 304: export learning outcome.

The Markovian decision process of research and utilization discrete time in the past, to having probabilistic Services Composition modeling, using the QoS polymerizing value of service as instant remuneration, finally tries to achieve the service optimal strategy of combination.This just requires the transition probability knowing each state in advance, but transition probability ratio is more difficult to get.In addition, former research does not consider that the value of QoS has certain probability distribution.The present invention extends service compination model, and use the qos value with probability distribution to Services Composition modeling, the method that the present invention proposes make use of machine learning algorithm and obtains optimum composite services, machine learning algorithm can when unknown-model learning strategy.Experimental result shows that the learning cycle of this method is short, adaptable.

The QoS information that user is called service by the present invention is modeled as specific probability distribution, then probability distribution is mapped in markov decision process.The present invention uses real-time Q value renewal function in nitrification enhancement, accelerates the speed of pace of learning and acquisition optimal strategy.The present invention is when quantity of service and service QoS attribute change, and the method for proposition still has adaptive ability, and this is that the characteristic having a rapid convergence due to this algorithm determines.

Below in conjunction with an instantiation, the service combining method that above-described embodiment provides is explained:

Supply chain service is the composite services (as Fig. 4) that the demand for commodity for meeting retailer is set up, the member of composite services has retailer (Retailer, R), manufacturer 1 and manufacturer 2 (Manufacturer, M1, M2), priority of supply business (PreferredSupplier, PrS), common supplier (OtherSupplier, PrS), spot market (SpotMarket, SpM), forwarding agent (Deliveryservice, D).In Fig. 4, there are common common supplier and spot market in manufacturer 1 and manufacturer 2, but there is priority of supply business in manufacturer 1.The flow process of retailer's order goods may be: select suitable manufacturer to place an order, if there is priority of supply business in manufacturer, then first inquire about the stock of priority of supply business, if priority of supply business does not meet order requirements, then manufacturer inquires about common supplier again, and another kind of mode is sought new supplier from spot market or directly buys required kinds of goods from spot market.After supplier is selected, forwarding agent is finally selected to have conveyed goods the process of order.Have some factors that will consider in process execution, such as, inquiring about at every turn or call may be failed, so needs to return to original service and re-executes or continue to select other to serve.In addition, spot market can meet order needs usually, but usually cost is also higher, makes it can not become the first-selected supply channel of the service of order, but needs to select according to certain strategy.

Manufacturer in supply chain provides corresponding Web service, and these Web services constitute operation flow, and namely this operation flow is composite services.As earlier indicated, there is the uncertainty calling result and QoS of Web service.Such as, Web service is called to the uncertainty of result, show as retailer and do not know manufacturer 1 or manufacturer 2 more may provide required kinds of goods.For the uncertainty of QoS, show as certain section of time spot market there are sufficient cheap kinds of goods but the delivery time slow, and another a period of time is antithesis.

In Fig. 5, the state of MDP is by retailer, and manufacturer, priority of supply business, common supplier, the service that spot market and forwarding agent provide represents.The action of MDP is calling of Web service initiation, and these service call results are random, may meet or not meet demand for commodity.The observed value of MDP is QoS grade, combines the execution time, cost, reliability, the grade point of the indexs such as credit worthiness.Figure below is the optimisation strategy figure that algorithm that this chapter proposes draws.

This optimisation strategy figure is used to guide composite services: according to this prioritization scheme, retailer places an order to manufacturer 1, if the QoS grade of manufacturer 1 is 1 (chance about 67%), then forward manufacturer 1 (left node) to, then stock is inquired about to priority of supply business by manufacturer 1.If the QoS grade of manufacturer 1 is 2 (chances about 33%), then still forward manufacturer 1 (right node) to, then stock is inquired about to other suppliers by manufacturer 1.Stock is inquired about in two kinds of situation to priority of supply business by manufacturer 1.The first, the QoS grade obtained is the observed value of 1 (chance about 49%) or 2 (chances about 33%), then all meet the qos requirement of setting, thus last by priority of supply business to manifest under forwarding agent.The second, the QoS grade obtained is 3 (chances about 18%), then turn to other suppliers to inquire about, finally by other suppliers to manifest under forwarding agent, its QoS grade is 1 (chance about 92%).

3 QoS grades of manufacturer 1 constitute a probability distribution, the probability of 3 QoS grades and be 1.When the QoS grade probability distribution of manufacturer 1 is not always partial to high QoS grade, and the QoS grade probability distribution of other manufacturers is close to when distributing just very much, and it is more complicated that optimisation strategy figure may become, as Fig. 6.

The basic reason that policy map complicates is that uniform probability distribution decreases the optional service with absolute predominance, and then decreases the execution route with absolute predominance, creates more alternative condition.According to this optimisation strategy figure, optimal combination scheme is: retailer first places an order to manufacturer 1, if the QoS grade of manufacturer 1 is 1 (chance about 11%), then forward manufacturer 1 to, then stock is inquired about to priority of supply business by manufacturer 1.If the QoS grade of manufacturer 1 is 2 (chances about 27%), then also forward manufacturer 1 to, then stock is inquired about to other suppliers by manufacturer 1, and ensuing strategy and Fig. 5 are roughly the same.Difference is, when the QoS grade of manufacturer 1 is 3 (chance about 62%), then cancels and placing an order to manufacturer 1, place an order to manufacturer 2.When the QoS grade of manufacturer 2 is 1 (chance about 93%, in order to simplify, ignores other probability herein), then forward manufacturer 2 to, then stock is inquired about to other suppliers by manufacturer 2.When the QoS grade of other suppliers is 1 (chance about 96%), then turn to other suppliers, finally by other suppliers to manifest under forwarding agent.

Experiment is compared: method provided by the present invention and background technology part second method (RandomComposition, RC) are in Services Composition success ratio and computing time compares.Number of tasks is all identical, get 30 candidate service, RC is the result of background technology part second method, LC is the result of the method for the invention, as seen from Figure 7, in uncertain environment, the power that is combined into of the method for the invention is higher, this is because the general method based on MDP is just by the reliability of service call or the transition probability of income as service calling service, the observable QoS variable of part is not considered yet, and QoS grade mapping is the observability data with probability distribution by method of the present invention, be applicable to actual service environment, thus improve the success ratio of combination.

As seen from Figure 8, along with the increase of number of tasks, the computing time of the method for the invention is than the remarkable reduction in document, this is due to the violent increase along with planning status number, also index increase computing time of classic method, but the algorithm real-time update optimal strategy that the present invention adopts, have lower time complexity, therefore computing time is less.

From experiment, the method that the present invention adopts, actually by introducing more multi-model parameter closing to reality situation more, improves the success ratio of Services Composition to a great extent, uses the method for real-time update strategy to reduce the computing time of Solve problems simultaneously.

As shown in Figures 9 and 10, the convergent cycle of visible learning method of the present invention is less, and pace of learning is faster for the comparative result of method of the present invention and background technology part first method (AdaptiveandDynamicComposition, ADC).

In dynamic network environment, the QoS attribute average of service can not remain constant, therefore regularly in an experiment QoS property value is changed with certain probability, in Figure 11, every 100 learning cycles, change 5% respectively, the QoS property value of 10%, meanwhile, contrast with the situation (0%) of fixing qos value.Result shows, in these 3 kinds of situations of change 0%, 5%, 10%, the speed of convergence of LC is all fast than ADC.As can be seen from Figure 11, although the change of qos value adds the time that algorithm obtains optimization solution, it can not stop algorithm to obtain optimization solution.

In dynamic network environment, Web service quantity may increase because of new foundation, also may reduce because of losing efficacy.In another group experiment, regularly change the quantity of service, point 3 kinds of situations, be respectively 0% ,+2% ,-2%, namely quantity of service is constant, increases by the service of 2%, reduces by the service of 2%.Figure 12 is comparative result, and result shows that service number reduces and reduces the time of solving, and serves number increase and improve the time of solving.

Second embodiment of the invention provide a kind of in uncertain environment the service combination system based on intensified learning, see Figure 13, comprise receiving element 11, optimal strategy acquiring unit 12 and composite services providing unit 13;

Described receiving element 11, for receiving services request;

Described optimal strategy acquiring unit 12, obtains optimal strategy for the learning algorithm good according to training in advance;

Described composite services providing unit 13, for calling service according to described optimal strategy.

Preferably, described system also comprises training unit 14, for training study algorithm;

Wherein, described training unit 14 is specifically for performing following operation:

S01. original service QoS Sample Storehouse is set up;

S02. initialization services QoS Sample Storehouse;

S03. intensified learning is carried out to QoS data;

S04. the change of service QoS Sample Storehouse is detected;

S08. training result is exported;

S09. change original service QoS Sample Storehouse, re-execute step S03.

Further, described training unit 14 perform step S03 time specifically for performing following operation:

S031. original service Sample Storehouse is inputted;

S032. the Markovian decision process MDP model of improvement is built;

Web service combination is represented with the MDP improved:

A: action is calling of Web service;

H: stage, refers to the step number of planning, and H can be divided into limited step and unlimited step two kinds; Only consider that the income of the current action taked cannot reach maximum return in the future, therefore multistep action below will be planned, each step in planning all bears interest, and the income obtained more afterwards is less, and aggregate earnings value can be made like this to restrain, discount factor γ ∈ [0,1], describe the size of discount, it makes financial value reduce along with the increase of planning step number, in planning, the income of h step is exactly γ ^h;

S033. the MDP model based on described improvement calculates renewal function;

S034. learning outcome is exported.

Further, described training unit 14 perform step S033 time specifically for performing following operation:

Described renewal function is:

S11. random initializtion Q (s, a);

S12., episode quantity is set;

Wherein, strategy decision assembled scheme, the service combining method based on intensified learning performs combination by learning strategy, and learning outcome is the optimal strategy learning to obtain

The system that the embodiment of the present invention provides may be used for the method described in above-described embodiment of performing, its principle and technique effect similar, no longer describe in detail herein.

Above embodiment only for illustration of technical scheme of the present invention, is not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. in uncertain environment based on a service combining method for intensified learning, it is characterized in that, comprising:

S1. services request is received;

S3. service is called according to described optimal strategy.

2. method according to claim 1, is characterized in that, the training process of described learning algorithm comprises:

S01. original service QoS Sample Storehouse is set up;

S02. initialization services QoS Sample Storehouse;

S03. intensified learning is carried out to QoS data;

S04. the change of service QoS Sample Storehouse is detected;

S08. training result is exported;

S09. change original service QoS Sample Storehouse, re-execute step S03.

3. method according to claim 2, is characterized in that, described step S03 carries out intensified learning to QoS data and comprises:

S031. original service Sample Storehouse is inputted;

S032. the Markovian decision process MDP model of improvement is built;

Web service combination is represented with the MDP improved:

A: action is calling of Web service;

H: stage, refers to the step number of planning, and H can be divided into limited step and unlimited step two kinds; Discount factor γ ∈ [0,1], represents the size of discount, and it makes financial value reduce along with the increase of planning step number, and in planning, the income of h step is γ ^h;

S033. the MDP model based on described improvement calculates renewal function;

S034. learning outcome is exported.

4. method according to claim 3, is characterized in that, described step S033 carries out calculating based on described MDP model to renewal function and comprises:

Described renewal function is:

S11. random initializtion Q (s, a);

S12., episode quantity is set;

5. in uncertain environment based on a service combination system for intensified learning, it is characterized in that, comprising:

Receiving element, for receiving services request;

6. system according to claim 5, is characterized in that, also comprises training unit, for training study algorithm;

S01. original service QoS Sample Storehouse is set up;

S02. initialization services QoS Sample Storehouse;

S03. intensified learning is carried out to QoS data;

S04. the change of service QoS Sample Storehouse is detected;

S08. training result is exported;

S09. change original service QoS Sample Storehouse, re-execute step S03.

7. system according to claim 8, is characterized in that, described training unit perform step S03 time specifically for performing following operation:

S031. original service Sample Storehouse is inputted;

S032. the Markovian decision process MDP model of improvement is built;

Web service combination is represented with the MDP improved:

A: action is calling of Web service;

S033. the MDP model based on described improvement calculates renewal function;

S034. learning outcome is exported.

8. system according to claim 7, is characterized in that, described training unit perform step S033 time specifically for performing following operation:

Described renewal function is:

S11. random initializtion Q (s, a);

S12., episode quantity is set;