CN109214518A

CN109214518A - Global optimization system and method based on continuous action learning automaton

Info

Publication number: CN109214518A
Application number: CN201710520108.7A
Authority: CN
Inventors: 李生红; 葛昊; 马颖华; 黄德双; 江文; 狄冲; 周之晟; 李怡晨
Original assignee: Shanghai Jiaotong University; Ctrip Computer Technology Shanghai Co Ltd
Current assignee: Shanghai Jiaotong University; Ctrip Computer Technology Shanghai Co Ltd
Priority date: 2017-06-30
Filing date: 2017-06-30
Publication date: 2019-01-15

Abstract

One kind being based on the global optimization system and method for continuous action learning automaton (CALA), it include: initialization module, action selection module, environmental feedback module, update module and output module, wherein: the parameter of initialization module initialization CALA algorithm, input behavior selecting module carries out action selection, behavior passes through ... environment ... enters environmental feedback module, the corresponding environmental feedback of behavior is obtained, locally optimal solution is obtained；Update module updates algorithm parameter according to environmental feedback, the parameter input behavior selecting module of update is completed an iteration, and improve smooth function；Improved smooth function is introduced to the environmental feedback module of next iteration, successive ignition is carried out, finally obtains extreme point, by current environmental feedback input/output module, output ...；It is exported as global minimum；The present invention has rational design, introduces smooth function and slope components are added improve, so that CALA is easier to jump out local minimum solution, and makes subsequent search have directionality, substantially increases convergence speed of the algorithm and accuracy.

Description

Global optimization system and method based on continuous action learning automaton

Technical field

It is specifically a kind of to be learnt based on continuous action the present invention relates to a kind of technology in learning automaton optimization field The global optimization system and method for automatic machine.

Background technique

Random function optimization method, the general iterative process that its solution is described using probability mechanism, is different from certainty letter Certainty point range in numberization method, the important foundation for thus showing algorithm is randomness.Such algorithm is applied widely, past Toward the problems such as can be used for solving large-scale continuous and discrete function optimization, Combinatorial Optimization.It can guarantee in theory of algorithm Convergence with probability 1 is to global optimum, but the opposite time spent is also very much.Since learning automaton is with extremely strong anti-interference The advantages that ability and global optimization ability, is applied in terms of function optimization and shows good application prospect.

Learning automaton (LA) is that simulation biology has seldom congenital knowledge since birth, and constantly and random environment Interactive learning process and establish.LA is adjusted itself decision probability vector, is adaptively learned by the continuous interaction with environment It practises to an optimal behavior, optimum behavior here refers to the behavior of maximum reward probability.The function of learning automaton It also can simply be described as interacting with a series of duplicate circulations for having feedback that environment carries out.

According to the type of behavior aggregate, learning automaton is divided into limited action learning automaton (FALA) and continuous action study Automatic machine (CALA).The behavior aggregate of FALA is limited number, and the behavior aggregate of CALA be it is unlimited, generally selected from real number axis Take one section of representative movement.During actual function optimization, real environment is often changeable and complicated, therefore limited Application of the learning automaton of movement in random environment is nothing like continuous action.

In Global Optimal Problem, the most common problem is got into a difficult position after finding a local minimum solution, can not Find globally optimal solution.Therefore, an important problem of Global Optimal Problem is how that locally optimal solution can be jumped out.Through Although the continuous action collection learning automaton algorithm of allusion quotation can also converge to optimal behavior, there is more serious be easy Fall into local minimum and the weaker problem of anti-noise ability.

Summary of the invention

The present invention In view of the above shortcomings of the prior art, proposes a kind of based on continuous action learning automaton (CALA) Global optimization system and method, smooth function is introduced in CALA algorithm, and improve to smooth function, is conducive to jump out Local optimum in global optimization function, and the search after making is more directional.

The present invention is achieved by the following technical solutions:

The present invention relates to a kind of global optimization systems based on continuous action learning automaton, comprising: initialization module, row For selecting module, environmental feedback module, update module and output module, in which: the ginseng of initialization module initialization CALA algorithm Number, input behavior selecting module carry out action selection, and the application of behavior passage path environment obtains feeding back anti-subsequently into environment Module is presented, the corresponding environmental feedback of behavior is obtained；Update module updates the parameter of CALA algorithm according to environmental feedback, by update Parameter input behavior selecting module completes an iteration；When the number of iterations reaches setting value, current environmental feedback is inputted Output module exports optimal routing information.

The present invention relates to a kind of global optimization methods for being based on continuous action learning automaton (CALA), first by existing CALA algorithm obtain locally optimal solution, improved smooth function is obtained according to the locally optimal solution；By improved smooth function The CALA algorithm that existing CALA algorithm is optimized is introduced, successive ignition is carried out by the CALA algorithm of optimization, is finally obtained Extreme point is exported as global minimum.

The iteration refers to: introducing improved one wheel iteration of smooth function progress in CALA algorithm and obtains local optimum Solution, when epicycle locally optimal solution is less than the locally optimal solution that last round of iteration obtains, local optimum that epicycle iteration is obtained Solution is set as the point of locally optimal solution, then carries out next round iteration, until the locally optimal solution when previous round iteration is greater than or waits In the locally optimal solution that last round of iteration obtains, i.e., the locally optimal solution that last round of iteration obtains is extreme point.

Number of the iteration in each round is identical.

The improved smooth function is defined as:Its In:F (x, x^*)=min (f (x), f (x^*)) it is smooth function, a and b are the constant of setting, and x is to work as The value of preceding selection, x^*For the value of acquired locally optimal solution, f () is to seek function.

The CALA algorithm of the optimization the following steps are included:

Step 1, the parameter for initializing CALA algorithm, and set the value of the number of iterations and each constant parameter.

Step 2 value is randomly generated according to current distribution and using the value as behavior input environment, is observed Value.

Step 3 obtains corresponding feedback from environment, that is, introduces improved smooth function.

Step 4, the mean value and standard deviation that distribution is updated according to the feedback in step 3.

Step 5 counts the number of iterations, if the number of iterations is less than the number of every wheel iteration of setting, returns to Step 2；Otherwise the locally optimal solution that observation is obtained as epicycle carries out step 6.

Step 6, when the locally optimal solution that epicycle iteration obtains be less than last round of iteration obtain locally optimal solution when, will this The locally optimal solution of wheel is assigned to the point and record of last round of obtained locally optimal solution, and return step 2 carries out next round iteration； Otherwise the point (i.e. extreme point) of the locally optimal solution last round of iteration obtained is exported as global minimum, terminates this algorithm.

The number of iterations of the standard deviation is not reset in every wheel iteration.

Technical effect

Compared with prior art, the present invention jumps out the part in global optimization function most by being introduced into smooth function first The figure of merit, then again for existing smooth function encounter plateau region search it is non-directional by be added slope components changed Into the advantages of improved smooth function still possesses former smooth function: it can ignore that all solutions for being not better than current points, and And be easy to jump out from current local best points, there are certain slopes for global optimization function at this time, so that subsequent search Direction is far from local minimum, without doing the idle work moved back and forth in current point；And convergence speed of the algorithm can be improved.

Detailed description of the invention

Fig. 1 is global optimization method flow diagram.

Specific embodiment

In optimal route selection, optimum therapeuticing effect field (such as: information propagates maximization, the selection of optimal metering medicaments) In, this method can obtain the locally optimal solution in path till now by existing CALA algorithm first, since the path may be only It is local optimum path, rather than global optimum path, improved smooth function is then obtained according to the locally optimal solution；It will change Into smooth function introduce the CALA algorithm that existing CALA algorithm optimized, repeatedly changed by the CALA algorithm of optimization In generation, finally obtains extreme point, as the global minimum output after Path selection, to obtain optimal route selection scheme.

Number of the iteration in each round is identical, and the specific experimental actual conditions of number obtain, and cannot It is too low, it otherwise easily can not find extreme point.

As shown in Figure 1, the present embodiment specifically includes the following steps:

Step 1, the parameter for initializing continuous action learning automaton (CALA) algorithm, set the value of each parameter.

The parameter includes: the number of iterations T, constant a, b, λ₁And λ₂(wherein parameter setting T is 20000, iteration wheel number m =4, each round iteration starts the number of iterations n and is set as 0.).

If f_min=inf, t=inf, t are the optimal the number of iterations of experimental setup, to guarantee that search does not utilize flat for the first time Sliding function.

Step 2 can determine that behavior belongs to normal distribution according to the case where environmental behaviour, then by current distribution N (μ, σ) (μ is mean value, and σ is standard deviation, and the initial value of the two vectors is determined by specific environmental behaviour, can according to general cala algorithm To obtain) a value α is randomly generated_n(n is the number of iterations) and α_nBehavior input environment.

Step 3 obtains corresponding feedback β (α from environment_n), that is, introduce improved smooth function.

The improved smooth function, which refers to, is added one at smooth function end to jump from current locally optimal solution Out, and as in CALA environment feedback come complete to selection act influence.

The smooth function may be defined as: when original function certain point value be greater than obtain at present optimal solution value when, This value put is assigned a value of the optimal solution obtained at present again；When original function is less than the optimal solution obtained at present in the value of certain point When value, the value of the point is remained unchanged, it may be assumed that F (x, x^*)=min (f (x), f (x^*)), in which: x is the value currently selected, x^*For The value of the locally optimal solution of acquisition, f () are to seek function.

Improved smooth function F ' (x, the x^*)=min (f (x),Wherein: A and b is the constant of setting.

Feedback β (the α_n)=(min (f_min,f(α_n))-λ₁)/λ₂, in which: f (α_n) be Shubert function (optimization Reference function) in α_nThe observation of point.

Step 4, the feedback β (α according to environment_n) update the mean μ being distributed and standard deviation sigma.

The update refers to: μ_n+1=μ_n-a*β(α_n)(α_n-μ_n)σ_n；

If step 5, the number of iterations n are less than T, return step 2；Otherwise step 6 is carried out.

If step 6, f_minGreater than f (α_n), that is, smaller value is had found, then f_min=f (α_n), t=n；Record current minimum point f(α_n), return step 2 simultaneously recalculates the number of iterations, but the number of iterations in standard deviation sigma counts non-return-to-zero；Otherwise terminate this Algorithm.

It is respectively two of 0.1 and 0.5 in noiseless, standard deviation by the present embodiment for the generality for proving the present embodiment Gaussian noise and two ranges are tested in the uniform white noise of [- 0.1,0.1] and [- 1,1] 5 environment totally respectively. According to Shubert function 50 experiments are carried out to four kinds of algorithms respectively, averaged, obtained the accuracy of each algorithm and be averaged The number of iterations, comparative experiments result are drawn a conclusion.

Four kinds of algorithms are respectively as follows: function optimization algorithm (algorithm 1) based on CALA, are based on and improved smooth letter It counts the function optimization algorithm (algorithm 2) of CALA combined, the function optimization algorithm (algorithm 3) based on improved CALA and is based on The function optimization algorithm (algorithm 4) of improved CALA in conjunction with smooth function, in which: algorithm 4 is algorithm of the invention.

The Shubert function is the reference function for being used for test optimization performance.

The number of iterations is denoted as T, and every wheel iteration employs T removal search minimum value, then sees whether to find smaller Value and whether need to continue to run algorithm, T=20000 in experiment.

Experiment purpose is for this kind of common this common improved CALA of CALA algorithm and algorithm 3 of algorithm 1 Existing more serious the problem of being easily trapped into local minimum and weaker anti-noise ability itself, improves in algorithm, leads to It crosses algorithm 1 and algorithm 2 (improved smooth function is added) and improved algorithm 3 and algorithm 4 (joined improved smooth letter Number) experiment is compared, obtain experimental result.

The algorithm 1 specifically includes the following steps:

S1: the mean μ and variances sigma of the Gaussian Profile of initialization behavior value, the two are randomly choosed according to behavior section.

S2: a value α is randomly generated according to current distribution N (μ, σ)_nAnd α_nAnd μ_nAs two behavior input environments, N is current iteration number, 1≤n≤20000, α_min≤α_n≤α_max(n is 1 at this time).

S3: feedback β (α is obtained from environment_n) and β (μ_n):

S4: μ and σ is updated:

μ_n+1=μ_n+p*f₁[μ_n,σ_n,α_n,β(α_n),β(μ_n)],

σ_n+1=σ_n+p*f₂[μ_n,σ_n,α_n,β(α_n),β(μ_n)]-c*α[σ_n-σ_l]；

Wherein:

σ_lFor setting Optimal variance, is set as 1 in this.

S5: when the number of iterations reaches T, terminating this algorithm, exports n as Function Optimization point, otherwise returns to S2.

The algorithm 2 specifically includes the following steps:

S1: the value of initialization T, a, b, μ and σ, and set f_min=inf, t=inf, to guarantee that search does not utilize flat for the first time Sliding function.

S2: a value α is randomly generated according to current distribution N (μ, σ)_nAnd α_nAnd μ_nAs two behavior input environments, N is current iteration number, 1≤n≤20000.

S3: feedback β (α is obtained from environment_n) and β (μ_n):

S4: μ and σ is updated:

μ_n+1=μ_n+a*f₁[μ_n,σ_n,α_n,β(α_n),β(μ_n)]

σ_n+1=σ_n+a*f₂[μ_n,σ_n,α_n,β(α_n),β(μ_n)]-c*α_n[σ_n-σ_l]；Wherein: f₁(.)、f₂() andDefinition It is identical as algorithm 1.

S5: if the number of iterations is less than T, S2 is returned to；Otherwise S6 is carried out.

S6: if f_minGreater than f (n), i.e., T CALA algorithm Jing Guo epicycle has found one and obtains than last round of iteration The smaller local minizing point of local minimum, then make f_min=f (n), t=n are to record current minimum point.S2 is returned to lay equal stress on It is new to calculate the number of iterations；Otherwise terminate this algorithm.

The algorithm 3 specifically includes the following steps:

S3: by α_nAnd μ_nFeedback β (α is obtained as two behavior input environments and from environment_n) and β (μ_n): β (α_n)= (min(f_min,f(α_n))-λ₁)/λ₂, β (μ_n)=(min (f_min,f(μ_n))-λ₁)/λ₂

S4: μ and σ: μ is updated_n+1=μ_n-c*β(α_n)(α_n-μ_n)σ_n,

S5: when the number of iterations n reaches T, terminating this algorithm, exports using n as Function Optimization point, otherwise returns to S2.

The parameter setting of four kinds of algorithms is as follows:

Algorithm 1:m=4, p=3*10^-4, σ_l=0.01, c=5；

Algorithm 2:m=4, p=3*10^-4, σ_l=0.01, c=5, a=10, b=1；

Algorithm 3:c=1, λ₁=-15, λ₂=30, m=4；

Algorithm 4: λ₁=-15, λ₂=30, m=4, a=10, b=1.

The Shubert function are as follows:Wherein: u (x, B, k, m) be such function: as x > b, u (x, b, k, m)=k (x-b)^m；When | x | when≤b, function u (x, b, k, m)=0； As x <-b, u (x, b, k, m)=k (- x-b)^m。

The function u (x, b, k, m) possesses 19 local minizing points between [- 10,10], and 3 therein are complete Office's minimum point.Global minimum can be got when x is -5.9,0.4 and 6.8, having global minimizer is -12.87.

The experimental result is as shown in table 1, and iteration round is 3 wheels.

Each algorithm average accuracy (%) of table 1

The experimental result carried out in the environment of Gaussian noise shows, algorithm 3 and algorithm 4 is averaged when standard deviation is 0.1 Accuracy is respectively 56% and 86%, and performance improves 0.34 times；The average accuracy of algorithm 3 and algorithm 4 when standard deviation is 0.5 Respectively 58% and 84%, performance improves 0.44 times.

Pass through the available conclusion of experimental result: joined in existing CALA algorithm and improved CALA algorithm improved After smooth function, local minimum can be jumped out, and anti-noise ability greatly increases.

The present embodiment is related to a kind of optimization system of optimal path, comprising: initialization module, action selection module, environment Feedback module, update module and output module, in which: initialization module initializes the parameter of CALA algorithm, input behavior selection Module carries out action selection, and the application of behavior passage path environment obtains feeding back subsequently into environmental feedback module, obtains behavior Corresponding environmental feedback；Update module updates the parameter of CALA algorithm according to environmental feedback, and the parameter input behavior of update is selected It selects module and completes an iteration；When the number of iterations reaches setting value, by current environmental feedback input/output module, output is most Excellent routing information.

After the algorithm of the present embodiment, obtained optimal path, Path selection compared with the prior art will not be by office The constraint of portion's minimal point, anti-noise ability are also obviously improved.

Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute Limit, each implementation within its scope is by the constraint of the present invention.

Claims

1. a kind of global optimization system based on continuous action learning automaton characterized by comprising initialization module, row For selecting module, environmental feedback module, update module and output module, in which: the ginseng of initialization module initialization CALA algorithm Number, input behavior selecting module carry out action selection, and the application of behavior passage path environment obtains feeding back anti-subsequently into environment Module is presented, the corresponding environmental feedback of behavior is obtained；Update module updates the parameter of CALA algorithm according to environmental feedback, by update Parameter input behavior selecting module completes an iteration；When the number of iterations reaches setting value, current environmental feedback is inputted Output module exports optimal routing information.

2. a kind of global optimization method of the continuous action learning automaton based on system described in claim 1, which is characterized in that Locally optimal solution is obtained by existing CALA algorithm first, improved smooth function is obtained according to the locally optimal solution；It will change Into smooth function introduce the CALA algorithm that existing CALA algorithm optimized, repeatedly changed by the CALA algorithm of optimization In generation, finally obtains extreme point, exports as global minimum；

The iteration refers to: improved one wheel iteration of smooth function progress introduced in CALA algorithm obtains locally optimal solution, When epicycle locally optimal solution is less than the locally optimal solution that last round of iteration obtains, the locally optimal solution that epicycle iteration obtains is set It is set to the point of locally optimal solution, then carries out next round iteration, until when the locally optimal solution of previous round iteration is more than or equal to upper The locally optimal solution that one wheel iteration obtains, i.e., the locally optimal solution that last round of iteration obtains are extreme point.

3. global optimization method according to claim 2, characterized in that number phase of the iteration in each round Together.

4. global optimization method according to claim 2, characterized in that the improved smooth function is defined as:Wherein:F (x, x^*)=min (f (x), f(x^*)) it is smooth function, a and b are the constant of setting, and x is the value currently selected, x^*For the value of acquired locally optimal solution, f () is to seek function.

5. global optimization method according to claim 2, characterized in that the CALA algorithm of the optimization includes following step It is rapid:

Step 1, the parameter for initializing CALA algorithm, and set the value of the number of iterations and each constant parameter；

Step 2 value is randomly generated according to current distribution and using the value as behavior input environment, obtains observation；

Step 3 obtains corresponding feedback from environment, that is, introduces improved smooth function；

Step 4, the mean value and standard deviation that distribution is updated according to the feedback in step 3；

Step 5 counts the number of iterations, if the number of iterations is less than the number of every wheel iteration of setting, returns to step 2；Otherwise the locally optimal solution that observation is obtained as epicycle carries out step 6；

Step 6, when the locally optimal solution that epicycle iteration obtains be less than last round of iteration obtain locally optimal solution when, by epicycle Locally optimal solution is assigned to the point and record of last round of obtained locally optimal solution, and return step 2 carries out next round iteration；Otherwise The point for the locally optimal solution that last round of iteration is obtained, i.e. extreme point are exported as global minimum, terminate this algorithm；