CN109214518A - Global optimization system and method based on continuous action learning automaton - Google Patents
Global optimization system and method based on continuous action learning automaton Download PDFInfo
- Publication number
- CN109214518A CN109214518A CN201710520108.7A CN201710520108A CN109214518A CN 109214518 A CN109214518 A CN 109214518A CN 201710520108 A CN201710520108 A CN 201710520108A CN 109214518 A CN109214518 A CN 109214518A
- Authority
- CN
- China
- Prior art keywords
- algorithm
- iteration
- module
- optimal solution
- cala
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
One kind being based on the global optimization system and method for continuous action learning automaton (CALA), it include: initialization module, action selection module, environmental feedback module, update module and output module, wherein: the parameter of initialization module initialization CALA algorithm, input behavior selecting module carries out action selection, behavior passes through ... environment ... enters environmental feedback module, the corresponding environmental feedback of behavior is obtained, locally optimal solution is obtained;Update module updates algorithm parameter according to environmental feedback, the parameter input behavior selecting module of update is completed an iteration, and improve smooth function;Improved smooth function is introduced to the environmental feedback module of next iteration, successive ignition is carried out, finally obtains extreme point, by current environmental feedback input/output module, output ...;It is exported as global minimum;The present invention has rational design, introduces smooth function and slope components are added improve, so that CALA is easier to jump out local minimum solution, and makes subsequent search have directionality, substantially increases convergence speed of the algorithm and accuracy.
Description
Technical field
It is specifically a kind of to be learnt based on continuous action the present invention relates to a kind of technology in learning automaton optimization field
The global optimization system and method for automatic machine.
Background technique
Random function optimization method, the general iterative process that its solution is described using probability mechanism, is different from certainty letter
Certainty point range in numberization method, the important foundation for thus showing algorithm is randomness.Such algorithm is applied widely, past
Toward the problems such as can be used for solving large-scale continuous and discrete function optimization, Combinatorial Optimization.It can guarantee in theory of algorithm
Convergence with probability 1 is to global optimum, but the opposite time spent is also very much.Since learning automaton is with extremely strong anti-interference
The advantages that ability and global optimization ability, is applied in terms of function optimization and shows good application prospect.
Learning automaton (LA) is that simulation biology has seldom congenital knowledge since birth, and constantly and random environment
Interactive learning process and establish.LA is adjusted itself decision probability vector, is adaptively learned by the continuous interaction with environment
It practises to an optimal behavior, optimum behavior here refers to the behavior of maximum reward probability.The function of learning automaton
It also can simply be described as interacting with a series of duplicate circulations for having feedback that environment carries out.
According to the type of behavior aggregate, learning automaton is divided into limited action learning automaton (FALA) and continuous action study
Automatic machine (CALA).The behavior aggregate of FALA is limited number, and the behavior aggregate of CALA be it is unlimited, generally selected from real number axis
Take one section of representative movement.During actual function optimization, real environment is often changeable and complicated, therefore limited
Application of the learning automaton of movement in random environment is nothing like continuous action.
In Global Optimal Problem, the most common problem is got into a difficult position after finding a local minimum solution, can not
Find globally optimal solution.Therefore, an important problem of Global Optimal Problem is how that locally optimal solution can be jumped out.Through
Although the continuous action collection learning automaton algorithm of allusion quotation can also converge to optimal behavior, there is more serious be easy
Fall into local minimum and the weaker problem of anti-noise ability.
Summary of the invention
The present invention In view of the above shortcomings of the prior art, proposes a kind of based on continuous action learning automaton (CALA)
Global optimization system and method, smooth function is introduced in CALA algorithm, and improve to smooth function, is conducive to jump out
Local optimum in global optimization function, and the search after making is more directional.
The present invention is achieved by the following technical solutions:
The present invention relates to a kind of global optimization systems based on continuous action learning automaton, comprising: initialization module, row
For selecting module, environmental feedback module, update module and output module, in which: the ginseng of initialization module initialization CALA algorithm
Number, input behavior selecting module carry out action selection, and the application of behavior passage path environment obtains feeding back anti-subsequently into environment
Module is presented, the corresponding environmental feedback of behavior is obtained;Update module updates the parameter of CALA algorithm according to environmental feedback, by update
Parameter input behavior selecting module completes an iteration;When the number of iterations reaches setting value, current environmental feedback is inputted
Output module exports optimal routing information.
The present invention relates to a kind of global optimization methods for being based on continuous action learning automaton (CALA), first by existing
CALA algorithm obtain locally optimal solution, improved smooth function is obtained according to the locally optimal solution;By improved smooth function
The CALA algorithm that existing CALA algorithm is optimized is introduced, successive ignition is carried out by the CALA algorithm of optimization, is finally obtained
Extreme point is exported as global minimum.
The iteration refers to: introducing improved one wheel iteration of smooth function progress in CALA algorithm and obtains local optimum
Solution, when epicycle locally optimal solution is less than the locally optimal solution that last round of iteration obtains, local optimum that epicycle iteration is obtained
Solution is set as the point of locally optimal solution, then carries out next round iteration, until the locally optimal solution when previous round iteration is greater than or waits
In the locally optimal solution that last round of iteration obtains, i.e., the locally optimal solution that last round of iteration obtains is extreme point.
Number of the iteration in each round is identical.
The improved smooth function is defined as:Its
In:F (x, x*)=min (f (x), f (x*)) it is smooth function, a and b are the constant of setting, and x is to work as
The value of preceding selection, x*For the value of acquired locally optimal solution, f () is to seek function.
The CALA algorithm of the optimization the following steps are included:
Step 1, the parameter for initializing CALA algorithm, and set the value of the number of iterations and each constant parameter.
Step 2 value is randomly generated according to current distribution and using the value as behavior input environment, is observed
Value.
Step 3 obtains corresponding feedback from environment, that is, introduces improved smooth function.
Step 4, the mean value and standard deviation that distribution is updated according to the feedback in step 3.
Step 5 counts the number of iterations, if the number of iterations is less than the number of every wheel iteration of setting, returns to
Step 2;Otherwise the locally optimal solution that observation is obtained as epicycle carries out step 6.
Step 6, when the locally optimal solution that epicycle iteration obtains be less than last round of iteration obtain locally optimal solution when, will this
The locally optimal solution of wheel is assigned to the point and record of last round of obtained locally optimal solution, and return step 2 carries out next round iteration;
Otherwise the point (i.e. extreme point) of the locally optimal solution last round of iteration obtained is exported as global minimum, terminates this algorithm.
The number of iterations of the standard deviation is not reset in every wheel iteration.
Technical effect
Compared with prior art, the present invention jumps out the part in global optimization function most by being introduced into smooth function first
The figure of merit, then again for existing smooth function encounter plateau region search it is non-directional by be added slope components changed
Into the advantages of improved smooth function still possesses former smooth function: it can ignore that all solutions for being not better than current points, and
And be easy to jump out from current local best points, there are certain slopes for global optimization function at this time, so that subsequent search
Direction is far from local minimum, without doing the idle work moved back and forth in current point;And convergence speed of the algorithm can be improved.
Detailed description of the invention
Fig. 1 is global optimization method flow diagram.
Specific embodiment
In optimal route selection, optimum therapeuticing effect field (such as: information propagates maximization, the selection of optimal metering medicaments)
In, this method can obtain the locally optimal solution in path till now by existing CALA algorithm first, since the path may be only
It is local optimum path, rather than global optimum path, improved smooth function is then obtained according to the locally optimal solution;It will change
Into smooth function introduce the CALA algorithm that existing CALA algorithm optimized, repeatedly changed by the CALA algorithm of optimization
In generation, finally obtains extreme point, as the global minimum output after Path selection, to obtain optimal route selection scheme.
The iteration refers to: introducing improved one wheel iteration of smooth function progress in CALA algorithm and obtains local optimum
Solution, when epicycle locally optimal solution is less than the locally optimal solution that last round of iteration obtains, local optimum that epicycle iteration is obtained
Solution is set as the point of locally optimal solution, then carries out next round iteration, until the locally optimal solution when previous round iteration is greater than or waits
In the locally optimal solution that last round of iteration obtains, i.e., the locally optimal solution that last round of iteration obtains is extreme point.
Number of the iteration in each round is identical, and the specific experimental actual conditions of number obtain, and cannot
It is too low, it otherwise easily can not find extreme point.
As shown in Figure 1, the present embodiment specifically includes the following steps:
Step 1, the parameter for initializing continuous action learning automaton (CALA) algorithm, set the value of each parameter.
The parameter includes: the number of iterations T, constant a, b, λ1And λ2(wherein parameter setting T is 20000, iteration wheel number m
=4, each round iteration starts the number of iterations n and is set as 0.).
If fmin=inf, t=inf, t are the optimal the number of iterations of experimental setup, to guarantee that search does not utilize flat for the first time
Sliding function.
Step 2 can determine that behavior belongs to normal distribution according to the case where environmental behaviour, then by current distribution N (μ, σ)
(μ is mean value, and σ is standard deviation, and the initial value of the two vectors is determined by specific environmental behaviour, can according to general cala algorithm
To obtain) a value α is randomly generatedn(n is the number of iterations) and αnBehavior input environment.
Step 3 obtains corresponding feedback β (α from environmentn), that is, introduce improved smooth function.
The improved smooth function, which refers to, is added one at smooth function end to jump from current locally optimal solution
Out, and as in CALA environment feedback come complete to selection act influence.
The smooth function may be defined as: when original function certain point value be greater than obtain at present optimal solution value when,
This value put is assigned a value of the optimal solution obtained at present again;When original function is less than the optimal solution obtained at present in the value of certain point
When value, the value of the point is remained unchanged, it may be assumed that F (x, x*)=min (f (x), f (x*)), in which: x is the value currently selected, x*For
The value of the locally optimal solution of acquisition, f () are to seek function.
Improved smooth function F ' (x, the x*)=min (f (x),Wherein: A and b is the constant of setting.
Feedback β (the αn)=(min (fmin,f(αn))-λ1)/λ2, in which: f (αn) be Shubert function (optimization
Reference function) in αnThe observation of point.
Step 4, the feedback β (α according to environmentn) update the mean μ being distributed and standard deviation sigma.
The update refers to: μn+1=μn-a*β(αn)(αn-μn)σn;
If step 5, the number of iterations n are less than T, return step 2;Otherwise step 6 is carried out.
If step 6, fminGreater than f (αn), that is, smaller value is had found, then fmin=f (αn), t=n;Record current minimum point
f(αn), return step 2 simultaneously recalculates the number of iterations, but the number of iterations in standard deviation sigma counts non-return-to-zero;Otherwise terminate this
Algorithm.
It is respectively two of 0.1 and 0.5 in noiseless, standard deviation by the present embodiment for the generality for proving the present embodiment
Gaussian noise and two ranges are tested in the uniform white noise of [- 0.1,0.1] and [- 1,1] 5 environment totally respectively.
According to Shubert function 50 experiments are carried out to four kinds of algorithms respectively, averaged, obtained the accuracy of each algorithm and be averaged
The number of iterations, comparative experiments result are drawn a conclusion.
Four kinds of algorithms are respectively as follows: function optimization algorithm (algorithm 1) based on CALA, are based on and improved smooth letter
It counts the function optimization algorithm (algorithm 2) of CALA combined, the function optimization algorithm (algorithm 3) based on improved CALA and is based on
The function optimization algorithm (algorithm 4) of improved CALA in conjunction with smooth function, in which: algorithm 4 is algorithm of the invention.
The Shubert function is the reference function for being used for test optimization performance.
The number of iterations is denoted as T, and every wheel iteration employs T removal search minimum value, then sees whether to find smaller
Value and whether need to continue to run algorithm, T=20000 in experiment.
Experiment purpose is for this kind of common this common improved CALA of CALA algorithm and algorithm 3 of algorithm 1
Existing more serious the problem of being easily trapped into local minimum and weaker anti-noise ability itself, improves in algorithm, leads to
It crosses algorithm 1 and algorithm 2 (improved smooth function is added) and improved algorithm 3 and algorithm 4 (joined improved smooth letter
Number) experiment is compared, obtain experimental result.
The algorithm 1 specifically includes the following steps:
S1: the mean μ and variances sigma of the Gaussian Profile of initialization behavior value, the two are randomly choosed according to behavior section.
S2: a value α is randomly generated according to current distribution N (μ, σ)nAnd αnAnd μnAs two behavior input environments,
N is current iteration number, 1≤n≤20000, αmin≤αn≤αmax(n is 1 at this time).
S3: feedback β (α is obtained from environmentn) and β (μn):
S4: μ and σ is updated:
μn+1=μn+p*f1[μn,σn,αn,β(αn),β(μn)],
σn+1=σn+p*f2[μn,σn,αn,β(αn),β(μn)]-c*α[σn-σl];
Wherein:
σlFor setting
Optimal variance, is set as 1 in this.
S5: when the number of iterations reaches T, terminating this algorithm, exports n as Function Optimization point, otherwise returns to S2.
The algorithm 2 specifically includes the following steps:
S1: the value of initialization T, a, b, μ and σ, and set fmin=inf, t=inf, to guarantee that search does not utilize flat for the first time
Sliding function.
S2: a value α is randomly generated according to current distribution N (μ, σ)nAnd αnAnd μnAs two behavior input environments,
N is current iteration number, 1≤n≤20000.
S3: feedback β (α is obtained from environmentn) and β (μn):
S4: μ and σ is updated:
μn+1=μn+a*f1[μn,σn,αn,β(αn),β(μn)]
σn+1=σn+a*f2[μn,σn,αn,β(αn),β(μn)]-c*αn[σn-σl];Wherein: f1(.)、f2() andDefinition
It is identical as algorithm 1.
S5: if the number of iterations is less than T, S2 is returned to;Otherwise S6 is carried out.
S6: if fminGreater than f (n), i.e., T CALA algorithm Jing Guo epicycle has found one and obtains than last round of iteration
The smaller local minizing point of local minimum, then make fmin=f (n), t=n are to record current minimum point.S2 is returned to lay equal stress on
It is new to calculate the number of iterations;Otherwise terminate this algorithm.
The algorithm 3 specifically includes the following steps:
S1: the mean μ and variances sigma of the Gaussian Profile of initialization behavior value, the two are randomly choosed according to behavior section.
S2: a value α is randomly generated according to current distribution N (μ, σ)nAnd αnAnd μnAs two behavior input environments,
N is current iteration number, 1≤n≤20000.
S3: by αnAnd μnFeedback β (α is obtained as two behavior input environments and from environmentn) and β (μn): β (αn)=
(min(fmin,f(αn))-λ1)/λ2, β (μn)=(min (fmin,f(μn))-λ1)/λ2
S4: μ and σ: μ is updatedn+1=μn-c*β(αn)(αn-μn)σn,
S5: when the number of iterations n reaches T, terminating this algorithm, exports using n as Function Optimization point, otherwise returns to S2.
The parameter setting of four kinds of algorithms is as follows:
Algorithm 1:m=4, p=3*10^-4, σl=0.01, c=5;
Algorithm 2:m=4, p=3*10^-4, σl=0.01, c=5, a=10, b=1;
Algorithm 3:c=1, λ1=-15, λ2=30, m=4;
Algorithm 4: λ1=-15, λ2=30, m=4, a=10, b=1.
The Shubert function are as follows:Wherein: u (x,
B, k, m) be such function: as x > b, u (x, b, k, m)=k (x-b)m;When | x | when≤b, function u (x, b, k, m)=0;
As x <-b, u (x, b, k, m)=k (- x-b)m。
The function u (x, b, k, m) possesses 19 local minizing points between [- 10,10], and 3 therein are complete
Office's minimum point.Global minimum can be got when x is -5.9,0.4 and 6.8, having global minimizer is -12.87.
The experimental result is as shown in table 1, and iteration round is 3 wheels.
Each algorithm average accuracy (%) of table 1
The experimental result carried out in the environment of Gaussian noise shows, algorithm 3 and algorithm 4 is averaged when standard deviation is 0.1
Accuracy is respectively 56% and 86%, and performance improves 0.34 times;The average accuracy of algorithm 3 and algorithm 4 when standard deviation is 0.5
Respectively 58% and 84%, performance improves 0.44 times.
Pass through the available conclusion of experimental result: joined in existing CALA algorithm and improved CALA algorithm improved
After smooth function, local minimum can be jumped out, and anti-noise ability greatly increases.
The present embodiment is related to a kind of optimization system of optimal path, comprising: initialization module, action selection module, environment
Feedback module, update module and output module, in which: initialization module initializes the parameter of CALA algorithm, input behavior selection
Module carries out action selection, and the application of behavior passage path environment obtains feeding back subsequently into environmental feedback module, obtains behavior
Corresponding environmental feedback;Update module updates the parameter of CALA algorithm according to environmental feedback, and the parameter input behavior of update is selected
It selects module and completes an iteration;When the number of iterations reaches setting value, by current environmental feedback input/output module, output is most
Excellent routing information.
After the algorithm of the present embodiment, obtained optimal path, Path selection compared with the prior art will not be by office
The constraint of portion's minimal point, anti-noise ability are also obviously improved.
Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference
Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute
Limit, each implementation within its scope is by the constraint of the present invention.
Claims (5)
1. a kind of global optimization system based on continuous action learning automaton characterized by comprising initialization module, row
For selecting module, environmental feedback module, update module and output module, in which: the ginseng of initialization module initialization CALA algorithm
Number, input behavior selecting module carry out action selection, and the application of behavior passage path environment obtains feeding back anti-subsequently into environment
Module is presented, the corresponding environmental feedback of behavior is obtained;Update module updates the parameter of CALA algorithm according to environmental feedback, by update
Parameter input behavior selecting module completes an iteration;When the number of iterations reaches setting value, current environmental feedback is inputted
Output module exports optimal routing information.
2. a kind of global optimization method of the continuous action learning automaton based on system described in claim 1, which is characterized in that
Locally optimal solution is obtained by existing CALA algorithm first, improved smooth function is obtained according to the locally optimal solution;It will change
Into smooth function introduce the CALA algorithm that existing CALA algorithm optimized, repeatedly changed by the CALA algorithm of optimization
In generation, finally obtains extreme point, exports as global minimum;
The iteration refers to: improved one wheel iteration of smooth function progress introduced in CALA algorithm obtains locally optimal solution,
When epicycle locally optimal solution is less than the locally optimal solution that last round of iteration obtains, the locally optimal solution that epicycle iteration obtains is set
It is set to the point of locally optimal solution, then carries out next round iteration, until when the locally optimal solution of previous round iteration is more than or equal to upper
The locally optimal solution that one wheel iteration obtains, i.e., the locally optimal solution that last round of iteration obtains are extreme point.
3. global optimization method according to claim 2, characterized in that number phase of the iteration in each round
Together.
4. global optimization method according to claim 2, characterized in that the improved smooth function is defined as:Wherein:F (x, x*)=min (f (x),
f(x*)) it is smooth function, a and b are the constant of setting, and x is the value currently selected, x*For the value of acquired locally optimal solution, f
() is to seek function.
5. global optimization method according to claim 2, characterized in that the CALA algorithm of the optimization includes following step
It is rapid:
Step 1, the parameter for initializing CALA algorithm, and set the value of the number of iterations and each constant parameter;
Step 2 value is randomly generated according to current distribution and using the value as behavior input environment, obtains observation;
Step 3 obtains corresponding feedback from environment, that is, introduces improved smooth function;
Step 4, the mean value and standard deviation that distribution is updated according to the feedback in step 3;
Step 5 counts the number of iterations, if the number of iterations is less than the number of every wheel iteration of setting, returns to step
2;Otherwise the locally optimal solution that observation is obtained as epicycle carries out step 6;
Step 6, when the locally optimal solution that epicycle iteration obtains be less than last round of iteration obtain locally optimal solution when, by epicycle
Locally optimal solution is assigned to the point and record of last round of obtained locally optimal solution, and return step 2 carries out next round iteration;Otherwise
The point for the locally optimal solution that last round of iteration is obtained, i.e. extreme point are exported as global minimum, terminate this algorithm;
The number of iterations of the standard deviation is not reset in every wheel iteration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710520108.7A CN109214518A (en) | 2017-06-30 | 2017-06-30 | Global optimization system and method based on continuous action learning automaton |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710520108.7A CN109214518A (en) | 2017-06-30 | 2017-06-30 | Global optimization system and method based on continuous action learning automaton |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109214518A true CN109214518A (en) | 2019-01-15 |
Family
ID=64976594
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710520108.7A Pending CN109214518A (en) | 2017-06-30 | 2017-06-30 | Global optimization system and method based on continuous action learning automaton |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109214518A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460883A (en) * | 2020-01-22 | 2020-07-28 | 电子科技大学 | Video behavior automatic description method based on deep reinforcement learning |
-
2017
- 2017-06-30 CN CN201710520108.7A patent/CN109214518A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460883A (en) * | 2020-01-22 | 2020-07-28 | 电子科技大学 | Video behavior automatic description method based on deep reinforcement learning |
CN111460883B (en) * | 2020-01-22 | 2022-05-03 | 电子科技大学 | Video behavior automatic description method based on deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhao et al. | A differential evolution algorithm with self-adaptive strategy and control parameters based on symmetric Latin hypercube design for unconstrained optimization problems | |
Tang et al. | Evolutionary pinning control and its application in UAV coordination | |
Bai et al. | Stabilizing equilibrium models by jacobian regularization | |
CN111814626B (en) | Dynamic gesture recognition method and system based on self-attention mechanism | |
CN111241952A (en) | Reinforced learning reward self-learning method in discrete manufacturing scene | |
Zhao et al. | An opposition-based chaotic salp swarm algorithm for global optimization | |
CN111950711A (en) | Second-order hybrid construction method and system of complex-valued forward neural network | |
Kathpal et al. | Hybrid PSO–SA algorithm for achieving partitioning optimization in various network applications | |
Bhatnagar et al. | Stochastic algorithms for discrete parameter simulation optimization | |
CN109214518A (en) | Global optimization system and method based on continuous action learning automaton | |
Jing et al. | Task transfer by preference-based cost learning | |
Broni-Bediako et al. | Evolutionary NAS with gene expression programming of cellular encoding | |
CN108108554B (en) | Multi-material vehicle body assembly sequence planning and optimizing method | |
CN111027251B (en) | Preference set-based electromagnetic relay robustness optimization design method | |
CN109074348A (en) | For being iterated the equipment and alternative manner of cluster to input data set | |
CN107169561B (en) | Power consumption-oriented hybrid particle swarm pulse neural network mapping method | |
Gao et al. | An adaptive framework to select the coordinate systems for evolutionary algorithms | |
Li et al. | Ds-net++: Dynamic weight slicing for efficient inference in cnns and transformers | |
TW201839676A (en) | Tracking axes during model conversion | |
Lu et al. | High-speed channel modeling with deep neural network for signal integrity analysis | |
CN108122033B (en) | Neural network training method and neural network obtained by the training method | |
Shao et al. | A high-order iterative learning control for discrete-time linear switched systems | |
CN108983863B (en) | A kind of photovoltaic maximum power tracking method based on improvement glowworm swarm algorithm | |
Contardo et al. | Learning states representations in pomdp | |
Czarnowski et al. | Designing RBF networks using the agent-based population learning algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190115 |