CN115081323A

CN115081323A - Method for solving multi-objective constrained optimization problem and storage medium thereof

Info

Publication number: CN115081323A
Application number: CN202210688940.9A
Authority: CN
Inventors: 何克晶; 黄秋越
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-09-20

Abstract

The invention discloses a method for solving a multi-objective constraint optimization problem and a storage medium thereof. The method comprises the following steps: generating an initial population, and then respectively calculating a target function value and a constraint value; self-adaptively generating a new difference factor and a constraint relaxation factor by using a strategy network of deep reinforcement learning; generating new individuals by using differential factor evolution, and forming a temporary population by the new individuals and the previous generation population; obtaining a Chebyshev value and a constraint value of an individual in the temporary population based on a decomposed multi-objective evolutionary algorithm MOEA/D, and then updating to a new generation population; and if the maximum iteration times are reached, taking the finally updated population as the optimal solution of the multi-objective constraint optimization problem, otherwise, calculating feedback reward and returning to the strategy network of deep reinforcement learning to update the loss function, and then continuing iteration. The method can adaptively adjust the sensitive parameters in the evolutionary algorithm, and realize the balance of convergence and distribution in the evolutionary process.

Description

Method for solving multi-objective constrained optimization problem and storage medium thereof

Technical Field

The invention belongs to the technical field of optimization problem algorithms, and particularly relates to a method for solving a multi-objective constraint optimization problem and a storage medium thereof.

Background

Complex problems in science and engineering, such as workshop scheduling, model design, path optimization and the like, are generally NP-difficult problems. These problems carry multiple objectives and at the same time have numerous conditional constraints.

The evolutionary algorithm simulates the process of self-learning, self-adaptation and self-solving problems in the biological genetic evolution, and comprises four basic operations: propagation, recombination, competition and selection, are often used to solve such complex problems. A Multi-objective optimization Algorithm (MOEA/D) Based on Decomposition is a classic Algorithm for solving Multi-objective problems in the prior art, and the core idea is to decompose a Multi-objective optimization problem into a plurality of single-objective optimization sub-problems and solve the sub-problems one by one to finally obtain each Pareto optimal solution on the Pareto front.

However, in the existing multi-objective optimization algorithm based on decomposition, in the iterative process of the MOEA/D algorithm operation, the parameters used in each generation are solidified, but the solidified parameters are not necessarily suitable for each iteration. Moreover, most of the practical scientific and engineering problems are constrained, and the industry focuses more on a constraint processing mechanism of a multi-objective optimization problem, but does not fundamentally solve the problem that the parameter solidification in the constraint mechanism causes the performance loss of the algorithm. On the other hand, in the prior art, a scheme for introducing a differential evolution algorithm into a multi-objective optimization algorithm based on decomposition exists, but the performance of the differential evolution algorithm highly depends on a variation and crossing strategy and associated control parameters, and the parameter setting process is time-consuming and has certain limitations.

Disclosure of Invention

In order to overcome one or more of the drawbacks and deficiencies of the prior art, a first object of the present invention is to provide a method for solving a multi-objective constrained optimization problem, and a second object of the present invention is to provide a storage medium for introducing a reinforcement learning and evolution algorithm into solving the multi-objective constrained optimization problem.

In order to achieve the above object, the present invention adopts the following technical means.

A method for solving a multi-objective constraint optimization problem comprises the following steps:

generating initial population from raw data of multi-objective constrained optimization problem

Then, respectively calculating an objective function value F (x) and a constraint value phi (x) of an individual x in the initial population;

adaptive generation of new difference factors using deep reinforcement learning policy networks

And a constraint relaxation factor epsilon ^t T represents the number of iterations;

from the starting population

Starting iteration using difference factor

Evolving to generate new individuals, and mixing the new individuals with the previous generation population

Forming a temporary population

Converting a multi-target constraint optimization problem into a single-target optimization problem to solve based on a decomposed multi-target evolutionary algorithm MOEA/D to obtain a temporary population

The Chebyshev value, the constraint value phi (x) of the middle individual, and then based on the constraint relaxation factor epsilon ^t Updating to a new generation population;

judging whether the maximum iteration number t is reached _end (ii) a If so, taking the finally updated population as the optimal solution of the multi-target constraint optimization problem; if not, the difference between the new population and the individuals of the previous generation population is used as the feedback reward of the reinforcement learning, the feedback reward is returned to the strategy network of the deep reinforcement learning to update the loss function, and then the iteration is continued until the maximum iteration time t is reached _end 。

Preferably, an initial population is generated

The process of calculating the objective function value f (x) and the constraint value Φ (x) includes:

forming a search space by using the original data of the multi-objective constrained optimization problem, and randomly generating an initial population from the search space

N represents the population size;

setting the maximum iteration number t for iterating the population _end ；

Calculating an initial population

The target function value f (x), the inequality constraint value g (x), and the equality constraint value h (x) for the intermediate node, where Φ (x) ∑ g (x) +∑ c (x).

Preferably, the strategy network for deep reinforcement learning is a long-short term memory artificial neural network, the input of the long-short term memory artificial neural network is the objective function value F (x) of the individual in the population, and the output is a differential factor

And a constraint relaxation factor epsilon ^t Wherein, CR is the mutation probability, and F is the scaling factor.

Further, the input and output of the long-short term memory artificial neural network for learning are shown as follows:

wherein the content of the first and second substances,

is the information of the current population,

including the value of the objective function F (x), the inequality constraint value g (x), etc. of all the individuals in the current populationThe constraint value h (x) of formula (II),

is the weight information of the LSTM,

are cells in the LSTM and are,

the meaning of LSTM is the learning process of long-short term memory artificial neural networks, which are hidden units in LSTM.

Preferably, a difference factor is used

Evolution to generate new individuals to reconstitute temporary populations

The process comprises the following steps:

from the starting population

From the previous generation population

Randomly selecting two individuals from the neighbors of any individual, and using difference factors for the three individuals

Calculating based on the variation probability CR to generate a new individual;

combining newly generated individuals with the previous generation population

Forming a temporary population

Further, the process of generating new individuals is:

at the lastGeneration group

Any one of the individuals

Of (2)

In, randomly selecting Lambda ^t-1 Two individuals in

And

will be provided with

Difference factor used by three individuals

Performing operation based on the variation probability CR to generate new individuals

The process is shown as the following formula:

wherein rand (0,1) represents a probability between 0 and 1, N represents the total number of individuals in the population, M represents the total number of individuals in the neighborhood, i represents the number of individuals, j represents the number of individuals in the neighborhood, k represents the number of individuals in the neighborhood, i _j Representing the jth individual, i, in the neighborhood of the ith individual _j To representTake the kth individual in the neighborhood of the ith individual.

Preferably, the process of updating the MOEA/D to the next generation population based on the decomposition multi-objective evolutionary algorithm is as follows:

from temporary populations

Of (2)

Of (2)

In, randomly selecting individuals

Respectively calculating respective Chebyshev value and constraint value phi (x) of the two and comparing;

if it is

Are all less than a constraint relaxation factor epsilon ^t Then let

The individuals with smaller Chebyshev value in the two enter the next generation population

If it is

Not all less than the constraint relaxation factor epsilon ^t Then let

Of the two, individuals with smaller constraint value phi (x) enter the next generation population

Wherein i represents a temporary population

J denotes the second individual in the neighborhood.

Preferably, the maximum number of iterations t is not reached at the decision _end The following steps are specifically:

the next generation is clustered

And the previous generation population

The difference between the two strategies is used as feedback reward of reinforcement learning, the updating loss function in the strategy network of the deep reinforcement learning is returned, and the updating loss function in the strategy network of the deep reinforcement learning is trained;

the iteration is then continued to update the population until a maximum number of iterations is reached.

Further, the calculation process of the feedback reward specifically includes:

new population

And the previous generation population

The difference between the two is expressed by an anti-generation distance index IGD; the feedback reward is noted as R ^t Feedback of the reward R ^t Calculated according to the following formula:

wherein abs represents an absolute value operation, and:

where A represents the solution set to be solved, P ^* Representing true PEvenly distributed sampling units on the areto front edge, d (y) ^* A) denotes the Euclidean distance of y from A, y ^* Represents P ^* Y represents the individual in A,

denotes the minimum Euclidean distance of y to A, y _i The data representing each of the dimensions in y,

denotes y ^* For each dimension of data, the index i indicates the dimension, and m is the total dimension of the data.

A storage medium for storing a computer program arranged to perform a method of solving a multi-objective constrained optimization problem as described in any one of the preceding claims.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

compared with the existing multi-objective optimization algorithm and differential evolution algorithm, the invention introduces deep reinforcement learning on the basis of the multi-objective evolutionary algorithm MOEA/D, designs a reward feedback mechanism based on IGD value for the deep reinforcement learning, can adaptively adjust the sensitive parameters (difference factors and constraint condition relaxation factors) in the evolutionary algorithm, can effectively adjust the balance of convergence and distribution in the evolutionary process, can adjust the balance of the concern and the distribution of constraint conditions in different stages in the population evolutionary process, effectively solves the performance problem of the multi-objective evolutionary algorithm caused by parameter sensitivity in the constraint condition scene, realizes the function of adaptive parameter adjustment, and has better performance.

Drawings

FIG. 1 is a generalized flow diagram of one method of solving a multi-objective constrained optimization problem according to the present invention;

FIG. 2 is a graph illustrating the effectiveness of the method of FIG. 1 in testing the LIR-COMP1 problem;

FIG. 3 is a graph illustrating the effectiveness of the method of FIG. 1 in testing the LIR-COMP7 problem.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments thereof. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

For a multi-objective constrained optimization problem, it can be described in the form:

minimize F(x)＝(f ₁ (x),f ₂ (x),…,f _m (x))

subject to g _u (x)≤0,u＝1,…,p；h _v (x)＝0,v＝1,…,q

wherein F (x) is the objective function, g _u (x) 0 is the u-th inequality constraint, and h _v (x) 0 is the vth equality constraint, x ═ x ₁ ,x ₂ ,…,x _D E.g. R) is a decision variable, f _m (x) An objective function.

As shown in fig. 1, the method for solving the multi-objective constraint optimization problem in this embodiment includes the following specific steps:

s1, forming a search space by the original data of the multi-objective constraint optimization problem, and randomly generating an initial population from the search space

The t generation population is recorded as

N denotes the size of the population, t denotes the number of iterations, x denotes the individuals in the population,

representing a population;

then, the maximum iteration times for iterating the population is set, and then the initial population is calculated

The target function value F (x), the inequality constraint value g (x), and the equality constraint value h (x) corresponding to the middle body;

s2, generating new difference factor by strategy network self-adaption using deep reinforcement learning

And a constraint relaxation factor epsilon ^t Wherein the difference factor

The subscript DE means deep reinforcement learning, CR means variation probability, F means a scaling factor, and the influence degree of differential disturbance on the generated test vector is determined by the value of F; the specific process is as follows:

constructing a Long Short-Term Memory artificial neural network (LSTM) as a strategy network for deep reinforcement learning to learn; the input of the long-short term memory artificial neural network is the objective function value F (x) of the current population individual, and the output is a differential factor

And a constraint relaxation factor epsilon ^t (ii) a The input and output of the long-short term memory artificial neural network for learning are shown as follows:

wherein the content of the first and second substances,

is the information of the current population,

including the objective function value F (x), the inequality constraint value g (x), the equality constraint value h (x) of all the individuals in the current population,

is the weight information of the LSTM,

is a cell in the LSTM, and is a cell,

is a hidden unit in the LSTM, and the meaning of the LSTM is the learning process of the long-term and short-term memory artificial neural network;

s3, starting from the initial population

Using difference factors

Evolving to generate new individuals, and combining the new individuals with the previous generation population

Forming a temporary population

The method comprises the following steps:

s31, starting from the initial population

From the previous generation population

Any one of the individuals

Of (2)

In, randomly selecting Lambda ^t-1 Two individuals in

And

will be provided with

Three individuals using difference factors

The process is shown as follows:

wherein rand (0,1) represents the probability between 0 and 1, N represents the total number of individuals in the population, M represents the total number of individuals in the neighborhood, i represents the number of individuals, j represents the number of individuals in the neighborhood, k represents the number of individuals in the neighborhood, i represents the number of individuals in the neighborhood, and _j representing the jth individual, i, in the neighborhood of the ith individual _j Representing taking the kth individual in the neighborhood of the ith individual;

s32, combining N newly generated individuals with the previous generation population

Composition of temporary populations of size 2N

S4, converting the multi-target constraint optimization problem into a single-target optimization problem by using a multi-target evolutionary algorithm MOEA/D based on decomposition to solve to obtain a temporary population

The Chebyshev value and the constraint value phi (x) of the middle individual are updated to the next generation of population; the chebyshev value calculation formula is as follows:

wherein, g ^te (x _i |λ ⁱ ,z ^* ) For the chebyshev value of the ith individual,

is a set of ideal points, λ ⁱ Is a weight coefficient, f _k (x) As an objective function at the k-th ideal point, x _i Representing the ith individual, where k represents the corresponding several ideal points;

the steps of obtaining the Chebyshev value and the constraint value phi (x) and updating the population are as follows:

s41, pairing population

The single individual x in (a) is marked as corresponding constraint value phi (x) ═ Σ g (x) +∑ c (x), wherein g (x) is inequality constraint value, and c (x) is equality constraint value;

s42, based on the obtained constraint relaxation factor epsilon ^t According to the epsilon constraint rule, will

Update to a new population of size N

The method specifically comprises the following steps:

from an individual

Of (2)

In, randomly selecting individuals

if it is

Are all less than a constraint relaxation factor epsilon ^t Then let

If it is

Not all less than the constraint relaxation factor epsilon ^t Then let

S5, judging whether the current iteration number reaches the termination condition of the set maximum iteration number to judge whether the optimal solution is obtained;

s51, if the set maximum iteration number is not reached, the new population is added

And the previous generation population

The difference between the long-term and short-term memory artificial neural networks is used as feedback reward of reinforcement learning, and the long-term and short-term memory artificial neural networks are returned to update the loss function of the long-term and short-term memory artificial neural networks, so that the training of the long-term and short-term memory artificial neural networks is realized; then returning to execute the steps 2 to S4 in sequence; new population

And the previous generation population

The gap between them, expressed using the inverse Generational Distance (inversed Generational Distance) index IGD;

recording the feedback Reward as R ^t ，R ^t Calculated according to the following formula:

wherein abs represents an absolute value operation, and:

in the above formula, A represents the solution set to be solved, P ^* Representing evenly distributed sampled individuals on the true Pareto front, d (y) ^* A) denotes the Euclidean distance of y from A, y ^* Represents P ^* Y represents the individual in A,

denotes y ^* Data of each dimension, subscript i represents the dimension of the data, and m is the total dimension of the data;

s52, if the set maximum iteration number is reached, outputting the optimal solution population representing the multi-objective constraint optimization problem

t _end Is a preset maximum number of iterations.

As shown in fig. 2 and 3. The inverse generation distance evaluation index IGD index is used for describing the convergence and the distribution of the algorithm, and the smaller the IGD is, the better the convergence and the distribution of the algorithm are. Fig. 2 and fig. 3 are respectively box charts of optimization results IGD of classical test cases LIR-COMP1 and LIR-COMP7 based on the constraint multi-objective optimization problem in the present embodiment, wherein the population size is set to 300, the evolution generation number is 500, and the number of operations is 30, which can be obtained by combining fig. 2 and fig. 3.

Compared with the prior art, the method for solving the multi-objective constraint optimization problem and the storage medium thereof have the advantages that:

compared with the existing multi-objective optimization algorithm and differential evolution algorithm, the method introduces deep reinforcement learning on the basis of the multi-objective evolutionary algorithm MOEA/D, designs a reward feedback mechanism based on an IGD value for the deep reinforcement learning, can adaptively adjust the sensitive parameter differential factor and the constraint condition relaxation factor in the evolutionary algorithm, can effectively adjust the balance between convergence and distribution in the evolutionary process, can adjust the balance between the attention and the distribution of the constraint conditions in different stages in the population evolutionary process, effectively solves the performance problem of the multi-objective evolutionary algorithm caused by parameter sensitivity in the constraint condition scene, realizes the function of adaptive parameter adjustment, and has better performance.

Example 2

A storage medium of the present embodiment stores a computer program for executing the method of solving a multi-objective constraint optimization problem in embodiment 1, and the computer program is retained in the storage medium in the form of data.

The storage medium of the present embodiment is provided in a computer device, and when a processing unit in the computer device executes the computer program of the method for solving a multi-objective constraint optimization problem of embodiment 1, data corresponding to the computer program is read from the storage medium of the present embodiment.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A method for solving a multi-objective constraint optimization problem is characterized by comprising the following steps:

Then respectively calculating the objective function value F (x) and the constraint value phi (x) of the individual x in the initial population;

policy network adaptive generation of new difference factors using deep reinforcement learning

from the starting population

Starting iteration using difference factor

Forming a temporary population

2. The method for solving a multi-objective constrained optimization problem according to claim 1, wherein an initial population is generated

N represents the population size;

setting the maximum iteration number t for iterating the population _end ；

Calculating an initial population

The target function value f (x), the inequality constraint value g (x), and the equality constraint value h (x) corresponding to the second entity, where Φ (x) Σ g (x) Σ c (x).

3. The method for solving the multi-objective constraint optimization problem according to claim 1, wherein the strategy network for deep reinforcement learning is specifically a long-short term memory artificial neural network, the input of the long-short term memory artificial neural network is the objective function value F (x) of an individual in a population, and the output of the long-short term memory artificial neural network is a differential factor

4. The method for solving a multi-objective constraint optimization problem according to claim 3, wherein the input and output of the learning of the long-short term memory artificial neural network are shown as follows:

wherein the content of the first and second substances,

is the information of the current population,

is the weight information of the LSTM,

are cells in the LSTM and are,

5. The method for solving a multi-objective constrained optimization problem according to claim 1, wherein difference factors are used

Evolution to generate new individuals to reconstitute temporary populations

The process comprises the following steps:

from the starting population

From the previous generation population

Calculating based on the variation probability CR to generate a new individual;

combining newly generated individuals with the previous generation population

Forming a temporary population

6. The method for solving a multi-objective constraint optimization problem according to claim 5, wherein the process for generating new individuals is as follows:

population of previous generation

Any one of the individuals

Of (2)

In, randomly selecting Lambda ^t-1 Two individuals in

And

will be provided with

Difference factor used by three individuals

The process is shown as the following formula:

wherein rand (0,1) represents the probability between 0 and 1, N represents the total number of individuals in the population, M represents the total number of individuals in the neighborhood, i represents the number of individuals, j represents the number of individuals in the neighborhood, k represents the number of individuals in the neighborhood, i represents the number of individuals in the neighborhood, and _j representing the jth individual, i, in the neighborhood of the ith individual _j Representing taking the kth individual in the neighborhood of the ith individual.

7. The method for solving the multi-objective constraint optimization problem according to claim 1, wherein the process of updating the MOEA/D to the next generation population based on the multi-objective evolutionary algorithm of decomposition is as follows:

from temporary populations

Of (2)

Of (2)

In, randomly selecting individuals

if it is

Are all less than a constraint relaxation factor epsilon ^t Then let

If it is

Not all less than the constraint relaxation factor epsilon ^t Then let

Wherein i represents a temporary population

J denotes the second individual in the neighborhood.

8. The method for solving a multi-objective constrained optimization problem of claim 1, wherein the maximum number of iterations t is determined not to be reached _end The following steps are specifically:

the next generation is clustered

And the previous generation population

The difference between the two parameters is used as feedback reward of reinforcement learning, the update loss function in the strategy network of the deep reinforcement learning is returned, and the update loss function in the strategy network of the deep reinforcement learning is trained;

9. The method for solving a multi-objective constraint optimization problem according to claim 8, wherein the calculation process of the feedback rewards is specifically as follows:

new population

And the previous generation population

wherein abs represents an absolute value operation, and:

where A represents the solution set to be solved, P ^* Representing evenly distributed sampled individuals on the true Pareto front, d (y) ^* A) denotes the Euclidean distance of y from A, y ^* Represents P ^* Y represents the individual in A,

denotes the maximum of y to ASmall Euclidean distance, y _i The data representing each of the dimensions in y,

10. A storage medium for storing a computer program configured to execute the method for solving a multi-objective constrained optimization problem according to any one of claims 1 to 9.