CN116245245A

CN116245245A - Distributed blocking flow shop scheduling optimization system based on co-evolution algorithm

Info

Publication number: CN116245245A
Application number: CN202310231932.6A
Authority: CN
Inventors: 朱宁宁; 赵付青; 宋彬; 张建林; 许天鹏; 刘欢; 左阳; 李英堂; 朱波
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2023-03-12
Filing date: 2023-03-12
Publication date: 2023-06-09

Abstract

The invention provides a distributed blocking flow shop scheduling optimization system based on a co-evolution algorithm, which is used for designing a distributed blocking flow shop scheduling under the constraint condition of optimizing energy consumption by the co-evolution algorithm assisted by a knowledge-driven cross-region interactive learning mechanism. Aiming at reducing energy consumption and total delay in the production process, a heuristic method based on problem characteristics is designed; in order to improve diversity and convergence rate of population, the invention designs a cross-region interactive learning mechanism guided by the optimal state cost function of reinforcement learning, realizes comprehensive coordination from an algorithm level, a parameter level and an individual level, and improves the precision and efficiency of solving. To improve the quality of solutions, inferior solution repair strategies and individual regeneration mechanisms are proposed. Furthermore, the energy consumption is reduced by the energy saving operation on the critical path. Through comparison and verification on a continuous optimization problem test set and examples consisting of different factory numbers, workpiece numbers and machine numbers, the optimization effect of the optimization system designed by the invention is superior to that of other optimization systems.

Description

Distributed blocking flow shop scheduling optimization system based on co-evolution algorithm

Technical Field

The invention belongs to the technical field of manufacturing production scheduling and route overall planning, and relates to a distributed blocking flow shop scheduling optimization system based on a co-evolution algorithm.

Background

The rapid development of society enables the optimization problem in intelligent computing to be related to more fields of continuous optimization, engineering application, intelligent traffic and the like. And as the complexity of the optimization problem increases, different improved optimization systems are proposed based on the original system framework. The main objective of the optimization is to find the best from a series of candidate solutions, typically minimizing or maximizing the problem.

There are many typical problem abstractions in complex flow industrial systems, such as distributed blocking flow shop scheduling problems in the steel industry, industrial waste treatment and nonferrous metal smelting processes, where the selection of plants, the allocation of machines, the speed of operation of machines and the order of workpieces have a direct impact on the productivity, profitability and energy consumption of the enterprise. Therefore, the design of the distributed blocking flow shop scheduling optimization system with high production benefit and low energy consumption has important research significance and value.

The problems of continuous optimization and distributed shop scheduling optimization in complex flow industrial systems tend to be complex, multi-modal. The intelligent optimization algorithm has the characteristics of wide applicability and strong robustness, can adapt and learn self, is not limited by the nature of the problem, and therefore has wide application in an optimization system. Typical intelligent optimization algorithms include Differential Evolution (DE), distribution estimation algorithm (EDA), genetic Algorithm (GA), among others. The DE guides evolution behaviors through mutation operators and difference terms, including mutation, crossover and selection, evaluates each solution according to an objective function, selects a better solution by using a greedy strategy, and continuously and iteratively approximates the optimal solution. The DE is widely used in solving complex optimization problems due to its good performance, few parameters, simple structure, and strong global searching capability. EDA is a new evolution algorithm based on probability theory and statistics formed by genetic algorithm development, and relates to the fields of mathematical statistics and intelligent computation. EDA builds a probability model according to the selected dominant individuals, so that a high-quality solution is generated, the inferior solution in the original population is replaced, and the globally optimal solution can be found with higher probability.

The co-evolution method fully exerts the advantages of each algorithm by designing a proper co-strategy, eliminates the respective algorithm limit, even achieves the complementary advantages, and the algorithm framework can formulate an optimization scheme for a potential search area capable of improving the adaptability, so that the convergence speed or the search precision is improved to different degrees. Reinforcement learning is used to learn sequential decision tasks. The agent optimizes behavior through continuous experimentation. The intelligent agent learns through interaction with the dynamic environment and evaluates the current environment through feedback of the rewarding value. Reinforcement learning has been used in combination with intelligent optimization algorithms to solve various optimization problems over the past few years with success.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention expands related researches by means of the support of an innovation star project (No. 2023 CXZX-476) supported by the Gansu province educational hall, and provides a technical solution, in particular a distributed blocking flow shop scheduling optimization system based on a co-evolution algorithm. The invention designs a cross-region interactive learning mechanism driven by problem characteristic knowledge and guided by reinforcement learning, and an optimization method for realizing comprehensive coordination from an algorithm level, a parameter level and an individual level is designed, as shown in figure 1, and the detailed description is shown in steps 1-3. By performing performance verification in the continuous optimization problem and applying the performance verification in the distributed blocking flow shop scheduling problem with energy consumption constraint, the optimization system designed by the invention is proved to be capable of reducing energy consumption while improving production benefits.

A distributed blocking flow shop scheduling optimization system based on a co-evolution algorithm is characterized by relating to a heuristic method based on problem characteristics, wherein the heuristic method is based on total energy consumption and total delay, a cross-region interactive learning mechanism guided by an optimal state cost function based on reinforcement learning, a bad solution restoration strategy and an individual regeneration mechanism aiming at a specific phenomenon of an optimization problem, and the specific interactive method of a key idea of the system is shown in a figure 2 and specifically comprises the following steps of 1-3:

step 1: in the described initialization method taking into account the total delay and the total energy consumption, the position and the operating speed of each workpiece are initialized randomly, in which method the influence of the front-end delay, the blocking time, the idle time is taken into account, in particular for the first position the workpiece with the smallest delay time is used. Selecting a plant with the largest total delay as a key plant according to the workpiece sequence, and optimizing the total delay of the distributed blocking flow shop with energy consumption constraint by optimizing the total delay of the key plant;

step 2: the cross-regional interactive learning mechanism guided by the optimal state cost function based on reinforcement learning, the inferior solution restoration strategy and the individual regeneration mechanism aiming at the specific phenomenon of the optimization problem comprise the following steps of A1-A7:

step A1: dividing the region;

dividing the regions by the distances between individuals in the population and elite solutions, dividing three regions from far to near, defining the region where 1/3 of the individuals closest to the region are located as an enhancement region, defining the region where 1/3 of the individuals farthest from the region are located as an attenuation region, and defining the regions where the rest distances are located in the middle as a stabilization region. The definition of the distance is shown in the formula 1-2:

d (X _ i), x_best) = |x_i-x_best|, i.e {1,2, …, NP } equation 1

Np=np_total/3 equation 2

Where X_best is the elite solution, i.e., the individual with the smallest fitness value, X_i is the i-th individual vector, NP is the number of individuals per region, and NP_total is the total population size.

Step A2: establishing an enhanced learning mechanism based on an epsilon-greedy strategy and simulated annealing, which specifically comprises the substeps A2.1-A2.5:

step a2.1: considering that the evolution process is a continuous learning process, the discount factor gamma emphasizes the contribution to the total prize over different iterations. Therefore, γ does not take a fixed value, but is an adaptive update;

gamma=0.5× (0.9-0.6×g/g_max) equation 3

Where G is the current algebra and g_max is the maximum number of iterations;

step a2.2: the similarity takes the optimal state cost function as a learning target, and iterative updating is carried out according to a formula 4;

Q_g≡Q_g+α [ R_ (g+1) +γ ΔQ_ (g+1) ] formula 4

Where α is the learning rate, R_g is the prize for the g-th generation, and ΔQ_ (g+1) is the increment of Q.

Step a2.3: r_g is set according to the distance, and the distance between the individual in each region and the optimal individual is as shown in formula 5;

l (x_ (i, k), x_ (best, k)) = l_ (i, k) -x_ (best, k) l_, k = {1,2,3} equation 5

Where X_ (i, k) is the ith individual of the kth region and X_ (best, k) is the optimal individual of the kth region;

step a2.4: r_g is obtained by comparing the distance to give the individual a prize or penalty. The larger R_g represents the closer to the optimal solution; the smaller R_g represents the farther the individual is from the optimal solution;

step a2.5: and (5) introducing an epsilon-greedy strategy and dynamically adjusting epsilon values by simulated annealing. Updating the temperature according to the formula 6-7;

t0=t_max equation 6

T_ (g+1) =t_min+θ× (t_g-t_min) formula 7

Wherein T_0 is the initialization temperature, and takes the maximum value, T_min is the minimum temperature value, and θ ε [0,1] is the annealing factor. In the iterative process, the variation of the Q value table is considered, and the epsilon value is defined by using the difference value of the Q value and the dynamic temperature value.

Epsilon=exp ((- (Δq_g))/t_g) equation 8

Wherein Δq_g is the increment of the g-th generation Q, and t_g is the temperature value of the g-th generation;

step A3: implementing a cross-region dynamic resource allocation scheme;

the parameters of the three areas are updated according to different formulas, and specifically comprise substeps a3.1-a3.4:

step a3.1: updating parameters in the mutation strategy of the enhancement region according to formulas 9-14;

F_(i,1)∈randn_i(μ_(F_1),0.1)

cr_ (i, 1) ∈randn_i (μ_ (Cr_1), 0.1) equation 9

F_(i+1,1)'＝c*F_(i+1,1)+(1-c)*F_(i,1)

Cr_ (i+1, 1)' =c cr_ (i+1, 1) + (1-c) cr_ (i, 1) formula 10

c=0.9-0.9 x 10 (-5) G/g_max formula 11

Mu_ (f_1) = (1-c) = (f_1) +c × mean_a (ω_i × f_ (i, 1)) formula 12

Mu_ (cr_1) = (1-c) mu_ (cr_1) +c mean_a (ω_i cr_ (i, 1)) equation 13

Omega_i= (Δfit_i)/(Σ (Δfit_k) equation 14

Wherein f_i, cr_i is a variation factor and a crossover factor, μ_f is the mean of f_i, μ_cr is the mean of cr_i, fit_i is the fitness value of the i-th individual, Δfit_i is the change in fitness value, mean_a represents the arithmetic mean. F_i, cr_i obeys Gaussian distribution in the region; and taking the change amount of the fitness value as a weight coefficient, and calculating the average value by adopting arithmetic average.

Step a3.2: updating parameters in the mutation strategy of the stable region according to formulas 15-19;

F_(i,2)∈randc_i(μ_(F_2),0.1)

cr_ (i, 2) ∈randn_i (μ_ (Cr_2), 0.1)) equation 15

F_(i+1,2)'＝c*F_(i+1,2)+(1-c)*F_(i,2)

Cr_ (i+1, 2)' =c cr_ (i+1, 2) + (1-c) cr_ (i, 2) formula 16

μ_ (f_2) = (1-c) = (f_2) +c × mean_l (ω_i × f_ (i, 2)) equation 17

Mu_ (cr_2) = (1-c) mu_ (cr_2) +c mean_a (ω_i cr_ (i, 2)) equation 18

mean_l (ω_i×f_ (i, 2)) = (Σω_i×f_ (i, 2)/(Σω_i×f_ (i, 2)) equation 19

Wherein f_i follows the cauchy distribution, and the mean value adopts the lemer mean value, mean_l (ω_i×f_ (i, 2)), cr_i follows the gaussian distribution, and arithmetic mean value calculation is adopted.

Step a3.3: updating parameters in a mutation strategy of the attenuation region according to formulas 20-25;

F_(i,3)∈randc_i(μ_(F_3),0.1)

cr_ (i, 3) ∈random_i (μ_ (Cr_3), 0.1) equation 20

F_(i+1,3)'＝c*F_(i+1,3)+(1-c)*F_(i,3)

Cr_ (i+1, 3)' =c cr_ (i+1, 3) + (1-c) cr_ (i, 3) formula 21

μ_ (f_3) = (1-c) = (f_3) +c × mean_l (ω_i × f_ (i, 3)) formula 22

Mu_ (cr_3) = (1-c) = (cr_3) +c x mean_l (ω_i x cr_ (i, 3)) equation 23

mean_l (ω_i×f_ (i, 3)) = (Σω_i×f_ (i, 3)/(Σω_i×f_ (i, 3)) equation 24

mean_l (ω_i cr_ (i, 3)) = (Σω_i cr_ (i, 3)/(Σω_i cr_ (i, 3)) equation 25

Wherein, F_i and Cr_i both obey the Korotkoff distribution, and the average value is calculated by adopting the Lamermer average value which can obtain a larger value.

Step A3.4 average fitness of each individual generation in each region

And average fitness after mutation and crossover +.>

Calculated according to formulas 26-27, respectively;

step A4: when the condition of successful variation is satisfied as shown in formula 28, executing a self-feedback strategy;

if equation 28 is satisfied, the generation is successfully mutated, the parameter self-feedback is performed, and the self-feedback design of the three regions is shown in equations 10, 16 and 21. Otherwise perform as per

equations

9, 15, 20.

Step A5: improving individuals with small Q values and individuals too dense in the region in the reinforcement learning mechanism, and executing a bad solution improvement strategy, wherein the method specifically comprises the following substeps A5.1-A5.6:

step a5.1: improving individuals with small Q values and too dense individuals in the region in a reinforcement learning mechanism, and using the individuals in the evolution process of the population;

the individual's density is measured by the average distance of the individual from the centroid of the region and the average fitness value. The centroid of a region is found according to equation 29.

m_c= (Σx_i)/NP formula 29

Step a5.2: the average distance of an individual to the centroid of the respective region is calculated according to equation 30;

d- = (Σiix_i-m_c|)/NP formula 30

Step a5.3: the average fitness increment is defined according to equation 31;

Δf= (Σ ((f (x_i) -f (m_c)))/NP formula 31

Step a5.4: the degree of intensity DD is calculated according to a formula 32;

dd=Δf (1+λ/d-) formula 32

Wherein Δf represents the amount of change in the fitness value;

step a5.5: the value of lambda is calculated according to formula 33;

lambda= (Σ ((u_d-l_d)/NP) ζ2) ζ1/2 equation 33)

Where D represents the dimension, L_d represents the lower bound, and U_d represents the lower bound.

Step a5.6: individuals with small Q values and too dense individuals in the modified population are defined as inferior solutions, denoted by I_i, and the repair strategy is equation 34:

i_i=i_i+ζ (m_c-i_i), I e {1,2, …, NP } equation 34

Where I_i is the ith repair solution, ζ ε [0,1] is used to control how close the inferior solution is to the centroid.

Step A6: when the fitness value is not updated for several successive generations, an individual regeneration mechanism is executed, specifically comprising the sub-steps A6.1-A6.3:

step a6.1: generating a low-order triangular matrix M by adopting a Cholesky decomposition method for the covariance matrix C;

c=m×m≡formula 35

Step A6.2: sampling with standard normal distribution N (0, 1) to obtain a matrix R of 1 xD;

step A6.3: generating a new individual according to formula 36;

x_new=μ+η (g) M R formula 36

Wherein η (g) is a scaling factor, η (g) >1.η (g) is updated according to formulas 37-40.

η (G) =η_min+ ((g_max-G)/(g_max))ζ (η_max- η_min) formula 37

ηmax= (u_d-l_d) 0.01 equation 38

ηmin=ηmax/10 equation 39

Beta=1-e (g/(0.15×g_max)) equation 40

Wherein, eta_max and eta_min represent the maximum value and the minimum value allowed, and are determined according to the scale of the search space, L_d is used for representing the lower bound, U_d is used for representing the upper bound, G_max is the maximum iteration number, and beta is the correction parameter of nonlinear adjustment eta.

Step A7: after entering the development stage, performing a distance-based EDA local search, performed according to formulas 41-42;

x_i (g+1) =n (μ_i, δ_i) formula 41

Where N (μ_i, δ_i) is a Gaussian distribution with mean μ_i and variance δ_i;

delta_i=τ || x_best-x_i (g) || formula 42

Where τ is the scale factor.

Step 3: the energy-saving operation based on the critical path is carried out according to the steps B1-B3;

step B1: the processing speed of part of the workpiece is reduced, and the energy consumption can be reduced;

step B2: the energy consumption is reduced by reducing the speed of non-critical workpieces on a critical path so as to achieve the purpose of energy conservation;

step B3: if the total energy consumption and the total delay are reduced or any index is reduced, receiving; when other conditions occur, the new solution is discarded, and the original solution is reserved.

Compared with the prior art, the invention has the beneficial and improved effects that:

(1) Aiming at reducing energy consumption and total delay in the production process, a heuristic method based on problem characteristics is designed; the comprehensive balance bidirectional optimization of production efficiency and energy consumption indexes is realized;

(2) In order to improve diversity and convergence rate of population, the invention designs a cross-region interactive learning mechanism guided by the optimal state cost function of reinforcement learning, realizes comprehensive coordination from an algorithm level, a parameter level and an individual level, and improves the precision and efficiency of solving;

(3) To improve the quality of solutions, inferior solution repair strategies and individual regeneration mechanisms are proposed. In addition, energy consumption is reduced through energy-saving operation on a critical path, production efficiency is improved, and the aim of energy saving is achieved.

Drawings

FIG. 1 is a flow chart of the improved method of the present invention;

fig. 2 is a diagram of the design concept of the present invention;

FIG. 3 is a box diagram of the present optimization system and the comparative optimization system employed in a continuous optimization problem;

FIG. 4 is a diagram of a power saving operation of a critical path of a plant.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

A co-evolution algorithm-based distributed blocking flow shop scheduling optimization system, comprising: a heuristic step based on problem characteristics considering total energy consumption and total delay, a cross-regional interactive learning step guided by an optimal state cost function based on reinforcement learning, a bad solution restoration step and an individual regeneration step aiming at a specific phenomenon of an optimization problem, and an energy-saving operation based on a critical path; the method specifically comprises the following steps:

step 1: in an initialization step taking into account the total delay and the total energy consumption, the position and the operating speed of each workpiece are initialized randomly, in which step the effect of the front-end delay, the blocking time, the idle time is taken into account, in particular for the first position with the workpiece having the smallest delay time; selecting a factory with the largest total delay as a key factory according to the workpiece sequence; taking the plant with the largest total delay as the critical plant, optimizing the total delay of the distributed blocking flow shop with energy consumption constraints by optimizing the total delay of the critical plant.

Step 2: the cross-region interactive learning step guided by the optimal state cost function based on reinforcement learning, the inferior solution repairing step and the individual regenerating step aiming at the specific phenomenon of the optimization problem comprise the steps of A1-A7:

step A1: dividing the region; dividing the regions by the distances between individuals in the population and elite solutions, dividing three regions from far to near according to the distances, defining the region where 1/3 of the individuals closest to the regions are located as enhancement regions, defining the region where 1/3 of the individuals farthest from the regions are located as attenuation regions, defining the regions where the rest distances are located in the middle as stable regions, and defining the distances as shown in the formula 1-2:

d (X _ i), x_best) = |x_i-x_best|, i.e {1,2, …, NP } equation 1

Np=np_total/3 equation 2

Wherein X_best is elite solution, namely the individual with the smallest fitness value, X_i is the i-th individual vector, NP is the number of individuals in each region, and NP_total is the total population size;

step a2.1: considering that the evolution process is a continuous learning process, the discount factor gamma emphasizes the contribution to the total rewards in different iteration times; therefore, γ does not take a fixed value, but is an adaptive update;

gamma=0.5× (0.9-0.6×g/g_max) equation 3

Where G is the current algebra and g_max is the maximum number of iterations;

a2.2, approximately taking the optimal state cost function as a learning target, and performing iterative updating according to a formula 4;

Q_g≡Q_g+α [ R_ (g+1) +γ ΔQ_ (g+1) ] formula 4

Where α is the learning rate, R_g is the prize of the g-th generation, ΔQ_ (g+1) is the increment of Q;

step A2.3, R_g is set according to the distance, and the distance between the individual in each area and the optimal individual is shown as formula 5;

a2.4, giving rewards or punishment values to individuals by comparing the distances to obtain R_g; the larger R_g represents the closer to the optimal solution; the smaller R_g represents the farther the individual is from the optimal solution;

step a2.5: an epsilon-greedy strategy is introduced, an epsilon value is dynamically adjusted by simulated annealing, and the temperature is updated according to formulas 6-7;

t0=t_max equation 6

T_ (g+1) =t_min+θ× (t_g-t_min) formula 7

Wherein T_0 is the initialization temperature, and takes the maximum value, T_min is the minimum temperature value, θ ε [0,1] is the annealing factor; in the iterative process, the change of a Q value table is considered, and an epsilon value is defined by using the difference value of the Q value and the dynamic temperature value;

epsilon=exp ((- (Δq_g))/t_g) equation 8

step A3: implementing a cross-region dynamic resource allocation scheme; the parameters of the three areas are updated according to different formulas, and specifically comprise substeps a3.1-a3.4:

F_(i,1)∈randn_i(μ_(F_1),0.1)

cr_ (i, 1) ∈randn_i (μ_ (Cr_1), 0.1) equation 9

F_(i+1,1)'＝c*F_(i+1,1)+(1-c)*F_(i,1)

Cr_ (i+1, 1)' =c cr_ (i+1, 1) + (1-c) cr_ (i, 1) formula 10

c=0.9-0.9 x 10 (-5) G/g_max formula 11

Mu_ (f_1) = (1-c) = (f_1) +c × mean_a (ω_i × f_ (i, 1)) formula 12

Mu_ (cr_1) = (1-c) mu_ (cr_1) +c mean_a (ω_i cr_ (i, 1)) equation 13

Omega_i= (Δfit_i)/(Σ (Δfit_k) equation 14

F_(i,2)∈randc_i(μ_(F_2),0.1)

cr_ (i, 2) ∈randn_i (μ_ (Cr_2), 0.1)) equation 15

F_(i+1,2)'＝c*F_(i+1,2)+(1-c)*F_(i,2)

Cr_ (i+1, 2)' =c cr_ (i+1, 2) + (1-c) cr_ (i, 2) formula 16

μ_ (f_2) = (1-c) = (f_2) +c × mean_l (ω_i × f_ (i, 2)) equation 17

Mu_ (cr_2) = (1-c) mu_ (cr_2) +c mean_a (ω_i cr_ (i, 2)) equation 18

mean_l (ω_i×f_ (i, 2)) = (Σω_i×f_ (i, 2)/(Σω_i×f_ (i, 2)) equation 19

Wherein F_i obeys the Cauchy distribution, the mean value adopts the Lamer mean_L (omega_i. Times.F_ (i, 2)), and Cr_i obeys the Gaussian distribution, and arithmetic mean value calculation is adopted;

F_(i,3)∈randc_i(μ_(F_3),0.1)

cr_ (i, 3) ∈random_i (μ_ (Cr_3), 0.1) equation 20

F_(i+1,3)'＝c*F_(i+1,3)+(1-c)*F_(i,3)

Cr_ (i+1, 3)' =c cr_ (i+1, 3) + (1-c) cr_ (i, 3) formula 21

μ_ (f_3) = (1-c) = (f_3) +c × mean_l (ω_i × f_ (i, 3)) formula 22

Mu_ (cr_3) = (1-c) = (cr_3) +c x mean_l (ω_i x cr_ (i, 3)) equation 23

mean_l (ω_i×f_ (i, 3)) = (Σω_i×f_ (i, 3)/(Σω_i×f_ (i, 3)) equation 24

mean_l (ω_i cr_ (i, 3)) = (Σω_i cr_ (i, 3)/(Σω_i cr_ (i, 3)) equation 25

Wherein, F_i and Cr_i are subjected to Cauchy distribution, and the average value is calculated by adopting the average value of the Laimer which can obtain a larger value;

step A3.4 average fitness of each individual generation in each region

And average fitness after mutation and crossover +.>

Calculated according to formulas 26-27, respectively;

if formula 28 is satisfied, the generation is successfully mutated, the parameter self-feedback is performed, otherwise, the method is performed according to

formulas

9, 15 and 20; the self-feedback designs of the three areas are shown in formulas 10, 16 and 21;

step A4: when the condition of successful variation is met, executing a self-feedback strategy;

step A5: the improvement of individuals with small Q values and individuals with too dense areas in the reinforcement learning mechanism carries out the inferior solution improvement process, and specifically comprises the substeps A5.1-A5.6:

measuring the density of an individual by using the average distance between the individual and the centroid of the area and the average fitness value, wherein the centroid of one area is obtained according to a formula 29;

m_c= (Σx_i)/NP formula 29

d- = (Σiix_i-m_c|)/NP formula 30

Step a5.3: the average fitness increment is defined according to equation 31;

Δf= (Σ ((f (x_i) -f (m_c)))/NP formula 31

Step a5.4: the degree of intensity DD is calculated according to a formula 32;

dd=Δf (1+λ/d-) formula 32

Wherein Δf represents the amount of change in the fitness value;

step a5.5: the value of lambda is calculated according to formula 33;

lambda= (Σ ((u_d-l_d)/NP) ζ2) ζ1/2 equation 33)

Wherein D represents the dimension, L_d represents the lower bound, U_d represents the lower bound, and obviously, lambda is adaptively adjusted along with the search space;

i_i=i_i+ζ (m_c-i_i), I e {1,2, …, NP } equation 34

Wherein I_i is the ith repair solution, ζ ε [0,1] is used to control the extent to which the inferior solution approaches the centroid;

c=m×m≡formula 35

step A6.3: generating a new individual according to formula 36;

x_new=μ+η (g) M R formula 36

Wherein, eta (g) is a scaling factor, eta (g) is more than 1, and eta (g) is updated according to a formula 37-40;

η (G) =η_min+ ((g_max-G)/(g_max))ζ (η_max- η_min) formula 37

ηmax= (u_d-l_d) 0.01 equation 38

ηmin=ηmax/10 equation 39

Beta=1-e (g/(0.15×g_max)) equation 40

Wherein, eta_max and eta_min represent the maximum value and the minimum value allowed, and are determined according to the scale of the search space, L_d is used for representing the lower bound, U_d is used for representing the upper bound, G_max is the maximum iteration number, and beta is the correction parameter of nonlinear adjustment eta;

x_i (g+1) =n (μ_i, δ_i) formula 41

delta_i=τ || x_best-x_i (g) || formula 42

Where τ is the scale factor.

Step 3: critical path based energy saving operation as shown in fig. 4, wherein different numbers represent different workpieces, specifically according to steps B1-B3;

step B2: the speed of non-critical workpieces on a critical path is reduced, and the energy consumption is reduced so as to achieve the purpose of energy conservation;

The optimization system proposed by the present invention was verified on example 1.

Step 1: and verifying the performance of the optimization system and the comparison optimization system in the table through the comparison of the mean value and the variance. The mean value represents the overall optimization performance, and the variance represents the stability of the system. Evaluations were performed on 10-dimensional (10D), 30-dimensional (30D), 50-dimensional (50D), and 100-dimensional (100D), respectively, as shown in tables 1 to 4. The optimal results obtained are marked in bold.

TABLE 1 mean variance comparison results for several exemplary optimization systems (10D)

TABLE 2 mean variance comparison results (30D) for several exemplary optimization systems

TABLE 3 mean variance comparison results (50D) for several exemplary optimization systems

TABLE 4 mean variance comparison results (100D) for several exemplary optimization systems

/>

Step 2: from the above results, it was found that; compared with other four optimization methods, the optimization method provided by the invention has better overall performance.

Step 3: the box plots on the optimization questions f3, f6, f14, f26 are shown in fig. 3, with the abscissa representing the respective optimization system and the ordinate representing the error value. Figure 3 reflects the stability effect of the optimization system. The stability of the optimization method provided by the invention is optimal compared with other comparison optimization methods.

Step 4: the significant differences of the present optimization system compared to other optimization systems are shown in table 5.

Table 5 statistical test values for eight optimization systems

The optimization system proposed by the present invention was verified on example 2.

The optimization system provided by the invention and the current classical optimizer are verified on 720 examples consisting of different work pieces, machine numbers and factory numbers. Wherein the number of work pieces and the number of machines are set to { (20, 50, 100) × 5,10,20, respectively; 200 (10, 20); 500 x 20}, the number of plants is set to 2,3,4,5,6,7, respectively. Performance was evaluated using non-dominant vector generation (ONVG) and C Metric (C Metric).

TABLE 6 ONVG index comparison for four optimization systems

Table 6 illustrates that the results obtained by the present optimization system are optimal.

Table 7 non-parametric verification of optimized systems

From the results in table 7: compared with other optimizing systems, the optimizing system has obvious difference.

In summary, the optimization system of the present invention is designed on this plant production instance due to other optimization systems.

The basic principles and main features of the present invention have been described above with reference to the accompanying drawings. Modifications and variations may be made by those skilled in the art without departing from the principles of the present invention, and such modifications are intended to be included within the scope of the present invention.

Claims

1. A co-evolution algorithm-based distributed blocking flow shop scheduling optimization system, comprising: a heuristic step based on problem characteristics considering total energy consumption and total delay, a cross-regional interactive learning step guided by an optimal state cost function based on reinforcement learning, a bad solution restoration step and an individual regeneration step aiming at a specific phenomenon of an optimization problem, and an energy-saving operation based on a critical path;

the method specifically comprises the following steps:

step 1: in an initialization step taking into account the total delay and the total energy consumption, the position and the operating speed of each workpiece are initialized randomly, in which step the effect of the front-end delay, the blocking time, the idle time is taken into account, in particular for the first position with the workpiece having the smallest delay time; selecting a factory with the largest total delay as a key factory according to the workpiece sequence;

step A1: dividing the region;

step A2: establishing an enhanced learning mechanism based on an epsilon-greedy strategy and simulated annealing;

step A3: implementing a cross-region dynamic resource allocation scheme;

step A5: improving individuals with small Q values and individuals too dense in the region in the reinforcement learning mechanism, and executing an inferior solution improvement process;

step A6: when the fitness value is not updated for several successive generations, executing an individual regeneration mechanism;

step A7: after entering the development stage, performing a distance-based EDA local search;

2. A co-evolution algorithm based distributed blocking flow shop scheduling optimization system according to claim 1, wherein in step 1: taking the plant with the largest total delay as the critical plant, optimizing the total delay of the distributed blocking flow shop with energy consumption constraints by optimizing the total delay of the critical plant.

3. The distributed blocking flow shop scheduling optimization system based on the co-evolution algorithm according to claim 1, wherein in the sub-step A1 of the step 2, the regions are divided by distances of individuals in the population from elite solutions, three regions are divided from far to near, the region where 1/3 of the individuals closest to the region is defined as an enhancement region, the region where 1/3 of the individuals farthest from the region is defined as an attenuation region, the regions where the rest of the distances are in the middle are defined as a stabilization region, and the definition of the distances is shown in formula 1-2:

d (X _ i), x_best) = |x_i-x_best|, i.e {1,2, ⋯, NP } equation 1

Np=np_total ⁄ 3 equation 2

4. The distributed blocking flow shop scheduling optimization system based on co-evolution algorithm according to claim 1, wherein in sub-step A2 of step 2, the sub-steps C1-C5 are specifically included:

step C1: considering that the evolution process is a continuous learning process, the discount factor gamma emphasizes the contribution to the total rewards in different iteration times; therefore, γ does not take a fixed value, but is an adaptive update;

γ=0.5× (0.9-0.6×g/g_max) formula 3;

where G is the current algebra and g_max is the maximum number of iterations;

step C2, approximately taking the optimal state cost function as a learning target, and performing iterative updating according to a formula 4;

Q_g≡Q_g+α [ R_ (g+1) +γ ] q_ (g+1) ] formula 4;

where α is the learning rate, R_g is the prize for the g-th generation, and father Q_ (g+1) is the increment of Q.

R_g is set according to the distance, and the distance between the individual in each area and the optimal individual is shown as formula 5;

l (x_ (i, k), x_ (best, k)) = l_ (i, k) -x_ (best, k) l_, k = {1,2,3} equation 5;

step C4, giving rewards or punishment values to individuals by comparing the distances of the distances to obtain R_g; the larger R_g represents the closer to the optimal solution; the smaller R_g represents the farther the individual is from the optimal solution;

step C5: an epsilon-greedy strategy is introduced, an epsilon value is dynamically adjusted by simulated annealing, and the temperature is updated according to formulas 6-7;

t0=t_max equation 6;

t_ (g+1) =t_min+θ× (t_g-t_min) formula 7;

epsilon=exp ((- (+q_g))/t_g) equation 8;

wherein, Q_g is the increment of the g generation Q, and T_g is the temperature value of the g generation.

5. The co-evolution algorithm-based distributed blocking flow shop scheduling optimization system according to claim 1, wherein: in sub-step A3 of step 2, the parameters of the three areas are updated according to different formulas, specifically comprising sub-steps D1-D4:

step D1: updating parameters in the mutation strategy of the enhancement region according to formulas 9-14;

F_(i,1)∈randn_i (μ_(F_1 ),0.1)

cr_ (i, 1) ∈randn_i (μ_ (Cr_1), 0.1) equation 9;

F_(i+1,1)'=c*F_(i+1,1)+(1-c)*F_(i,1)

cr_ (i+1, 1)' =c cr_ (i+1, 1) + (1-c) cr_ (i, 1) formula 10;

c=0.9-0.9 x 10 (-5) G/g_max formula 11;

μ_ (f_1) = (1-c) = (f_1) +c × mean_a (ω_i × f_ (i, 1)) formula 12;

μ_ (cr_1) = (1-c) = (cr_1) +c × mean_a (ω_i × cr_ (i, 1)) formula 13;

omega_i= (+fati)/(Σfatk) equation 14;

wherein F_i, cr_i are a variation factor and a crossover factor, μ_F is the mean of F_i, μ_Cr is the mean of Cr_i, fit_i is the fitness value of the ith individual, fatj_i is the change in fitness value, mean_A represents the arithmetic mean; f_i, cr_i obeys Gaussian distribution in the region; taking the change amount of the fitness value as a weight coefficient, and calculating the average value by adopting arithmetic average;

step D2: updating parameters in the mutation strategy of the stable region according to formulas 15-19;

F_(i,2)∈randc_i (μ_(F_2 ),0.1)

cr_ (i, 2) ∈randn_i (μ_ (Cr_2), 0.1)) equation 15;

F_(i+1,2)'=c*F_(i+1,2)+(1-c)*F_(i,2)

cr_ (i+1, 2)' =c cr_ (i+1, 2) + (1-c) cr_ (i, 2) formula 16;

μ_ (f_2) = (1-c) × μ_ (f_2) +c × mean_l (ω_i × f_ (i, 2)) formula 17;

μ_ (cr_2) = (1-c) = (cr_2) +c × mean_a (ω_i × cr_ (i, 2)) equation 18;

mean_l (ω_i×f_ (i, 2)) = (Σω_i×f_ (i, 2)/(Σω_i×f_ (i, 2)) formula 19;

step D3: updating parameters in a mutation strategy of the attenuation region according to formulas 20-25;

F_(i,3)∈randc_i (μ_(F_3 ),0.1)

cr_ (i, 3) ∈random_i (μ_ (Cr_3), 0.1) equation 20;

F_(i+1,3)'=c*F_(i+1,3)+(1-c)*F_(i,3)

cr_ (i+1, 3)' =c cr_ (i+1, 3) + (1-c) cr_ (i, 3) formula 21;

μ_ (f_3) = (1-c) = (f_3) +c × mean_l (ω_i × f_ (i, 3)) formula 22;

μ_ (cr_3) = (1-c) = (cr_3) +c × mean_l (ω_i × cr_ (i, 3)) formula 23;

mean_l (ω_i×f_ (i, 3)) = (Σω_i×f_ (i, 3)/(Σω_i×f_ (i, 3)) equation 24;

mean_l (ω_i cr_ (i, 3)) = (Σω_i cr_ (i, 3)/(Σω_i cr_ (i, 3)) equation 25;

step D4, in each region, calculating the average fitness phi of each generation of individuals and the average fitness phi' after mutation and crossing according to formulas 26-27;

phi= (Σf (x_i))/NP equation 26;

phi' = (Σf (u_i))/NP formula 27;

phi' < phi formula 28;

if formula 28 is satisfied, the generation is successfully mutated, the parameter self-feedback is performed, otherwise, the method is performed according to formulas 9, 15 and 20; the self-feedback design for the three regions is shown in equations 10, 16, 21.

6. The co-evolution algorithm-based distributed blocking flow shop scheduling optimization system according to claim 1, wherein: in the inferior solution improvement strategy of the sub-step A5 of the step 2, the method specifically comprises the following sub-steps E1-E6:

step E1: improving individuals with small Q values and too dense individuals in the region in a reinforcement learning mechanism, and using the individuals in the evolution process of the population;

m_c= (Σx_i)/NP formula 29;

step E2: the average distance of an individual to the centroid of the respective region is calculated according to equation 30;

d ̅ = (Σiix_i-m_c ii)/NP formula 30;

step E3: the average fitness increment is defined according to equation 31;

Δf= (Σ ((f (x_i) -f (m_c)))/NP formula 31;

step E4: the degree of intensity DD is calculated according to a formula 32;

dd=Δf (1+λ/d ̅) formula 32;

wherein Δf represents the amount of change in the fitness value;

step E5: the value of lambda is calculated according to formula 33;

λ= (Σ ((u_d-l_d)/NP) ζ2) ζ1/2 equation 33;

step E6: individuals with small Q values and too dense individuals in the modified population are defined as inferior solutions, denoted by I_i, and the repair strategy is equation 34:

i_i=i_i+ζ (m_c-i_i), I e {1,2, ⋯, NP } equation 34;

7. The co-evolution algorithm-based distributed blocking flow shop scheduling optimization system according to claim 1, wherein: in the substep A6 of step 2, aiming at the phenomenon that the fitness value is not updated for several successive generations in the evolution process, the regeneration mode of the individual specifically comprises the substeps F1-F3:

step F1: generating a low-order triangular matrix M by adopting a Cholesky decomposition method for the covariance matrix C;

c=m×m≡formula 35;

step F2: sampling with standard normal distribution N (0, 1) to obtain a matrix R of 1 xD;

step F3: generating a new individual according to formula 36;

x_new=μ+η (g) M R formula 36;

η (G) =η_min+ ((g_max-G)/(g_max))ζ (η_max- η_min) formula 37;

ηmax= (u_d-l_d) 0.01 formula 38;

ηmin=ηmax ⁄ equation 39;

beta=1-e (g/(0.15×g_max)) formula 40;

8. The co-evolution algorithm-based distributed blocking flow shop scheduling optimization system according to claim 1, wherein: in sub-step A7 of step 2, the local search is performed according to formulas 41-42;

x_i (g+1) =n (μ_i, δ_i) formula 41

delta_i=τ || x_best-x_i (g) || formula 42

Where τ is the scale factor.