CN113935610A

CN113935610A - Multi-robot joint scheduling method of flexible manufacturing system

Info

Publication number: CN113935610A
Application number: CN202111175163.XA
Authority: CN
Inventors: 辛斌; 鲁赛; 王晴; 张佳; 陈杰; 贺英媚; 王丹敬
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2022-01-14

Abstract

The invention discloses a multi-robot joint scheduling method of a flexible manufacturing system, which can dynamically adjust key parameters of a genetic algorithm and control an iterative search process of the genetic algorithm, thereby improving the solving efficiency of the algorithm and realizing efficient joint scheduling of workshops. A three-layer coding mode that a machine sequence, a process sequence and an AGV sequence are matched is adopted; firstly, algorithm initialization is carried out, adaptive values of all individuals in the current population are calculated, and the best individual searched at present is recorded; calculating the state data of the current generation population, comparing the state data with the previous generation state data to obtain a return value R, updating Q-Table, and discretizing the current generation state data; selecting an action value a according to the discrete state value, the Q-Table row element and the epsilon-greedy strategy, and executing a corresponding genetic algorithm parameter adjustment action; executing a selection operator, a crossover operator and a mutation operator of the genetic algorithm according to the adjusted parameters to form a new population; and if the population iteration reaches the maximum iteration times, outputting the optimal individual and fitness value.

Description

Multi-robot joint scheduling method of flexible manufacturing system

Technical Field

The invention relates to the technical field of flexible manufacturing, in particular to a multi-robot joint scheduling method of a flexible manufacturing system.

Background

With the increasing refinement of user demands and manufacturing flexibility, the manufacturing mode gradually changes from a product-driven mode to a user-driven mode. Flexible Manufacturing Systems (FMS) are capable of responding quickly to user-driven global manufacturing markets while gradually increasing production efficiency, controlling production costs, shortening production cycles, and improving production quality. With the development of production automation and information integration, modern industrial production puts higher demands on logistics systems. An Automatic Guided Vehicle (AGV) is an automatic transport apparatus capable of accomplishing a designated transport task under a given path and scenario information layout. The AGV transportation management system is an important component of the AGV system and plays an important role in managing and scheduling the AGV. Due to the close coupling relationship between processing and transportation, simultaneous scheduling of machines and AGVs in an FMS is of great significance to improve system flexibility and production efficiency. Compared with the traditional workshop scheduling problem, the combined scheduling problem is added with a new scheduling main body, and the traditional workshop scheduling algorithm can not be qualified for the scheduling problem because only the processing machines in the workshop are scheduled. From literature search, few researches on the solution of the problem are found, and no research on solving the problem by using reinforcement learning and genetic algorithm is found.

Disclosure of Invention

In view of this, the invention provides a multi-robot joint scheduling method for a flexible manufacturing system, which can dynamically adjust key parameters of a genetic algorithm by a reinforcement learning method and control an iterative search process of the genetic algorithm, thereby improving the solving efficiency of the algorithm and realizing efficient joint scheduling of a workshop.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

firstly, a three-layer coding mode that a machine sequence, a process sequence and an AGV sequence are matched is adopted: each gene in the process sequence represents the number of the workpiece with processing, and the number of times of the number from the left side of the process sequence to the position is counted to represent the number of the processing process; each gene in the machine sequence represents a machine number of a process corresponding to the process; each gene in the AGV sequence represents an AGV number for conveying a corresponding workpiece to a corresponding machine;

secondly, decoding the processing procedures in a decoding mode according to the appearance sequence of the coding middle procedures from front to back on the premise of meeting procedure constraints and processing constraints;

step 1: algorithm initialization:

step 1.1: the genetic algorithm initialization content comprises the following steps: the method comprises the following steps of (1) establishing a random initial population, iteration times g _ max, a selection proportion Ps, a cross probability Pc and a variation probability Pm;

step 1.2: the Q-Learning algorithm initialization content comprises the following steps: Q-Table, state set S, action set A, Q-Table update coefficient beta and action selection coefficient epsilon; the selection mode of the elements in the state set S is as follows: extracting state data from a genetic algorithm population, wherein the state data comprises population diversity sa, a population convergence speed sv and a population convergence trend sc, and the combination of the discretized state data is sd ═ sda, sdv, scdc >, wherein sda is a discretized population diversity value, sdv is a discretized population convergence speed value, scdc is a discretized population convergence trend, and the discretized state data is taken as an element of a state set S;

the selection mode of the elements in the action set A is as follows: increasing or reducing the selection proportion Ps, the cross probability Pc and the variation probability Pm of the genetic algorithm; the Q-Table establishes an empty initial Q-Table Table with elements of 0 by taking the state set S as a row vector and the action set A as a column vector;

step 2: calculating the adaptive values of all individuals in the current population, and recording the best individual searched at present;

and step 3: calculating the state data of the current generation population according to the fitness value, wherein the state data comprises population diversity sa, population convergence speed sv and population convergence trend sc, comparing the state data with the state data of the previous generation population to obtain a report value R, updating Q-Table, and then performing discretization processing on the state data of the current generation population;

and 4, step 4: selecting an action value a according to an sd ═ sda, sdv, scdc > discrete state value, an element value of a line corresponding to Q-Table, and an epsilon-greedy strategy;

and 5: executing corresponding genetic algorithm parameter adjustment action according to the selected action value a;

step 6: executing a selection operator, a crossover operator and a mutation operator of the genetic algorithm according to the adjusted parameters to form a new population;

and 7: judging whether population iteration reaches the maximum iteration times or not, and outputting the optimal individual and fitness value if the population iteration meets the conditions; otherwise, jumping to step 2.

Further, the calculation process of the population diversity sa, the population convergence speed sv and the population convergence trend sc in the step 1.2 includes the following steps:

step 201: calculating the average value of the adaptive values of all individuals in the kth generation population as shown in formula (2), wherein Y (k) represents the average value of the adaptive values of the individuals in the kth generation population, and f (xjk) represents the adaptive value of the jth individual in the kth generation population;

step 202: calculating the average value of the adaptation values of all individuals in the population of the continuous 10 generations as shown in formula (3), wherein ave (i) represents the average value of the adaptation values of the individuals in the population from the 10i-9 th generation to the 10 th generation.

Wherein i is a Learning period of the Q-Learning algorithm, i belongs to [1, iter _ max/N ], and iter _ max is a set maximum iteration number; the population learns once per Q-Learning 10 iterations.

step 203: the average value of the population diversity of successive generations is calculated as shown in formula (4), wherein div (i) represents the average value of the population diversity from generations 10i-9 to 10.

step 204: calculating the slope value con (i) of one linear fit of the average value of the population fitness values of the continuous 10 generations as shown in the formula (5):

step 205: calculating the population diversity sa, the population convergence speed sv and the population convergence trend sc of the genetic algorithm as shown in formula (6):

wherein s is_a(i) Is the population diversity of the ith learning cycle, s_v(i) Population convergence rate, s, for the i-th learning period_c(i) Is the population convergence trend of the ith learning period.

step 206: calculating a discretized state set sd ═ sda, sdv, scdc > of various parameters of the population of the genetic algorithm, as shown in formula (7) and formula (8);

sd_a(i) discretizing value of population diversity for the ith learning cycle; sd_v(i) Discretizing the population convergence rate of the ith learning period; sd_c(i) Is a discretized value of the population convergence trend for the ith learning cycle.

Further, the calculation method of the report value R in step 3 is shown in formula (10):

r (i) is the return value of the ith learning period.

Further, step 5, according to the selected action value a, executing a corresponding genetic algorithm parameter adjustment action, wherein Pa is a general name of a selection proportion Ps, a cross probability Pc and a variation probability Pm in the genetic algorithm, and all three parameters are adjusted by using the following formula;

where i is a value

Further, step 6, executing a selection operator, a crossover operator and a mutation operator of the genetic algorithm according to the adjusted parameters to form a new population, wherein the specific steps comprise the following steps;

selecting an operator: generating N multiplied by Ps individuals by using a selective cross mutation strategy according to the selection proportion Ps and the elite retention strategy; generating the remaining N x (1-Ps) individuals using a randomly generated method; selecting individuals as parent individuals of the crossover operators by using a tournament algorithm;

and (3) a crossover operator: when the crossover operator is executed each time, any one of the following three crossover schemes is randomly adopted;

the first crossover scheme: only the chromosome position information of two individual specific workpieces is crossed, and the combination relation of the workpiece, the machine and the AGV is not changed;

the second crossover scheme: crossing the position relation of the workpieces corresponding to the workpieces, the machines and the two parent chromosomes without changing the gene position of the AGV;

third crossover scheme 3: the cross < workpiece, machine, AGV > and the positional relationship of the two chromosomes to the workpiece.

Mutation operator: when a mutation operator is executed each time, according to the mutation probability, selecting one of the following three mutation strategies to carry out different mutation operations on each gene:

the first mutation strategy: intersecting two adjacent < workpieces, machines, AGVs >;

the second mutation strategy: randomly selecting other machines which can execute corresponding procedures again;

the third mutation strategy: and other AGVs capable of executing corresponding transfer tasks are randomly selected again.

Has the advantages that:

1. the invention provides a multi-robot joint scheduling method of a flexible manufacturing system, which aims at the problem of joint scheduling of processing machines and AGV in the flexible manufacturing system. Because this problem adds a new class of scheduling bodies compared to the conventional flexible shop scheduling problem, the conventional shop scheduling algorithm is not competent for this new problem. The invention provides an evolution optimization method based on reinforcement learning and genetic algorithm, which aims at the problem of joint scheduling of a production machine and an AGV of a flexible manufacturing system. The method uses a reinforcement Learning method to dynamically adjust key parameters of a genetic algorithm, controls an iterative search process of the genetic algorithm, dynamically adjusts parameters of the genetic algorithm by an upper Q-Learning algorithm, and adopts a multi-robot joint scheduling algorithm optimized by iterative evolution of a lower genetic algorithm, so that the solving quality is obviously improved compared with the traditional genetic algorithm.

2. The invention provides a multi-robot joint scheduling method of a flexible manufacturing system, which provides a plurality of cross strategies and variation strategies aiming at a multi-layer coding scheme of a problem, realizes full-coverage search of a problem coding solution space, improves the solving efficiency of an algorithm, and realizes efficient joint scheduling of a workshop.

Drawings

FIG. 1 is a flowchart of a multi-robot joint scheduling method for a flexible manufacturing system according to the present invention;

fig. 2 is a schematic diagram of an encoding scheme in a multi-robot joint scheduling method of a flexible manufacturing system according to the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The flexible workshop joint scheduling problem to which the present disclosure is directed can be described as follows: there are several processing machines with different capacities in the workshop, the set is M ═ M₁,...,M_m,...,M _{N_M}1 st to Nth processing machines; there are multiple homogeneous transport AGVs with the set R ═ R₁,...,R_r,...,R _{N_R}1 st to Nth _ R transfer AGVs; the set of workpieces to be machined is J ═ J₁,...,J_j,...,J_{N_J}N _ M, N _ R, N _ J, which represent the total number of processing machines, transfer AGVs, and workpieces to be processed, respectively. Different workpieces J_jWith different working processes O_jiAnd in different processing machines M_mWith different processing times P_jim. What needs to be solved in the problem of joint scheduling of workshops is to find an efficient processing and transferring flow for each processing technology of all the generation processing workpieces, and minimize the maximum value of the completion time of all the workpieces, namely minimize makespan. A number of assumptions and constraints need to be considered in the problem, including: 1) the storage capacity of the feeding/discharging buffer area of each processing machine is unlimited; 2) each machine only allows one workpiece to be processed at the same time; 3) the machining process cannot be stopped after the machining process is started; 4) all AGVs are isomorphic and are allowed to complete the transport task at the same speed; 5) energy consumption of the AGV is not considered; 6) after the AGV completes one transfer task, the AGV moves to the next task point to wait for the transfer task.

A multi-robot combined scheduling method of a flexible manufacturing system provides a multi-robot combined scheduling algorithm with genetic algorithm parameters dynamically adjusted by an upper Q-Learning algorithm and iterative evolution optimization of a lower genetic algorithm, and compared with the traditional genetic algorithm, the method has the advantages that the solving quality is obviously improved, and the flow of the method is as shown in figure 1, and comprises the following steps:

firstly, a three-layer coding mode that a machine sequence, a process sequence and an AGV sequence are matched is adopted, and a coding scheme is shown in FIG. 2: each gene in the process sequence represents the number of the workpiece with processing, and the number of times of the number from the left side of the process sequence to the position is counted to represent the number of the processing process; each gene in the machine sequence represents a machine number of a process corresponding to the process; each gene in the AGV sequence represents an AGV number for transporting the corresponding workpiece to the corresponding machine.

Secondly, decoding the processing procedures in a decoding mode according to the appearance sequence of the coding middle procedures from front to back strictly on the premise of meeting procedure constraints and processing constraints.

Step 1: algorithm initialization:

step 1.1: the genetic algorithm initialization content comprises the following steps: the method comprises the following steps of (1) establishing a population scale N, establishing a random initial population, iteration times g _ max, selecting a proportion Ps to be 0.8, a cross probability Pc to be 0.6, a variation probability Pm to be 0.05 and the like;

step 1.2: the Q-Learning algorithm initialization content comprises the following steps: Q-Table, state set S, action set A, α, β, ε, and the like. The selection mode of the elements in the state set S is as follows: extracting key information such as population diversity s from genetic algorithm population_aGroup convergence speed s_vPopulation convergence tendency s_cThe data are discretized to form a combination sd of multiple information data<sd_a,sd_v,sd_c>And use this as an element of the state set. The selection mode of the elements in the action set A is as follows: increasing or reducing the selection proportion Ps, the cross probability Pc and the variation probability Pm of the genetic algorithm; the Q-Table uses the state set S as a row vector and uses the action set A as a column vector to establish an empty Q-Table Table with elements of 0;

TABLE 1Q-Table initialization example

	Ps+	Ps-	Pc+	Pc-	Pm+	Pm-
							s0	0	0	0	0	0	0
s1	0	0	0	0	0	0
							s2	0	0	0	0	0	0
…	0	0	0	0	0	0
							s14	0	0	0	0	0	0
s15	0	0	0	0	0	0

and step 3: calculating population diversity s according to fitness value_aGroup convergence speed s_vPopulation convergence tendency s_cComparing the data with the state data of the previous generation population to obtain a return value R, updating the Q-Table, and then performing discretization processing on the Q-Table;

step 3.1: calculating the average of the fitness values of all individuals in the population of the kth generation as shown in the following formula, wherein Y (k) represents the average of the fitness values of the individuals in the population of the kth generation, and f (x)_j ^k) Indicates the fitness value of the jth individual in the kth generation population.

Step 3.2: the mean of the fitness values of all individuals in the population of consecutive generations was calculated as shown below, where ave (i) represents the mean of the fitness values of individuals in the population of generations 10-9 through 10.

Step 3.3: the average value of population diversity for successive generations was calculated as shown below, where div (i) represents the average value of population diversity from generations 10i-9 to 10.

Step 3.4: the slope value of a linear fit of the mean values of the population fitness values for successive 10 generations was calculated as shown below.

Step 3.5: calculating population diversity s of genetic algorithm_aGroup convergence speed s_vPopulation convergence tendency s_cAs shown in the following formula.

Step 3.6: calculating the discretized state set of various parameters of the population of the genetic algorithm<sd_a,sd_v,sd_c>The formula is shown below.

Step 3.7: the calculation method of the return value R is as follows.

Step 3.8: the Q-Table calculation formula is shown below. Wherein Q is a Q-Table vector; s_i+1 represents a state value of the next generation learning process; s_iA state value representing a present generation learning process; a is_i+1 represents the action value obtained by the Q-Table of the next generation; a is_iRepresents the action value obtained by the present generation according to the Q-Table;

Q(s_i,a_i)←(1-α)·Q(s_i,a_i)+α·(R_t+1+γQ(s_i+1,a_i+1)) (10)

and 4, step 4: according to sd ═<sd_a,sd_v,sd_c>Selecting an action value a by using the discrete state value, the element value of the row corresponding to the Q-Table and the epsilon-greedy strategy;

step 4.2: the epsilon-greedy action selection strategy is as follows:

and 5: executing corresponding genetic algorithm parameter adjustment action according to the selected action value a; the parameter adjustment strategy of the genetic algorithm is shown as the following formula, wherein Pa is a general name of the selection proportion Ps, the cross probability Pc and the variation probability Pm in the genetic algorithm, and the three parameters are adjusted by using the following formula.

Step 6: executing a selection operator, a crossover operator and a mutation operator of the genetic algorithm according to the adjusted parameters to form a new population; the strategies of selecting operators, crossover operators and mutation operators in the invention are as follows:

selecting an operator: generating N × Ps individuals by using a selective cross mutation strategy according to the selection proportion Ps and the elite retention strategy; generating the remaining N x (1-Ps) individuals using a random generation method; individuals are selected as parents of the crossover operator using the tournament algorithm.

And (3) a crossover operator: because of the three-layer coding scheme, the invention designs 3 crossing schemes, and randomly adopts one crossing scheme when a crossing operator is executed each time.

Cross protocol 1: only the chromosome position information of two individual specific workpieces is crossed, and the combination relation of the workpiece, the machine and the AGV is not changed;

cross-over scheme 2: crossing the position relation of the workpieces corresponding to the workpieces, the machines and the two parent chromosomes without changing the gene position of the AGV;

crossover scheme 3: the cross < workpiece, machine, AGV > and the positional relationship of the two chromosomes to the workpiece.

Mutation operator: because a three-layer coding scheme is adopted, the actual meanings of each layer of coding are different, 3 mutation strategies are designed, and different mutation operations are carried out on each gene according to mutation probability when a mutation operator is executed each time.

Mutation strategy 1: intersecting two adjacent < workpieces, machines, AGVs >;

mutation strategy 2: randomly selecting other machines which can execute corresponding procedures again;

mutation strategy 3: randomly selecting other AGVs capable of executing corresponding carrying tasks again;

Example (b): taking EX11 as an example, the data of the problem example are shown in the following table, wherein 5 workpieces to be processed, 4 processing machines and 2 transfer AGVs exist:

TABLE 2 transfer time of workshop nodes

TABLE 3 workpiece Process information

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-robot joint scheduling method of a flexible manufacturing system is characterized by comprising the following steps:

step 1: algorithm initialization:

step 1.2: the Q-Learning algorithm initialization content comprises the following steps: Q-Table, state set S, action set A, Q-Table update coefficient beta and action selection coefficient epsilon; the selection mode of the elements in the state set S is as follows: extracting state data from a genetic algorithm population, wherein the state data comprises population diversity sa, a population convergence speed sv and a population convergence trend sc, and the combination of the discretized state data is sd ═ sda, sdv, scdc >, wherein sda is a discretized population diversity value, sdv is a discretized population convergence speed value, scdc is a discretized population convergence trend, and the state data after discretization is taken as an element of a state set S;

2. The method according to claim 1, wherein the calculation process of the population diversity sa, the population convergence rate sv and the population convergence trend sc in step 1.2 comprises the following steps:

step 202: calculating the average value of the adaptation values of all individuals in the population of the continuous 10 generations as shown in formula (3), wherein ave (i) represents the average value of the adaptation values of the individuals in the population from the 10i-9 th generation to the 10 th generation;

wherein i is a Learning period of the Q-Learning algorithm, i belongs to [1, iter _ max/N ], and iter _ max is a set maximum iteration number; Q-Learning is carried out once every 10 times of iteration of the population;

step 203: calculating the average value of the population diversity of the continuous 10 generations as shown in formula (4), wherein div (i) represents the average value of the population diversity from the 10i-9 th generation to the 10 th generation;

wherein s is_a(i) Is the population diversity of the ith learning cycle, s_v(i) Population convergence rate, s, for the i-th learning period_c(i) The population convergence trend of the ith learning period;

3. The method of claim 1, wherein the report value R in step 3 is calculated according to equation (10):

r (i) is the return value of the ith learning period.

4. The method according to claim 1, wherein in step 5, according to the selected action value a, a corresponding genetic algorithm parameter adjustment action is executed, where Pa is a general term of the selection ratio Ps, the cross probability Pc, and the variation probability Pm in the genetic algorithm, and all three parameters are adjusted using the following formula;

here i takes the value.

5. The method according to claim 1, wherein step 6, selecting operator, crossover operator and mutation operator of genetic algorithm are executed according to the adjusted parameters to form new population, and the specific steps include the following;