CN116500986A

CN116500986A - Method and system for generating priority scheduling rule of distributed job shop

Info

Publication number: CN116500986A
Application number: CN202310439782.8A
Authority: CN
Inventors: 李新宇; 黄江平; 高亮
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-07-28

Abstract

The invention belongs to the field of workshop scheduling, and particularly discloses a method and a system for generating a priority scheduling rule of a distributed job workshop, wherein the method comprises the following steps: constructing a scheduling rule generation model for deciding on a distributed job shop scheduling problem, wherein the distributed job shop scheduling problem is represented as a disjunctive graph: each factory corresponds to one sub-extraction graph, sub-extraction graphs of all factories are spliced to obtain extraction graphs capable of representing factory allocation and sequencing of procedures in the factories, and each node of the extraction graphs comprises allocated factory information; solving the extraction graph through a Markov decision model, extracting characteristics of the extraction graph through a graph neural network in the decision process, and performing action decision through an actor network; training a scheduling rule generation model according to the pre-acquired data set, and updating parameters of the graphic neural network and the actor network to obtain a trained scheduling rule generation model. The invention can realize the generation of the priority scheduling rule of the distributed job shop, and has better performance and generalization.

Description

Method and system for generating priority scheduling rule of distributed job shop

Technical Field

The invention belongs to the field of workshop scheduling, and particularly relates to a method and a system for generating a priority scheduling rule of a distributed job workshop.

Background

Production scheduling is an important link of a manufacturing system, and directly affects the benefit and competitiveness of enterprises. Distributed manufacturing has become one of the important development directions of manufacturing industry, and has the advantages of large flexibility, rapid response, high reliability and the like, can cope with urgent production demands, promotes customization, low cost and small-scale production, and reduces the dependence of production on environment. Distributed job shop scheduling problems (Distributed Job Shop Scheduling Problem, DJSP) are typical representatives of equipment manufacturing, which treat each plant as one job shop, the process of different workpieces may be different; it mainly contains 2 sub-problems, shop allocation of workpieces and scheduling of processes on machines in each shop to meet different production requirements, as shown in fig. 1.

The priority scheduling rules (Priority Dispatch Rule, PDR) are a classical heuristic method that has been widely used in practical production. The PDR is visual, quick and easy to understand, and is widely applied to various scheduling problems. The advantages of PDR are more evident for complex production scenarios lacking a priori knowledge. A good PDR is based on rich domain knowledge and is continuously perfected during trial-and-error. In addition, the performance of PDR is greatly affected by the scale of the problem. Therefore, designing a general priority scheduling rule generating method with self-learning and self-evolution capability is very important to solve the complex and variable production scheduling problem.

Deep reinforcement learning (Deep Reinforcement Learning, DRL) technology combines the perceptibility of deep learning and the decision making capability of reinforcement learning, and is an artificial intelligence method which is more similar to the human thinking mode. The DRL has autonomy, can learn the optimal action selection, and reacts to the environment in real time. In addition, the method has strong generalization capability and high solving speed, so that the method is valuable in exploring the application of the method in the field of workshop scheduling.

The graph theory is widely applied in the field of workshop scheduling, and the extracted graph expression mode of the solution of the scheduling problem can clearly illustrate the constraint relation among the procedures of the same workpiece. In the field of deep learning, the graphic neural network (Graph Neural Network, GNN) is a network structure for directly operating graphic data, and has been successfully applied to the problems of decentralized wireless resource allocation, residual service life prediction of industrial equipment, voltage stability control in a power system, and the like.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention provides a method and a system for generating a priority scheduling rule of a distributed job shop, and aims to provide a method for generating the priority scheduling rule with self-learning and self-evolution capabilities and strong universality, so as to realize the generation of the priority scheduling rule of the distributed job shop.

To achieve the above object, according to a first aspect of the present invention, there is provided a method for generating a priority scheduling rule for a distributed job shop, including the steps of:

constructing a scheduling rule generation model for deciding a distributed job shop scheduling problem, wherein:

representing the distributed job shop scheduling problem as a disjunctive graph: each factory corresponds to one sub-extraction diagram, sub-extraction diagrams of all factories are spliced to obtain extraction diagrams capable of representing factory allocation and sequencing of procedures in all factories, and each node of each extraction diagram comprises allocated factory information;

solving the disjunctive graph through a Markov decision model: the Markov decision model gradually perfects all nodes in the extraction graph through multiple decision updating extraction graph, so as to obtain a final solution; in the decision process, extracting the characteristics of the extracted graph through a graph neural network, and performing action decision through an actor network;

training the constructed scheduling rule generation model according to the pre-acquired data set, and iteratively updating the parameters of the graphic neural network and the actor network to obtain a trained scheduling rule generation model;

and the scheduling rule generation of the distributed job shop is realized through the trained scheduling rule generation model.

As a further preferred mode, in the scheduling rule generation model, the factory allocation of each workpiece is determined first, and then the extracted graph is solved through a Markov decision model; the factory distribution method of the workpiece comprises the following steps:

respectively calculating the total processing time of each workpiece for completing all working procedures, and sequencing each workpiece according to the ascending order of the total processing time; then sequentially placing the first f workpieces into f factories, wherein f is the total number of the factories;

for the rest workpieces, calculating the total processing time of all the workpieces in each factory, and distributing the workpieces with the forefront current sequence to the factory with the minimum total processing time; this process is repeated until factory distribution of all the workpieces is completed.

As a further preferred, the markov decision model includes state features at each moment in the decision process;

extracting a graph at any decision point t time, wherein the nodes comprise 5 features, and the 5 features form a state feature; the 5 features are specifically:

1) Node v corresponds to process O _ji Processing time p of (2) _ji ；

2) Binary variable b (v, s _t ) The method comprises the steps of carrying out a first treatment on the surface of the When node v corresponds to process O _ji When the scheduling has been completed at time t, b (v, s _t ) =1, otherwise b (v, s _t )＝0；

3) Node v corresponds to process O _ji Factory fac (v, s) _t ) The method comprises the steps of carrying out a first treatment on the surface of the If a workpiece is assigned to the factory k, the factory characteristics of all the processes of the workpiece at time t are denoted as fac (v, s _t )＝k；

4) Node v corresponds to process O _ji Lower boundary c of completion time of (c) _LB (O _ji ,s _t )；

When process O _ji Is a workpiece J _j In the first pass of (2), the estimated lower bound is equal to the workpiece J _j Is added to the release time of the process O _ji Is not limited, and the processing time of the device is not limited;

when process O _ji Not work J _j Judging: if process O at time t _ji Finishing the processing, the estimated lower bound of the finishing time is equal to the actual finishing time; otherwise, the lower bound is estimated to pass c _LB (O _ji ,s _t )＝c _LB (O _j,i-1 ,s _t )+p _ji Calculated, wherein c _LB (O _j,i-1 ,s _t ) Is the procedure O _ji Lower limit of the estimation of the completion time of the previous process, p _ji Is a procedure O _ji Is a processing time of (a);

5) Node v corresponds to process O _ji Is the earliest release time est (O) _ji ,s _t )；

Initial state s ₀ When (1): when process O _ji Is a workpiece J _j The earliest release time is 0 in the first procedure of (2); when process O _ji Not work J _j In the first step of (2), the earliest release time is est (O _ji ,s ₀ )＝est(O _j,i-1 ,s ₀ )+p _j,i-1 Wherein est (O) _j,i-1 ,s ₀ ) Is the procedure O _ji Is the previous process O _j,i-1 Is the earliest release time, p _j,i-1 Is a procedure O _j,i-1 Is a processing time of (a);

state s _t When t+.0: when process O _ji Finishing the processing, wherein the earliest release time is 0; when process O _ji Not processed and preceding process O _j,i-1 Finishing the processing, then the process O _ji Is equal to the earliest release time of the previous procedure O _j,i-1 Is a completion time of (2); when process O _ji Not processed and preceding process O _j,i-1 The earliest release time of the processing is the process O _j,i-1 Adding process O to the earliest release time of (2) _j,i-1 Processing time p of (2) _j,i-1 。

As a further preference, the markov decision model also includes actions, state transitions, and rewards at each moment in the decision process; the method comprises the following steps:

the actions are as follows: action space A of decision point t _t A set formed by the next working procedure of all unfinished workpieces;

state transition: inserting the selected action into a feasible position such that the earliest start processing time of the process is minimized;

rewarding: at decision point t, if action a _t In the corresponding process in the plant k, the current prize is R (s _t ,a _t )＝C _k (s _t )-C _k (s _t+1 ) Wherein R(s) _t ,a _t ) Is state s _t Lower selection action a _t Obtained prize value, C _k (s _t ) Is the maximum completion of factory k at time tTime of formation, C _k (s _t+1 ) Is to take action a _t Maximum completion time of plant k at time t+1 later.

As a further preferred way of combining the 5 features into a state feature is:

each feature matrix is n, m, and is rewritten into a one-dimensional matrix with n multiplied by m,1, wherein n is the total number of workpieces, and m is the total number of working procedures of each workpiece; and splicing the one-dimensional matrixes corresponding to the 5 features into a matrix of [ n multiplied by m,5], and then carrying out normalization processing on the matrix to obtain the state features.

As a further preferred option, in the decision process, features of the extracted graph are extracted through the graph neural network, and the actor network is input, so that the actor network scores each action, and based on the score, the probability of each action being selected is calculated through a softmax function, and the action with the highest probability of being selected is output.

As a further preference, the scheduling-rule generating model is trained by an Adam optimizer.

As a further preferred mode, when training the scheduling rules generation model, calculating a network loss function through an asynchronous dominant actor-critic network, dynamically adjusting the learning rate by adopting an Adam optimizer, and updating the parameters of the graph neural network and the actor network.

According to a second aspect of the present invention, there is provided a priority scheduling rule generating system of a distributed job shop, comprising a processor for executing the above-mentioned priority scheduling rule generating method of the distributed job shop.

According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described priority scheduling rule generation method of a distributed job shop.

In general, compared with the prior art, the above technical solution conceived by the present invention mainly has the following technical advantages:

1. according to the characteristics of the distributed job shop, the invention designs a representation method of the disjunctive graph, and further establishes the relationship between disjunctive graph representation and graph neural network of the scheduling problem of the distributed job shop, thereby constructing a scheduling rule generation model which can make real-time scheduling decisions by observing the current scheduling environment, and provides a new view for solving the scheduling problem; meanwhile, the time for solving the problem is shortened, the quality of knowledge is improved, and therefore the production efficiency of enterprises can be remarkably improved.

2. Compared with the existing priority scheduling rule method, the method has strong self-learning and self-evolution capability, higher solving efficiency and stronger optimizing; compared with the prior meta heuristic algorithm, the method is simple, easy to understand, strong in generalization and stability and applicable to different scheduling environments.

3. According to the invention, factory allocation rules are designed, factory allocation of workpieces is performed in advance, and then, according to the characteristics of distributed problems, state characteristics, rewarding mechanisms and the like in a Markov decision model are designed in a matching manner, so that efficient and accurate generation of the priority scheduling rules of the distributed job shop is realized.

Drawings

FIG. 1 is a schematic view of a factory distribution in a distributed job shop according to an embodiment of the present invention;

fig. 2 (a) and (b) are exploded views of a solution of a distributed job shop according to an embodiment of the present invention;

FIGS. 3 (a) - (c) are schematic diagrams of the action space in the Markov decision model according to an embodiment of the present invention;

fig. 4 (a) and (b) are schematic views of state transitions in a markov decision model according to an embodiment of the present invention;

fig. 5 (a) - (c) are graphs of the results of the comparison of the TA dataset with the factory number of 2 with classical scheduling rules, meta-heuristic, reinforcement learning algorithms according to the embodiment of the present invention;

fig. 6 (a) - (c) are graphs of the results of the comparison of the TA dataset with the number of factories of 3 with classical scheduling rules, meta-heuristic algorithms, reinforcement learning algorithms, respectively, according to the embodiments of the present invention;

fig. 7 (a) - (c) are graphs of the results of the comparison of the TA dataset with the number of factories of 4 with classical scheduling rules, meta-heuristic algorithms, reinforcement learning algorithms, respectively, according to the embodiments of the present invention;

fig. 8 is a flowchart of a method for generating a priority scheduling rule in a distributed job shop according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The method for generating the priority scheduling rule of the distributed job shop provided by the embodiment of the invention, as shown in fig. 8, comprises the following steps:

s1, constructing a scheduling rule generation model for deciding a scheduling problem of a distributed job shop; in the scheduling rule generation model, the scheduling problem of the distributed job shop is expressed as a disjunctive graph, and the disjunctive graph is solved through a Markov decision model; the method specifically comprises the following steps:

(1) Solution representation method

In the shop scheduling problem, the extraction graph is a classical solution representation method. The extraction graph is a directed graph g= (V, C u D), where V is a node set, set V contains process nodes of the workpiece and 2 empty nodes { S, T }, the processing time of the empty nodes is 0, and all processing tasks start from node S and end with node T. C is a connection arc set and represents a preferential constraint relationship between the same workpiece processes. And D is an extracted arc set, and represents the sequence of the processing procedures on the same machine.

Aiming at DJSP, the invention provides a splicing extraction graph representation method, wherein each node comprises basic node information, such as processing time, earliest start processing time and the like of a corresponding working procedure of the node, and further comprises additional information, namely a processing factory where each working procedure is located. The factory allocation information in the extraction graph may reflect the factory allocation scheme of the DJSP.

For example, for an example of 2 factories, 6 workpieces, the workpiece processing letterAs shown in Table 1, the extracted graph of the initial information is shown in FIG. 2 (a), and FIG. 2 (b) shows one possible solution, with workpieces 1, 2 and 3 being processed in the factory 1 (first three rows) and workpieces 4, 5 and 6 being processed in the factory 2 (second three rows). The working sequence of the working procedures on the same machine is marked by the broken line arrow with the same color, such as the working procedure O ₁₁ ,O ₂₂ And O ₃₁ In machine M ₂ And (3) sequentially processing, wherein the process priority of the same workpiece is marked by black arrows.

Table 1 processing time and processing machine information of the process

(2) Factory allocation rules

Factory allocation rules are used to solve the first sub-problem of DJSP, the shop allocation of workpieces. Firstly, sequencing all workpieces according to the ascending order of the total processing time of the workpieces, and then sequentially placing the first f workpieces (f is the total number of factories) into f factories; and for the rest workpieces, calculating the total processing time of all the workpieces in each factory, distributing the workpieces with the forefront of the current sequence to the factory with the minimum total processing time, and repeating the process until the factory distribution of all the workpieces is completed.

(3) Markov decision model

DJSP is a sequence decision problem that can be constructed as a Markov decision process. The Markov decision process may be represented by a five-tuple (S, A, P, gamma, R), S being a set of states, A being a set of actions, P being a dynamic model, gamma being a discount factor between 0 and 1, R being a reward function.

Arbitrary decision point t, the agent observes the current environmental state s _t E S, selecting an action a by a given policy pi (S→A) _t E A, the agent is represented by probability p (s _t+1 |s _t ,a _t ) Entering a new state s _t+1 And obtain a real-time rewarding r _t ∈R。

The invention adopts a disjunctive graph mode to represent the solution of DJSP, and a Markov model is established as follows:

status: extraction graph G (t) = (V, C ≡d) at arbitrary decision point t _c (t),D _u (t)) reflects the state of the current solution. D (D) _c (t) includes an extracted arc with direction, D _u (t) then contains the extracted arcs without direction, D _c (t)∪D _u (t) =d. When DJSP is in initial state, D _c (t) =Φ; when DJSP has completed scheduling, D _u (t) =Φ; the set V includes all the processing steps.

Any node in set V contains the following 5 features:

1) Node v corresponds to process O _ji Processing time p of (2) _ji ；

3) Node v corresponds to process O _ji Factory fac (v, s) _t ) The method comprises the steps of carrying out a first treatment on the surface of the If the workpiece J _j Is distributed to the factory k, the factory characteristics of all the working procedures of the workpiece at the time t are expressed as fac (v, s _t )＝k,k∈[1,f]；

4) Node v corresponds to process O _ji Lower boundary c of completion time of (c) _LB (v,s _t )；

When i=1, i.e. procedure O _ji Is a workpiece J _j In the first pass of (2), the estimated lower bound is equal to the workpiece J _j Is added to the release time of the process O _ji Is not limited, and the processing time of the device is not limited;

when i.noteq.1, i.e. procedure O _ji Not work J _j During the first procedure of (2), judging: if process O at time t _ji Finishing the processing, the estimated lower bound of the finishing time is equal to the actual finishing time; otherwise, the lower bound passes formula c _LB (O _ji ,s _t )＝c _LB (O _j,i-1 ,s _t )+p _ji Calculated, wherein is O _j,i-1 Is the procedure O _ji P is the previous step of _ji Is a procedure O _ji Is a processing time of (a);

5) Node v corresponds toProcedure O _ji Is the earliest release time est (O) _ji ,s _t )；

Initial state s ₀ When i=1, i.e., step O _ji Is a workpiece J _j In the first step of (2), the earliest release time is 0, that is, est (O) _j1 ,s ₀ ) =0; when i.noteq.1, i.e. procedure O _ji Not work J _j In the first step of (2), the initial release time is est (O _ji ,s ₀ )＝est(O _j,i-1 ,s ₀ )+p _j,i-1 In the process O _j,i-1 Is a procedure O _ji P is the previous step of _j,i-1 Is a procedure O _j,i-1 Is a processing time of (a);

state s _t If (t.noteq.0) is the process O _ji The processing is completed, the earliest release time is set to 0, that is, est (O _ji ,s _t ) =0; when process O _ji Not processed and preceding process O _j,i-1 Finishing the processing, then the process O _ji Is equal to the previous process O _j,i-1 Completion time of (2), i.e. est (O) _ji ,s _t )＝c _LB (O _j,i-1 ,s _t ) Otherwise, if the procedure O _ji Is the previous process O _j,i-1 The earliest release time of the processing is the process O _j,i-1 Adding process O to the earliest release time of (2) _j,i-1 Treatment time p _j,i-1 That is, est (O) _ji ,s _t )＝est(O _j,i-1 ,s _t )+p _j,i-1 。

The 5 characteristics are adopted to represent the state of DJSP at any time, thus obtaining 5 values of [ n, m ]]Are written as matrices of size [ n x m,1 respectively]Is spliced to form a one-dimensional matrix with the size of [ n multiplied by m,5]]Is simultaneously normalized by a normalization formula And carrying out normalization processing to obtain state characteristics.

The actions are as follows: number of workpiecesFor n, the DJSP problem of m number of work pieces each contains n×m number of work pieces, one work piece is selected by each decision point agent. According to the definition of DJSP problem, every workpiece can only finish the processing of one procedure at any time. Therefore, the operation space a at time t _t Is a set formed by the next working procedure of all unfinished workpieces, and the action space is gradually reduced along with the gradual processing of the workpieces. As shown in fig. 3 (a), step O ₁₁ ,O ₂₁ And O ₃₁ Has been scheduled, so the action space is { O } ₁₂ ,O ₂₂ ,O ₃₂ ,O ₄₁ ,O ₅₁ ,O ₆₁ -a }; as shown in FIG. 3 (b), in the step O ₂₂ Selected and scheduling is completed, then the action space becomes { O } ₁₂ ,O ₂₃ ,O ₃₂ ,O ₄₁ ,O ₅₁ ,O ₆₁ -a }; as shown in FIG. 3 (c), if step O ₂₃ Scheduled, the action space becomes { O } ₁₂ ,O ₃₂ ,O ₄₁ ,O ₅₁ ,O ₆₁ And the size of the motion space is reduced by one.

State transition: when an agent selects an action, it is necessary to determine its earliest start time so that the completion time of the current process is minimized. As shown in FIG. 4, step O circled with a solid line ₁₁ ,O ₂₁ And O ₃₁ The scheduling has been completed and the remaining unfinished scheduled steps are circled with dashed lines. As shown in fig. 4 (a), in state s ₃ Action a ₃ ＝O ₂₂ Is from the state space { O ₁₂ ,O ₂₂ ,O ₃₂ ,O ₄₁ ,O ₅₁ ,O ₆₁ Selected from the group. Procedure O ₂₂ The earliest start processing time of (2) is 3, i.e. procedure O ₂₁ Is completed in machine M ₂ In the process O ₂₂ Earlier than procedure O ₃₁ When the processing time is minimized, the state of the extracted graph is changed from that shown in fig. 4 (a) to that shown in fig. 4 (b), step O ₃₁ The starting time of (2) is changed from 2 to 6.

Rewarding: the reward is a feedback signal of the environment indicating the quality of the decision made by the agent at the current decision point. The goal of reinforcement learning is to make the current rewards earned by the agent as large as possible, hopefullyIs the largest. Wherein the bonus function at time t is defined as R (s _t ,a _t )＝H(s _t )-H(s _t+1 ) H (·) is a quality assessment of the different states. In the invention, the maximum completion time of DJSP directly reflects the production efficiency of workshops. By combining the distributed characteristics of DJSP, a reward function based on the maximum completion time of each factory is designed. The maximum completion time of each plant is equal to the maximum completion time of all the processes in the plant, i.e. C _k ＝max(C _LB (O _ji )|fac(O _ji )＝k),k∈[1,f]. When all the working procedures are scheduled, the maximum completion time is C _max ＝max(C _k ). At decision point t, if action a _t In the corresponding process, H(s) _t )＝C _k (s _t ). The reward function of the invention is R (s _t ,a _t )＝C _k (s _t )-C _k (s _t+1 ). As can be seen by the reward function, maximizing the jackpot equates to minimizing the maximum completion time.

(4) GNN-based policies

The conventional PDR is to select a procedure for scheduling with probability 1 for each step. The invention designs a strategy pi (a _t |s _t ) Which outputs a probability distribution based on the motion space. To optimize the strategy, the strategy parameter θ needs to be determined through training to obtain a strategy pi of optimal parameters _θ (a _t |s _t )。

Graph embedding: embedding is a compressed representation of information, and the present invention employs a graph neural network Graph Isomorphism Network (GIN) to extract features of the extracted graph. Given a graph G= (V, C U D), through K iterations, GIN calculates an embedded information vector of p dimension for all nodes in the set V, and the updating formula of the node information of GIN is Wherein (1)>Is compressed information of node v after k iterations, < >>Is the original feature of node v, +.>The network parameter obtained by the kth iteration is theta _k E of the multi-layer perceptron network ^(k) Is a learnable parameter and N (v) is the neighborhood of node v.

Information aggregation: the node information after K iterations is aggregated by adopting an average pooling function, and the expression is as followsWhen the state is s _t In any case, procedure O _ji (O _ji E V) is a 5-dimensional vector ∈ ->Wherein (1)> And->Procedure O obtained according to normalization formula _ji Is a normalized value of 5 features of (c). After K iterations, procedure O _ji Is characterized by->Global graph embedding information h from average pooling function _g (s _t )。

Action selection: the action space is a set composed of the current processes of all the unfinished machined workpieces. The invention is thatThe probability distribution of the action space is calculated by using an Actor Network (Actor Network) composed of two layers of multi-layer perceptrons. Integrating node embedded information and global embedded information to obtainAnd inputting the probability into an Actor Network, scoring each action by the Actor Network, calculating the probability of each action being selected by a softmax function, and selecting the action with the highest probability for output.

The reinforcement learning method comprises the following steps: actor-Critic networks (Actor-Critic networks) are a very effective reinforcement learning method that combines strategy gradient and time-difference learning. The Actor Network is a policy function that learns a policy such that an agent gets as high a prize as possible from the environment. A Critic Network (Critic Network) is a cost function for evaluating the superiority of current policies. Based on the cost function, the Actor-Critic Network updates the Network parameters once per step. The invention adopts an asynchronous dominant actor-critter network (Asynchronous Advantage Actor-Critic, A3C) to realize the self-learning of the strategy, thereby updating network parameters.

S2, training the constructed scheduling rule generation model according to the pre-acquired data set to obtain a trained scheduling rule generation model; the method specifically comprises the following steps:

(5) Model training

Training the scheduling discipline generation model by adopting an Adam optimizer.

Aiming at DJSP of different scales, respectively training a model, iterating the problem of each scale 1000 times, and carrying out parameter updating on the model once for each iteration. Every 10 iterations, the model is verified on 50 fixed test cases designed in advance, and if the current model optimizes the average value of the 50 test case results, the current model is saved.

(6) Model effect verification

The saved model can be directly used for solving the DJSP problem, so that the real-time scheduling requirement is met.

In order to verify the practical application effect of the invention, a TA standard test set is selected for simulation test and is compared with 8 classical PDRs, 5 meta heuristic algorithms and 3 RL algorithms.

Fig. 5-7 are comparison results of the present invention with other algorithms in solving TA datasets for numbers 2, 3 and 4 of plants. As can be seen from the results in the figure, the PDR generation method based on GNN and RL can solve the DJSP problem of different scales, and has great advantages compared with the traditional PDR, meta heuristic algorithm and related RL algorithm.

Besides verifying the effectiveness of the invention through design experiments, the generalization of the invention is verified by solving the large-scale problems through models trained by different scale problems. The experimental results are shown in table 2, and the data in the table show that the same problem and the objective function values obtained by different models are not different, so that the invention has strong generalization.

Table 2 model generalization verification data

The reaction variable in FIGS. 5 to 7 is the relative growth rate (relative percentage increase, RPI) and the formula isMethod _sol The maximum completion time of the current example obtained by the algorithm Method is Best _sol Is the optimal value of the maximum completion time of the current example calculated by all the comparison algorithms.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for generating a priority scheduling rule of a distributed job shop is characterized by comprising the following steps:

2. The method for generating a priority scheduling rule for a distributed job shop according to claim 1, wherein in the scheduling rule generation model, factory allocation of each workpiece is determined first, and then the extraction graph is solved by a markov decision model; the factory distribution method of the workpiece comprises the following steps:

3. The method for generating a priority scheduling rule for a distributed job shop according to claim 1, wherein the markov decision model includes state features at each moment in the decision process;

1) Node v corresponds to process O _ji Processing time p of (2) _ji ；

Initial state s ₀ When (1): when process O _ji Is a workpiece J _j The earliest release time is 0 in the first procedure of (2); when process O _ji Not work J _j In the first process of (a), itThe earliest release time is est (O _ji ,s ₀ )＝est(O _j,i-1 ,s ₀ )+p _j,i-1 Wherein est (O) _j,i-1 ,s ₀ ) Is the procedure O _ji Is the previous process O _j,i-1 Is the earliest release time, p _j,i-1 Is a procedure O _j,i-1 Is a processing time of (a);

4. A method of generating a priority scheduling rule for a distributed job shop as claimed in claim 3, wherein the markov decision model further comprises actions, state transitions and rewards at each moment in the decision process; the method comprises the following steps:

rewarding: at decision point t, if action a _t In the corresponding process in the plant k, the current prize is R (s _t ,a _t )＝C _k (s _t )-C _k (s _t+1 ) Wherein R(s) _t ,a _t ) Is state s _t Lower selection action a _t Obtained prize value, C _k (s _t ) Is the maximum completion time of factory k at time t, C _k (s _t+1 ) Is to take action a _t Maximum completion time of plant k at time t+1 later.

5. A method of generating a priority scheduling rule for a distributed job shop according to claim 3, wherein the 5 feature combinations are state features in the manner of:

6. The method for generating a priority scheduling rule for a distributed job shop according to claim 1, wherein in the decision process, features of the extracted graph are extracted through a graph neural network, and actor networks are input, whereby the actor networks score each action, and based on the score, a probability of each action being selected is calculated through a softmax function, and an action with the highest probability of being selected is output.

7. A method of generating a priority scheduling rule for a distributed job shop as claimed in any one of claims 1 to 6, wherein the scheduling rule generation model is trained by an Adam optimizer.

8. The method for generating a priority scheduling rule for a distributed job shop according to claim 7, wherein when training the scheduling rule generation model, calculating a network loss function through an asynchronous dominant actor-critique network, dynamically adjusting a learning rate by using an Adam optimizer, and updating parameters of a graph neural network and actor network.

9. A priority scheduling rule generating system for a distributed job shop, comprising a processor for executing the priority scheduling rule generating method for a distributed job shop as claimed in any one of claims 1-8.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the priority scheduling rule generating method of a distributed job shop according to any one of claims 1-8.